0604:cs0604101/focs.tex

1: \documentclass[11pt]{amsart}

2:

3: \usepackage{fullpage}

4: \usepackage[latin1]{inputenc}

5: \usepackage{bbm,natbib}

6: \usepackage{amsmath}

7: \usepackage{stmaryrd}

8: \usepackage{alltt, amssymb}

9: \usepackage{graphicx}

10: \usepackage{url}

11:

12: \def\Jac{{\mathbf{Jac}}}

13:

14: \newcommand{\sC}{\mathsf{C}}

15: \newcommand{\sL}{{\mathsf{L}}}

16: \newcommand{\sM}{{\mathsf{M}}}

17: \newcommand{\N}{\mathbb{N}}

18: \newcommand{\cO}{{\mathcal O}}

19: \newcommand{\order}{{r}}

20: \newcommand{\precision}{{N}}

21: \newcommand{\basefield}{{\mathbb{K}}}

22:

23: \newcommand{\Mat}{{\mathsf{MM}}}

24: \renewcommand{\proof}{\noindent\textsc{Proof.} }

25: \newcommand{\foorp}{\hfill$\square$}

26: \newcommand{\tr}{\mathrm{trace}}

27:

28: \newcommand{\trunc}[3]{\left[ #1 \right]_{#2}^{#3}}

29: \newcommand{\truncl}[2]{\left\lfloor #1 \right\rfloor_{#2}}

30: \newcommand{\trunch}[2]{\left\lceil #1 \right\rceil^{#2}}

31: \newcommand{\intpart}[1]{\left\lfloor #1 \right\rfloor}

32:

33: \newtheorem{Theo}{Theorem}

34: \newtheorem{Prop}{Proposition}

35: \newtheorem{Lemme}{Lemma}

36:

37:

38: \usepackage{graphicx}

39: \usepackage{changebar}

40: \usepackage[plainpages=false,pdfpagelabels,colorlinks=true,citecolor=blue,hypertexnames=false]{hyperref}

41:

42:

43: \begin{document}

44:

45: \title{Fast computation of power series solutions \\ of systems of

46:   differential equations}

47:

48: \author{A. Bostan, F. Chyzak, F. Ollivier, B. Salvy, \'E. Schost, and A. Sedoglavic}

49: \thanks{Partially supported by a grant from the French \emph{Agence nationale pour la recherche}.}

50: %\date{Preliminary version 1.10 --- 11/04/2006}

51:

52: \begin{abstract}

53:   We propose new algorithms for the computation of the first~$\precision$ terms

54:   of a vector (resp.\ a basis) of power series solutions of a linear

55:   system of differential equations at an ordinary point, using a

56:   number of arithmetic operations which is quasi-linear with respect

57:   to~$\precision$.  Similar results are also given in the non-linear case. This extends

58:   previous results obtained by Brent and

59:   Kung for scalar differential equations of order one and two.

60: \end{abstract}

61: \maketitle

62:

63: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

64: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

65: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

66:

67: \section{Introduction}

68:

69: In this article, we are interested in the computation of the first $\precision$ terms of power

70: series solutions of differential equations. This problem arises in

71: combinatorics, where the desired power series is a generating

72: function, as well as in numerical analysis and in particular in

73: control theory.

74:

75: Let~$\basefield$ be a field. Given~$r+1$ formal power

76: series~${a_0(t),\dots,a_{\order}(t)}$ in~$\basefield[[t]]$, one of

77: our aims is to provide fast algorithms for solving the

78: linear differential equation

79: % of order $\order$:

80: \begin{equation} \label{lindiffeq}

81: a_\order(t) y^{(\order)}(t) + \dots + a_1(t) y'(t)+ a_0(t) y(t) = 0. %

82: \end{equation}

83: Specifically, under the hypothesis that~$t=0$ is an ordinary point

84: for Equation~\eqref{lindiffeq} (i.e., ${a_r(0) \neq 0}$), we give efficient

85: algorithms taking as input the first~$\precision$ terms of the power

86: series $a_0(t), \dots, a_\order(t)$ and answering the following algorithmic questions:

87: \begin{enumerate}

88: \item[{\bf i.}]  find the first~$\precision$ coefficients of

89:   the~$\order$ elements of a basis of power series solutions

90:   of~\eqref{lindiffeq};

91: \item[{\bf ii.}] given initial conditions~$\alpha_0, \dots,

92:   \alpha_{\order-1}$ in~$\basefield$, find the first~$\precision$

93:   coefficients of the unique solution~$y(t)$ in~$\basefield[[t]]$ of

94:   Equation~\eqref{lindiffeq} satisfying

95: \[

96:   y(0) = \alpha_0,\quad y'(0) = \alpha_1, \quad \dots,\quad y^{(\order-1)}(0) =

97:   \alpha_{\order-1}.

98: \]

99: \end{enumerate}

100: More generally, we also treat linear first-order systems of differential

101: equations. From the data of initial conditions~$v$

102: in~$\mathcal{M}_{\order\times\order} (\basefield)$

103: (resp.~$\mathcal{M}_{{\order} \times 1} (\basefield)$) and of the

104: first~$\precision$ coefficients of each entry of the matrices~$A$

105: and~$B$ in~$\mathcal{M}_{\order\times\order} (\basefield[[t]])$ (resp.~$b$

106: in~$\mathcal{M}_{{\order} \times 1} (\basefield[[t]])$), we propose

107: algorithms that compute the first~$\precision$ coefficients:

108: \begin{enumerate}

109: \item[\bf I.]  of a fundamental solution~$Y$ in~$\mathcal{M}_{\order\times\order}

110:   (\basefield[[t]])$ of~${Y' = AY + B}$, with~${Y(0)=v},\;{\det Y(0) \neq 0}$;

111: \item[\bf II.]  of the unique solution~$y(t)$

112:   in~$\mathcal{M}_{{\order} \times 1} (\basefield[[t]])$ of~${y' = Ay

113:     + b}$, satisfying~${y(0) =v}$.

114: \end{enumerate}

115: %% \begin{equation}\label{systlindiffeq:basis}

116: %% Y' = AY + B, \quad \text{with} \; A,B \in \mathcal{M}_{\order} (\basefield[[t]])

117: %% \end{equation}

118: %% and

119: %% \begin{equation}\label{systlindiffeq:single}

120: %% y' = Ay + b

121: %% \end{equation}

122: Obviously, if an algorithm of algebraic complexity~$\sC$ (i.e.,

123: using~$\sC$ arithmetic operations in~$\basefield$) is available for

124: problem~{\bf II}, then applying it~$r$ times solves problem~{\bf I} in

125: time~$r \,\sC$, while applying it to a companion matrix solves

126: problem~{\bf ii} in time~$\sC$ and problem~{\bf i} in~$r

127: \,\sC$. Conversely, an algorithm solving~{\bf i} (resp. {\bf I}) also

128: solves {\bf ii} (resp. {\bf II}) within the same complexity, plus that

129: of a linear combination of series. Our reason for distinguishing the

130: four problems {\bf i, ii, I, II} is that in many cases, we are able to

131: give algorithms of better complexity than obtained by these

132: reductions.

133:

134: The most popular way of solving~{\bf i}, {\bf ii}, {\bf I}, and~{\bf II} is the

135: method of undetermined coefficients that requires~$\cO(\order^2

136: \precision^2)$ operations in~$\basefield$ for problem~{\bf i}

137: and~$\cO(\order \precision^2)$ operations in~$\basefield$ for~${\bf

138:   ii}$. Regarding the dependence in~$\precision$, this is certainly

139: too expensive compared to the size of the output, which is only linear

140: in~$\precision$ in both cases. On the other hand, verifying the

141: correctness of the output for~{\bf ii} (resp.~{\bf i}) already

142: requires a number of operations in~$\basefield$ which is linear

143: (resp.\ quadratic) in~$\order$: this indicates that there is little

144: hope of improving the dependence in~$\order$.  Similarly, for

145: problems~{\bf I} and~{\bf II}, the method of undetermined coefficients

146: requires~$\cO(\precision^2)$ multiplications of~$\order\times \order$

147: scalar matrices (resp.\ of scalar matrix-vector products in

148: size~$\order$), leading to a computational cost which is reasonable

149: with respect to~$\order$, but not with respect to~$\precision$.

150:

151: By contrast, the algorithms proposed in this article have costs that

152: are linear (up to logarithmic factors) in the

153: complexity~$\sM(\precision)$ of polynomial multiplication in degree

154: less than~$\precision$ over~$\basefield$. Using Fast Fourier Transform

155: (FFT) these costs become nearly linear~---~up to polylogarithmic

156: factors~---~with respect to~$\precision$, for all of the four problems

157: above (precise complexity results are stated below).  Up to these

158: polylogarithmic terms in~$\precision$, this estimate is probably not

159: far from the lower algebraic complexity one can expect: indeed, the

160: mere check of the correctness of the output requires, in each case, a

161: computational effort proportional to~$\precision$.

162:

163: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

164:

165: \subsection{Newton Iteration}

166: In the case of first-order equations ($r=1$), Brent and Kung have

167: shown in~\cite{BrKu78} (see also~\cite{Geddes1979,KuTr78}) that the problems

168: can be solved with complexity $\cO(\sM(\precision))$ by means of a

169: formal Newton iteration. Their algorithm is based on the fact that

170: solving the first-order differential equation~${y'(t) = a(t) y(t)}$,

171: with~$a(t)$ in~$\basefield[[t]]$ is equivalent to computing the

172: \emph{power series exponential\/}~$\exp(\int a(t))$.  This equivalence

173: is no longer true in the case of a system~${Y' = A(t) Y}$

174: (where~$A(t)$ is a power series matrix): for non-commutativity

175: reasons, the matrix exponential~${Y(t)= \exp(\int A(t))}$ is not a

176: solution of~${Y' = A(t) Y}$.

177:

178: Brent and Kung suggest a way to extend their result to higher orders,

179:  and the corresponding algorithm has been shown by van der Hoeven

180:  in~\cite{vdHoeven02} to have complexity~$\cO(\order^\order

181:  \,\sM(\precision))$. This is good with respect to~$\precision$, but

182:  the exponential dependence in the order~$\order$ is unacceptable.

183:

184: Instead, we solve this problem by devising a specific Newton iteration

185: for~${Y' = A(t) Y}$.  Thus we solve problems {\bf i} and {\bf I} in

186: $\cO(\Mat(\order,\precision))$, where $\Mat(\order,\precision)$ is the

187: number of operations in $\basefield$ required to multiply

188: $\order\times\order$ matrices with polynomial entries of degree less

189: than~$\precision$. For instance, when $\basefield=\mathbb{Q}$, this is

190: $\cO(\order^\omega \precision+r^2\sM(\precision))$, where

191: $\order^\omega$~can be seen as an abbreviation for~$\Mat(\order,1)$, see

192: \S\ref{ssec:complexity} below.

193:

194: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

195:

196: \subsection{Divide-and-conquer}

197: The resolution of problems {\bf i} and {\bf I} by Newton iteration

198: relies on the fact that a whole basis is computed. Dealing with

199: problems {\bf ii} and {\bf II}, we do not know how to preserve this

200: algorithmic structure, while simultaneously saving a factor $\order$.

201:

202: To solve problems~{\bf ii} and~{\bf II}, we therefore propose an

203: alternative algorithm, whose complexity is also nearly linear

204: in~$\precision$ (but not quite as good, being in

205: $\cO(\sM(\precision)\log\precision)$), but whose dependence in the

206: order~$\order$ is better~---~linear for~{\bf i} and quadratic for~{\bf

207: ii}. In a different model of computation with power series, based on

208: the so-called \emph{relaxed multiplication}, van der Hoeven briefly outlines

209: another algorithm~\cite[Section~4.5.2]{vdHoeven02} solving

210: problem~{\bf ii} in~$\cO(\order \,\sM(\precision) \log \precision)$.

211: To our knowledge, this result cannot be transferred to the usual model

212: of power series multiplication (called zealous in~\cite{vdHoeven02}).

213:

214: We use a divide-and-conquer technique similar to that used in the fast

215: Euclidean algorithm~\cite{Knuth70,Schonhage71,Strassen83}. For

216: instance, to solve problem~{\bf ii}, our algorithm divides it into two

217: similar problems of halved size. The key point is that the lowest

218: coefficients of the solution~$y(t)$ only depend on the lowest

219: coefficients of the coefficients~$a_i$.  Our algorithm first computes

220: the desired solution~$y(t)$ at precision only~$\precision/2$, then it

221: recovers the remaining coefficients of~$y(t)$ by recursively solving

222: at precision~$\precision/2$ a new differential equation.  The main

223: idea of this second algorithm is close to that used for solving

224: first-order difference equations in~\cite{GaGe97}.

225:

226: We encapsulate our main complexity results in

227: Theorem~\ref{theo:linear} below.  When FFT is used, the

228: functions~$\sM(\precision)$ and~$\Mat(\order,\precision)$ have, up to logarithmic terms, a nearly linear

229: growth in~$\precision$, see

230: \S\ref{ssec:complexity}. Thus, the results in the following theorem are quasi-optimal.

231: \begin{Theo}\label{theo:linear}

232:   Let~$\precision$ and~$\order$ be two positive integers and

233:   let\/~$\basefield$ be a field of characteristic zero or at

234:   least~$\precision$. Then:

235:   \begin{enumerate}

236:   \item[(a)] problems\/~{\bf i} and\/~{\bf I} can be solved

237:     using~$\cO\left(\Mat(\order,\precision) \right)$ operations

238:     in~$\basefield$;

239:   \item[(b)] problem\/~{\bf ii} can be solved using~$\cO\left(\order \,

240:     \sM (\precision) \log \precision\right)$ operations in~$\basefield$;

241:   \item[(c)] problem\/~{\bf II} can be solved using~$\cO\left(\order^2 \,

242:     \sM (\precision) \log \precision\right)$ operations in~$\basefield$.

243:   \end{enumerate}

244: \end{Theo}

245:

246: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

247:

248: \subsection{Special Coefficients}

249: For special classes of coefficients, we give different algorithms of

250: better complexity. We isolate two important classes of equations: that

251: with constant coefficients and that with polynomial coefficients.  In

252: the case of constant coefficients, our algorithms are based on the use

253: of the Laplace transform, which allows us to reduce the resolution of

254: differential equations with constant coefficients to manipulations

255: with rational functions.  The complexity results are summarized in the following theorem.

256: \begin{Theo}

257:   Let~$\precision$ and~$\order$ be two positive integers and

258:   let\/~$\basefield$ be a field of characteristic zero or at

259:   least~$\precision$. Then, for differential equations and systems with constant coefficients:

260:   \begin{enumerate}

261:   \item[(a)] problem\/~{\bf i} can be solved

262:     using~$\cO\left(\sM(\order)\,(\order+\precision) \right)$ operations

263:     in~$\basefield$;

264:   \item[(b)] problem\/~{\bf ii} can be solved using~$\cO\left(\sM(\order)\,(1+\precision/\order)\right)$ operations in~$\basefield$;

265:   \item[(c)] problem\/~{\bf I} can be solved using~$\cO\left( \order^{\omega+1}\log\order + \order\sM(\order)\precision \right)$ operations in~$\basefield$;

266:   \item[(d)] problem\/~{\bf II} can be solved using~$\cO\left( \order^\omega\log\order + \sM(\order)\precision \right)$ operations in~$\basefield$.

267:   \end{enumerate}

268: \end{Theo}

269: In the case of polynomial coefficients, we

270: exploit the linear recurrence satisfied by the coefficients of

271: solutions.  In Table~\ref{table1}, we gather the complexity estimates

272: corresponding to the best known solutions for each of the four

273: problems {\bf i}, {\bf ii}, {\bf I}, and~{\bf II} in the general case,

274: as well as in the above mentioned special cases. The algorithms are described in Section~\ref{sec:particular}.  In the polynomial

275: coefficients case, these results are well known. In the other cases,

276: to the best of our knowledge, the results improve upon existing

277: algorithms.

278:

279: \begin{table}

280: \renewcommand{\arraystretch}{1.4}

281: $$\begin{array}{||l|l|l|l||l||}\hline\hline   % & & & & \\

282:  \textsf{Problem} & \textsf{constant} & \textsf{polynomial}

283: & \textsf{power series} & \textsf{output}\\[-2mm]

284: \quad (\textsf{input, output}) & \textsf{coefficients} &

285: \textsf{coefficients} & \textsf{coefficients} & \textsf{size} \\

286: % & & & & \\

287:

288: \hline \hline \textbf{i} \quad  (\textsf{equation, basis}) &   \cO(\sM(\order)

289:  \precision) \;\hfill^\star & \cO(d \order^2 \precision) &   \cO(

290:  \Mat(\order, \precision)) \;\hfill ^\star &  \cO(\order \precision)\\

291:

292: \hline  \textbf{ii} \quad  (\textsf{equation, one solution}) &

293:  \cO(\sM(\order)  \precision/\order) \;\hfill^\star  &\cO(d

294:  \order \precision)  &\cO(\order \, \sM(\precision) \log \precision) \;\hfill ^\star & \cO(\precision)\\

295:

296:  \hline \hline

297:  \textbf{I} \quad (\textsf{system, basis}) &

298:  \cO(\order \sM(\order)

299:  \precision) \;\hfill^\star &  \cO(d \order^\omega \precision)

300:  & \cO(\Mat(\order, \precision))  \;\hfill ^\star  & \cO(\order^2 \precision)\\

301:

302: \hline \textbf{II} \quad  (\textsf{system, one solution})  &

303:  \cO(\sM(\order) \precision) \;\hfill ^\star &  \cO(d \order^2

304:  \precision)  & \cO(\order^2 \, \sM(\precision) \log

305:  \precision) \;\hfill ^\star  & \cO(\order \precision)\\

306:

307: \hline\hline

308: %\quad \quad \textsf{Input size} & \cO(\order^2)  &  \cO(d \order^2)  & \cO(\order^2

309: % \precision) \\ \hline \hline

310: \end{array}$$

311: \caption{Complexity of solving linear differential equations/systems for~$\precision\gg\order$.  Entries marked with a~`$\star$' correspond to new results. \label{table1}}

312: \end{table}

313:

314:

315:

316: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

317:

318: \subsection{Non-linear Systems}  As an important

319: consequence of Theorem~\ref{theo:linear}, we improve the known

320: complexity results for the more general problem of solving

321: \emph{non-linear} systems of differential equations. To do so, we use

322: a classical reduction technique from the non-linear to the linear

323: case, see for instance~\cite[Section~25]{Rall69}

324: and~\cite[Section~5.2]{BrKu78}. For simplicity, we only consider

325: non-linear systems of first order. There is no loss of generality in

326: doing so, more general cases can be reduced to that one by adding new

327: unknowns and possibly differentiating once. The following result

328: generalizes~\cite[Theorem~5.1]{BrKu78}.  If~${F=(F_1,\dots,F_r)}$ is a

329: differentiable function bearing on~$\order$

330: variables~${y_{1},\dots,y_{\order}}$, we use the notation~$\Jac(F)$

331: for the Jacobian matrix~$(\partial F_i/\partial y_j)_{1\leq i,j \leq \order}$.

332:

333: \begin{Theo}\label{theo:non-linear}

334:   Let~$\precision$, $\order$ be in~$\mathbb{N}$, let~$\basefield$ be a

335:   field of characteristic zero or at least\/~$\precision$ and

336:   let~$\varphi$ denote~${(\varphi_1,\dots,\varphi_{\order})}$,

337:   where~$\varphi_i(t,y)$ are multivariate power series

338:   in\/~$\basefield[[t,y_1,\dots,y_\order]]$.

339:   \par

340:   Let\/~${\sL :\N \to \N}$ be such that for all~$s(t)$

341:   in\/~$\mathcal{M}_{\order \times 1}(\basefield[[t]])$ and for all~$n$

342:   in\/~$\mathbb{N}$, the first~$n$ terms of~$\varphi(t,s(t))$ and

343:   of\/~$\Jac ({\varphi}) (t,s(t))$ can be computed in~$\sL(n)$ operations

344:   in\/~$\basefield$.  Suppose in addition that the function~${n \mapsto

345:     \sL(n)/n}$ is increasing. Given initial conditions~$v$

346:   in\/~$\mathcal{M}_{\order \times 1}(\basefield)$, if the differential

347:   system

348:   \[y'=\varphi(t,y),\qquad y(0)=v,\] admits a solution

349:   in\/~$\mathcal{M}_{\order \times 1} (\basefield[[t]])$, then the

350: first\/~$\precision$ terms of such a solution~$y(t)$ can be computed in

351: %  $\cO(\sL(N) + \Mat(\order,\precision))$ operations in $\basefield$.

352: %  $\cO(\sL(N) + \order^2 \sM(\precision) \log \precision)$ operations in $\basefield$.

353:   $\cO \left(\sL(\precision) + \min (\Mat(\order,\precision), \order^2

354:   \sM(\precision) \log \precision) \right)$ operations in~$\basefield$.

355: \end{Theo}

356: Werschulz~\cite[Theorem~3.2]{Werschulz80} gave an algorithm solving

357: the same problem using the integral Volterra-type equation technique

358: described in~\cite[pp.~172--173]{Rall69}.  With our notation, his

359: algorithm uses~$\cO \left(\sL(\precision) + \order^2 \precision \,

360: \sM(\precision)) \right)$ operations in~$\basefield$ to compute a

361: solution at precision~$\precision$. Thus, our algorithm is an

362: improvement for cases where $\sL(\precision)$ is known to be

363: subquadratic with respect to~$\precision$.

364:

365: The best known algorithms for power series composition in~${\order

366: \geq 2}$ variables require, at least on ``generic'' entries, a

367: number~${\sL(n) = \cO(n^{\order-1} \sM(n))}$ of operations in

368: $\basefield$ to compute the first~$n$ coefficients of the

369: composition~\cite[Section~3]{BrKu77}.  This complexity is nearly

370: optimal with respect to the size of a generic input. By contrast, in

371: the univariate case, the best known result~\cite[Th.~2.2]{BrKu78}

372: is~$\sL(n) = \cO(\sqrt{n \log n}\, \sM(n))$. For special entries,

373: however, better results can be obtained, already in the univariate

374: case: exponentials, logarithms, powers of univariate power series can

375: be computed~\cite[Section~13]{Brent75} in~$\sL(n) = \cO(\sM(n))$. As a

376: consequence, if~$\varphi$ is an~$\order$-variate sparse polynomial

377: with $m$~monomials of \emph{any} degree, then~$\sL(n) = \cO(m \order \,

378: \sM(n))$.

379:

380: Another important class of systems with such a

381: subquadratic~$\sL(\precision)$ is provided by \emph{rational systems},

382: where each~$\varphi_i$ is in~$\basefield(y_1,\dots,y_\order)$.

383: Supposing that the complexity of evaluation of~$\varphi$ is bounded

384: by~$L$ (i.e., for any point~$z$ in~$\basefield^\order$ at

385: which~$\varphi$ is well-defined, the value~$\varphi(z)$ can be

386: computed using at most~$L$ operations in~$\basefield$), then, the

387: Baur-Strassen theorem~\cite{BaSt83} implies that the complexity of

388: evaluation of the Jacobian~$\Jac(\varphi)$ is bounded by~$5L$, and

389: therefore, we can take~${\sL(n)= \sM(n) L}$ in the statement of

390: Theorem~\ref{theo:non-linear}.

391:

392: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

393:

394: \subsection{Basic Complexity Notation} \label{ssec:complexity}

395:

396: Our algorithms ultimately use, as a basic operation, multiplication

397: of matrices with entries that are polynomials (or truncated power

398: series).  Thus, to estimate their complexities in a unified manner,

399: we use a function~${\Mat : \N \times \N \to \N}$ such that any two~${r

400:   \times r}$ matrices with polynomial entries in~$\basefield[t]$ of

401: degree less than~$d$ can be multiplied using~$\Mat(r,d)$ operations

402: in~$\basefield$. In particular,~$\Mat(1,d)$ represents the number of

403: base field operations required to multiply two polynomials of degree

404: less than~$d$, while~$\Mat(r,1)$ is the arithmetic cost of scalar~${r

405:   \times r}$ matrix multiplication. For simplicity, we

406: denote~$\Mat(1,d)$ by~$\sM(d)$ and we have~${\Mat(r,1) =

407:   \cO(r^\omega)}$, where~${2 \leq \omega \leq 3}$ is the so-called {\em

408:   exponent of the matrix multiplication}, see, e.g.,~\cite{BuClSh97}

409: and~\cite{GaGe99}.

410:

411: Using the algorithms of~\cite{ScSt71,CaKa91}, one can take~$\sM(d)$

412: in~$\cO(d \log d \log \log d)$; over fields supporting FFT, one can

413: take~$\sM(d)$ in~$\cO(d\log d)$.  By~\cite{CaKa91} we can always

414: choose~$\Mat(r,d)$ in~${\cO(r^\omega \, \sM(d))}$, but better

415: estimates are known in important particular cases.  For instance, over

416: fields of characteristic~$0$ or larger than~$2d$, we have~${\Mat(r,d)

417:   = \cO( r^\omega d + r^2 \, \sM(d))}$, see~\cite[Th.~4]{BoSc05}.  To

418: simplify the complexity analyses of our algorithms, we suppose that the

419: {multiplication cost} function~$\Mat$ satisfies the following standard

420: growth hypotheses for all integers~$d_{1},d_{2}$ and~$r$: %(see, e.g., \cite{GaGe99}).

421: \begin{equation}\label{hyp:Mat}

422: \Mat(r,d_{1}d_{2}) \leq d_{1}^{2} \Mat (r,d_{2})

423: \qquad \text{and}  \qquad

424: \frac{\Mat(r,d_{1})}{d_{1}} \leq \frac{\Mat(r,d_{2})}{d_{2}}

425: \quad  \text{if $d_{1} \leq d_{2}$}.

426: \end{equation}

427: In particular, Equation~\eqref{hyp:Mat} implies the inequalities

428: \begin{equation} \label{ineq:Mat}

429: \begin{split}

430: 	\Mat(r,2^\kappa)+\Mat(r,2^{\kappa-1})+M(r,2^{\kappa-2})+\dots+\Mat(r,1)&

431: 		\le 2\Mat(r,2^\kappa),\\

432: 	\sM(2^\kappa)+2\sM(2^{\kappa-1})+4\sM(2^{\kappa-2})+\dots+2^\kappa\sM(1)&

433: 		\le (\kappa+1)\sM(2^\kappa).

434: \end{split}

435: \end{equation}

436: These inequalities are crucial to prove the estimates in

437: Theorem~\ref{theo:linear} and Theorem~\ref{theo:non-linear}.  Note

438: also that when the available multiplication algorithm is slower than

439: quasi-linear (e.g., Karatsuba or naive multiplication), then in the

440: second inequality, the factor~$(\kappa+1)$ can be replaced by a constant

441: and thus the estimates $\sM(\precision)\log \precision$ in our complexities become

442: $\sM(\precision)$ in those cases.

443:

444: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

445:

446: \subsection{Notation for Truncation}

447:

448: It is recurrent in algorithms to split a polynomial into a lower and a

449: higher part. To this end, the following notation proves convenient.

450: Given a polynomial~$f$, the remainder and quotient of its Euclidean

451: division by~$t^k$ are respectively denoted $\trunch fk$ and~$\truncl

452: fk$.  Another occasional operation consists in taking a middle part

453: out of a polynomial.  To this end, we let $\trunc fkl$

454: denote~$\truncl{\trunch fl}{k}$.  Furthermore, we shall write $f=g\mod

455: t^k$ when two polynomials or series $f$ and~$g$ agree up to

456: degree~$k-1$ included.  To get a nice behaviour of integration with

457: respect to truncation orders, all primitives of series are chosen with

458: zero as their constant coefficient.

459:

460: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

461: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

462: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

463:

464: \section{Newton Iteration for Systems of Linear Differential

465:   Equations}

466:

467: Let~${Y'(t) = A(t) Y(t)+B(t)}$ be a linear differential system,

468: where~$A(t)$ and~$B(t)$ are~${\order \times \order}$ matrices with

469: coefficients in~$\basefield[[t]]$. Given an invertible scalar

470: matrix~$Y_0$, an integer~${\precision \geq 1}$, and the expansions

471: of~$A$ and~$B$ up to precision~$\precision$, we show in this section

472: how to compute efficiently the power series expansion at

473: precision~$\precision$ of the unique solution of the Cauchy problem

474: $$Y'(t) = A(t) Y(t)+B(t) \quad \text{and} \quad Y(0) = Y_0.$$

475: This enables us to answer problems \textbf{I} and \textbf{i}, the

476: latter being a particular case of the former (through the application

477: to a companion matrix).

478:

479: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

480:

481: \subsection{Homogeneous Case}

482: First, we design a Newton-type iteration to solve the homogeneous

483: system~${Y'=A(t)Y}$.  The classical Newton iteration to solve an

484: equation $\phi(y)=0$ is $Y_{\kappa+1}=Y_\kappa-U_\kappa$, where

485: $U_\kappa$ is a solution of the linearized equation

486: $D\phi|_{Y_\kappa}\cdot U=\phi(Y_\kappa)$ and $D\phi|_{Y_\kappa}$ is

487: the differential of~$\phi$ at~$Y_\kappa$. We apply this idea to the

488: map~${\phi: Y \mapsto Y'-AY}$. Since~$\phi$ is linear, it is its own

489: differential and the equation for~$U$ becomes

490: $$U'-AU=Y'_\kappa-AY_\kappa.$$

491: Taking into account the proper orders of truncation and using

492: Lagrange's method of variation of

493: parameters~\cite{Lagrange1869,Ince56}, we are thus led to the

494: iteration

495: \[\begin{cases}Y_{\kappa+1} &= Y_\kappa - \trunch {U_\kappa}

496: {2^{\kappa+1}},\\

497: U_{\kappa}& = Y_\kappa \int

498: \trunch{Y_\kappa^{-1}}{2^{\kappa+1}} \left(Y_\kappa' -

499:   \trunch{A}{2^{\kappa+1}} Y_\kappa\right).

500: \end{cases}

501: \]

502: Thus we need to compute (approximations

503: of) the solution~$Y$ and its inverse simultaneously.  Now, a well-known Newton

504: iteration for the inverse $Z$ of $Y$ is

505: \begin{equation}\label{Newton:inverse}

506: Z_{\kappa+1} =

507: \trunch

508:   {Z_{\kappa} + Z_{\kappa} (I_\order - Y Z_{\kappa})}

509:   {2^{\kappa+1}}.

510: \end{equation}

511:  It was introduced by Schulz~\cite{Schulz33} in

512: the case of real matrices; its version for matrices of power series is

513: given for instance in~\cite{MoCa79}.

514:

515: \begin{figure}

516:   \begin{center}

517:     \fbox{\begin{minipage}{9cm}

518:       \medskip

519:       \begin{center}\textsf{SolveHomDiffSys}($A,\precision,Y_0$) \end{center}

520:       \textbf{Input:} ${Y_0,A_0, \dots, A_{\precision-2}}$

521:         in~$\mathcal{M}_{\order\times\order}(\basefield)$,

522:         ${A = \sum A_i t^i}$.

523:         \par\smallskip

524:         \textbf{Output:} ${Y=\sum_{i=0}^{\precision-1}Y_i t^i}$ in

525: $\mathcal{M}_{\order\times\order}(\basefield)[t]$ such that

526: ${Y' = A Y \mod t^{\precision-1}}$, and $Z=Y^{-1}\mod t^{\precision/2}$.

527:

528:         \begin{tabbing}

529:           \;\;\\$Y \leftarrow  (I_{\order}+ t  A_0) Y_0$

530:           \\$ Z \leftarrow Y_0^{-1}$

531:           \\$m \leftarrow 2$\\

532:           \textsf{while} $m \leq \precision/2$ \textsf{do}\\

533:           \hspace{0.5cm} $Z \leftarrow Z + \trunch {Z(I_{\order} - YZ)}{m} $\\

534:           \hspace{0.5cm} $Y \leftarrow Y - \trunch {Y\left(\int Z (Y' - \trunch{A}{2m-1} Y) \right)}{2m} $ \\

535:                                 %$+\sum_{i}\textsf{Coeff}(M', i)\frac{T^i}{i}$\\

536:           \hspace{0.5cm} $m \leftarrow 2m$ \\

537:           \textsf{return} $Y,Z$

538:         \end{tabbing}

539:       \end{minipage}

540:     }\end{center}

541:   \caption{Solving the Cauchy problem~$Y' = A(t) Y$,  $Y(0) = Y_0$ by Newton iteration.}

542:   \label{fig:hom}

543: \end{figure}

544: Putting together these considerations, we arrive at the algorithm

545: \textsf{SolveHomDiffSys} in Figure~\ref{fig:hom}, whose correctness

546: easily follows from Lemma~\ref{prop:Newton} below.  Remark

547: that in the scalar case~(${\order=1}$) algorithm

548: \textsf{SolveHomDiffSys} coincides with the algorithm for power series

549: exponential proposed by Hanrot and Zimmermann~\cite{HaZi04}; see

550: also~\cite{Bernstein}. In the case~${\order>1}$, ours is a nontrivial

551: generalization of the latter. Because it takes primitives of series at

552: precision~$\precision$, algorithm \textsf{SolveHomDiffSys} requires

553: that the elements~${2,3,\dots,\precision-1}$ be invertible

554: in~$\basefield$. Its complexity~$\sC$ satisfies the

555: recurrence~${\sC(m) = \sC(m/2) + \cO(\sM(\order,m))}$, which

556: implies~---~using the growth hypotheses on~$\sM$~---~that~${\sC(\precision)

557:   = \cO(\sM(\order,\precision))}$.  This proves the first assertion of

558: Theorem~\ref{theo:linear}.

559: %   It computes simultaneously the solutions $(Y,Z)$ of the problems

560: %   $$Y'-AY = 0 \bmod t^{\precision-1} \quad \text{and} \quad Z'+Z A = 0 \bmod

561: %   t^{\precision/2-1}.$$

562: %\bigskip

563:

564: % This is based on the following result, allowing to double the

565: % precision of the solution, by using only polynomial matrix operations.

566: \smallskip

567:

568:  \begin{Lemme}\label{prop:Newton}

569:    Let~$m$ be an even integer. Suppose

570:    that~$Y_{(0)}, Z_{(0)}$ in~$\mathcal{M}_{\order\times\order}(\basefield[t])$ satisfy

571:    \begin{equation*}

572:      I_{\order} - Y_{(0)} Z_{(0)} = 0 \mod t^{m/2} \quad \text{and} \quad

573:      Y_{(0)}' - AY_{(0)} = 0 \mod t^{m-1},

574:    \end{equation*}

575: and that they are of degree less than $m/2$ and~$m$, respectively.

576:    Define

577:    \begin{equation*}

578:      Z:=\trunch {Z_{(0)} \left(2I_{\order}  - Y_{(0)} Z_{(0)} \right)} {m} \quad \text{and} \quad

579:      Y:=\trunch {Y_{(0)} \left(I_{\order} - \int Z  (Y_{(0)}'-AY_{(0)})  \right)} {2m}.

580:    \end{equation*}

581:    Then~$Y$ and~$Z$ satisfy the equations

582:    \begin{equation} \label{eq:double}

583:      I_{\order} - Y Z = 0 \mod t^{m} \quad \text{and} \quad

584:      Y' - AY = 0 \mod t^{2m-1}.

585:    \end{equation}

586:  \end{Lemme}

587: \proof Using the definitions of~$Y$ and~$Z$, it follows that

588: $$

589: I_{\order} - YZ = (I_{\order} -Y_{(0)} Z_{(0)})^2 - (Y -

590: Y_{(0)}) Z_{(0)} (2I_{\order} -Y_{(0)} Z_{(0)}) \mod t^m.

591: $$

592: Since by hypothesis~${I_{\order} -Y_{(0)} Z_{(0)}}$ and~${Y -

593:   Y_{(0)}}$ are zero modulo~$t^{m/2}$, the right-hand side is zero

594: modulo~$t^m$ and this establishes the first formula in

595: Equation~\eqref{eq:double}.  Similarly, write~${Q= \int Z

596:   (Y_{(0)}'-AY_{(0)})}$ and observe $Q=0\mod t^m$ to get the equality

597: $$

598: Y' - AY = (I-YZ) (Y_{(0)}' - AY_{(0)}) - (Y_{(0)}' -

599: AY_{(0)}) Q \mod t^{2m-1}.

600: $$

601: Now,~${Y_{(0)}' - AY_{(0)}}=0 \mod t^{m-1}$, while~$Q$

602: and~${I_{\order} -YZ}$ are zero modulo~$t^{m}$ and therefore

603: the right-hand side of the last equation is zero modulo~$t^{2m-1}$,

604: proving the last part of the lemma.

605: \foorp

606:

607: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

608:

609: \subsection{General Case}

610: We want to solve the equation~${Y'=AY +B}$, where~$B$ is an~${\order

611: \times \order}$ matrix with coefficients in~$\basefield[[t]]$.

612: Suppose that we have already computed the solution~$\widetilde{Y}$ of

613: the associate homogeneous equation~${\widetilde{Y}'=A \widetilde{Y}}$,

614: together with its inverse~$\widetilde{Z}$.  Then, by the method of

615: variation of parameters, ${Y_{(1)}= \widetilde{Y} \int \widetilde{Z}

616: B}$ is a particular solution of the inhomogeneous problem, thus the

617: general solution has the form~${Y = Y_{(1)}+\widetilde{Y}}$.

618:

619: \begin{figure}

620:   \begin{center}

621:     \fbox{\begin{minipage}{9.5cm}

622:         \medskip

623:         \begin{center}\textsf{SolveInhomDiffSys}($A,B,\precision,Y_0$) \end{center}

624:         \textbf{Input:} ${Y_0,A_0, \dots, A_{\precision-2}}$ in~$\mathcal{M}_{\order\times\order}(\basefield)$,

625:           ${A = \sum A_i t^i}$,

626:           \par\smallskip

627:           ${B_0, \dots, B_{\precision-2}}$ in~$\mathcal{M}_{\order\times\order}(\basefield)$,

628:             ${B(t) = \sum B_i t^i}$.

629:             \par\medskip

630:             \textbf{Output:} ${Y_1,\dots,Y_{\precision-1}}$

631:             in~$\mathcal{M}_{\order\times\order}(\basefield)$ such that ${Y=Y_0 + \sum Y_i

632:             t^i}$ satisfies~${Y' = A Y + B \mod t^{\precision-1}}$.

633:

634: \begin{tabbing}

635:   \;\;\\$\widetilde{Y},\widetilde{Z} \leftarrow \textsf{SolveHomDiffSys} (A,\precision,Y_0)$ \\

636:   $\widetilde{Z} \leftarrow \widetilde{Z} + \trunch {\widetilde{Z}(I_\order - \widetilde{Y}\widetilde{Z})} {\precision}$\\

637:   $Y \leftarrow \trunch {\widetilde{Y}   \int (\widetilde{Z}  B)} {\precision}$ \\

638:   $Y \leftarrow Y + \widetilde{Y}$\\

639:   \textsf{return} $Y$

640: \end{tabbing}

641: \end{minipage}

642: }\end{center}

643: \caption{Solving the Cauchy problem $Y' = A Y  + B, \; Y(0) = Y_0$ by Newton iteration.}

644: \label{fig:inhom}

645: \end{figure}

646:

647: Now, to compute the particular solution~$Y_{(1)}$ at

648: precision~$\precision$, we need to know both~$\widetilde{Y}$

649: and~$\widetilde{Z}$ at the same precision~$\precision$. To do this, we

650: first apply the algorithm for the homogeneous case and

651: iterate~\eqref{Newton:inverse} once. The resulting algorithm is

652: encapsulated in Figure~\ref{fig:inhom}.

653:

654: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

655: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

656: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

657:

658: \section{Divide-and-conquer Algorithm}\label{sec:DAC}

659:

660: We now give our second algorithm, which allows us to solve problems

661: {\bf ii} and~{\bf II} and to finish the proof of

662: Theorem~\ref{theo:linear}.  Before entering a detailed presentation,

663: let us briefly sketch the main idea in the particular case of a

664: homogeneous differential equation~${\mathcal{L}y=0}$,

665: where~$\mathcal{L}$ is a linear differential operator in~${\delta = t

666: \frac{d}{dt}}$ with coefficients in~$\basefield[[t]]$.

667: % FC, 11/04/2006: Je re-coupe, sinon cette phrase est trop longue !

668: (The introduction of~$\delta$ is only for pedagogical reasons.)  The

669: starting remark is that if a power series~$y$ is written as~${y_0 +

670: t^m y_1}$, then~${\mathcal{L}(\delta)y = \mathcal{L}(\delta)y_0 +

671: t^m\mathcal{L}(\delta + m)y_1}$. Thus, to compute a solution~$y$

672: of~${\mathcal{L}(\delta) y = 0 \mod t^{2m}}$, it suffices to determine

673: the lower part of~$y$ as a solution of ${\mathcal{L}(\delta) y_0 = 0

674: \mod t^m}$, and then to compute the higher part~$y_1$, as a solution

675: of the inhomogeneous equation~${\mathcal{L}(\delta + m) y_1 = - R \mod

676: t^{m}}$, where the rest~$R$ is computed so that~${\mathcal{L}(\delta)

677: y_0 = t^m R \mod t^{2m}}$.

678:

679: Our algorithm \textsf{DivideAndConquer} makes a recursive use of this idea. Since, during the

680: recursions, we are naturally led to treat inhomogeneous equations of a

681: slightly more general form than that of~{\bf II} we introduce the

682: notation~$\mathcal{E}(s,p,m)$ for the vector equation

683: \begin{equation*}

684: t y' +   (p I_\order - tA) y =  s \mod t^{m}.

685: \end{equation*}

686: The algorithm is described in Figure~\ref{fig:algo-dac}.

687: Choosing~${p=0}$ and~${s(t) =t b(t)}$ we retrieve the equation of

688: problem~{\bf II}.  Our algorithm \textsf{Solve} to solve problem~{\bf

689: II} is thus a specialization of \textsf{DivideAndConquer}, defined by

690: making \textsf{Solve}$(A,b,\precision,v)$ simply call

691: \textsf{DivideAndConquer}$(tA,tb,0,\precision,v)$. Its correctness relies on

692: the following immediate lemma.

693:

694: \begin{figure}\label{fig:algo-dac}

695:   \begin{center}

696:     \fbox{\begin{minipage}{8.5 cm}

697:         \medskip

698:         \begin{center}\textsf{DivideAndConquer($A,s,p,m,v$)} \end{center}

699: 	\textbf{Input:} $A_0,\dots,A_{m-1}$ in~$\mathcal{M}_{\order\times\order}(\basefield)$,

700:         ${A = \sum A_i t^i}$, $s_0,\dots,s_{m-1},v$ in~$\mathcal{M}_{\order\times1}(\basefield)$,

701: 	${s = \sum s_i t^i}$, $p$ in~$\basefield$.

702:         \par\smallskip

703:         \textbf{Output:} ${y=\sum_{i=0}^{\precision-1}y_i t^i}$ in

704: $\mathcal{M}_{\order\times1}(\basefield)[t]$ such that

705: ${ty' + (pI_{\order}-tA)y=s \mod t^m}$, ${y(0)=v}$.

706:

707:         \begin{tabbing}

708:           \textsf{If}~$m=1$ \textsf{then} \\

709:           {\quad \textsf{if}} $p=0$ \textsf{then} \\

710:           {\quad \quad \textsf{return}} $v$\\

711:           {\quad else}  \textsf{return} $p^{-1} s(0)$\\

712:           \textsf{end if}\\

713:           $d \leftarrow \intpart{m/2}$\\

714:           $s \leftarrow \trunch s{d}$\\

715:           $y_0 \leftarrow$ {\sf DivideAndConquer}($A,s,p,d,v$)\\

716:           $R \leftarrow \trunc{s- t y_0' - (p I_\order -tA) y_0}{d}{m} $ \\

717:           $y_1 \leftarrow$ {\sf DivideAndConquer}($A, R, p+d, m-d,v$)\\

718:           \textsf{return} $y_0 + t^d y_1$

719:         \end{tabbing}

720:       \end{minipage}

721:     }\end{center}

722:   \caption{Solving $ty' + (pI_{\order}-tA)y=s \mod t^m$, ${y(0)=v}$,

723:     by divide-and-conquer.}

724: \label{fig:2}

725:  \end{figure}

726:

727: \begin{Lemme}

728:   Let~$A$ in~$\mathcal{M}_{\order\times\order}(\basefield[[t]])$, $s$

729:   in~$\mathcal{M}_{\order \times 1}(\basefield[[t]])$, and let~$p,d$

730:   in~$\mathbb{N}$.  Decompose~$\trunch sm$ into a sum~${s_0 +

731:     t^d s_1}$.  Suppose that~$y_0$

732:   in~$\mathcal{M}_{\order\times1}(\basefield[[t]])$ satisfies the

733:   equation~$\mathcal{E}(s_0,p,d)$, set $R$ to be

734:   \begin{equation*}

735:     \trunch {(ty'_0 + (pI_\order - t A) y_0 - s_0)/t^d} {m-d},

736:   \end{equation*}

737:   and let~$y_1$ in~$\mathcal{M}_{\order \times 1}(\basefield[[t]])$ be

738:   a solution of the equation~${\mathcal{E}(s_1-R,p+d,m-d)}$.  Then the

739:   sum $y:= y_0 + t^d y_1$ is a solution of the

740:   equation~$\mathcal{E}(s,p,m)$.

741: \end{Lemme}

742:

743: The only divisions performed along our algorithm~\textsf{Solve} are by 1, \dots, $\precision-1$.

744: As a consequence of this remark and of the previous lemma, we deduce the complexity estimates in the proposition below;

745: for a general matrix~$A$, this proves point~(c) in Theorem~\ref{theo:linear}, while the

746: particular case when $A$~is companion proves point~(b).

747:

748: \begin{Prop}

749:   Given the first~$m$ terms of the entries

750:   of~$A\in\mathcal{M}_{\order\times\order}(\basefield[[t]])$ and

751:   of~$s\in\mathcal{M}_{\order \times 1}(\basefield[[t]])$,

752:   given~$v\in\mathcal{M}_{\order \times 1}(\basefield)$,

753:   algorithm~$\emph{\textsf{DivideAndConquer}}(A,s,p,m,v)$ computes a

754:   solution of the linear differential system~${ty' + (pI_{\order}-tA)

755:   y=s \mod t^m}$, ${y(0)=v}$, using~${\cO(\order^2 \, \sM(m) \log m)}$

756:   operations in~$\basefield$. If $A$ is a companion matrix, the cost

757:   reduces to ${\sC(m) = \cO(\order \, \sM(m) \log m)}$.

758: \end{Prop}

759: \proof The correctness of the algorithm follows from the previous

760: Lemma.  The cost~$\sC(m)$ of the algorithm satisfies the recurrence

761: $$ \sC(m) = \sC(\intpart{m/2}) + \sC(\trunch{m/2}{}) + \order^2 \, \sM(m)

762: + \cO(\order m),$$ where the term $\order^2 \, \sM(m)$ comes from the

763: application of $A$ to $y_0$ used to compute the rest~$R$. From this

764: recurrence, it is easy to infer that~${\sC(m) = \cO(\order^2 \, \sM(m)

765: \log m)}$. Finally, when $A$ is a companion matrix, the vector~$R$ can

766: be computed in time $O(\order \, \sM(m))$, which implies that in this

767: case~${\sC(m) = \cO(\order \, \sM(m) \log m)}$.

768: \foorp

769:

770: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

771: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

772: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

773:

774: \section{Faster Algorithms for Special Coefficients}\label{sec:particular}

775:

776: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

777:

778: \subsection{Constant Coefficients}\label{ssec:const-coeffs}

779: Let~$A$ be a constant~${\order \times \order}$ matrix and let~$v$ be a

780: vector of initial conditions. Given~${\precision \geq 1}$, we want to

781: compute the first~$\precision$ coefficients of the series expansion of

782: a solution~$y$ in~$\mathcal{M}_{\order \times 1}(\basefield[[t]])$

783: of~${y' = Ay}$, with~${y(0) = v}$. In this setting, many various

784: algorithms have been proposed to solve problems {\bf i}, {\bf ii},

785: {\bf I}, and {\bf II}, see for

786: instance~\cite{Pennell26,Putzer66,Kirchner67,Fulmer75,MoLo78,Leonard96,Liz98,Gu99,Gu01,HaFiSm01,MoLo03,LuRo04}.

787: Again, the most naive algorithm is based on the method of undetermined

788: coefficients. On the other hand, most books on differential equations,

789: see, e.g., \cite{Ince56,Coddington61,Arnold92} recommend to simplify

790: the calculations using the Jordan form of matrices. The main drawback

791: of that approach is that computations are done over the algebraic

792: closure of the base field~$\basefield$. The best complexity result

793: known to us is given in~\cite{LuRo04} and it is quadratic in~$\order$.

794:

795: We concentrate first on problems~{\bf ii} and~{\bf II} (computing a

796: single solution for a single equation, or a first-order system).

797: Our algorithm for problem~{\bf II}

798: uses~${\cO(\order^\omega \log \order + \precision \sM(\order))}$

799: operations in~$\basefield$ for a general constant matrix~$A$ and

800: only~$\cO(\precision \sM(\order)/\order)$ operations in~$\basefield$ in

801: the case where~$A$ is a companion matrix (problem {\bf ii}). Despite

802: the simplicity of the solution, this is, to the best of our knowledge,

803: a new result.

804:

805: In order to compute~${y_\precision = \sum_{i=0}^\precision{A^i v

806: t^i/i!}}$, we first compute its Laplace

807: transform~${z_\precision=\sum_{i=0}^\precision {A^i v t^i}}$: indeed,

808: one can switch from~$y_\precision$ to~$z_\precision$ using

809: only~$\cO(\precision \order)$ operations in $\basefield$.  The

810: vector~$z_\precision$ is the truncation at order~${\precision + 1}$

811: of~${z=\sum_{i\ge0} A^i v t^i =(I-tA)^{-1} v}$. As a byproduct of a

812: more difficult question,~\cite[Prop.~10]{Storjohann02} shows

813: that~$z_\precision$ can be computed using~$\cO(\precision

814: \order^{\omega-1})$ operations in~$\basefield$. We propose a solution

815: of better complexity.

816:

817: By Cramer's rule,~$z$ is a vector of rational functions~$z_i(t)$, of

818: degree at most~$\order$.  The idea is to first compute~$z$ as a

819: rational function, and then to deduce its expansion

820: modulo~$t^{\precision +1}$. The first part of the algorithm does not

821: depend on~$\precision$ and thus it can be seen as a precomputation.

822: For instance, one can use%Algorithm \texttt{SeriesSolutionSmallRHS} in

823: ~\cite[Corollary~12]{Storjohann02}, to compute $z$ in

824: complexity~$\cO(\order^{\omega} \log \order)$. In the second step of

825: the algorithm, we have to expand~$\order$ rational functions of degree

826: at most~$\order$ at precision~$\precision$.  Each such expansion

827: can be performed using~$\cO(\precision\sM(\order)/\order)$ operations

828: in~$\basefield$, see, e.g., the proof of~\cite[Prop.~1]{BoFlSaSc05}.

829: The total cost of the algorithm is thus~${\cO(\order^\omega\log \order

830: + \precision \sM(\order))}$. We give below a simplified variant with

831: same complexity, avoiding the use of the algorithm

832: in~\cite{Storjohann02} for the precomputation step and relying instead

833: on a technique which is classical in the computation of minimal

834: polynomials~\cite{BuClSh97}.

835: \begin{enumerate}

836: \item Compute the vectors~$v,Av,A^2 v,A^3v,\dots,A^{2r}v$

837:   in~$\cO(\order^\omega\log \order)$, as follows: \\ for~$\kappa$

838:   from~$1$ to~${1 + \log \order}$ do

839:     \begin{enumerate}

840:     \item compute~$A^{2^\kappa}$

841:     \item compute~${A^{2^\kappa} \times [v | Av | \cdots | A^{2^\kappa-1}v]}$,

842:         thus getting~${[A^{2^\kappa}v | A^{2^\kappa+1}v | \cdots | A^{2^{\kappa+1}-1}v]}$

843:     \end{enumerate}

844:   \item For each~${j=1, \dots, \order}$:

845: \begin{enumerate}

846: \item recover the rational fraction whose series expansion

847:   is~$\sum{(A^i v)_j t^i}$ by Pad\'e approximation

848:   in~$\cO(\sM(\order)\log \order)$ operations

849: \item compute its expansion up to precision $t^{\precision + 1}$

850:   in~$\cO(\precision \, \sM(\order)/\order)$ operations

851: \item recover the expansion of~$y$ from that of~$z$,

852:   using~$\cO(\precision)$ operations.

853: \end{enumerate}

854: \end{enumerate}

855: This yields the announced total cost of ~${\cO(\order^\omega \log

856: \order + \precision \sM(\order))}$ operations for problem {\bf II}.

857:

858: We now turn to the estimation of the

859: cost for problems~{\bf i} and~{\bf I} (bases of solutions).

860: In the case of equations with constant coefficients, we use the

861: Laplace transform again. If $y = \sum_{i \geq 0} y_i t^i$ is a

862: solution of an order $\order$ equation with constant coefficients,

863: then the sequence $(z_i)=(i! y_i)$ is generated by a linear recurrence

864: with constant coefficients. Hence, the first terms $z_1,\dots,z_\precision$ can

865: be computed in $O(\precision\sM(\order)/\order)$ operations, using again the

866: algorithm described in~\cite[Prop.~1]{BoFlSaSc05}.

867: For problem~{\bf I}, the exponent~$\omega+1$ in the cost of the precomputation can be reduced to~$\omega$ by a very different approach; we cannot give the details here for space limitation.

868:

869:

870: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

871:

872: \subsection{Polynomial Coefficients}

873: If the coefficients in one of the problems {\bf i, ii, I}, and~{\bf II}

874: are polynomials in~$\basefield[t]$ of degree at most~$d$, using the

875: linear recurrence of order~$d$ satisfied by the coefficients of the

876: solution seemingly yields the lowest possible complexity.

877: Consider for instance problem~{\bf II}.

878: Plugging~${A=\sum_{i=0}^d t^i A_i}$, ${b=\sum_{i=0}^d t^i

879:   b_i}$, and~${y=\sum_{i\geq 0}^d t^i y_i}$ in the

880: equation~${y'=Ay+b}$, we arrive at the following recurrence

881: $$

882: y_{k+d+1} = (d+k+1)^{-1} (A_d y_k + A_{d-1}y_{k+1} + \dots + A_0

883: y_{k+d} + b_{k+d}), \quad \text{for all $k \geq -d$}.

884: $$

885: Thus, to compute~$y_0,\dots,y_\precision$, we need to

886: perform~${\precision d}$ matrix-vector products; this is done

887: using~${\cO (d \precision \order^2)}$ operations in~$\basefield$. A

888: similar analysis implies the other complexity estimates in the third

889: column of Table~\ref{table1}.

890:

891: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

892: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

893: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

894:

895: \section{Non-linear Systems of Differential Equations}

896: Let~${\varphi(t,y) = (\varphi_1(t,y), \dots, \varphi_\order(t,y))}$,

897: where each~$\varphi_i$ is a power series

898: in~$\basefield[[t,y_1,\dots,y_\order]]$. We consider the first-order

899: non-linear system in~$y$

900: \[

901: %(\mathcal{N})\qquad \left\{

902: %\begin{aligned}

903: % y_1'(t)& = \varphi_1(t,y_1(t),\dots,y_\order(t)), \\

904: %&\,\,\vdots \\

905: % y_\order'(t)& = \varphi_\order(t,y_1(t),\dots,y_\order(t)).

906: %\end{aligned}

907: %\right.

908: (\mathcal{N})\qquad % \left\{

909: y_1'(t) = \varphi_1(t,y_1(t),\dots,y_\order(t)), \quad\dots,\quad

910:  y_\order'(t) = \varphi_\order(t,y_1(t),\dots,y_\order(t)).

911: %\right.

912: \]

913:

914: To solve~($\mathcal{N}$), we use the classical technique of

915: \emph{linearization}. The idea is to attach, to an \emph{approximate}

916: solution~$y_0$ of~($\mathcal{N}$), a \emph{tangent} system in the new unknown~$z$,

917: $$

918: (\mathcal{T},y_0) \qquad z' = \Jac(\varphi)(y_0) z - y_0'+

919: \varphi(y_0),

920: $$

921: which is linear and whose solutions serve to obtain a

922: better approximation of a true solution of~($\mathcal{N}$).  Indeed,

923: let us denote by~$(\mathcal{N}_m),(\mathcal{T}_m)$ the

924: systems~$(\mathcal{N}),(\mathcal{T})$ where all the equalities are

925: taken modulo~$t^m$.  Taylor's formula states that the

926: expansion~${\varphi(y+z) - \varphi(y) - \Jac(\varphi)(y) z}$

927: is equal to~$0$ modulo~$z^2$.

928: It is a simple matter to check that if~$y$ is a

929: solution of~$(\mathcal{N}_m)$ and if~$z$ is a solution

930: of~$(\mathcal{T}_{2m},y)$, then~${y+z}$ is a solution

931: of~$(\mathcal{N}_{2m})$.  This justifies the correctness of

932: Algorithm {\sf SolveNonLinearSys}.

933:

934:  To analyze the complexity of this algorithm, it suffices to remark

935:  that for each integer~$\kappa$ between $1$ and~$\intpart{\log \precision}$,

936:  one has to compute one solution of a linear inhomogeneous first-order

937:  system at precision~$2^\kappa$ and to evaluate~$\varphi$ and its

938:  Jacobian on a series at the same precision. This concludes the proof of Theorem~\ref{theo:non-linear}.

939:

940:

941: \begin{figure}[h]

942: \begin{center}

943:   \fbox{\begin{minipage}{9 cm}

944:

945:       \medskip

946:       \begin{center}\textsf{SolveNonLinearSys}($\phi,v$) \end{center}

947:       \textbf{Input:} $\precision$ in~$\mathbb{N}$,

948:       $\varphi(t,y)$ in~$\basefield[[t,y_1,\dots,y_\order]]^{\order}$,

949:       $v$ in~$\basefield^\order$

950:       \par\smallskip

951:       \textbf{Output:} first~$\precision$ terms of a~$y(t)$

952:       in~$\basefield[[t]]$ such that~${y(t)' = \varphi(t,y(t)) \mod

953:         t^\precision}$ and~${y(0) = v}$.

954: \begin{tabbing}

955:   \;\;    \\$m \leftarrow 1$\\

956: $y \leftarrow v$ \\

957:   \textsf{while} $m \leq \precision/2$ \textsf{do}\\

958:   \hspace{0.5cm} $A \leftarrow \trunch{\Jac(\varphi) (y)}{2m}$\\

959:   \hspace{0.5cm} $b \leftarrow \trunch{\varphi (y) - y'}{2m}$ \\

960: % \hspace{0.5cm} $z \leftarrow \textsf{Solve}(z' = Az + b \mod t^{2m}, z(0)=0)$ \\

961:   \hspace{0.5cm} $z \leftarrow \textsf{Solve}(A, b, 2m, 0)$ \\

962:   \hspace{0.5cm} $y \leftarrow y + z$ \\

963:   \hspace{0.5cm} $m \leftarrow 2m$ \\

964:   \textsf{return} $y$

965: \end{tabbing}

966: \end{minipage}

967: }\end{center}

968: \caption{Solving the non-linear differential system ${y' = \varphi(t,y), \; y(0) = v}$.}

969: \label{fig:nonlinear}

970:  \end{figure}

971:

972:

973: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

974: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

975: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

976:

977:  \section{Implementation and Timings}

978:

979:  We implemented our algorithms \textsf{SolveDiffHomSys} and

980:  \textsf{Solve} in Magma~\cite{Magma} and ran the programs on an Athlon processor at 2.2~GHz

981:  with 2~GB of memory.\footnote{All the computations have been done on the machines of

982:    the MEDICIS ressource center

983:    \url{http://medicis.polytechnique.fr}.}  We used Magma's built-in

984:  polynomial arithmetic (using successively naive, Karatsuba and FFT

985:  multiplication algorithms), as well as Magma's scalar matrix

986:  multiplication (of cubic complexity in the ranges of our interest).

987:  We give three tables of timings. First, we compare in Figure

988:  \ref{fig:benchs} the performances of our algorithm

989:  \textsf{SolveDiffHomSys} with that of the naive quadratic algorithm,

990:  for computing a basis of (truncated power series) solutions of a

991:  homogeneous system.  The order of the system varies from $2$ to $16$,

992:  while the precision required for the solution varies from 256 to

993:  4096; the base field is $\mathbb{Z}/p\mathbb{Z}$, where $p$ is a 32-bit prime.

994:

995:

996: \begin{figure}

997: \begin{center}

998: \begin{tabular}{c||c|c|c|c}

999: $ \precision \ddots  \order$      &  2  & 4   & 8 & 16 \\

1000: \hline

1001: 256    &   0.02 \text{vs.}  2.09      & 0.08 \text{vs.}  6.11        &  0.44  \text{vs.}  28.16     &  2.859 \text{vs.}  168.96 \\

1002: 512    &  0.04 \text{vs.}  8.12       &  0.17 \text{vs.}  25.35     & 0.989  \text{vs.}  113.65  &  6.41 \text{vs.}  688.52 \\

1003: 1024  & 0.08 \text{vs.}  32.18      &  0.39  \text{vs.}  104.26  & 2.30 \text{vs.}  484.16     &  15   \text{vs.}  2795.71\\

1004: 2048  & 0.18  \text{vs.}  128.48   &  0.94 \text{vs.}  424.65   & 5.54 \text{vs.}  2025.68  &  36.62  \text{vs.} $> 3$\text{hours} $^\star$\\

1005: 4096  & 0.42 \text{vs.}  503.6      &  2.26 \text{vs.}  1686.42 &  13.69 \text{vs.}  8348.03  & 92.11  \text{vs.} $> 1/2$ \text{day}$^\star$ \\

1006:

1007: \end{tabular}

1008: \end{center}

1009: \caption{Computation of a basis of a linear homogeneous system with

1010:   $\order$ equations, at precision $\precision$: comparison of timings

1011:   (in seconds) between algorithm \textsf{SolveDiffHomSys} and the

1012:   naive algorithm. Entries marked with a `$\star$' are estimated timings.}

1013: \label{fig:benchs}

1014: \end{figure}

1015:

1016: Then we display in Figure~\ref{fig:matmul} and Figure~\ref{fig:newton}

1017: the timings obtained respectively with

1018: algorithm~\textsf{Solve\-DiffHomSys} and with the algorithm for

1019: polynomial matrix multiplication \textsf{PolyMatMul} that was used as

1020: a primitive of \textsf{SolveDiffHomSys}. The similar shapes of the two

1021: surfaces indicate that the complexity prediction of point (a) in

1022: Theorem~\ref{theo:linear} is well respected in our implementation:

1023: \textsf{SolveDiffHomSys} uses a constant number (between 4 and 5) of

1024: polynomial multiplications; note that the abrupt jumps at powers of 2

1025: reflect the performance of Magma's FFT implementation of polynomial

1026: arithmetic.

1027:

1028: \begin{figure}[ht]

1029: \begin{center}

1030: \begin{minipage}[b]{0.45\textwidth}

1031: \centerline{\includegraphics[scale=0.35,angle=270]{MatMul}}

1032: \caption{Timings of algorithm \textsf{PolyMatMul}.  \label{fig:matmul}}

1033: \end{minipage}\hskip0.1\textwidth

1034: \begin{minipage}[b]{0.45\textwidth}

1035: \centerline{\includegraphics[scale=0.35,angle=270]{Newton}}

1036: \caption{Timings of algorithm  \textsf{SolveDiffHomSys}.  \label{fig:newton}}

1037: \end{minipage}

1038: \end{center}

1039: \end{figure}

1040:

1041:

1042:

1043: In Figure ~\ref{fig:dac} we give the timings for the computation of

1044: one solution of a linear differential equation of order $2$, $4$, and

1045: $8$, respectively, using our algorithm~\textsf{Solve} in

1046: Section~\ref{sec:DAC}. Again, the shape of the three curves

1047: experimentally confirms the nearly linear behaviour established in

1048: point (b) of Theorem~\ref{theo:linear}, both in the

1049: precision~$\precision$ and in the order $\order$ of the complexity of

1050: algorithm ~\textsf{Solve}. Finally, Figure~\ref{fig:dac+naive}

1051: displays the three curves from Figure~\ref{fig:dac} together with the

1052: timings curve for the naive quadratic algorithm computing one solution

1053: of a linear differential equation of order $2$.  The conclusion is

1054: that our algorithm~\textsf{Solve} becomes very early superior to the

1055: quadratic one.

1056:

1057: \begin{figure}[ht]

1058: \begin{center}

1059: \begin{minipage}[b]{0.45\textwidth}

1060: \centerline{\includegraphics[scale=0.3,angle=270]{DAC}}

1061: \caption{Timings of algorithm \textsf{Solve} for equations of orders 2, 4, and~8. \label{fig:dac}}

1062: \end{minipage}\hskip0.1\textwidth

1063: \begin{minipage}[b]{0.45\textwidth}

1064: \centerline{\includegraphics[scale=0.3,angle=270]{DAC+Naive}}

1065: \caption{Same, compared to the naive algorithm for a second-order equation.\label{fig:dac+naive}}

1066: \end{minipage}

1067: \end{center}

1068: \end{figure}

1069:

1070:

1071: %precision~$\precision = 1048576$ in~24.53s; one at doubled

1072: %precision~$\precision=2097152$ in doubled time~49.05s; one for doubled

1073: We also implemented our algorithms of Section~\ref{ssec:const-coeffs}

1074: for the special case of constant coefficients. For reasons of space

1075: limitation, we only provide a few experimental results for

1076: problem~{\bf II}.  Over the same finite field, we computed: a solution

1077: of a linear system with~$\order=8$ at

1078: precision~$\precision\approx10^6$ in~24.53s; one at doubled precision

1079: in doubled time~49.05s; one for doubled order~$\order=16$ in doubled

1080: time~49.79s.

1081:

1082:

1083: \bibliographystyle{plain}

1084: \bibliography{focs}

1085:

1086: \end{document}

1087: