cs0604101/focs.tex
1: \documentclass[11pt]{amsart}
2: 
3: \usepackage{fullpage}
4: \usepackage[latin1]{inputenc}
5: \usepackage{bbm,natbib}
6: \usepackage{amsmath}
7: \usepackage{stmaryrd}
8: \usepackage{alltt, amssymb}
9: \usepackage{graphicx}
10: \usepackage{url}
11: 
12: \def\Jac{{\mathbf{Jac}}}
13: 
14: \newcommand{\sC}{\mathsf{C}}
15: \newcommand{\sL}{{\mathsf{L}}}
16: \newcommand{\sM}{{\mathsf{M}}}
17: \newcommand{\N}{\mathbb{N}}
18: \newcommand{\cO}{{\mathcal O}}
19: \newcommand{\order}{{r}}
20: \newcommand{\precision}{{N}}
21: \newcommand{\basefield}{{\mathbb{K}}}
22: 
23: \newcommand{\Mat}{{\mathsf{MM}}}
24: \renewcommand{\proof}{\noindent\textsc{Proof.} }
25: \newcommand{\foorp}{\hfill$\square$}
26: \newcommand{\tr}{\mathrm{trace}}
27: 
28: \newcommand{\trunc}[3]{\left[ #1 \right]_{#2}^{#3}}
29: \newcommand{\truncl}[2]{\left\lfloor #1 \right\rfloor_{#2}}
30: \newcommand{\trunch}[2]{\left\lceil #1 \right\rceil^{#2}}
31: \newcommand{\intpart}[1]{\left\lfloor #1 \right\rfloor}
32: 
33: \newtheorem{Theo}{Theorem}
34: \newtheorem{Prop}{Proposition}
35: \newtheorem{Lemme}{Lemma}
36: 
37: 
38: \usepackage{graphicx}
39: \usepackage{changebar}
40: \usepackage[plainpages=false,pdfpagelabels,colorlinks=true,citecolor=blue,hypertexnames=false]{hyperref}
41: 
42: 
43: \begin{document}
44: 
45: \title{Fast computation of power series solutions \\ of systems of
46:   differential equations}
47: 
48: \author{A. Bostan, F. Chyzak, F. Ollivier, B. Salvy, \'E. Schost, and A. Sedoglavic}
49: \thanks{Partially supported by a grant from the French \emph{Agence nationale pour la recherche}.}
50: %\date{Preliminary version 1.10 --- 11/04/2006}
51: 
52: \begin{abstract}
53:   We propose new algorithms for the computation of the first~$\precision$ terms
54:   of a vector (resp.\ a basis) of power series solutions of a linear
55:   system of differential equations at an ordinary point, using a
56:   number of arithmetic operations which is quasi-linear with respect
57:   to~$\precision$.  Similar results are also given in the non-linear case. This extends
58:   previous results obtained by Brent and
59:   Kung for scalar differential equations of order one and two.
60: \end{abstract}
61: \maketitle
62: 
63: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
64: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
65: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
66: 
67: \section{Introduction}
68: 
69: In this article, we are interested in the computation of the first $\precision$ terms of power  
70: series solutions of differential equations. This problem arises in  
71: combinatorics, where the desired power series is a generating  
72: function, as well as in numerical analysis and in particular in  
73: control theory.
74: 
75: Let~$\basefield$ be a field. Given~$r+1$ formal power
76: series~${a_0(t),\dots,a_{\order}(t)}$ in~$\basefield[[t]]$, one of
77: our aims is to provide fast algorithms for solving the
78: linear differential equation
79: % of order $\order$:
80: \begin{equation} \label{lindiffeq}
81: a_\order(t) y^{(\order)}(t) + \dots + a_1(t) y'(t)+ a_0(t) y(t) = 0. %
82: \end{equation}
83: Specifically, under the hypothesis that~$t=0$ is an ordinary point
84: for Equation~\eqref{lindiffeq} (i.e., ${a_r(0) \neq 0}$), we give efficient
85: algorithms taking as input the first~$\precision$ terms of the power
86: series $a_0(t), \dots, a_\order(t)$ and answering the following algorithmic questions:
87: \begin{enumerate}
88: \item[{\bf i.}]  find the first~$\precision$ coefficients of
89:   the~$\order$ elements of a basis of power series solutions
90:   of~\eqref{lindiffeq};
91: \item[{\bf ii.}] given initial conditions~$\alpha_0, \dots,
92:   \alpha_{\order-1}$ in~$\basefield$, find the first~$\precision$
93:   coefficients of the unique solution~$y(t)$ in~$\basefield[[t]]$ of
94:   Equation~\eqref{lindiffeq} satisfying 
95: \[
96:   y(0) = \alpha_0,\quad y'(0) = \alpha_1, \quad \dots,\quad y^{(\order-1)}(0) =
97:   \alpha_{\order-1}.
98: \]
99: \end{enumerate}
100: More generally, we also treat linear first-order systems of differential
101: equations. From the data of initial conditions~$v$
102: in~$\mathcal{M}_{\order\times\order} (\basefield)$
103: (resp.~$\mathcal{M}_{{\order} \times 1} (\basefield)$) and of the
104: first~$\precision$ coefficients of each entry of the matrices~$A$
105: and~$B$ in~$\mathcal{M}_{\order\times\order} (\basefield[[t]])$ (resp.~$b$
106: in~$\mathcal{M}_{{\order} \times 1} (\basefield[[t]])$), we propose
107: algorithms that compute the first~$\precision$ coefficients:
108: \begin{enumerate}
109: \item[\bf I.]  of a fundamental solution~$Y$ in~$\mathcal{M}_{\order\times\order}
110:   (\basefield[[t]])$ of~${Y' = AY + B}$, with~${Y(0)=v},\;{\det Y(0) \neq 0}$;
111: \item[\bf II.]  of the unique solution~$y(t)$
112:   in~$\mathcal{M}_{{\order} \times 1} (\basefield[[t]])$ of~${y' = Ay
113:     + b}$, satisfying~${y(0) =v}$.
114: \end{enumerate}
115: %% \begin{equation}\label{systlindiffeq:basis}
116: %% Y' = AY + B, \quad \text{with} \; A,B \in \mathcal{M}_{\order} (\basefield[[t]])
117: %% \end{equation}
118: %% and 
119: %% \begin{equation}\label{systlindiffeq:single}
120: %% y' = Ay + b
121: %% \end{equation}
122: Obviously, if an algorithm of algebraic complexity~$\sC$ (i.e.,
123: using~$\sC$ arithmetic operations in~$\basefield$) is available for
124: problem~{\bf II}, then applying it~$r$ times solves problem~{\bf I} in
125: time~$r \,\sC$, while applying it to a companion matrix solves
126: problem~{\bf ii} in time~$\sC$ and problem~{\bf i} in~$r
127: \,\sC$. Conversely, an algorithm solving~{\bf i} (resp. {\bf I}) also
128: solves {\bf ii} (resp. {\bf II}) within the same complexity, plus that
129: of a linear combination of series. Our reason for distinguishing the
130: four problems {\bf i, ii, I, II} is that in many cases, we are able to
131: give algorithms of better complexity than obtained by these
132: reductions.
133: 
134: The most popular way of solving~{\bf i}, {\bf ii}, {\bf I}, and~{\bf II} is the
135: method of undetermined coefficients that requires~$\cO(\order^2
136: \precision^2)$ operations in~$\basefield$ for problem~{\bf i}
137: and~$\cO(\order \precision^2)$ operations in~$\basefield$ for~${\bf
138:   ii}$. Regarding the dependence in~$\precision$, this is certainly
139: too expensive compared to the size of the output, which is only linear
140: in~$\precision$ in both cases. On the other hand, verifying the
141: correctness of the output for~{\bf ii} (resp.~{\bf i}) already
142: requires a number of operations in~$\basefield$ which is linear
143: (resp.\ quadratic) in~$\order$: this indicates that there is little
144: hope of improving the dependence in~$\order$.  Similarly, for
145: problems~{\bf I} and~{\bf II}, the method of undetermined coefficients
146: requires~$\cO(\precision^2)$ multiplications of~$\order\times \order$
147: scalar matrices (resp.\ of scalar matrix-vector products in
148: size~$\order$), leading to a computational cost which is reasonable
149: with respect to~$\order$, but not with respect to~$\precision$.
150: 
151: By contrast, the algorithms proposed in this article have costs that
152: are linear (up to logarithmic factors) in the
153: complexity~$\sM(\precision)$ of polynomial multiplication in degree
154: less than~$\precision$ over~$\basefield$. Using Fast Fourier Transform
155: (FFT) these costs become nearly linear~---~up to polylogarithmic
156: factors~---~with respect to~$\precision$, for all of the four problems
157: above (precise complexity results are stated below).  Up to these
158: polylogarithmic terms in~$\precision$, this estimate is probably not
159: far from the lower algebraic complexity one can expect: indeed, the
160: mere check of the correctness of the output requires, in each case, a
161: computational effort proportional to~$\precision$.
162: 
163: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
164: 
165: \subsection{Newton Iteration}
166: In the case of first-order equations ($r=1$), Brent and Kung have
167: shown in~\cite{BrKu78} (see also~\cite{Geddes1979,KuTr78}) that the problems
168: can be solved with complexity $\cO(\sM(\precision))$ by means of a
169: formal Newton iteration. Their algorithm is based on the fact that
170: solving the first-order differential equation~${y'(t) = a(t) y(t)}$,
171: with~$a(t)$ in~$\basefield[[t]]$ is equivalent to computing the
172: \emph{power series exponential\/}~$\exp(\int a(t))$.  This equivalence
173: is no longer true in the case of a system~${Y' = A(t) Y}$
174: (where~$A(t)$ is a power series matrix): for non-commutativity
175: reasons, the matrix exponential~${Y(t)= \exp(\int A(t))}$ is not a
176: solution of~${Y' = A(t) Y}$.
177: 
178: Brent and Kung suggest a way to extend their result to higher orders,
179:  and the corresponding algorithm has been shown by van der Hoeven
180:  in~\cite{vdHoeven02} to have complexity~$\cO(\order^\order
181:  \,\sM(\precision))$. This is good with respect to~$\precision$, but
182:  the exponential dependence in the order~$\order$ is unacceptable.
183: 
184: Instead, we solve this problem by devising a specific Newton iteration
185: for~${Y' = A(t) Y}$.  Thus we solve problems {\bf i} and {\bf I} in
186: $\cO(\Mat(\order,\precision))$, where $\Mat(\order,\precision)$ is the
187: number of operations in $\basefield$ required to multiply
188: $\order\times\order$ matrices with polynomial entries of degree less
189: than~$\precision$. For instance, when $\basefield=\mathbb{Q}$, this is
190: $\cO(\order^\omega \precision+r^2\sM(\precision))$, where
191: $\order^\omega$~can be seen as an abbreviation for~$\Mat(\order,1)$, see
192: \S\ref{ssec:complexity} below.
193: 
194: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
195: 
196: \subsection{Divide-and-conquer}
197: The resolution of problems {\bf i} and {\bf I} by Newton iteration
198: relies on the fact that a whole basis is computed. Dealing with
199: problems {\bf ii} and {\bf II}, we do not know how to preserve this
200: algorithmic structure, while simultaneously saving a factor $\order$.
201: 
202: To solve problems~{\bf ii} and~{\bf II}, we therefore propose an
203: alternative algorithm, whose complexity is also nearly linear
204: in~$\precision$ (but not quite as good, being in
205: $\cO(\sM(\precision)\log\precision)$), but whose dependence in the
206: order~$\order$ is better~---~linear for~{\bf i} and quadratic for~{\bf
207: ii}. In a different model of computation with power series, based on
208: the so-called \emph{relaxed multiplication}, van der Hoeven briefly outlines
209: another algorithm~\cite[Section~4.5.2]{vdHoeven02} solving
210: problem~{\bf ii} in~$\cO(\order \,\sM(\precision) \log \precision)$.
211: To our knowledge, this result cannot be transferred to the usual model
212: of power series multiplication (called zealous in~\cite{vdHoeven02}).
213: 
214: We use a divide-and-conquer technique similar to that used in the fast
215: Euclidean algorithm~\cite{Knuth70,Schonhage71,Strassen83}. For
216: instance, to solve problem~{\bf ii}, our algorithm divides it into two
217: similar problems of halved size. The key point is that the lowest
218: coefficients of the solution~$y(t)$ only depend on the lowest
219: coefficients of the coefficients~$a_i$.  Our algorithm first computes
220: the desired solution~$y(t)$ at precision only~$\precision/2$, then it
221: recovers the remaining coefficients of~$y(t)$ by recursively solving
222: at precision~$\precision/2$ a new differential equation.  The main
223: idea of this second algorithm is close to that used for solving
224: first-order difference equations in~\cite{GaGe97}.
225: 
226: We encapsulate our main complexity results in
227: Theorem~\ref{theo:linear} below.  When FFT is used, the
228: functions~$\sM(\precision)$ and~$\Mat(\order,\precision)$ have, up to logarithmic terms, a nearly linear
229: growth in~$\precision$, see
230: \S\ref{ssec:complexity}. Thus, the results in the following theorem are quasi-optimal.
231: \begin{Theo}\label{theo:linear}
232:   Let~$\precision$ and~$\order$ be two positive integers and
233:   let\/~$\basefield$ be a field of characteristic zero or at
234:   least~$\precision$. Then:
235:   \begin{enumerate}
236:   \item[(a)] problems\/~{\bf i} and\/~{\bf I} can be solved
237:     using~$\cO\left(\Mat(\order,\precision) \right)$ operations
238:     in~$\basefield$;
239:   \item[(b)] problem\/~{\bf ii} can be solved using~$\cO\left(\order \,
240:     \sM (\precision) \log \precision\right)$ operations in~$\basefield$;
241:   \item[(c)] problem\/~{\bf II} can be solved using~$\cO\left(\order^2 \,
242:     \sM (\precision) \log \precision\right)$ operations in~$\basefield$.
243:   \end{enumerate}
244: \end{Theo}
245: 
246: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
247:   
248: \subsection{Special Coefficients}  
249: For special classes of coefficients, we give different algorithms of
250: better complexity. We isolate two important classes of equations: that
251: with constant coefficients and that with polynomial coefficients.  In
252: the case of constant coefficients, our algorithms are based on the use
253: of the Laplace transform, which allows us to reduce the resolution of
254: differential equations with constant coefficients to manipulations
255: with rational functions.  The complexity results are summarized in the following theorem.
256: \begin{Theo}
257:   Let~$\precision$ and~$\order$ be two positive integers and
258:   let\/~$\basefield$ be a field of characteristic zero or at
259:   least~$\precision$. Then, for differential equations and systems with constant coefficients:
260:   \begin{enumerate}
261:   \item[(a)] problem\/~{\bf i} can be solved
262:     using~$\cO\left(\sM(\order)\,(\order+\precision) \right)$ operations
263:     in~$\basefield$;
264:   \item[(b)] problem\/~{\bf ii} can be solved using~$\cO\left(\sM(\order)\,(1+\precision/\order)\right)$ operations in~$\basefield$;
265:   \item[(c)] problem\/~{\bf I} can be solved using~$\cO\left( \order^{\omega+1}\log\order + \order\sM(\order)\precision \right)$ operations in~$\basefield$;
266:   \item[(d)] problem\/~{\bf II} can be solved using~$\cO\left( \order^\omega\log\order + \sM(\order)\precision \right)$ operations in~$\basefield$.
267:   \end{enumerate}
268: \end{Theo}
269: In the case of polynomial coefficients, we
270: exploit the linear recurrence satisfied by the coefficients of
271: solutions.  In Table~\ref{table1}, we gather the complexity estimates
272: corresponding to the best known solutions for each of the four
273: problems {\bf i}, {\bf ii}, {\bf I}, and~{\bf II} in the general case,
274: as well as in the above mentioned special cases. The algorithms are described in Section~\ref{sec:particular}.  In the polynomial
275: coefficients case, these results are well known. In the other cases,
276: to the best of our knowledge, the results improve upon existing
277: algorithms.
278: 
279: \begin{table}
280: \renewcommand{\arraystretch}{1.4}
281: $$\begin{array}{||l|l|l|l||l||}\hline\hline   % & & & & \\
282:  \textsf{Problem} & \textsf{constant} & \textsf{polynomial}
283: & \textsf{power series} & \textsf{output}\\[-2mm]
284: \quad (\textsf{input, output}) & \textsf{coefficients} &
285: \textsf{coefficients} & \textsf{coefficients} & \textsf{size} \\  
286: % & & & & \\
287: 
288: \hline \hline \textbf{i} \quad  (\textsf{equation, basis}) &   \cO(\sM(\order)
289:  \precision) \;\hfill^\star & \cO(d \order^2 \precision) &   \cO(
290:  \Mat(\order, \precision)) \;\hfill ^\star &  \cO(\order \precision)\\
291: 
292: \hline  \textbf{ii} \quad  (\textsf{equation, one solution}) &
293:  \cO(\sM(\order)  \precision/\order) \;\hfill^\star  &\cO(d
294:  \order \precision)  &\cO(\order \, \sM(\precision) \log \precision) \;\hfill ^\star & \cO(\precision)\\
295: 
296:  \hline \hline
297:  \textbf{I} \quad (\textsf{system, basis}) &
298:  \cO(\order \sM(\order)
299:  \precision) \;\hfill^\star &  \cO(d \order^\omega \precision) 
300:  & \cO(\Mat(\order, \precision))  \;\hfill ^\star  & \cO(\order^2 \precision)\\
301: 
302: \hline \textbf{II} \quad  (\textsf{system, one solution})  &
303:  \cO(\sM(\order) \precision) \;\hfill ^\star &  \cO(d \order^2
304:  \precision)  & \cO(\order^2 \, \sM(\precision) \log
305:  \precision) \;\hfill ^\star  & \cO(\order \precision)\\
306:  
307: \hline\hline
308: %\quad \quad \textsf{Input size} & \cO(\order^2)  &  \cO(d \order^2)  & \cO(\order^2
309: % \precision) \\ \hline \hline  
310: \end{array}$$
311: \caption{Complexity of solving linear differential equations/systems for~$\precision\gg\order$.  Entries marked with a~`$\star$' correspond to new results. \label{table1}}
312: \end{table}
313: 
314: 
315: 
316: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
317: 
318: \subsection{Non-linear Systems}  As an important
319: consequence of Theorem~\ref{theo:linear}, we improve the known
320: complexity results for the more general problem of solving
321: \emph{non-linear} systems of differential equations. To do so, we use
322: a classical reduction technique from the non-linear to the linear
323: case, see for instance~\cite[Section~25]{Rall69}
324: and~\cite[Section~5.2]{BrKu78}. For simplicity, we only consider
325: non-linear systems of first order. There is no loss of generality in
326: doing so, more general cases can be reduced to that one by adding new
327: unknowns and possibly differentiating once. The following result
328: generalizes~\cite[Theorem~5.1]{BrKu78}.  If~${F=(F_1,\dots,F_r)}$ is a
329: differentiable function bearing on~$\order$
330: variables~${y_{1},\dots,y_{\order}}$, we use the notation~$\Jac(F)$
331: for the Jacobian matrix~$(\partial F_i/\partial y_j)_{1\leq i,j \leq \order}$.
332: 
333: \begin{Theo}\label{theo:non-linear}
334:   Let~$\precision$, $\order$ be in~$\mathbb{N}$, let~$\basefield$ be a
335:   field of characteristic zero or at least\/~$\precision$ and
336:   let~$\varphi$ denote~${(\varphi_1,\dots,\varphi_{\order})}$,
337:   where~$\varphi_i(t,y)$ are multivariate power series
338:   in\/~$\basefield[[t,y_1,\dots,y_\order]]$.
339:   \par
340:   Let\/~${\sL :\N \to \N}$ be such that for all~$s(t)$
341:   in\/~$\mathcal{M}_{\order \times 1}(\basefield[[t]])$ and for all~$n$
342:   in\/~$\mathbb{N}$, the first~$n$ terms of~$\varphi(t,s(t))$ and
343:   of\/~$\Jac ({\varphi}) (t,s(t))$ can be computed in~$\sL(n)$ operations
344:   in\/~$\basefield$.  Suppose in addition that the function~${n \mapsto
345:     \sL(n)/n}$ is increasing. Given initial conditions~$v$
346:   in\/~$\mathcal{M}_{\order \times 1}(\basefield)$, if the differential
347:   system
348:   \[y'=\varphi(t,y),\qquad y(0)=v,\] admits a solution 
349:   in\/~$\mathcal{M}_{\order \times 1} (\basefield[[t]])$, then the 
350: first\/~$\precision$ terms of such a solution~$y(t)$ can be computed in
351: %  $\cO(\sL(N) + \Mat(\order,\precision))$ operations in $\basefield$.
352: %  $\cO(\sL(N) + \order^2 \sM(\precision) \log \precision)$ operations in $\basefield$.
353:   $\cO \left(\sL(\precision) + \min (\Mat(\order,\precision), \order^2
354:   \sM(\precision) \log \precision) \right)$ operations in~$\basefield$.
355: \end{Theo}
356: Werschulz~\cite[Theorem~3.2]{Werschulz80} gave an algorithm solving
357: the same problem using the integral Volterra-type equation technique
358: described in~\cite[pp.~172--173]{Rall69}.  With our notation, his
359: algorithm uses~$\cO \left(\sL(\precision) + \order^2 \precision \,
360: \sM(\precision)) \right)$ operations in~$\basefield$ to compute a
361: solution at precision~$\precision$. Thus, our algorithm is an
362: improvement for cases where $\sL(\precision)$ is known to be
363: subquadratic with respect to~$\precision$.
364: 
365: The best known algorithms for power series composition in~${\order
366: \geq 2}$ variables require, at least on ``generic'' entries, a
367: number~${\sL(n) = \cO(n^{\order-1} \sM(n))}$ of operations in
368: $\basefield$ to compute the first~$n$ coefficients of the
369: composition~\cite[Section~3]{BrKu77}.  This complexity is nearly
370: optimal with respect to the size of a generic input. By contrast, in
371: the univariate case, the best known result~\cite[Th.~2.2]{BrKu78}
372: is~$\sL(n) = \cO(\sqrt{n \log n}\, \sM(n))$. For special entries,
373: however, better results can be obtained, already in the univariate
374: case: exponentials, logarithms, powers of univariate power series can
375: be computed~\cite[Section~13]{Brent75} in~$\sL(n) = \cO(\sM(n))$. As a
376: consequence, if~$\varphi$ is an~$\order$-variate sparse polynomial
377: with $m$~monomials of \emph{any} degree, then~$\sL(n) = \cO(m \order \,
378: \sM(n))$.
379: 
380: Another important class of systems with such a
381: subquadratic~$\sL(\precision)$ is provided by \emph{rational systems},
382: where each~$\varphi_i$ is in~$\basefield(y_1,\dots,y_\order)$.
383: Supposing that the complexity of evaluation of~$\varphi$ is bounded
384: by~$L$ (i.e., for any point~$z$ in~$\basefield^\order$ at
385: which~$\varphi$ is well-defined, the value~$\varphi(z)$ can be
386: computed using at most~$L$ operations in~$\basefield$), then, the
387: Baur-Strassen theorem~\cite{BaSt83} implies that the complexity of
388: evaluation of the Jacobian~$\Jac(\varphi)$ is bounded by~$5L$, and
389: therefore, we can take~${\sL(n)= \sM(n) L}$ in the statement of
390: Theorem~\ref{theo:non-linear}.
391: 
392: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
393: 
394: \subsection{Basic Complexity Notation} \label{ssec:complexity}
395: 
396: Our algorithms ultimately use, as a basic operation, multiplication
397: of matrices with entries that are polynomials (or truncated power
398: series).  Thus, to estimate their complexities in a unified manner,
399: we use a function~${\Mat : \N \times \N \to \N}$ such that any two~${r
400:   \times r}$ matrices with polynomial entries in~$\basefield[t]$ of
401: degree less than~$d$ can be multiplied using~$\Mat(r,d)$ operations
402: in~$\basefield$. In particular,~$\Mat(1,d)$ represents the number of
403: base field operations required to multiply two polynomials of degree
404: less than~$d$, while~$\Mat(r,1)$ is the arithmetic cost of scalar~${r
405:   \times r}$ matrix multiplication. For simplicity, we
406: denote~$\Mat(1,d)$ by~$\sM(d)$ and we have~${\Mat(r,1) =
407:   \cO(r^\omega)}$, where~${2 \leq \omega \leq 3}$ is the so-called {\em
408:   exponent of the matrix multiplication}, see, e.g.,~\cite{BuClSh97}
409: and~\cite{GaGe99}.
410: 
411: Using the algorithms of~\cite{ScSt71,CaKa91}, one can take~$\sM(d)$
412: in~$\cO(d \log d \log \log d)$; over fields supporting FFT, one can
413: take~$\sM(d)$ in~$\cO(d\log d)$.  By~\cite{CaKa91} we can always
414: choose~$\Mat(r,d)$ in~${\cO(r^\omega \, \sM(d))}$, but better
415: estimates are known in important particular cases.  For instance, over
416: fields of characteristic~$0$ or larger than~$2d$, we have~${\Mat(r,d)
417:   = \cO( r^\omega d + r^2 \, \sM(d))}$, see~\cite[Th.~4]{BoSc05}.  To
418: simplify the complexity analyses of our algorithms, we suppose that the
419: {multiplication cost} function~$\Mat$ satisfies the following standard
420: growth hypotheses for all integers~$d_{1},d_{2}$ and~$r$: %(see, e.g., \cite{GaGe99}).
421: \begin{equation}\label{hyp:Mat}
422: \Mat(r,d_{1}d_{2}) \leq d_{1}^{2} \Mat (r,d_{2})
423: \qquad \text{and}  \qquad
424: \frac{\Mat(r,d_{1})}{d_{1}} \leq \frac{\Mat(r,d_{2})}{d_{2}}
425: \quad  \text{if $d_{1} \leq d_{2}$}.
426: \end{equation}
427: In particular, Equation~\eqref{hyp:Mat} implies the inequalities
428: \begin{equation} \label{ineq:Mat}
429: \begin{split}
430: 	\Mat(r,2^\kappa)+\Mat(r,2^{\kappa-1})+M(r,2^{\kappa-2})+\dots+\Mat(r,1)&
431: 		\le 2\Mat(r,2^\kappa),\\
432: 	\sM(2^\kappa)+2\sM(2^{\kappa-1})+4\sM(2^{\kappa-2})+\dots+2^\kappa\sM(1)&
433: 		\le (\kappa+1)\sM(2^\kappa).
434: \end{split}
435: \end{equation}
436: These inequalities are crucial to prove the estimates in
437: Theorem~\ref{theo:linear} and Theorem~\ref{theo:non-linear}.  Note
438: also that when the available multiplication algorithm is slower than
439: quasi-linear (e.g., Karatsuba or naive multiplication), then in the
440: second inequality, the factor~$(\kappa+1)$ can be replaced by a constant
441: and thus the estimates $\sM(\precision)\log \precision$ in our complexities become
442: $\sM(\precision)$ in those cases.
443: 
444: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
445: 
446: \subsection{Notation for Truncation}
447: 
448: It is recurrent in algorithms to split a polynomial into a lower and a
449: higher part. To this end, the following notation proves convenient.
450: Given a polynomial~$f$, the remainder and quotient of its Euclidean
451: division by~$t^k$ are respectively denoted $\trunch fk$ and~$\truncl
452: fk$.  Another occasional operation consists in taking a middle part
453: out of a polynomial.  To this end, we let $\trunc fkl$
454: denote~$\truncl{\trunch fl}{k}$.  Furthermore, we shall write $f=g\mod
455: t^k$ when two polynomials or series $f$ and~$g$ agree up to
456: degree~$k-1$ included.  To get a nice behaviour of integration with
457: respect to truncation orders, all primitives of series are chosen with
458: zero as their constant coefficient.
459: 
460: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
461: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
462: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
463: 
464: \section{Newton Iteration for Systems of Linear Differential
465:   Equations}
466: 
467: Let~${Y'(t) = A(t) Y(t)+B(t)}$ be a linear differential system,
468: where~$A(t)$ and~$B(t)$ are~${\order \times \order}$ matrices with
469: coefficients in~$\basefield[[t]]$. Given an invertible scalar
470: matrix~$Y_0$, an integer~${\precision \geq 1}$, and the expansions
471: of~$A$ and~$B$ up to precision~$\precision$, we show in this section
472: how to compute efficiently the power series expansion at
473: precision~$\precision$ of the unique solution of the Cauchy problem
474: $$Y'(t) = A(t) Y(t)+B(t) \quad \text{and} \quad Y(0) = Y_0.$$ 
475: This enables us to answer problems \textbf{I} and \textbf{i}, the
476: latter being a particular case of the former (through the application
477: to a companion matrix).
478: 
479: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
480: 
481: \subsection{Homogeneous Case}
482: First, we design a Newton-type iteration to solve the homogeneous
483: system~${Y'=A(t)Y}$.  The classical Newton iteration to solve an
484: equation $\phi(y)=0$ is $Y_{\kappa+1}=Y_\kappa-U_\kappa$, where
485: $U_\kappa$ is a solution of the linearized equation
486: $D\phi|_{Y_\kappa}\cdot U=\phi(Y_\kappa)$ and $D\phi|_{Y_\kappa}$ is
487: the differential of~$\phi$ at~$Y_\kappa$. We apply this idea to the
488: map~${\phi: Y \mapsto Y'-AY}$. Since~$\phi$ is linear, it is its own
489: differential and the equation for~$U$ becomes
490: $$U'-AU=Y'_\kappa-AY_\kappa.$$
491: Taking into account the proper orders of truncation and using
492: Lagrange's method of variation of
493: parameters~\cite{Lagrange1869,Ince56}, we are thus led to the
494: iteration
495: \[\begin{cases}Y_{\kappa+1} &= Y_\kappa - \trunch {U_\kappa}
496: {2^{\kappa+1}},\\
497: U_{\kappa}& = Y_\kappa \int
498: \trunch{Y_\kappa^{-1}}{2^{\kappa+1}} \left(Y_\kappa' -
499:   \trunch{A}{2^{\kappa+1}} Y_\kappa\right).
500: \end{cases}
501: \]
502: Thus we need to compute (approximations
503: of) the solution~$Y$ and its inverse simultaneously.  Now, a well-known Newton
504: iteration for the inverse $Z$ of $Y$ is
505: \begin{equation}\label{Newton:inverse}
506: Z_{\kappa+1} =
507: \trunch
508:   {Z_{\kappa} + Z_{\kappa} (I_\order - Y Z_{\kappa})}
509:   {2^{\kappa+1}}.
510: \end{equation}
511:  It was introduced by Schulz~\cite{Schulz33} in
512: the case of real matrices; its version for matrices of power series is
513: given for instance in~\cite{MoCa79}.
514: 
515: \begin{figure}
516:   \begin{center} 
517:     \fbox{\begin{minipage}{9cm}
518:       \medskip
519:       \begin{center}\textsf{SolveHomDiffSys}($A,\precision,Y_0$) \end{center}
520:       \textbf{Input:} ${Y_0,A_0, \dots, A_{\precision-2}}$
521:         in~$\mathcal{M}_{\order\times\order}(\basefield)$,
522:         ${A = \sum A_i t^i}$.
523:         \par\smallskip
524:         \textbf{Output:} ${Y=\sum_{i=0}^{\precision-1}Y_i t^i}$ in
525: $\mathcal{M}_{\order\times\order}(\basefield)[t]$ such that 
526: ${Y' = A Y \mod t^{\precision-1}}$, and $Z=Y^{-1}\mod t^{\precision/2}$.
527: 
528:         \begin{tabbing}
529:           \;\;\\$Y \leftarrow  (I_{\order}+ t  A_0) Y_0$ 
530:           \\$ Z \leftarrow Y_0^{-1}$
531:           \\$m \leftarrow 2$\\
532:           \textsf{while} $m \leq \precision/2$ \textsf{do}\\
533:           \hspace{0.5cm} $Z \leftarrow Z + \trunch {Z(I_{\order} - YZ)}{m} $\\
534:           \hspace{0.5cm} $Y \leftarrow Y - \trunch {Y\left(\int Z (Y' - \trunch{A}{2m-1} Y) \right)}{2m} $ \\
535:                                 %$+\sum_{i}\textsf{Coeff}(M', i)\frac{T^i}{i}$\\
536:           \hspace{0.5cm} $m \leftarrow 2m$ \\
537:           \textsf{return} $Y,Z$
538:         \end{tabbing}
539:       \end{minipage}
540:     }\end{center}
541:   \caption{Solving the Cauchy problem~$Y' = A(t) Y$,  $Y(0) = Y_0$ by Newton iteration.}
542:   \label{fig:hom}
543: \end{figure} 
544: Putting together these considerations, we arrive at the algorithm
545: \textsf{SolveHomDiffSys} in Figure~\ref{fig:hom}, whose correctness
546: easily follows from Lemma~\ref{prop:Newton} below.  Remark
547: that in the scalar case~(${\order=1}$) algorithm
548: \textsf{SolveHomDiffSys} coincides with the algorithm for power series
549: exponential proposed by Hanrot and Zimmermann~\cite{HaZi04}; see
550: also~\cite{Bernstein}. In the case~${\order>1}$, ours is a nontrivial
551: generalization of the latter. Because it takes primitives of series at
552: precision~$\precision$, algorithm \textsf{SolveHomDiffSys} requires
553: that the elements~${2,3,\dots,\precision-1}$ be invertible
554: in~$\basefield$. Its complexity~$\sC$ satisfies the
555: recurrence~${\sC(m) = \sC(m/2) + \cO(\sM(\order,m))}$, which
556: implies~---~using the growth hypotheses on~$\sM$~---~that~${\sC(\precision)
557:   = \cO(\sM(\order,\precision))}$.  This proves the first assertion of 
558: Theorem~\ref{theo:linear}.
559: %   It computes simultaneously the solutions $(Y,Z)$ of the problems
560: %   $$Y'-AY = 0 \bmod t^{\precision-1} \quad \text{and} \quad Z'+Z A = 0 \bmod
561: %   t^{\precision/2-1}.$$
562: %\bigskip
563: 
564: % This is based on the following result, allowing to double the
565: % precision of the solution, by using only polynomial matrix operations.
566: \smallskip
567: 
568:  \begin{Lemme}\label{prop:Newton} 
569:    Let~$m$ be an even integer. Suppose
570:    that~$Y_{(0)}, Z_{(0)}$ in~$\mathcal{M}_{\order\times\order}(\basefield[t])$ satisfy
571:    \begin{equation*}
572:      I_{\order} - Y_{(0)} Z_{(0)} = 0 \mod t^{m/2} \quad \text{and} \quad
573:      Y_{(0)}' - AY_{(0)} = 0 \mod t^{m-1},
574:    \end{equation*}
575: and that they are of degree less than $m/2$ and~$m$, respectively.
576:    Define
577:    \begin{equation*}
578:      Z:=\trunch {Z_{(0)} \left(2I_{\order}  - Y_{(0)} Z_{(0)} \right)} {m} \quad \text{and} \quad
579:      Y:=\trunch {Y_{(0)} \left(I_{\order} - \int Z  (Y_{(0)}'-AY_{(0)})  \right)} {2m}.
580:    \end{equation*}
581:    Then~$Y$ and~$Z$ satisfy the equations
582:    \begin{equation} \label{eq:double}
583:      I_{\order} - Y Z = 0 \mod t^{m} \quad \text{and} \quad
584:      Y' - AY = 0 \mod t^{2m-1}.
585:    \end{equation}
586:  \end{Lemme}
587: \proof Using the definitions of~$Y$ and~$Z$, it follows that
588: $$
589: I_{\order} - YZ = (I_{\order} -Y_{(0)} Z_{(0)})^2 - (Y -
590: Y_{(0)}) Z_{(0)} (2I_{\order} -Y_{(0)} Z_{(0)}) \mod t^m.
591: $$
592: Since by hypothesis~${I_{\order} -Y_{(0)} Z_{(0)}}$ and~${Y -
593:   Y_{(0)}}$ are zero modulo~$t^{m/2}$, the right-hand side is zero
594: modulo~$t^m$ and this establishes the first formula in
595: Equation~\eqref{eq:double}.  Similarly, write~${Q= \int Z
596:   (Y_{(0)}'-AY_{(0)})}$ and observe $Q=0\mod t^m$ to get the equality
597: $$
598: Y' - AY = (I-YZ) (Y_{(0)}' - AY_{(0)}) - (Y_{(0)}' -
599: AY_{(0)}) Q \mod t^{2m-1}.
600: $$
601: Now,~${Y_{(0)}' - AY_{(0)}}=0 \mod t^{m-1}$, while~$Q$
602: and~${I_{\order} -YZ}$ are zero modulo~$t^{m}$ and therefore
603: the right-hand side of the last equation is zero modulo~$t^{2m-1}$,
604: proving the last part of the lemma. 
605: \foorp
606: 
607: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
608: 
609: \subsection{General Case}
610: We want to solve the equation~${Y'=AY +B}$, where~$B$ is an~${\order
611: \times \order}$ matrix with coefficients in~$\basefield[[t]]$.
612: Suppose that we have already computed the solution~$\widetilde{Y}$ of
613: the associate homogeneous equation~${\widetilde{Y}'=A \widetilde{Y}}$,
614: together with its inverse~$\widetilde{Z}$.  Then, by the method of
615: variation of parameters, ${Y_{(1)}= \widetilde{Y} \int \widetilde{Z}
616: B}$ is a particular solution of the inhomogeneous problem, thus the
617: general solution has the form~${Y = Y_{(1)}+\widetilde{Y}}$.
618: 
619: \begin{figure}
620:   \begin{center} 
621:     \fbox{\begin{minipage}{9.5cm}
622:         \medskip
623:         \begin{center}\textsf{SolveInhomDiffSys}($A,B,\precision,Y_0$) \end{center}
624:         \textbf{Input:} ${Y_0,A_0, \dots, A_{\precision-2}}$ in~$\mathcal{M}_{\order\times\order}(\basefield)$,
625:           ${A = \sum A_i t^i}$,
626:           \par\smallskip
627:           ${B_0, \dots, B_{\precision-2}}$ in~$\mathcal{M}_{\order\times\order}(\basefield)$, 
628:             ${B(t) = \sum B_i t^i}$.
629:             \par\medskip
630:             \textbf{Output:} ${Y_1,\dots,Y_{\precision-1}}$
631:             in~$\mathcal{M}_{\order\times\order}(\basefield)$ such that ${Y=Y_0 + \sum Y_i
632:             t^i}$ satisfies~${Y' = A Y + B \mod t^{\precision-1}}$.
633: 
634: \begin{tabbing}
635:   \;\;\\$\widetilde{Y},\widetilde{Z} \leftarrow \textsf{SolveHomDiffSys} (A,\precision,Y_0)$ \\
636:   $\widetilde{Z} \leftarrow \widetilde{Z} + \trunch {\widetilde{Z}(I_\order - \widetilde{Y}\widetilde{Z})} {\precision}$\\
637:   $Y \leftarrow \trunch {\widetilde{Y}   \int (\widetilde{Z}  B)} {\precision}$ \\
638:   $Y \leftarrow Y + \widetilde{Y}$\\
639:   \textsf{return} $Y$
640: \end{tabbing}
641: \end{minipage}
642: }\end{center}
643: \caption{Solving the Cauchy problem $Y' = A Y  + B, \; Y(0) = Y_0$ by Newton iteration.}
644: \label{fig:inhom}
645: \end{figure} 
646: 
647: Now, to compute the particular solution~$Y_{(1)}$ at
648: precision~$\precision$, we need to know both~$\widetilde{Y}$
649: and~$\widetilde{Z}$ at the same precision~$\precision$. To do this, we
650: first apply the algorithm for the homogeneous case and
651: iterate~\eqref{Newton:inverse} once. The resulting algorithm is
652: encapsulated in Figure~\ref{fig:inhom}.
653: 
654: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
655: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
656: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
657: 
658: \section{Divide-and-conquer Algorithm}\label{sec:DAC}
659: 
660: We now give our second algorithm, which allows us to solve problems
661: {\bf ii} and~{\bf II} and to finish the proof of
662: Theorem~\ref{theo:linear}.  Before entering a detailed presentation,
663: let us briefly sketch the main idea in the particular case of a
664: homogeneous differential equation~${\mathcal{L}y=0}$,
665: where~$\mathcal{L}$ is a linear differential operator in~${\delta = t
666: \frac{d}{dt}}$ with coefficients in~$\basefield[[t]]$.
667: % FC, 11/04/2006: Je re-coupe, sinon cette phrase est trop longue !
668: (The introduction of~$\delta$ is only for pedagogical reasons.)  The
669: starting remark is that if a power series~$y$ is written as~${y_0 +
670: t^m y_1}$, then~${\mathcal{L}(\delta)y = \mathcal{L}(\delta)y_0 +
671: t^m\mathcal{L}(\delta + m)y_1}$. Thus, to compute a solution~$y$
672: of~${\mathcal{L}(\delta) y = 0 \mod t^{2m}}$, it suffices to determine
673: the lower part of~$y$ as a solution of ${\mathcal{L}(\delta) y_0 = 0
674: \mod t^m}$, and then to compute the higher part~$y_1$, as a solution
675: of the inhomogeneous equation~${\mathcal{L}(\delta + m) y_1 = - R \mod
676: t^{m}}$, where the rest~$R$ is computed so that~${\mathcal{L}(\delta)
677: y_0 = t^m R \mod t^{2m}}$.
678: 
679: Our algorithm \textsf{DivideAndConquer} makes a recursive use of this idea. Since, during the
680: recursions, we are naturally led to treat inhomogeneous equations of a
681: slightly more general form than that of~{\bf II} we introduce the
682: notation~$\mathcal{E}(s,p,m)$ for the vector equation
683: \begin{equation*}
684: t y' +   (p I_\order - tA) y =  s \mod t^{m}.
685: \end{equation*}
686: The algorithm is described in Figure~\ref{fig:algo-dac}.
687: Choosing~${p=0}$ and~${s(t) =t b(t)}$ we retrieve the equation of
688: problem~{\bf II}.  Our algorithm \textsf{Solve} to solve problem~{\bf
689: II} is thus a specialization of \textsf{DivideAndConquer}, defined by
690: making \textsf{Solve}$(A,b,\precision,v)$ simply call
691: \textsf{DivideAndConquer}$(tA,tb,0,\precision,v)$. Its correctness relies on
692: the following immediate lemma. 
693: 
694: \begin{figure}\label{fig:algo-dac}
695:   \begin{center} 
696:     \fbox{\begin{minipage}{8.5 cm}
697:         \medskip
698:         \begin{center}\textsf{DivideAndConquer($A,s,p,m,v$)} \end{center}
699: 	\textbf{Input:} $A_0,\dots,A_{m-1}$ in~$\mathcal{M}_{\order\times\order}(\basefield)$,
700:         ${A = \sum A_i t^i}$, $s_0,\dots,s_{m-1},v$ in~$\mathcal{M}_{\order\times1}(\basefield)$,
701: 	${s = \sum s_i t^i}$, $p$ in~$\basefield$.
702:         \par\smallskip
703:         \textbf{Output:} ${y=\sum_{i=0}^{\precision-1}y_i t^i}$ in
704: $\mathcal{M}_{\order\times1}(\basefield)[t]$ such that 
705: ${ty' + (pI_{\order}-tA)y=s \mod t^m}$, ${y(0)=v}$.
706: 
707:         \begin{tabbing}
708:           \textsf{If}~$m=1$ \textsf{then} \\
709:           {\quad \textsf{if}} $p=0$ \textsf{then} \\
710:           {\quad \quad \textsf{return}} $v$\\
711:           {\quad else}  \textsf{return} $p^{-1} s(0)$\\
712:           \textsf{end if}\\
713:           $d \leftarrow \intpart{m/2}$\\
714:           $s \leftarrow \trunch s{d}$\\
715:           $y_0 \leftarrow$ {\sf DivideAndConquer}($A,s,p,d,v$)\\
716:           $R \leftarrow \trunc{s- t y_0' - (p I_\order -tA) y_0}{d}{m} $ \\
717:           $y_1 \leftarrow$ {\sf DivideAndConquer}($A, R, p+d, m-d,v$)\\
718:           \textsf{return} $y_0 + t^d y_1$
719:         \end{tabbing}
720:       \end{minipage}
721:     }\end{center}
722:   \caption{Solving $ty' + (pI_{\order}-tA)y=s \mod t^m$, ${y(0)=v}$, 
723:     by divide-and-conquer.}
724: \label{fig:2}
725:  \end{figure} 
726: 
727: \begin{Lemme}
728:   Let~$A$ in~$\mathcal{M}_{\order\times\order}(\basefield[[t]])$, $s$
729:   in~$\mathcal{M}_{\order \times 1}(\basefield[[t]])$, and let~$p,d$
730:   in~$\mathbb{N}$.  Decompose~$\trunch sm$ into a sum~${s_0 +
731:     t^d s_1}$.  Suppose that~$y_0$
732:   in~$\mathcal{M}_{\order\times1}(\basefield[[t]])$ satisfies the
733:   equation~$\mathcal{E}(s_0,p,d)$, set $R$ to be
734:   \begin{equation*}
735:     \trunch {(ty'_0 + (pI_\order - t A) y_0 - s_0)/t^d} {m-d},
736:   \end{equation*}
737:   and let~$y_1$ in~$\mathcal{M}_{\order \times 1}(\basefield[[t]])$ be
738:   a solution of the equation~${\mathcal{E}(s_1-R,p+d,m-d)}$.  Then the
739:   sum $y:= y_0 + t^d y_1$ is a solution of the
740:   equation~$\mathcal{E}(s,p,m)$.
741: \end{Lemme}
742: 
743: The only divisions performed along our algorithm~\textsf{Solve} are by 1, \dots, $\precision-1$.
744: As a consequence of this remark and of the previous lemma, we deduce the complexity estimates in the proposition below;
745: for a general matrix~$A$, this proves point~(c) in Theorem~\ref{theo:linear}, while the
746: particular case when $A$~is companion proves point~(b).
747: 
748: \begin{Prop}
749:   Given the first~$m$ terms of the entries
750:   of~$A\in\mathcal{M}_{\order\times\order}(\basefield[[t]])$ and
751:   of~$s\in\mathcal{M}_{\order \times 1}(\basefield[[t]])$,
752:   given~$v\in\mathcal{M}_{\order \times 1}(\basefield)$,
753:   algorithm~$\emph{\textsf{DivideAndConquer}}(A,s,p,m,v)$ computes a
754:   solution of the linear differential system~${ty' + (pI_{\order}-tA)
755:   y=s \mod t^m}$, ${y(0)=v}$, using~${\cO(\order^2 \, \sM(m) \log m)}$
756:   operations in~$\basefield$. If $A$ is a companion matrix, the cost 
757:   reduces to ${\sC(m) = \cO(\order \, \sM(m) \log m)}$.
758: \end{Prop}
759: \proof The correctness of the algorithm follows from the previous
760: Lemma.  The cost~$\sC(m)$ of the algorithm satisfies the recurrence
761: $$ \sC(m) = \sC(\intpart{m/2}) + \sC(\trunch{m/2}{}) + \order^2 \, \sM(m)
762: + \cO(\order m),$$ where the term $\order^2 \, \sM(m)$ comes from the
763: application of $A$ to $y_0$ used to compute the rest~$R$. From this
764: recurrence, it is easy to infer that~${\sC(m) = \cO(\order^2 \, \sM(m)
765: \log m)}$. Finally, when $A$ is a companion matrix, the vector~$R$ can
766: be computed in time $O(\order \, \sM(m))$, which implies that in this
767: case~${\sC(m) = \cO(\order \, \sM(m) \log m)}$.
768: \foorp
769: 
770: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
771: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
772: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
773: 
774: \section{Faster Algorithms for Special Coefficients}\label{sec:particular}
775: 
776: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
777: 
778: \subsection{Constant Coefficients}\label{ssec:const-coeffs}
779: Let~$A$ be a constant~${\order \times \order}$ matrix and let~$v$ be a
780: vector of initial conditions. Given~${\precision \geq 1}$, we want to
781: compute the first~$\precision$ coefficients of the series expansion of
782: a solution~$y$ in~$\mathcal{M}_{\order \times 1}(\basefield[[t]])$
783: of~${y' = Ay}$, with~${y(0) = v}$. In this setting, many various
784: algorithms have been proposed to solve problems {\bf i}, {\bf ii},
785: {\bf I}, and {\bf II}, see for
786: instance~\cite{Pennell26,Putzer66,Kirchner67,Fulmer75,MoLo78,Leonard96,Liz98,Gu99,Gu01,HaFiSm01,MoLo03,LuRo04}.
787: Again, the most naive algorithm is based on the method of undetermined
788: coefficients. On the other hand, most books on differential equations,
789: see, e.g., \cite{Ince56,Coddington61,Arnold92} recommend to simplify
790: the calculations using the Jordan form of matrices. The main drawback
791: of that approach is that computations are done over the algebraic
792: closure of the base field~$\basefield$. The best complexity result
793: known to us is given in~\cite{LuRo04} and it is quadratic in~$\order$.
794: 
795: We concentrate first on problems~{\bf ii} and~{\bf II} (computing a
796: single solution for a single equation, or a first-order system).
797: Our algorithm for problem~{\bf II}
798: uses~${\cO(\order^\omega \log \order + \precision \sM(\order))}$
799: operations in~$\basefield$ for a general constant matrix~$A$ and
800: only~$\cO(\precision \sM(\order)/\order)$ operations in~$\basefield$ in
801: the case where~$A$ is a companion matrix (problem {\bf ii}). Despite
802: the simplicity of the solution, this is, to the best of our knowledge,
803: a new result.
804: 
805: In order to compute~${y_\precision = \sum_{i=0}^\precision{A^i v
806: t^i/i!}}$, we first compute its Laplace
807: transform~${z_\precision=\sum_{i=0}^\precision {A^i v t^i}}$: indeed,
808: one can switch from~$y_\precision$ to~$z_\precision$ using
809: only~$\cO(\precision \order)$ operations in $\basefield$.  The
810: vector~$z_\precision$ is the truncation at order~${\precision + 1}$
811: of~${z=\sum_{i\ge0} A^i v t^i =(I-tA)^{-1} v}$. As a byproduct of a
812: more difficult question,~\cite[Prop.~10]{Storjohann02} shows
813: that~$z_\precision$ can be computed using~$\cO(\precision
814: \order^{\omega-1})$ operations in~$\basefield$. We propose a solution
815: of better complexity.
816: 
817: By Cramer's rule,~$z$ is a vector of rational functions~$z_i(t)$, of
818: degree at most~$\order$.  The idea is to first compute~$z$ as a
819: rational function, and then to deduce its expansion
820: modulo~$t^{\precision +1}$. The first part of the algorithm does not
821: depend on~$\precision$ and thus it can be seen as a precomputation.
822: For instance, one can use%Algorithm \texttt{SeriesSolutionSmallRHS} in
823: ~\cite[Corollary~12]{Storjohann02}, to compute $z$ in
824: complexity~$\cO(\order^{\omega} \log \order)$. In the second step of
825: the algorithm, we have to expand~$\order$ rational functions of degree
826: at most~$\order$ at precision~$\precision$.  Each such expansion
827: can be performed using~$\cO(\precision\sM(\order)/\order)$ operations
828: in~$\basefield$, see, e.g., the proof of~\cite[Prop.~1]{BoFlSaSc05}.
829: The total cost of the algorithm is thus~${\cO(\order^\omega\log \order
830: + \precision \sM(\order))}$. We give below a simplified variant with
831: same complexity, avoiding the use of the algorithm
832: in~\cite{Storjohann02} for the precomputation step and relying instead
833: on a technique which is classical in the computation of minimal
834: polynomials~\cite{BuClSh97}.
835: \begin{enumerate}
836: \item Compute the vectors~$v,Av,A^2 v,A^3v,\dots,A^{2r}v$
837:   in~$\cO(\order^\omega\log \order)$, as follows: \\ for~$\kappa$
838:   from~$1$ to~${1 + \log \order}$ do
839:     \begin{enumerate}
840:     \item compute~$A^{2^\kappa}$
841:     \item compute~${A^{2^\kappa} \times [v | Av | \cdots | A^{2^\kappa-1}v]}$,
842:         thus getting~${[A^{2^\kappa}v | A^{2^\kappa+1}v | \cdots | A^{2^{\kappa+1}-1}v]}$
843:     \end{enumerate}
844:   \item For each~${j=1, \dots, \order}$:
845: \begin{enumerate}
846: \item recover the rational fraction whose series expansion
847:   is~$\sum{(A^i v)_j t^i}$ by Pad\'e approximation
848:   in~$\cO(\sM(\order)\log \order)$ operations
849: \item compute its expansion up to precision $t^{\precision + 1}$
850:   in~$\cO(\precision \, \sM(\order)/\order)$ operations
851: \item recover the expansion of~$y$ from that of~$z$,
852:   using~$\cO(\precision)$ operations.
853: \end{enumerate}
854: \end{enumerate}
855: This yields the announced total cost of ~${\cO(\order^\omega \log
856: \order + \precision \sM(\order))}$ operations for problem {\bf II}.
857: 
858: We now turn to the estimation of the
859: cost for problems~{\bf i} and~{\bf I} (bases of solutions). 
860: In the case of equations with constant coefficients, we use the
861: Laplace transform again. If $y = \sum_{i \geq 0} y_i t^i$ is a
862: solution of an order $\order$ equation with constant coefficients,
863: then the sequence $(z_i)=(i! y_i)$ is generated by a linear recurrence
864: with constant coefficients. Hence, the first terms $z_1,\dots,z_\precision$ can
865: be computed in $O(\precision\sM(\order)/\order)$ operations, using again the
866: algorithm described in~\cite[Prop.~1]{BoFlSaSc05}.
867: For problem~{\bf I}, the exponent~$\omega+1$ in the cost of the precomputation can be reduced to~$\omega$ by a very different approach; we cannot give the details here for space limitation.
868: 
869: 
870: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
871: 
872: \subsection{Polynomial Coefficients}
873: If the coefficients in one of the problems {\bf i, ii, I}, and~{\bf II}
874: are polynomials in~$\basefield[t]$ of degree at most~$d$, using the
875: linear recurrence of order~$d$ satisfied by the coefficients of the
876: solution seemingly yields the lowest possible complexity.
877: Consider for instance problem~{\bf II}.
878: Plugging~${A=\sum_{i=0}^d t^i A_i}$, ${b=\sum_{i=0}^d t^i
879:   b_i}$, and~${y=\sum_{i\geq 0}^d t^i y_i}$ in the
880: equation~${y'=Ay+b}$, we arrive at the following recurrence
881: $$
882: y_{k+d+1} = (d+k+1)^{-1} (A_d y_k + A_{d-1}y_{k+1} + \dots + A_0
883: y_{k+d} + b_{k+d}), \quad \text{for all $k \geq -d$}.
884: $$
885: Thus, to compute~$y_0,\dots,y_\precision$, we need to
886: perform~${\precision d}$ matrix-vector products; this is done
887: using~${\cO (d \precision \order^2)}$ operations in~$\basefield$. A
888: similar analysis implies the other complexity estimates in the third
889: column of Table~\ref{table1}.
890: 
891: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
892: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
893: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
894: 
895: \section{Non-linear Systems of Differential Equations}
896: Let~${\varphi(t,y) = (\varphi_1(t,y), \dots, \varphi_\order(t,y))}$,
897: where each~$\varphi_i$ is a power series
898: in~$\basefield[[t,y_1,\dots,y_\order]]$. We consider the first-order
899: non-linear system in~$y$
900: \[
901: %(\mathcal{N})\qquad \left\{
902: %\begin{aligned}
903: % y_1'(t)& = \varphi_1(t,y_1(t),\dots,y_\order(t)), \\
904: %&\,\,\vdots \\
905: % y_\order'(t)& = \varphi_\order(t,y_1(t),\dots,y_\order(t)).
906: %\end{aligned}
907: %\right.
908: (\mathcal{N})\qquad % \left\{
909: y_1'(t) = \varphi_1(t,y_1(t),\dots,y_\order(t)), \quad\dots,\quad
910:  y_\order'(t) = \varphi_\order(t,y_1(t),\dots,y_\order(t)).
911: %\right.
912: \]
913: 
914: To solve~($\mathcal{N}$), we use the classical technique of
915: \emph{linearization}. The idea is to attach, to an \emph{approximate}
916: solution~$y_0$ of~($\mathcal{N}$), a \emph{tangent} system in the new unknown~$z$,
917: $$
918: (\mathcal{T},y_0) \qquad z' = \Jac(\varphi)(y_0) z - y_0'+
919: \varphi(y_0),
920: $$
921: which is linear and whose solutions serve to obtain a
922: better approximation of a true solution of~($\mathcal{N}$).  Indeed,
923: let us denote by~$(\mathcal{N}_m),(\mathcal{T}_m)$ the
924: systems~$(\mathcal{N}),(\mathcal{T})$ where all the equalities are
925: taken modulo~$t^m$.  Taylor's formula states that the
926: expansion~${\varphi(y+z) - \varphi(y) - \Jac(\varphi)(y) z}$
927: is equal to~$0$ modulo~$z^2$.
928: It is a simple matter to check that if~$y$ is a
929: solution of~$(\mathcal{N}_m)$ and if~$z$ is a solution
930: of~$(\mathcal{T}_{2m},y)$, then~${y+z}$ is a solution
931: of~$(\mathcal{N}_{2m})$.  This justifies the correctness of 
932: Algorithm {\sf SolveNonLinearSys}.
933:  
934:  To analyze the complexity of this algorithm, it suffices to remark
935:  that for each integer~$\kappa$ between $1$ and~$\intpart{\log \precision}$,
936:  one has to compute one solution of a linear inhomogeneous first-order
937:  system at precision~$2^\kappa$ and to evaluate~$\varphi$ and its
938:  Jacobian on a series at the same precision. This concludes the proof of Theorem~\ref{theo:non-linear}.
939: 
940: 
941: \begin{figure}[h]
942: \begin{center} 
943:   \fbox{\begin{minipage}{9 cm}
944:       
945:       \medskip
946:       \begin{center}\textsf{SolveNonLinearSys}($\phi,v$) \end{center}
947:       \textbf{Input:} $\precision$ in~$\mathbb{N}$,
948:       $\varphi(t,y)$ in~$\basefield[[t,y_1,\dots,y_\order]]^{\order}$, 
949:       $v$ in~$\basefield^\order$
950:       \par\smallskip
951:       \textbf{Output:} first~$\precision$ terms of a~$y(t)$
952:       in~$\basefield[[t]]$ such that~${y(t)' = \varphi(t,y(t)) \mod
953:         t^\precision}$ and~${y(0) = v}$.
954: \begin{tabbing}
955:   \;\;    \\$m \leftarrow 1$\\
956: $y \leftarrow v$ \\
957:   \textsf{while} $m \leq \precision/2$ \textsf{do}\\
958:   \hspace{0.5cm} $A \leftarrow \trunch{\Jac(\varphi) (y)}{2m}$\\
959:   \hspace{0.5cm} $b \leftarrow \trunch{\varphi (y) - y'}{2m}$ \\
960: % \hspace{0.5cm} $z \leftarrow \textsf{Solve}(z' = Az + b \mod t^{2m}, z(0)=0)$ \\
961:   \hspace{0.5cm} $z \leftarrow \textsf{Solve}(A, b, 2m, 0)$ \\
962:   \hspace{0.5cm} $y \leftarrow y + z$ \\
963:   \hspace{0.5cm} $m \leftarrow 2m$ \\
964:   \textsf{return} $y$
965: \end{tabbing}
966: \end{minipage}
967: }\end{center}
968: \caption{Solving the non-linear differential system ${y' = \varphi(t,y), \; y(0) = v}$.}
969: \label{fig:nonlinear}
970:  \end{figure} 
971: 
972: 
973: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
974: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
975: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
976: 
977:  \section{Implementation and Timings}
978: 
979:  We implemented our algorithms \textsf{SolveDiffHomSys} and
980:  \textsf{Solve} in Magma~\cite{Magma} and ran the programs on an Athlon processor at 2.2~GHz
981:  with 2~GB of memory.\footnote{All the computations have been done on the machines of
982:    the MEDICIS ressource center
983:    \url{http://medicis.polytechnique.fr}.}  We used Magma's built-in
984:  polynomial arithmetic (using successively naive, Karatsuba and FFT
985:  multiplication algorithms), as well as Magma's scalar matrix
986:  multiplication (of cubic complexity in the ranges of our interest).
987:  We give three tables of timings. First, we compare in Figure
988:  \ref{fig:benchs} the performances of our algorithm
989:  \textsf{SolveDiffHomSys} with that of the naive quadratic algorithm,
990:  for computing a basis of (truncated power series) solutions of a
991:  homogeneous system.  The order of the system varies from $2$ to $16$,
992:  while the precision required for the solution varies from 256 to
993:  4096; the base field is $\mathbb{Z}/p\mathbb{Z}$, where $p$ is a 32-bit prime.
994: 
995: 
996: \begin{figure}
997: \begin{center}
998: \begin{tabular}{c||c|c|c|c}
999: $ \precision \ddots  \order$      &  2  & 4   & 8 & 16 \\
1000: \hline 
1001: 256    &   0.02 \text{vs.}  2.09      & 0.08 \text{vs.}  6.11        &  0.44  \text{vs.}  28.16     &  2.859 \text{vs.}  168.96 \\
1002: 512    &  0.04 \text{vs.}  8.12       &  0.17 \text{vs.}  25.35     & 0.989  \text{vs.}  113.65  &  6.41 \text{vs.}  688.52 \\
1003: 1024  & 0.08 \text{vs.}  32.18      &  0.39  \text{vs.}  104.26  & 2.30 \text{vs.}  484.16     &  15   \text{vs.}  2795.71\\
1004: 2048  & 0.18  \text{vs.}  128.48   &  0.94 \text{vs.}  424.65   & 5.54 \text{vs.}  2025.68  &  36.62  \text{vs.} $> 3$\text{hours} $^\star$\\
1005: 4096  & 0.42 \text{vs.}  503.6      &  2.26 \text{vs.}  1686.42 &  13.69 \text{vs.}  8348.03  & 92.11  \text{vs.} $> 1/2$ \text{day}$^\star$ \\
1006: 
1007: \end{tabular}
1008: \end{center}
1009: \caption{Computation of a basis of a linear homogeneous system with
1010:   $\order$ equations, at precision $\precision$: comparison of timings
1011:   (in seconds) between algorithm \textsf{SolveDiffHomSys} and the
1012:   naive algorithm. Entries marked with a `$\star$' are estimated timings.}
1013: \label{fig:benchs}
1014: \end{figure}
1015: 
1016: Then we display in Figure~\ref{fig:matmul} and Figure~\ref{fig:newton}
1017: the timings obtained respectively with
1018: algorithm~\textsf{Solve\-DiffHomSys} and with the algorithm for
1019: polynomial matrix multiplication \textsf{PolyMatMul} that was used as
1020: a primitive of \textsf{SolveDiffHomSys}. The similar shapes of the two
1021: surfaces indicate that the complexity prediction of point (a) in
1022: Theorem~\ref{theo:linear} is well respected in our implementation:
1023: \textsf{SolveDiffHomSys} uses a constant number (between 4 and 5) of
1024: polynomial multiplications; note that the abrupt jumps at powers of 2
1025: reflect the performance of Magma's FFT implementation of polynomial
1026: arithmetic.
1027: 
1028: \begin{figure}[ht]
1029: \begin{center}
1030: \begin{minipage}[b]{0.45\textwidth}
1031: \centerline{\includegraphics[scale=0.35,angle=270]{MatMul}}
1032: \caption{Timings of algorithm \textsf{PolyMatMul}.  \label{fig:matmul}}
1033: \end{minipage}\hskip0.1\textwidth
1034: \begin{minipage}[b]{0.45\textwidth}
1035: \centerline{\includegraphics[scale=0.35,angle=270]{Newton}}
1036: \caption{Timings of algorithm  \textsf{SolveDiffHomSys}.  \label{fig:newton}}
1037: \end{minipage}
1038: \end{center}
1039: \end{figure}
1040: 
1041: 
1042: 
1043: In Figure ~\ref{fig:dac} we give the timings for the computation of
1044: one solution of a linear differential equation of order $2$, $4$, and
1045: $8$, respectively, using our algorithm~\textsf{Solve} in
1046: Section~\ref{sec:DAC}. Again, the shape of the three curves
1047: experimentally confirms the nearly linear behaviour established in
1048: point (b) of Theorem~\ref{theo:linear}, both in the
1049: precision~$\precision$ and in the order $\order$ of the complexity of
1050: algorithm ~\textsf{Solve}. Finally, Figure~\ref{fig:dac+naive}
1051: displays the three curves from Figure~\ref{fig:dac} together with the
1052: timings curve for the naive quadratic algorithm computing one solution
1053: of a linear differential equation of order $2$.  The conclusion is
1054: that our algorithm~\textsf{Solve} becomes very early superior to the
1055: quadratic one.
1056: 
1057: \begin{figure}[ht]
1058: \begin{center}
1059: \begin{minipage}[b]{0.45\textwidth}
1060: \centerline{\includegraphics[scale=0.3,angle=270]{DAC}}
1061: \caption{Timings of algorithm \textsf{Solve} for equations of orders 2, 4, and~8. \label{fig:dac}}
1062: \end{minipage}\hskip0.1\textwidth
1063: \begin{minipage}[b]{0.45\textwidth}
1064: \centerline{\includegraphics[scale=0.3,angle=270]{DAC+Naive}}
1065: \caption{Same, compared to the naive algorithm for a second-order equation.\label{fig:dac+naive}}
1066: \end{minipage}
1067: \end{center}
1068: \end{figure}
1069: 
1070: 
1071: %precision~$\precision = 1048576$ in~24.53s; one at doubled
1072: %precision~$\precision=2097152$ in doubled time~49.05s; one for doubled
1073: We also implemented our algorithms of Section~\ref{ssec:const-coeffs}
1074: for the special case of constant coefficients. For reasons of space
1075: limitation, we only provide a few experimental results for
1076: problem~{\bf II}.  Over the same finite field, we computed: a solution
1077: of a linear system with~$\order=8$ at
1078: precision~$\precision\approx10^6$ in~24.53s; one at doubled precision
1079: in doubled time~49.05s; one for doubled order~$\order=16$ in doubled
1080: time~49.79s.
1081: 
1082: 
1083: \bibliographystyle{plain}
1084: \bibliography{focs}
1085: 
1086: \end{document}
1087: