1: \documentclass[11pt]{amsart}
2:
3: \usepackage{fullpage}
4: \usepackage[latin1]{inputenc}
5: \usepackage{bbm,natbib}
6: \usepackage{amsmath}
7: \usepackage{stmaryrd}
8: \usepackage{alltt, amssymb}
9: \usepackage{graphicx}
10: \usepackage{url}
11:
12: \def\Jac{{\mathbf{Jac}}}
13:
14: \newcommand{\sC}{\mathsf{C}}
15: \newcommand{\sL}{{\mathsf{L}}}
16: \newcommand{\sM}{{\mathsf{M}}}
17: \newcommand{\N}{\mathbb{N}}
18: \newcommand{\cO}{{\mathcal O}}
19: \newcommand{\order}{{r}}
20: \newcommand{\precision}{{N}}
21: \newcommand{\basefield}{{\mathbb{K}}}
22:
23: \newcommand{\Mat}{{\mathsf{MM}}}
24: \renewcommand{\proof}{\noindent\textsc{Proof.} }
25: \newcommand{\foorp}{\hfill$\square$}
26: \newcommand{\tr}{\mathrm{trace}}
27:
28: \newcommand{\trunc}[3]{\left[ #1 \right]_{#2}^{#3}}
29: \newcommand{\truncl}[2]{\left\lfloor #1 \right\rfloor_{#2}}
30: \newcommand{\trunch}[2]{\left\lceil #1 \right\rceil^{#2}}
31: \newcommand{\intpart}[1]{\left\lfloor #1 \right\rfloor}
32:
33: \newtheorem{Theo}{Theorem}
34: \newtheorem{Prop}{Proposition}
35: \newtheorem{Lemme}{Lemma}
36:
37:
38: \usepackage{graphicx}
39: \usepackage{changebar}
40: \usepackage[plainpages=false,pdfpagelabels,colorlinks=true,citecolor=blue,hypertexnames=false]{hyperref}
41:
42:
43: \begin{document}
44:
45: \title{Fast computation of power series solutions \\ of systems of
46: differential equations}
47:
48: \author{A. Bostan, F. Chyzak, F. Ollivier, B. Salvy, \'E. Schost, and A. Sedoglavic}
49: \thanks{Partially supported by a grant from the French \emph{Agence nationale pour la recherche}.}
50: %\date{Preliminary version 1.10 --- 11/04/2006}
51:
52: \begin{abstract}
53: We propose new algorithms for the computation of the first~$\precision$ terms
54: of a vector (resp.\ a basis) of power series solutions of a linear
55: system of differential equations at an ordinary point, using a
56: number of arithmetic operations which is quasi-linear with respect
57: to~$\precision$. Similar results are also given in the non-linear case. This extends
58: previous results obtained by Brent and
59: Kung for scalar differential equations of order one and two.
60: \end{abstract}
61: \maketitle
62:
63: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
64: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
65: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
66:
67: \section{Introduction}
68:
69: In this article, we are interested in the computation of the first $\precision$ terms of power
70: series solutions of differential equations. This problem arises in
71: combinatorics, where the desired power series is a generating
72: function, as well as in numerical analysis and in particular in
73: control theory.
74:
75: Let~$\basefield$ be a field. Given~$r+1$ formal power
76: series~${a_0(t),\dots,a_{\order}(t)}$ in~$\basefield[[t]]$, one of
77: our aims is to provide fast algorithms for solving the
78: linear differential equation
79: % of order $\order$:
80: \begin{equation} \label{lindiffeq}
81: a_\order(t) y^{(\order)}(t) + \dots + a_1(t) y'(t)+ a_0(t) y(t) = 0. %
82: \end{equation}
83: Specifically, under the hypothesis that~$t=0$ is an ordinary point
84: for Equation~\eqref{lindiffeq} (i.e., ${a_r(0) \neq 0}$), we give efficient
85: algorithms taking as input the first~$\precision$ terms of the power
86: series $a_0(t), \dots, a_\order(t)$ and answering the following algorithmic questions:
87: \begin{enumerate}
88: \item[{\bf i.}] find the first~$\precision$ coefficients of
89: the~$\order$ elements of a basis of power series solutions
90: of~\eqref{lindiffeq};
91: \item[{\bf ii.}] given initial conditions~$\alpha_0, \dots,
92: \alpha_{\order-1}$ in~$\basefield$, find the first~$\precision$
93: coefficients of the unique solution~$y(t)$ in~$\basefield[[t]]$ of
94: Equation~\eqref{lindiffeq} satisfying
95: \[
96: y(0) = \alpha_0,\quad y'(0) = \alpha_1, \quad \dots,\quad y^{(\order-1)}(0) =
97: \alpha_{\order-1}.
98: \]
99: \end{enumerate}
100: More generally, we also treat linear first-order systems of differential
101: equations. From the data of initial conditions~$v$
102: in~$\mathcal{M}_{\order\times\order} (\basefield)$
103: (resp.~$\mathcal{M}_{{\order} \times 1} (\basefield)$) and of the
104: first~$\precision$ coefficients of each entry of the matrices~$A$
105: and~$B$ in~$\mathcal{M}_{\order\times\order} (\basefield[[t]])$ (resp.~$b$
106: in~$\mathcal{M}_{{\order} \times 1} (\basefield[[t]])$), we propose
107: algorithms that compute the first~$\precision$ coefficients:
108: \begin{enumerate}
109: \item[\bf I.] of a fundamental solution~$Y$ in~$\mathcal{M}_{\order\times\order}
110: (\basefield[[t]])$ of~${Y' = AY + B}$, with~${Y(0)=v},\;{\det Y(0) \neq 0}$;
111: \item[\bf II.] of the unique solution~$y(t)$
112: in~$\mathcal{M}_{{\order} \times 1} (\basefield[[t]])$ of~${y' = Ay
113: + b}$, satisfying~${y(0) =v}$.
114: \end{enumerate}
115: %% \begin{equation}\label{systlindiffeq:basis}
116: %% Y' = AY + B, \quad \text{with} \; A,B \in \mathcal{M}_{\order} (\basefield[[t]])
117: %% \end{equation}
118: %% and
119: %% \begin{equation}\label{systlindiffeq:single}
120: %% y' = Ay + b
121: %% \end{equation}
122: Obviously, if an algorithm of algebraic complexity~$\sC$ (i.e.,
123: using~$\sC$ arithmetic operations in~$\basefield$) is available for
124: problem~{\bf II}, then applying it~$r$ times solves problem~{\bf I} in
125: time~$r \,\sC$, while applying it to a companion matrix solves
126: problem~{\bf ii} in time~$\sC$ and problem~{\bf i} in~$r
127: \,\sC$. Conversely, an algorithm solving~{\bf i} (resp. {\bf I}) also
128: solves {\bf ii} (resp. {\bf II}) within the same complexity, plus that
129: of a linear combination of series. Our reason for distinguishing the
130: four problems {\bf i, ii, I, II} is that in many cases, we are able to
131: give algorithms of better complexity than obtained by these
132: reductions.
133:
134: The most popular way of solving~{\bf i}, {\bf ii}, {\bf I}, and~{\bf II} is the
135: method of undetermined coefficients that requires~$\cO(\order^2
136: \precision^2)$ operations in~$\basefield$ for problem~{\bf i}
137: and~$\cO(\order \precision^2)$ operations in~$\basefield$ for~${\bf
138: ii}$. Regarding the dependence in~$\precision$, this is certainly
139: too expensive compared to the size of the output, which is only linear
140: in~$\precision$ in both cases. On the other hand, verifying the
141: correctness of the output for~{\bf ii} (resp.~{\bf i}) already
142: requires a number of operations in~$\basefield$ which is linear
143: (resp.\ quadratic) in~$\order$: this indicates that there is little
144: hope of improving the dependence in~$\order$. Similarly, for
145: problems~{\bf I} and~{\bf II}, the method of undetermined coefficients
146: requires~$\cO(\precision^2)$ multiplications of~$\order\times \order$
147: scalar matrices (resp.\ of scalar matrix-vector products in
148: size~$\order$), leading to a computational cost which is reasonable
149: with respect to~$\order$, but not with respect to~$\precision$.
150:
151: By contrast, the algorithms proposed in this article have costs that
152: are linear (up to logarithmic factors) in the
153: complexity~$\sM(\precision)$ of polynomial multiplication in degree
154: less than~$\precision$ over~$\basefield$. Using Fast Fourier Transform
155: (FFT) these costs become nearly linear~---~up to polylogarithmic
156: factors~---~with respect to~$\precision$, for all of the four problems
157: above (precise complexity results are stated below). Up to these
158: polylogarithmic terms in~$\precision$, this estimate is probably not
159: far from the lower algebraic complexity one can expect: indeed, the
160: mere check of the correctness of the output requires, in each case, a
161: computational effort proportional to~$\precision$.
162:
163: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
164:
165: \subsection{Newton Iteration}
166: In the case of first-order equations ($r=1$), Brent and Kung have
167: shown in~\cite{BrKu78} (see also~\cite{Geddes1979,KuTr78}) that the problems
168: can be solved with complexity $\cO(\sM(\precision))$ by means of a
169: formal Newton iteration. Their algorithm is based on the fact that
170: solving the first-order differential equation~${y'(t) = a(t) y(t)}$,
171: with~$a(t)$ in~$\basefield[[t]]$ is equivalent to computing the
172: \emph{power series exponential\/}~$\exp(\int a(t))$. This equivalence
173: is no longer true in the case of a system~${Y' = A(t) Y}$
174: (where~$A(t)$ is a power series matrix): for non-commutativity
175: reasons, the matrix exponential~${Y(t)= \exp(\int A(t))}$ is not a
176: solution of~${Y' = A(t) Y}$.
177:
178: Brent and Kung suggest a way to extend their result to higher orders,
179: and the corresponding algorithm has been shown by van der Hoeven
180: in~\cite{vdHoeven02} to have complexity~$\cO(\order^\order
181: \,\sM(\precision))$. This is good with respect to~$\precision$, but
182: the exponential dependence in the order~$\order$ is unacceptable.
183:
184: Instead, we solve this problem by devising a specific Newton iteration
185: for~${Y' = A(t) Y}$. Thus we solve problems {\bf i} and {\bf I} in
186: $\cO(\Mat(\order,\precision))$, where $\Mat(\order,\precision)$ is the
187: number of operations in $\basefield$ required to multiply
188: $\order\times\order$ matrices with polynomial entries of degree less
189: than~$\precision$. For instance, when $\basefield=\mathbb{Q}$, this is
190: $\cO(\order^\omega \precision+r^2\sM(\precision))$, where
191: $\order^\omega$~can be seen as an abbreviation for~$\Mat(\order,1)$, see
192: \S\ref{ssec:complexity} below.
193:
194: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
195:
196: \subsection{Divide-and-conquer}
197: The resolution of problems {\bf i} and {\bf I} by Newton iteration
198: relies on the fact that a whole basis is computed. Dealing with
199: problems {\bf ii} and {\bf II}, we do not know how to preserve this
200: algorithmic structure, while simultaneously saving a factor $\order$.
201:
202: To solve problems~{\bf ii} and~{\bf II}, we therefore propose an
203: alternative algorithm, whose complexity is also nearly linear
204: in~$\precision$ (but not quite as good, being in
205: $\cO(\sM(\precision)\log\precision)$), but whose dependence in the
206: order~$\order$ is better~---~linear for~{\bf i} and quadratic for~{\bf
207: ii}. In a different model of computation with power series, based on
208: the so-called \emph{relaxed multiplication}, van der Hoeven briefly outlines
209: another algorithm~\cite[Section~4.5.2]{vdHoeven02} solving
210: problem~{\bf ii} in~$\cO(\order \,\sM(\precision) \log \precision)$.
211: To our knowledge, this result cannot be transferred to the usual model
212: of power series multiplication (called zealous in~\cite{vdHoeven02}).
213:
214: We use a divide-and-conquer technique similar to that used in the fast
215: Euclidean algorithm~\cite{Knuth70,Schonhage71,Strassen83}. For
216: instance, to solve problem~{\bf ii}, our algorithm divides it into two
217: similar problems of halved size. The key point is that the lowest
218: coefficients of the solution~$y(t)$ only depend on the lowest
219: coefficients of the coefficients~$a_i$. Our algorithm first computes
220: the desired solution~$y(t)$ at precision only~$\precision/2$, then it
221: recovers the remaining coefficients of~$y(t)$ by recursively solving
222: at precision~$\precision/2$ a new differential equation. The main
223: idea of this second algorithm is close to that used for solving
224: first-order difference equations in~\cite{GaGe97}.
225:
226: We encapsulate our main complexity results in
227: Theorem~\ref{theo:linear} below. When FFT is used, the
228: functions~$\sM(\precision)$ and~$\Mat(\order,\precision)$ have, up to logarithmic terms, a nearly linear
229: growth in~$\precision$, see
230: \S\ref{ssec:complexity}. Thus, the results in the following theorem are quasi-optimal.
231: \begin{Theo}\label{theo:linear}
232: Let~$\precision$ and~$\order$ be two positive integers and
233: let\/~$\basefield$ be a field of characteristic zero or at
234: least~$\precision$. Then:
235: \begin{enumerate}
236: \item[(a)] problems\/~{\bf i} and\/~{\bf I} can be solved
237: using~$\cO\left(\Mat(\order,\precision) \right)$ operations
238: in~$\basefield$;
239: \item[(b)] problem\/~{\bf ii} can be solved using~$\cO\left(\order \,
240: \sM (\precision) \log \precision\right)$ operations in~$\basefield$;
241: \item[(c)] problem\/~{\bf II} can be solved using~$\cO\left(\order^2 \,
242: \sM (\precision) \log \precision\right)$ operations in~$\basefield$.
243: \end{enumerate}
244: \end{Theo}
245:
246: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
247:
248: \subsection{Special Coefficients}
249: For special classes of coefficients, we give different algorithms of
250: better complexity. We isolate two important classes of equations: that
251: with constant coefficients and that with polynomial coefficients. In
252: the case of constant coefficients, our algorithms are based on the use
253: of the Laplace transform, which allows us to reduce the resolution of
254: differential equations with constant coefficients to manipulations
255: with rational functions. The complexity results are summarized in the following theorem.
256: \begin{Theo}
257: Let~$\precision$ and~$\order$ be two positive integers and
258: let\/~$\basefield$ be a field of characteristic zero or at
259: least~$\precision$. Then, for differential equations and systems with constant coefficients:
260: \begin{enumerate}
261: \item[(a)] problem\/~{\bf i} can be solved
262: using~$\cO\left(\sM(\order)\,(\order+\precision) \right)$ operations
263: in~$\basefield$;
264: \item[(b)] problem\/~{\bf ii} can be solved using~$\cO\left(\sM(\order)\,(1+\precision/\order)\right)$ operations in~$\basefield$;
265: \item[(c)] problem\/~{\bf I} can be solved using~$\cO\left( \order^{\omega+1}\log\order + \order\sM(\order)\precision \right)$ operations in~$\basefield$;
266: \item[(d)] problem\/~{\bf II} can be solved using~$\cO\left( \order^\omega\log\order + \sM(\order)\precision \right)$ operations in~$\basefield$.
267: \end{enumerate}
268: \end{Theo}
269: In the case of polynomial coefficients, we
270: exploit the linear recurrence satisfied by the coefficients of
271: solutions. In Table~\ref{table1}, we gather the complexity estimates
272: corresponding to the best known solutions for each of the four
273: problems {\bf i}, {\bf ii}, {\bf I}, and~{\bf II} in the general case,
274: as well as in the above mentioned special cases. The algorithms are described in Section~\ref{sec:particular}. In the polynomial
275: coefficients case, these results are well known. In the other cases,
276: to the best of our knowledge, the results improve upon existing
277: algorithms.
278:
279: \begin{table}
280: \renewcommand{\arraystretch}{1.4}
281: $$\begin{array}{||l|l|l|l||l||}\hline\hline % & & & & \\
282: \textsf{Problem} & \textsf{constant} & \textsf{polynomial}
283: & \textsf{power series} & \textsf{output}\\[-2mm]
284: \quad (\textsf{input, output}) & \textsf{coefficients} &
285: \textsf{coefficients} & \textsf{coefficients} & \textsf{size} \\
286: % & & & & \\
287:
288: \hline \hline \textbf{i} \quad (\textsf{equation, basis}) & \cO(\sM(\order)
289: \precision) \;\hfill^\star & \cO(d \order^2 \precision) & \cO(
290: \Mat(\order, \precision)) \;\hfill ^\star & \cO(\order \precision)\\
291:
292: \hline \textbf{ii} \quad (\textsf{equation, one solution}) &
293: \cO(\sM(\order) \precision/\order) \;\hfill^\star &\cO(d
294: \order \precision) &\cO(\order \, \sM(\precision) \log \precision) \;\hfill ^\star & \cO(\precision)\\
295:
296: \hline \hline
297: \textbf{I} \quad (\textsf{system, basis}) &
298: \cO(\order \sM(\order)
299: \precision) \;\hfill^\star & \cO(d \order^\omega \precision)
300: & \cO(\Mat(\order, \precision)) \;\hfill ^\star & \cO(\order^2 \precision)\\
301:
302: \hline \textbf{II} \quad (\textsf{system, one solution}) &
303: \cO(\sM(\order) \precision) \;\hfill ^\star & \cO(d \order^2
304: \precision) & \cO(\order^2 \, \sM(\precision) \log
305: \precision) \;\hfill ^\star & \cO(\order \precision)\\
306:
307: \hline\hline
308: %\quad \quad \textsf{Input size} & \cO(\order^2) & \cO(d \order^2) & \cO(\order^2
309: % \precision) \\ \hline \hline
310: \end{array}$$
311: \caption{Complexity of solving linear differential equations/systems for~$\precision\gg\order$. Entries marked with a~`$\star$' correspond to new results. \label{table1}}
312: \end{table}
313:
314:
315:
316: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
317:
318: \subsection{Non-linear Systems} As an important
319: consequence of Theorem~\ref{theo:linear}, we improve the known
320: complexity results for the more general problem of solving
321: \emph{non-linear} systems of differential equations. To do so, we use
322: a classical reduction technique from the non-linear to the linear
323: case, see for instance~\cite[Section~25]{Rall69}
324: and~\cite[Section~5.2]{BrKu78}. For simplicity, we only consider
325: non-linear systems of first order. There is no loss of generality in
326: doing so, more general cases can be reduced to that one by adding new
327: unknowns and possibly differentiating once. The following result
328: generalizes~\cite[Theorem~5.1]{BrKu78}. If~${F=(F_1,\dots,F_r)}$ is a
329: differentiable function bearing on~$\order$
330: variables~${y_{1},\dots,y_{\order}}$, we use the notation~$\Jac(F)$
331: for the Jacobian matrix~$(\partial F_i/\partial y_j)_{1\leq i,j \leq \order}$.
332:
333: \begin{Theo}\label{theo:non-linear}
334: Let~$\precision$, $\order$ be in~$\mathbb{N}$, let~$\basefield$ be a
335: field of characteristic zero or at least\/~$\precision$ and
336: let~$\varphi$ denote~${(\varphi_1,\dots,\varphi_{\order})}$,
337: where~$\varphi_i(t,y)$ are multivariate power series
338: in\/~$\basefield[[t,y_1,\dots,y_\order]]$.
339: \par
340: Let\/~${\sL :\N \to \N}$ be such that for all~$s(t)$
341: in\/~$\mathcal{M}_{\order \times 1}(\basefield[[t]])$ and for all~$n$
342: in\/~$\mathbb{N}$, the first~$n$ terms of~$\varphi(t,s(t))$ and
343: of\/~$\Jac ({\varphi}) (t,s(t))$ can be computed in~$\sL(n)$ operations
344: in\/~$\basefield$. Suppose in addition that the function~${n \mapsto
345: \sL(n)/n}$ is increasing. Given initial conditions~$v$
346: in\/~$\mathcal{M}_{\order \times 1}(\basefield)$, if the differential
347: system
348: \[y'=\varphi(t,y),\qquad y(0)=v,\] admits a solution
349: in\/~$\mathcal{M}_{\order \times 1} (\basefield[[t]])$, then the
350: first\/~$\precision$ terms of such a solution~$y(t)$ can be computed in
351: % $\cO(\sL(N) + \Mat(\order,\precision))$ operations in $\basefield$.
352: % $\cO(\sL(N) + \order^2 \sM(\precision) \log \precision)$ operations in $\basefield$.
353: $\cO \left(\sL(\precision) + \min (\Mat(\order,\precision), \order^2
354: \sM(\precision) \log \precision) \right)$ operations in~$\basefield$.
355: \end{Theo}
356: Werschulz~\cite[Theorem~3.2]{Werschulz80} gave an algorithm solving
357: the same problem using the integral Volterra-type equation technique
358: described in~\cite[pp.~172--173]{Rall69}. With our notation, his
359: algorithm uses~$\cO \left(\sL(\precision) + \order^2 \precision \,
360: \sM(\precision)) \right)$ operations in~$\basefield$ to compute a
361: solution at precision~$\precision$. Thus, our algorithm is an
362: improvement for cases where $\sL(\precision)$ is known to be
363: subquadratic with respect to~$\precision$.
364:
365: The best known algorithms for power series composition in~${\order
366: \geq 2}$ variables require, at least on ``generic'' entries, a
367: number~${\sL(n) = \cO(n^{\order-1} \sM(n))}$ of operations in
368: $\basefield$ to compute the first~$n$ coefficients of the
369: composition~\cite[Section~3]{BrKu77}. This complexity is nearly
370: optimal with respect to the size of a generic input. By contrast, in
371: the univariate case, the best known result~\cite[Th.~2.2]{BrKu78}
372: is~$\sL(n) = \cO(\sqrt{n \log n}\, \sM(n))$. For special entries,
373: however, better results can be obtained, already in the univariate
374: case: exponentials, logarithms, powers of univariate power series can
375: be computed~\cite[Section~13]{Brent75} in~$\sL(n) = \cO(\sM(n))$. As a
376: consequence, if~$\varphi$ is an~$\order$-variate sparse polynomial
377: with $m$~monomials of \emph{any} degree, then~$\sL(n) = \cO(m \order \,
378: \sM(n))$.
379:
380: Another important class of systems with such a
381: subquadratic~$\sL(\precision)$ is provided by \emph{rational systems},
382: where each~$\varphi_i$ is in~$\basefield(y_1,\dots,y_\order)$.
383: Supposing that the complexity of evaluation of~$\varphi$ is bounded
384: by~$L$ (i.e., for any point~$z$ in~$\basefield^\order$ at
385: which~$\varphi$ is well-defined, the value~$\varphi(z)$ can be
386: computed using at most~$L$ operations in~$\basefield$), then, the
387: Baur-Strassen theorem~\cite{BaSt83} implies that the complexity of
388: evaluation of the Jacobian~$\Jac(\varphi)$ is bounded by~$5L$, and
389: therefore, we can take~${\sL(n)= \sM(n) L}$ in the statement of
390: Theorem~\ref{theo:non-linear}.
391:
392: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
393:
394: \subsection{Basic Complexity Notation} \label{ssec:complexity}
395:
396: Our algorithms ultimately use, as a basic operation, multiplication
397: of matrices with entries that are polynomials (or truncated power
398: series). Thus, to estimate their complexities in a unified manner,
399: we use a function~${\Mat : \N \times \N \to \N}$ such that any two~${r
400: \times r}$ matrices with polynomial entries in~$\basefield[t]$ of
401: degree less than~$d$ can be multiplied using~$\Mat(r,d)$ operations
402: in~$\basefield$. In particular,~$\Mat(1,d)$ represents the number of
403: base field operations required to multiply two polynomials of degree
404: less than~$d$, while~$\Mat(r,1)$ is the arithmetic cost of scalar~${r
405: \times r}$ matrix multiplication. For simplicity, we
406: denote~$\Mat(1,d)$ by~$\sM(d)$ and we have~${\Mat(r,1) =
407: \cO(r^\omega)}$, where~${2 \leq \omega \leq 3}$ is the so-called {\em
408: exponent of the matrix multiplication}, see, e.g.,~\cite{BuClSh97}
409: and~\cite{GaGe99}.
410:
411: Using the algorithms of~\cite{ScSt71,CaKa91}, one can take~$\sM(d)$
412: in~$\cO(d \log d \log \log d)$; over fields supporting FFT, one can
413: take~$\sM(d)$ in~$\cO(d\log d)$. By~\cite{CaKa91} we can always
414: choose~$\Mat(r,d)$ in~${\cO(r^\omega \, \sM(d))}$, but better
415: estimates are known in important particular cases. For instance, over
416: fields of characteristic~$0$ or larger than~$2d$, we have~${\Mat(r,d)
417: = \cO( r^\omega d + r^2 \, \sM(d))}$, see~\cite[Th.~4]{BoSc05}. To
418: simplify the complexity analyses of our algorithms, we suppose that the
419: {multiplication cost} function~$\Mat$ satisfies the following standard
420: growth hypotheses for all integers~$d_{1},d_{2}$ and~$r$: %(see, e.g., \cite{GaGe99}).
421: \begin{equation}\label{hyp:Mat}
422: \Mat(r,d_{1}d_{2}) \leq d_{1}^{2} \Mat (r,d_{2})
423: \qquad \text{and} \qquad
424: \frac{\Mat(r,d_{1})}{d_{1}} \leq \frac{\Mat(r,d_{2})}{d_{2}}
425: \quad \text{if $d_{1} \leq d_{2}$}.
426: \end{equation}
427: In particular, Equation~\eqref{hyp:Mat} implies the inequalities
428: \begin{equation} \label{ineq:Mat}
429: \begin{split}
430: \Mat(r,2^\kappa)+\Mat(r,2^{\kappa-1})+M(r,2^{\kappa-2})+\dots+\Mat(r,1)&
431: \le 2\Mat(r,2^\kappa),\\
432: \sM(2^\kappa)+2\sM(2^{\kappa-1})+4\sM(2^{\kappa-2})+\dots+2^\kappa\sM(1)&
433: \le (\kappa+1)\sM(2^\kappa).
434: \end{split}
435: \end{equation}
436: These inequalities are crucial to prove the estimates in
437: Theorem~\ref{theo:linear} and Theorem~\ref{theo:non-linear}. Note
438: also that when the available multiplication algorithm is slower than
439: quasi-linear (e.g., Karatsuba or naive multiplication), then in the
440: second inequality, the factor~$(\kappa+1)$ can be replaced by a constant
441: and thus the estimates $\sM(\precision)\log \precision$ in our complexities become
442: $\sM(\precision)$ in those cases.
443:
444: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
445:
446: \subsection{Notation for Truncation}
447:
448: It is recurrent in algorithms to split a polynomial into a lower and a
449: higher part. To this end, the following notation proves convenient.
450: Given a polynomial~$f$, the remainder and quotient of its Euclidean
451: division by~$t^k$ are respectively denoted $\trunch fk$ and~$\truncl
452: fk$. Another occasional operation consists in taking a middle part
453: out of a polynomial. To this end, we let $\trunc fkl$
454: denote~$\truncl{\trunch fl}{k}$. Furthermore, we shall write $f=g\mod
455: t^k$ when two polynomials or series $f$ and~$g$ agree up to
456: degree~$k-1$ included. To get a nice behaviour of integration with
457: respect to truncation orders, all primitives of series are chosen with
458: zero as their constant coefficient.
459:
460: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
461: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
462: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
463:
464: \section{Newton Iteration for Systems of Linear Differential
465: Equations}
466:
467: Let~${Y'(t) = A(t) Y(t)+B(t)}$ be a linear differential system,
468: where~$A(t)$ and~$B(t)$ are~${\order \times \order}$ matrices with
469: coefficients in~$\basefield[[t]]$. Given an invertible scalar
470: matrix~$Y_0$, an integer~${\precision \geq 1}$, and the expansions
471: of~$A$ and~$B$ up to precision~$\precision$, we show in this section
472: how to compute efficiently the power series expansion at
473: precision~$\precision$ of the unique solution of the Cauchy problem
474: $$Y'(t) = A(t) Y(t)+B(t) \quad \text{and} \quad Y(0) = Y_0.$$
475: This enables us to answer problems \textbf{I} and \textbf{i}, the
476: latter being a particular case of the former (through the application
477: to a companion matrix).
478:
479: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
480:
481: \subsection{Homogeneous Case}
482: First, we design a Newton-type iteration to solve the homogeneous
483: system~${Y'=A(t)Y}$. The classical Newton iteration to solve an
484: equation $\phi(y)=0$ is $Y_{\kappa+1}=Y_\kappa-U_\kappa$, where
485: $U_\kappa$ is a solution of the linearized equation
486: $D\phi|_{Y_\kappa}\cdot U=\phi(Y_\kappa)$ and $D\phi|_{Y_\kappa}$ is
487: the differential of~$\phi$ at~$Y_\kappa$. We apply this idea to the
488: map~${\phi: Y \mapsto Y'-AY}$. Since~$\phi$ is linear, it is its own
489: differential and the equation for~$U$ becomes
490: $$U'-AU=Y'_\kappa-AY_\kappa.$$
491: Taking into account the proper orders of truncation and using
492: Lagrange's method of variation of
493: parameters~\cite{Lagrange1869,Ince56}, we are thus led to the
494: iteration
495: \[\begin{cases}Y_{\kappa+1} &= Y_\kappa - \trunch {U_\kappa}
496: {2^{\kappa+1}},\\
497: U_{\kappa}& = Y_\kappa \int
498: \trunch{Y_\kappa^{-1}}{2^{\kappa+1}} \left(Y_\kappa' -
499: \trunch{A}{2^{\kappa+1}} Y_\kappa\right).
500: \end{cases}
501: \]
502: Thus we need to compute (approximations
503: of) the solution~$Y$ and its inverse simultaneously. Now, a well-known Newton
504: iteration for the inverse $Z$ of $Y$ is
505: \begin{equation}\label{Newton:inverse}
506: Z_{\kappa+1} =
507: \trunch
508: {Z_{\kappa} + Z_{\kappa} (I_\order - Y Z_{\kappa})}
509: {2^{\kappa+1}}.
510: \end{equation}
511: It was introduced by Schulz~\cite{Schulz33} in
512: the case of real matrices; its version for matrices of power series is
513: given for instance in~\cite{MoCa79}.
514:
515: \begin{figure}
516: \begin{center}
517: \fbox{\begin{minipage}{9cm}
518: \medskip
519: \begin{center}\textsf{SolveHomDiffSys}($A,\precision,Y_0$) \end{center}
520: \textbf{Input:} ${Y_0,A_0, \dots, A_{\precision-2}}$
521: in~$\mathcal{M}_{\order\times\order}(\basefield)$,
522: ${A = \sum A_i t^i}$.
523: \par\smallskip
524: \textbf{Output:} ${Y=\sum_{i=0}^{\precision-1}Y_i t^i}$ in
525: $\mathcal{M}_{\order\times\order}(\basefield)[t]$ such that
526: ${Y' = A Y \mod t^{\precision-1}}$, and $Z=Y^{-1}\mod t^{\precision/2}$.
527:
528: \begin{tabbing}
529: \;\;\\$Y \leftarrow (I_{\order}+ t A_0) Y_0$
530: \\$ Z \leftarrow Y_0^{-1}$
531: \\$m \leftarrow 2$\\
532: \textsf{while} $m \leq \precision/2$ \textsf{do}\\
533: \hspace{0.5cm} $Z \leftarrow Z + \trunch {Z(I_{\order} - YZ)}{m} $\\
534: \hspace{0.5cm} $Y \leftarrow Y - \trunch {Y\left(\int Z (Y' - \trunch{A}{2m-1} Y) \right)}{2m} $ \\
535: %$+\sum_{i}\textsf{Coeff}(M', i)\frac{T^i}{i}$\\
536: \hspace{0.5cm} $m \leftarrow 2m$ \\
537: \textsf{return} $Y,Z$
538: \end{tabbing}
539: \end{minipage}
540: }\end{center}
541: \caption{Solving the Cauchy problem~$Y' = A(t) Y$, $Y(0) = Y_0$ by Newton iteration.}
542: \label{fig:hom}
543: \end{figure}
544: Putting together these considerations, we arrive at the algorithm
545: \textsf{SolveHomDiffSys} in Figure~\ref{fig:hom}, whose correctness
546: easily follows from Lemma~\ref{prop:Newton} below. Remark
547: that in the scalar case~(${\order=1}$) algorithm
548: \textsf{SolveHomDiffSys} coincides with the algorithm for power series
549: exponential proposed by Hanrot and Zimmermann~\cite{HaZi04}; see
550: also~\cite{Bernstein}. In the case~${\order>1}$, ours is a nontrivial
551: generalization of the latter. Because it takes primitives of series at
552: precision~$\precision$, algorithm \textsf{SolveHomDiffSys} requires
553: that the elements~${2,3,\dots,\precision-1}$ be invertible
554: in~$\basefield$. Its complexity~$\sC$ satisfies the
555: recurrence~${\sC(m) = \sC(m/2) + \cO(\sM(\order,m))}$, which
556: implies~---~using the growth hypotheses on~$\sM$~---~that~${\sC(\precision)
557: = \cO(\sM(\order,\precision))}$. This proves the first assertion of
558: Theorem~\ref{theo:linear}.
559: % It computes simultaneously the solutions $(Y,Z)$ of the problems
560: % $$Y'-AY = 0 \bmod t^{\precision-1} \quad \text{and} \quad Z'+Z A = 0 \bmod
561: % t^{\precision/2-1}.$$
562: %\bigskip
563:
564: % This is based on the following result, allowing to double the
565: % precision of the solution, by using only polynomial matrix operations.
566: \smallskip
567:
568: \begin{Lemme}\label{prop:Newton}
569: Let~$m$ be an even integer. Suppose
570: that~$Y_{(0)}, Z_{(0)}$ in~$\mathcal{M}_{\order\times\order}(\basefield[t])$ satisfy
571: \begin{equation*}
572: I_{\order} - Y_{(0)} Z_{(0)} = 0 \mod t^{m/2} \quad \text{and} \quad
573: Y_{(0)}' - AY_{(0)} = 0 \mod t^{m-1},
574: \end{equation*}
575: and that they are of degree less than $m/2$ and~$m$, respectively.
576: Define
577: \begin{equation*}
578: Z:=\trunch {Z_{(0)} \left(2I_{\order} - Y_{(0)} Z_{(0)} \right)} {m} \quad \text{and} \quad
579: Y:=\trunch {Y_{(0)} \left(I_{\order} - \int Z (Y_{(0)}'-AY_{(0)}) \right)} {2m}.
580: \end{equation*}
581: Then~$Y$ and~$Z$ satisfy the equations
582: \begin{equation} \label{eq:double}
583: I_{\order} - Y Z = 0 \mod t^{m} \quad \text{and} \quad
584: Y' - AY = 0 \mod t^{2m-1}.
585: \end{equation}
586: \end{Lemme}
587: \proof Using the definitions of~$Y$ and~$Z$, it follows that
588: $$
589: I_{\order} - YZ = (I_{\order} -Y_{(0)} Z_{(0)})^2 - (Y -
590: Y_{(0)}) Z_{(0)} (2I_{\order} -Y_{(0)} Z_{(0)}) \mod t^m.
591: $$
592: Since by hypothesis~${I_{\order} -Y_{(0)} Z_{(0)}}$ and~${Y -
593: Y_{(0)}}$ are zero modulo~$t^{m/2}$, the right-hand side is zero
594: modulo~$t^m$ and this establishes the first formula in
595: Equation~\eqref{eq:double}. Similarly, write~${Q= \int Z
596: (Y_{(0)}'-AY_{(0)})}$ and observe $Q=0\mod t^m$ to get the equality
597: $$
598: Y' - AY = (I-YZ) (Y_{(0)}' - AY_{(0)}) - (Y_{(0)}' -
599: AY_{(0)}) Q \mod t^{2m-1}.
600: $$
601: Now,~${Y_{(0)}' - AY_{(0)}}=0 \mod t^{m-1}$, while~$Q$
602: and~${I_{\order} -YZ}$ are zero modulo~$t^{m}$ and therefore
603: the right-hand side of the last equation is zero modulo~$t^{2m-1}$,
604: proving the last part of the lemma.
605: \foorp
606:
607: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
608:
609: \subsection{General Case}
610: We want to solve the equation~${Y'=AY +B}$, where~$B$ is an~${\order
611: \times \order}$ matrix with coefficients in~$\basefield[[t]]$.
612: Suppose that we have already computed the solution~$\widetilde{Y}$ of
613: the associate homogeneous equation~${\widetilde{Y}'=A \widetilde{Y}}$,
614: together with its inverse~$\widetilde{Z}$. Then, by the method of
615: variation of parameters, ${Y_{(1)}= \widetilde{Y} \int \widetilde{Z}
616: B}$ is a particular solution of the inhomogeneous problem, thus the
617: general solution has the form~${Y = Y_{(1)}+\widetilde{Y}}$.
618:
619: \begin{figure}
620: \begin{center}
621: \fbox{\begin{minipage}{9.5cm}
622: \medskip
623: \begin{center}\textsf{SolveInhomDiffSys}($A,B,\precision,Y_0$) \end{center}
624: \textbf{Input:} ${Y_0,A_0, \dots, A_{\precision-2}}$ in~$\mathcal{M}_{\order\times\order}(\basefield)$,
625: ${A = \sum A_i t^i}$,
626: \par\smallskip
627: ${B_0, \dots, B_{\precision-2}}$ in~$\mathcal{M}_{\order\times\order}(\basefield)$,
628: ${B(t) = \sum B_i t^i}$.
629: \par\medskip
630: \textbf{Output:} ${Y_1,\dots,Y_{\precision-1}}$
631: in~$\mathcal{M}_{\order\times\order}(\basefield)$ such that ${Y=Y_0 + \sum Y_i
632: t^i}$ satisfies~${Y' = A Y + B \mod t^{\precision-1}}$.
633:
634: \begin{tabbing}
635: \;\;\\$\widetilde{Y},\widetilde{Z} \leftarrow \textsf{SolveHomDiffSys} (A,\precision,Y_0)$ \\
636: $\widetilde{Z} \leftarrow \widetilde{Z} + \trunch {\widetilde{Z}(I_\order - \widetilde{Y}\widetilde{Z})} {\precision}$\\
637: $Y \leftarrow \trunch {\widetilde{Y} \int (\widetilde{Z} B)} {\precision}$ \\
638: $Y \leftarrow Y + \widetilde{Y}$\\
639: \textsf{return} $Y$
640: \end{tabbing}
641: \end{minipage}
642: }\end{center}
643: \caption{Solving the Cauchy problem $Y' = A Y + B, \; Y(0) = Y_0$ by Newton iteration.}
644: \label{fig:inhom}
645: \end{figure}
646:
647: Now, to compute the particular solution~$Y_{(1)}$ at
648: precision~$\precision$, we need to know both~$\widetilde{Y}$
649: and~$\widetilde{Z}$ at the same precision~$\precision$. To do this, we
650: first apply the algorithm for the homogeneous case and
651: iterate~\eqref{Newton:inverse} once. The resulting algorithm is
652: encapsulated in Figure~\ref{fig:inhom}.
653:
654: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
655: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
656: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
657:
658: \section{Divide-and-conquer Algorithm}\label{sec:DAC}
659:
660: We now give our second algorithm, which allows us to solve problems
661: {\bf ii} and~{\bf II} and to finish the proof of
662: Theorem~\ref{theo:linear}. Before entering a detailed presentation,
663: let us briefly sketch the main idea in the particular case of a
664: homogeneous differential equation~${\mathcal{L}y=0}$,
665: where~$\mathcal{L}$ is a linear differential operator in~${\delta = t
666: \frac{d}{dt}}$ with coefficients in~$\basefield[[t]]$.
667: % FC, 11/04/2006: Je re-coupe, sinon cette phrase est trop longue !
668: (The introduction of~$\delta$ is only for pedagogical reasons.) The
669: starting remark is that if a power series~$y$ is written as~${y_0 +
670: t^m y_1}$, then~${\mathcal{L}(\delta)y = \mathcal{L}(\delta)y_0 +
671: t^m\mathcal{L}(\delta + m)y_1}$. Thus, to compute a solution~$y$
672: of~${\mathcal{L}(\delta) y = 0 \mod t^{2m}}$, it suffices to determine
673: the lower part of~$y$ as a solution of ${\mathcal{L}(\delta) y_0 = 0
674: \mod t^m}$, and then to compute the higher part~$y_1$, as a solution
675: of the inhomogeneous equation~${\mathcal{L}(\delta + m) y_1 = - R \mod
676: t^{m}}$, where the rest~$R$ is computed so that~${\mathcal{L}(\delta)
677: y_0 = t^m R \mod t^{2m}}$.
678:
679: Our algorithm \textsf{DivideAndConquer} makes a recursive use of this idea. Since, during the
680: recursions, we are naturally led to treat inhomogeneous equations of a
681: slightly more general form than that of~{\bf II} we introduce the
682: notation~$\mathcal{E}(s,p,m)$ for the vector equation
683: \begin{equation*}
684: t y' + (p I_\order - tA) y = s \mod t^{m}.
685: \end{equation*}
686: The algorithm is described in Figure~\ref{fig:algo-dac}.
687: Choosing~${p=0}$ and~${s(t) =t b(t)}$ we retrieve the equation of
688: problem~{\bf II}. Our algorithm \textsf{Solve} to solve problem~{\bf
689: II} is thus a specialization of \textsf{DivideAndConquer}, defined by
690: making \textsf{Solve}$(A,b,\precision,v)$ simply call
691: \textsf{DivideAndConquer}$(tA,tb,0,\precision,v)$. Its correctness relies on
692: the following immediate lemma.
693:
694: \begin{figure}\label{fig:algo-dac}
695: \begin{center}
696: \fbox{\begin{minipage}{8.5 cm}
697: \medskip
698: \begin{center}\textsf{DivideAndConquer($A,s,p,m,v$)} \end{center}
699: \textbf{Input:} $A_0,\dots,A_{m-1}$ in~$\mathcal{M}_{\order\times\order}(\basefield)$,
700: ${A = \sum A_i t^i}$, $s_0,\dots,s_{m-1},v$ in~$\mathcal{M}_{\order\times1}(\basefield)$,
701: ${s = \sum s_i t^i}$, $p$ in~$\basefield$.
702: \par\smallskip
703: \textbf{Output:} ${y=\sum_{i=0}^{\precision-1}y_i t^i}$ in
704: $\mathcal{M}_{\order\times1}(\basefield)[t]$ such that
705: ${ty' + (pI_{\order}-tA)y=s \mod t^m}$, ${y(0)=v}$.
706:
707: \begin{tabbing}
708: \textsf{If}~$m=1$ \textsf{then} \\
709: {\quad \textsf{if}} $p=0$ \textsf{then} \\
710: {\quad \quad \textsf{return}} $v$\\
711: {\quad else} \textsf{return} $p^{-1} s(0)$\\
712: \textsf{end if}\\
713: $d \leftarrow \intpart{m/2}$\\
714: $s \leftarrow \trunch s{d}$\\
715: $y_0 \leftarrow$ {\sf DivideAndConquer}($A,s,p,d,v$)\\
716: $R \leftarrow \trunc{s- t y_0' - (p I_\order -tA) y_0}{d}{m} $ \\
717: $y_1 \leftarrow$ {\sf DivideAndConquer}($A, R, p+d, m-d,v$)\\
718: \textsf{return} $y_0 + t^d y_1$
719: \end{tabbing}
720: \end{minipage}
721: }\end{center}
722: \caption{Solving $ty' + (pI_{\order}-tA)y=s \mod t^m$, ${y(0)=v}$,
723: by divide-and-conquer.}
724: \label{fig:2}
725: \end{figure}
726:
727: \begin{Lemme}
728: Let~$A$ in~$\mathcal{M}_{\order\times\order}(\basefield[[t]])$, $s$
729: in~$\mathcal{M}_{\order \times 1}(\basefield[[t]])$, and let~$p,d$
730: in~$\mathbb{N}$. Decompose~$\trunch sm$ into a sum~${s_0 +
731: t^d s_1}$. Suppose that~$y_0$
732: in~$\mathcal{M}_{\order\times1}(\basefield[[t]])$ satisfies the
733: equation~$\mathcal{E}(s_0,p,d)$, set $R$ to be
734: \begin{equation*}
735: \trunch {(ty'_0 + (pI_\order - t A) y_0 - s_0)/t^d} {m-d},
736: \end{equation*}
737: and let~$y_1$ in~$\mathcal{M}_{\order \times 1}(\basefield[[t]])$ be
738: a solution of the equation~${\mathcal{E}(s_1-R,p+d,m-d)}$. Then the
739: sum $y:= y_0 + t^d y_1$ is a solution of the
740: equation~$\mathcal{E}(s,p,m)$.
741: \end{Lemme}
742:
743: The only divisions performed along our algorithm~\textsf{Solve} are by 1, \dots, $\precision-1$.
744: As a consequence of this remark and of the previous lemma, we deduce the complexity estimates in the proposition below;
745: for a general matrix~$A$, this proves point~(c) in Theorem~\ref{theo:linear}, while the
746: particular case when $A$~is companion proves point~(b).
747:
748: \begin{Prop}
749: Given the first~$m$ terms of the entries
750: of~$A\in\mathcal{M}_{\order\times\order}(\basefield[[t]])$ and
751: of~$s\in\mathcal{M}_{\order \times 1}(\basefield[[t]])$,
752: given~$v\in\mathcal{M}_{\order \times 1}(\basefield)$,
753: algorithm~$\emph{\textsf{DivideAndConquer}}(A,s,p,m,v)$ computes a
754: solution of the linear differential system~${ty' + (pI_{\order}-tA)
755: y=s \mod t^m}$, ${y(0)=v}$, using~${\cO(\order^2 \, \sM(m) \log m)}$
756: operations in~$\basefield$. If $A$ is a companion matrix, the cost
757: reduces to ${\sC(m) = \cO(\order \, \sM(m) \log m)}$.
758: \end{Prop}
759: \proof The correctness of the algorithm follows from the previous
760: Lemma. The cost~$\sC(m)$ of the algorithm satisfies the recurrence
761: $$ \sC(m) = \sC(\intpart{m/2}) + \sC(\trunch{m/2}{}) + \order^2 \, \sM(m)
762: + \cO(\order m),$$ where the term $\order^2 \, \sM(m)$ comes from the
763: application of $A$ to $y_0$ used to compute the rest~$R$. From this
764: recurrence, it is easy to infer that~${\sC(m) = \cO(\order^2 \, \sM(m)
765: \log m)}$. Finally, when $A$ is a companion matrix, the vector~$R$ can
766: be computed in time $O(\order \, \sM(m))$, which implies that in this
767: case~${\sC(m) = \cO(\order \, \sM(m) \log m)}$.
768: \foorp
769:
770: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
771: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
772: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
773:
774: \section{Faster Algorithms for Special Coefficients}\label{sec:particular}
775:
776: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
777:
778: \subsection{Constant Coefficients}\label{ssec:const-coeffs}
779: Let~$A$ be a constant~${\order \times \order}$ matrix and let~$v$ be a
780: vector of initial conditions. Given~${\precision \geq 1}$, we want to
781: compute the first~$\precision$ coefficients of the series expansion of
782: a solution~$y$ in~$\mathcal{M}_{\order \times 1}(\basefield[[t]])$
783: of~${y' = Ay}$, with~${y(0) = v}$. In this setting, many various
784: algorithms have been proposed to solve problems {\bf i}, {\bf ii},
785: {\bf I}, and {\bf II}, see for
786: instance~\cite{Pennell26,Putzer66,Kirchner67,Fulmer75,MoLo78,Leonard96,Liz98,Gu99,Gu01,HaFiSm01,MoLo03,LuRo04}.
787: Again, the most naive algorithm is based on the method of undetermined
788: coefficients. On the other hand, most books on differential equations,
789: see, e.g., \cite{Ince56,Coddington61,Arnold92} recommend to simplify
790: the calculations using the Jordan form of matrices. The main drawback
791: of that approach is that computations are done over the algebraic
792: closure of the base field~$\basefield$. The best complexity result
793: known to us is given in~\cite{LuRo04} and it is quadratic in~$\order$.
794:
795: We concentrate first on problems~{\bf ii} and~{\bf II} (computing a
796: single solution for a single equation, or a first-order system).
797: Our algorithm for problem~{\bf II}
798: uses~${\cO(\order^\omega \log \order + \precision \sM(\order))}$
799: operations in~$\basefield$ for a general constant matrix~$A$ and
800: only~$\cO(\precision \sM(\order)/\order)$ operations in~$\basefield$ in
801: the case where~$A$ is a companion matrix (problem {\bf ii}). Despite
802: the simplicity of the solution, this is, to the best of our knowledge,
803: a new result.
804:
805: In order to compute~${y_\precision = \sum_{i=0}^\precision{A^i v
806: t^i/i!}}$, we first compute its Laplace
807: transform~${z_\precision=\sum_{i=0}^\precision {A^i v t^i}}$: indeed,
808: one can switch from~$y_\precision$ to~$z_\precision$ using
809: only~$\cO(\precision \order)$ operations in $\basefield$. The
810: vector~$z_\precision$ is the truncation at order~${\precision + 1}$
811: of~${z=\sum_{i\ge0} A^i v t^i =(I-tA)^{-1} v}$. As a byproduct of a
812: more difficult question,~\cite[Prop.~10]{Storjohann02} shows
813: that~$z_\precision$ can be computed using~$\cO(\precision
814: \order^{\omega-1})$ operations in~$\basefield$. We propose a solution
815: of better complexity.
816:
817: By Cramer's rule,~$z$ is a vector of rational functions~$z_i(t)$, of
818: degree at most~$\order$. The idea is to first compute~$z$ as a
819: rational function, and then to deduce its expansion
820: modulo~$t^{\precision +1}$. The first part of the algorithm does not
821: depend on~$\precision$ and thus it can be seen as a precomputation.
822: For instance, one can use%Algorithm \texttt{SeriesSolutionSmallRHS} in
823: ~\cite[Corollary~12]{Storjohann02}, to compute $z$ in
824: complexity~$\cO(\order^{\omega} \log \order)$. In the second step of
825: the algorithm, we have to expand~$\order$ rational functions of degree
826: at most~$\order$ at precision~$\precision$. Each such expansion
827: can be performed using~$\cO(\precision\sM(\order)/\order)$ operations
828: in~$\basefield$, see, e.g., the proof of~\cite[Prop.~1]{BoFlSaSc05}.
829: The total cost of the algorithm is thus~${\cO(\order^\omega\log \order
830: + \precision \sM(\order))}$. We give below a simplified variant with
831: same complexity, avoiding the use of the algorithm
832: in~\cite{Storjohann02} for the precomputation step and relying instead
833: on a technique which is classical in the computation of minimal
834: polynomials~\cite{BuClSh97}.
835: \begin{enumerate}
836: \item Compute the vectors~$v,Av,A^2 v,A^3v,\dots,A^{2r}v$
837: in~$\cO(\order^\omega\log \order)$, as follows: \\ for~$\kappa$
838: from~$1$ to~${1 + \log \order}$ do
839: \begin{enumerate}
840: \item compute~$A^{2^\kappa}$
841: \item compute~${A^{2^\kappa} \times [v | Av | \cdots | A^{2^\kappa-1}v]}$,
842: thus getting~${[A^{2^\kappa}v | A^{2^\kappa+1}v | \cdots | A^{2^{\kappa+1}-1}v]}$
843: \end{enumerate}
844: \item For each~${j=1, \dots, \order}$:
845: \begin{enumerate}
846: \item recover the rational fraction whose series expansion
847: is~$\sum{(A^i v)_j t^i}$ by Pad\'e approximation
848: in~$\cO(\sM(\order)\log \order)$ operations
849: \item compute its expansion up to precision $t^{\precision + 1}$
850: in~$\cO(\precision \, \sM(\order)/\order)$ operations
851: \item recover the expansion of~$y$ from that of~$z$,
852: using~$\cO(\precision)$ operations.
853: \end{enumerate}
854: \end{enumerate}
855: This yields the announced total cost of ~${\cO(\order^\omega \log
856: \order + \precision \sM(\order))}$ operations for problem {\bf II}.
857:
858: We now turn to the estimation of the
859: cost for problems~{\bf i} and~{\bf I} (bases of solutions).
860: In the case of equations with constant coefficients, we use the
861: Laplace transform again. If $y = \sum_{i \geq 0} y_i t^i$ is a
862: solution of an order $\order$ equation with constant coefficients,
863: then the sequence $(z_i)=(i! y_i)$ is generated by a linear recurrence
864: with constant coefficients. Hence, the first terms $z_1,\dots,z_\precision$ can
865: be computed in $O(\precision\sM(\order)/\order)$ operations, using again the
866: algorithm described in~\cite[Prop.~1]{BoFlSaSc05}.
867: For problem~{\bf I}, the exponent~$\omega+1$ in the cost of the precomputation can be reduced to~$\omega$ by a very different approach; we cannot give the details here for space limitation.
868:
869:
870: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
871:
872: \subsection{Polynomial Coefficients}
873: If the coefficients in one of the problems {\bf i, ii, I}, and~{\bf II}
874: are polynomials in~$\basefield[t]$ of degree at most~$d$, using the
875: linear recurrence of order~$d$ satisfied by the coefficients of the
876: solution seemingly yields the lowest possible complexity.
877: Consider for instance problem~{\bf II}.
878: Plugging~${A=\sum_{i=0}^d t^i A_i}$, ${b=\sum_{i=0}^d t^i
879: b_i}$, and~${y=\sum_{i\geq 0}^d t^i y_i}$ in the
880: equation~${y'=Ay+b}$, we arrive at the following recurrence
881: $$
882: y_{k+d+1} = (d+k+1)^{-1} (A_d y_k + A_{d-1}y_{k+1} + \dots + A_0
883: y_{k+d} + b_{k+d}), \quad \text{for all $k \geq -d$}.
884: $$
885: Thus, to compute~$y_0,\dots,y_\precision$, we need to
886: perform~${\precision d}$ matrix-vector products; this is done
887: using~${\cO (d \precision \order^2)}$ operations in~$\basefield$. A
888: similar analysis implies the other complexity estimates in the third
889: column of Table~\ref{table1}.
890:
891: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
892: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
893: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
894:
895: \section{Non-linear Systems of Differential Equations}
896: Let~${\varphi(t,y) = (\varphi_1(t,y), \dots, \varphi_\order(t,y))}$,
897: where each~$\varphi_i$ is a power series
898: in~$\basefield[[t,y_1,\dots,y_\order]]$. We consider the first-order
899: non-linear system in~$y$
900: \[
901: %(\mathcal{N})\qquad \left\{
902: %\begin{aligned}
903: % y_1'(t)& = \varphi_1(t,y_1(t),\dots,y_\order(t)), \\
904: %&\,\,\vdots \\
905: % y_\order'(t)& = \varphi_\order(t,y_1(t),\dots,y_\order(t)).
906: %\end{aligned}
907: %\right.
908: (\mathcal{N})\qquad % \left\{
909: y_1'(t) = \varphi_1(t,y_1(t),\dots,y_\order(t)), \quad\dots,\quad
910: y_\order'(t) = \varphi_\order(t,y_1(t),\dots,y_\order(t)).
911: %\right.
912: \]
913:
914: To solve~($\mathcal{N}$), we use the classical technique of
915: \emph{linearization}. The idea is to attach, to an \emph{approximate}
916: solution~$y_0$ of~($\mathcal{N}$), a \emph{tangent} system in the new unknown~$z$,
917: $$
918: (\mathcal{T},y_0) \qquad z' = \Jac(\varphi)(y_0) z - y_0'+
919: \varphi(y_0),
920: $$
921: which is linear and whose solutions serve to obtain a
922: better approximation of a true solution of~($\mathcal{N}$). Indeed,
923: let us denote by~$(\mathcal{N}_m),(\mathcal{T}_m)$ the
924: systems~$(\mathcal{N}),(\mathcal{T})$ where all the equalities are
925: taken modulo~$t^m$. Taylor's formula states that the
926: expansion~${\varphi(y+z) - \varphi(y) - \Jac(\varphi)(y) z}$
927: is equal to~$0$ modulo~$z^2$.
928: It is a simple matter to check that if~$y$ is a
929: solution of~$(\mathcal{N}_m)$ and if~$z$ is a solution
930: of~$(\mathcal{T}_{2m},y)$, then~${y+z}$ is a solution
931: of~$(\mathcal{N}_{2m})$. This justifies the correctness of
932: Algorithm {\sf SolveNonLinearSys}.
933:
934: To analyze the complexity of this algorithm, it suffices to remark
935: that for each integer~$\kappa$ between $1$ and~$\intpart{\log \precision}$,
936: one has to compute one solution of a linear inhomogeneous first-order
937: system at precision~$2^\kappa$ and to evaluate~$\varphi$ and its
938: Jacobian on a series at the same precision. This concludes the proof of Theorem~\ref{theo:non-linear}.
939:
940:
941: \begin{figure}[h]
942: \begin{center}
943: \fbox{\begin{minipage}{9 cm}
944:
945: \medskip
946: \begin{center}\textsf{SolveNonLinearSys}($\phi,v$) \end{center}
947: \textbf{Input:} $\precision$ in~$\mathbb{N}$,
948: $\varphi(t,y)$ in~$\basefield[[t,y_1,\dots,y_\order]]^{\order}$,
949: $v$ in~$\basefield^\order$
950: \par\smallskip
951: \textbf{Output:} first~$\precision$ terms of a~$y(t)$
952: in~$\basefield[[t]]$ such that~${y(t)' = \varphi(t,y(t)) \mod
953: t^\precision}$ and~${y(0) = v}$.
954: \begin{tabbing}
955: \;\; \\$m \leftarrow 1$\\
956: $y \leftarrow v$ \\
957: \textsf{while} $m \leq \precision/2$ \textsf{do}\\
958: \hspace{0.5cm} $A \leftarrow \trunch{\Jac(\varphi) (y)}{2m}$\\
959: \hspace{0.5cm} $b \leftarrow \trunch{\varphi (y) - y'}{2m}$ \\
960: % \hspace{0.5cm} $z \leftarrow \textsf{Solve}(z' = Az + b \mod t^{2m}, z(0)=0)$ \\
961: \hspace{0.5cm} $z \leftarrow \textsf{Solve}(A, b, 2m, 0)$ \\
962: \hspace{0.5cm} $y \leftarrow y + z$ \\
963: \hspace{0.5cm} $m \leftarrow 2m$ \\
964: \textsf{return} $y$
965: \end{tabbing}
966: \end{minipage}
967: }\end{center}
968: \caption{Solving the non-linear differential system ${y' = \varphi(t,y), \; y(0) = v}$.}
969: \label{fig:nonlinear}
970: \end{figure}
971:
972:
973: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
974: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
975: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
976:
977: \section{Implementation and Timings}
978:
979: We implemented our algorithms \textsf{SolveDiffHomSys} and
980: \textsf{Solve} in Magma~\cite{Magma} and ran the programs on an Athlon processor at 2.2~GHz
981: with 2~GB of memory.\footnote{All the computations have been done on the machines of
982: the MEDICIS ressource center
983: \url{http://medicis.polytechnique.fr}.} We used Magma's built-in
984: polynomial arithmetic (using successively naive, Karatsuba and FFT
985: multiplication algorithms), as well as Magma's scalar matrix
986: multiplication (of cubic complexity in the ranges of our interest).
987: We give three tables of timings. First, we compare in Figure
988: \ref{fig:benchs} the performances of our algorithm
989: \textsf{SolveDiffHomSys} with that of the naive quadratic algorithm,
990: for computing a basis of (truncated power series) solutions of a
991: homogeneous system. The order of the system varies from $2$ to $16$,
992: while the precision required for the solution varies from 256 to
993: 4096; the base field is $\mathbb{Z}/p\mathbb{Z}$, where $p$ is a 32-bit prime.
994:
995:
996: \begin{figure}
997: \begin{center}
998: \begin{tabular}{c||c|c|c|c}
999: $ \precision \ddots \order$ & 2 & 4 & 8 & 16 \\
1000: \hline
1001: 256 & 0.02 \text{vs.} 2.09 & 0.08 \text{vs.} 6.11 & 0.44 \text{vs.} 28.16 & 2.859 \text{vs.} 168.96 \\
1002: 512 & 0.04 \text{vs.} 8.12 & 0.17 \text{vs.} 25.35 & 0.989 \text{vs.} 113.65 & 6.41 \text{vs.} 688.52 \\
1003: 1024 & 0.08 \text{vs.} 32.18 & 0.39 \text{vs.} 104.26 & 2.30 \text{vs.} 484.16 & 15 \text{vs.} 2795.71\\
1004: 2048 & 0.18 \text{vs.} 128.48 & 0.94 \text{vs.} 424.65 & 5.54 \text{vs.} 2025.68 & 36.62 \text{vs.} $> 3$\text{hours} $^\star$\\
1005: 4096 & 0.42 \text{vs.} 503.6 & 2.26 \text{vs.} 1686.42 & 13.69 \text{vs.} 8348.03 & 92.11 \text{vs.} $> 1/2$ \text{day}$^\star$ \\
1006:
1007: \end{tabular}
1008: \end{center}
1009: \caption{Computation of a basis of a linear homogeneous system with
1010: $\order$ equations, at precision $\precision$: comparison of timings
1011: (in seconds) between algorithm \textsf{SolveDiffHomSys} and the
1012: naive algorithm. Entries marked with a `$\star$' are estimated timings.}
1013: \label{fig:benchs}
1014: \end{figure}
1015:
1016: Then we display in Figure~\ref{fig:matmul} and Figure~\ref{fig:newton}
1017: the timings obtained respectively with
1018: algorithm~\textsf{Solve\-DiffHomSys} and with the algorithm for
1019: polynomial matrix multiplication \textsf{PolyMatMul} that was used as
1020: a primitive of \textsf{SolveDiffHomSys}. The similar shapes of the two
1021: surfaces indicate that the complexity prediction of point (a) in
1022: Theorem~\ref{theo:linear} is well respected in our implementation:
1023: \textsf{SolveDiffHomSys} uses a constant number (between 4 and 5) of
1024: polynomial multiplications; note that the abrupt jumps at powers of 2
1025: reflect the performance of Magma's FFT implementation of polynomial
1026: arithmetic.
1027:
1028: \begin{figure}[ht]
1029: \begin{center}
1030: \begin{minipage}[b]{0.45\textwidth}
1031: \centerline{\includegraphics[scale=0.35,angle=270]{MatMul}}
1032: \caption{Timings of algorithm \textsf{PolyMatMul}. \label{fig:matmul}}
1033: \end{minipage}\hskip0.1\textwidth
1034: \begin{minipage}[b]{0.45\textwidth}
1035: \centerline{\includegraphics[scale=0.35,angle=270]{Newton}}
1036: \caption{Timings of algorithm \textsf{SolveDiffHomSys}. \label{fig:newton}}
1037: \end{minipage}
1038: \end{center}
1039: \end{figure}
1040:
1041:
1042:
1043: In Figure ~\ref{fig:dac} we give the timings for the computation of
1044: one solution of a linear differential equation of order $2$, $4$, and
1045: $8$, respectively, using our algorithm~\textsf{Solve} in
1046: Section~\ref{sec:DAC}. Again, the shape of the three curves
1047: experimentally confirms the nearly linear behaviour established in
1048: point (b) of Theorem~\ref{theo:linear}, both in the
1049: precision~$\precision$ and in the order $\order$ of the complexity of
1050: algorithm ~\textsf{Solve}. Finally, Figure~\ref{fig:dac+naive}
1051: displays the three curves from Figure~\ref{fig:dac} together with the
1052: timings curve for the naive quadratic algorithm computing one solution
1053: of a linear differential equation of order $2$. The conclusion is
1054: that our algorithm~\textsf{Solve} becomes very early superior to the
1055: quadratic one.
1056:
1057: \begin{figure}[ht]
1058: \begin{center}
1059: \begin{minipage}[b]{0.45\textwidth}
1060: \centerline{\includegraphics[scale=0.3,angle=270]{DAC}}
1061: \caption{Timings of algorithm \textsf{Solve} for equations of orders 2, 4, and~8. \label{fig:dac}}
1062: \end{minipage}\hskip0.1\textwidth
1063: \begin{minipage}[b]{0.45\textwidth}
1064: \centerline{\includegraphics[scale=0.3,angle=270]{DAC+Naive}}
1065: \caption{Same, compared to the naive algorithm for a second-order equation.\label{fig:dac+naive}}
1066: \end{minipage}
1067: \end{center}
1068: \end{figure}
1069:
1070:
1071: %precision~$\precision = 1048576$ in~24.53s; one at doubled
1072: %precision~$\precision=2097152$ in doubled time~49.05s; one for doubled
1073: We also implemented our algorithms of Section~\ref{ssec:const-coeffs}
1074: for the special case of constant coefficients. For reasons of space
1075: limitation, we only provide a few experimental results for
1076: problem~{\bf II}. Over the same finite field, we computed: a solution
1077: of a linear system with~$\order=8$ at
1078: precision~$\precision\approx10^6$ in~24.53s; one at doubled precision
1079: in doubled time~49.05s; one for doubled order~$\order=16$ in doubled
1080: time~49.79s.
1081:
1082:
1083: \bibliographystyle{plain}
1084: \bibliography{focs}
1085:
1086: \end{document}
1087: