math0503612/C05.tex
1: 
2: 
3: 
4: \documentclass{article}
5: 
6: \usepackage{amsmath,amsthm,amsfonts}
7: 
8: \usepackage{amssymb}
9: 
10: \usepackage{epsfig}
11: 
12: \usepackage{rotating}
13: 
14: \usepackage{subfigure}
15: 
16: 
17: 
18: %\newcommand{\p}[2]{\frac{\partial#1}{\partial#2}}
19: %\def{\p}{\partial }
20: 
21: %\newcommand{\sl R}{\mathscr{R}}
22: \def\sl R{\mathscr{R}}
23: \def\scrH{\mathscr{H}}
24: \def\scrS{\mathscr{S}}
25: \def\R{\Re}
26: 
27: \def\vp{{\varphi}}
28: %\newcommand{\E}{{E}}
29: \def\E{{E}}
30: %\newcommand{\G}{{\Gamma}}
31: \def\G{{\Gamma}}
32: %\newcommand{\hatx}{{\hat{x}}}
33: \def\hatx{{\hat{x}}}
34: %\newcommand{\haty}{{\hat{y}}}
35: \def\haty{{\hat{y}}}
36: %\newcommand{\hatR}{{\hat{R}}}
37: \def\hatR{{\hat{R}}}
38: %\newcommand{\hatvp}{{\hat{\vp}}}
39: \def\hatvp{{\hat{\vp}}}
40: %\renewcommand{\P}{{P}}
41: \def\P{{P}}
42: %\newcommand{\Q}{{Q}}
43: \def\Q{{Q}}
44: \def\LQ{{LQ}}
45: %\newcommand{\hatp}{{\hat{p}}}
46: \def\hatp{{\hat{p}}}
47: %\newcommand{\hatq}{{\hat{q}}}
48: %\def\hatq{{\hat{q}}}
49: %\newcommand{\tildep}{{\tilde{p}}}
50: \def\tildep{{\tilde{p}}}
51: %\newcommand{\tildeq}{{\tilde{q}}}
52: \def\tildeq{{\tilde{q}}}
53: \def\pH{\partial H}
54: \def\eps{\epsilon}
55: 
56: % ================================================================
57: 
58: \begin{document}
59: \centerline{\Large\bf PROBLEM REDUCTION, RENORMALIZATION, AND MEMORY}
60: \vskip14pt
61: 
62: \centerline{\bf Alexandre J.\ Chorin and Panagiotis Stinis}
63: \vskip12pt
64: \centerline{Department of Mathematics, University of California}
65: \centerline{and}
66: \centerline{Lawrence Berkeley National Laboratory}
67: \centerline{Berkeley, CA 94720}
68: 
69: 
70: \begin{abstract}
71: Methods for the reduction of the complexity of computational problems are
72: presented, as well as their connections to renormalization, scaling, and irreversible statistical mechanics. 
73: Several statistically stationary cases are analyzed; for time dependent 
74: problem averaging usually fails, and averaged equations 
75: must be augmented by appropriate memory and random forcing terms. 
76: Approximations are described and
77: examples are given. 
78: \end{abstract}
79: 
80: 
81: \section{Introduction}\label{intro}
82: There are many problems in science which are too complex for numerical
83: solution as they stand. Examples include turbulence and other problems
84: where multiple scales must be taken into account. Such problems must be
85: reduced to more amenable forms before one computes. In the present paper
86: we would like to summarize some reduction methods that 
87: have been developed in recent years, together with an account of what was 
88: learned in the process. It is obvious that the problem has not been fully solved, 
89: but we think that the examples and the
90: conclusions  reached so far are useful.
91: 
92: In general terms,  a reduction to a more  amenable form is a renormalization
93: group transformation, as in physics --- a
94: transformation of a problem into a more tractable form while keeping 
95: quantities of interest invariant. A renormalization group transformation
96: involves an incomplete similarity transformation (see below for definitions),
97: and thus a reduction method is a search for hidden
98: similarities. This a general feature of reduction methods,
99: and it will be illustrated in the examples. A successful problem reduction
100: produces a new problem which must in some asymptotic sense be similar to the
101: original problem. For general backgound on renormalization, see e.g.\cite{benfatto,fisher,stanley}.
102: 
103: In problems with strong time dependence, reduction methods resemble methods
104: for the analysis of thermodynamic systems not in equilibrium; indeed, those aspects
105: of the problem that are ignored in a reduced description conspire to 
106: destroy order and increase entropy.  Problem reduction for time-dependent
107: problems is basically renormalization group theory for non-equilibrium 
108: statistical mechanics. For background on such theory, see e.g. \cite{barenblatt,goldenfeld1,kevrekidis2}.
109: 
110: The content of the paper is as follows: In section \ref{ave} we consider Hamiltonian systems 
111: and their conditional expectations. 
112: In section \ref{block} we narrow the discussion to statistically stationary Hamiltonian systems
113: and recover Kadanoff real-space renormalization
114: groups and an interesting block Monte-Carlo method. In section \ref{kdv} we display an example that
115: exhibits and also extends the main features of this analysis in simple form.
116: 
117: In section \ref{mz}  we explain the Mori-Zwanzig formalism for the reduction of statistically
118: time-dependent problems. The analysis shows that averaging the equations is in general
119: not enough; one must take into account noise and a temporal memory. The Mori-Zwazig
120: formalism is rather dense, and in the sections that follow we present various special
121: cases in which it can be simplified, in particular when the memory is very short or very long.
122: 
123: For the sake of readability, we remind the reader of the rudiments of similarity
124: theory
125: \cite{barenblatt}.
126: Suppose a variable $a$ is a function of variables
127: $a_1,a_2,\ldots, a_m$,
128: $b_1,b_2,\ldots, b_k$, where
129: $a_1,\ldots,  a_m$ have independent units, for example units of length and mass,
130: while the units  of
131: $b_1,\ldots, b_k$, can be formed from the units of 
132: $a_1,a_2,\ldots, a_m$.
133: Then there exist dimensionless variables
134: $\Pi=\frac{a}{a_1^{\alpha_1}\cdots a_m^{\alpha_m}}$,
135: $\Pi_i=\frac{b_i}{a_1^{\alpha_{i1}}\cdots a_m^{\alpha_{im}}}$,
136: $i=1,\ldots,k$, where the $\alpha_i,\alpha_{ij}$ are simple fractions, such that
137: $\Pi$ is a function of the $\Pi_{i}$:
138: \begin{equation}
139: \Pi=\Phi(\Pi_1,\ldots,\Pi_k).
140: \end{equation}
141: This is just a consequence  of the requirement that a physical relationship
142: be independent of the size of the units of measurement. At this stage nothing can be said about the
143: function $\Phi$. 
144: Now
145: suppose the variables $\Pi_i$ are small or large, and assume that the
146: function $\Phi$ 
147: has a non-zero finite limit as
148: its arguments tend to zero or to infinity; then $\Pi\sim$ constant, and one finds a power
149: monomial relation between $a$ and the $a_i$.
150: This is a complete similarity relation. 
151: If the function $\Phi$ does not have the assumed limit, it may
152: happen that for $\Pi_1$ small or large, $\Phi(\Pi_1)=\Pi_1^{\alpha}\Phi_1(\Pi_1)+\ldots$,
153: where the dots denote lower order terms, $\alpha$ is a constant, the other arguments of 
154: $\Phi$ have been omitted and $\Phi_1$ has a finite non-zero limit. 
155: One can then obtain a scaling (power monomial) expression for $a$ in terms of  the $a_i$ and $b_i$,
156: with undetermined powers which must be found by means other than dimensional analysis. 
157: The resulting power relation is an {\it incomplete}  similarity relation. 
158: Of course one may well have functions $\Phi$ with neither kind of similarity.
159: 
160: Incomplete similarity expresses what is invariant under a renormalization
161: group; all renormalization group transformations involve incomplete similarity, 
162: see  the books already cited as well as \cite{benettin} written before the
163: notion of incomplete similarity was formalized. The exponent $\alpha$ is called
164: an anomalous exponent.
165: 
166: The paper \cite{givon1} is a survey of reduction methods organized along different
167: lines and can be profitably read in tandem with the present paper.
168: 
169: 
170: 
171: \section{Averaging a Hamiltonian system} \label{ave} 
172: We begin by examining what happens when one tries to reduce the complexity of
173: a Hamiltonian system by averaging (see also \cite{CKK1,CKL,seibold}). 
174: Consider a system of nonlinear ordinary differential equations,
175: \begin{eqnarray}
176: \frac{d}{dt}\varphi(t)& =&R (\varphi(t)),\nonumber\\ 
177: \varphi(0)&=&x,
178: \label{eq:system}
179: \end{eqnarray}
180: where $\varphi$ and $x$ are $n$-dimensional vectors with components
181: $\varphi_i$ and $x_i$, and $R$ is a vector-valued function with components
182: $R_i$; $t$ is time. 
183: To each initial value $x$ in (\ref{eq:system}) corresponds a
184: trajectory $\varphi(t)=\varphi(x,t)$.
185: 
186: 
187: Suppose that we only want to find 
188: $m$ of the $n$ components
189: of the solution vector $\varphi(t)$ without finding the $n-m$ others. 
190: One has to assume something about the variables that are not evaluated, and we assume
191: that at time t=0 we have a a joint probability density $F(x)$ for all the variables.
192: The variables we keep will have definite initial values $x_1,x_2,\dots,x_m$, and the 
193: rest of variables will then have a conditional probability density $f_m=f(x_1,\dots,x_m,x_{m+1},\dots)/Z_m$,
194: where $Z_m=\int_{-\infty}^{+\infty}f(x_1,\dots,x_m,x_{m+1},\dots)dx_{m+1}dx_{m+2}\cdots$ is a normalization
195: constant. Without some assumption about the missing variables the problem is meaningless;
196: this particular assumption is reasonable because in practice $f$ can often be estimated
197: from previous experience or from general considerations of statistical mechanics. 
198: The question is how to use this prior
199: knowledge in the evaluation of $\varphi(t)$.
200: 
201: 
202: Partition the vector $x$ so that $\hatx=(x_1,x_2,\dots,x_m)$, $\tilde x=(x_{m+1},\dots,
203: x_n)$ and $x=(\hat x,\tilde x)$, and similarly $\varphi=(\hat\varphi,\tilde\varphi),
204: R=(\hat  R,\tilde R)$.  In general the first $m$ components of $R$ depend on all the components of $\vp$,  $\hatR=\hatR(\varphi)=\hatR(\hat\varphi,\tilde\varphi)$;
205: if they do not we have a system of $m$ equations in $m$ variables and 
206: nothing further needs to be done. 
207: We want to calculate only 
208: the variables $\hat\varphi$; then
209: $(d/dt)\hat\varphi(t) =\hat R (\varphi(t))$ where the right hand
210: side depends on the variables $\tilde\varphi$ which are unknown at time $t$. 
211: We shall call the variables $\hat\varphi$ the ``resolved variables" and the 
212: remaining variables $\tilde\varphi$ the ``unresolved variables".
213: 
214: 
215: Consider in particular a Hamiltonian system as in \cite{CKK1},\cite{CKL}. There exists then
216: a Hamiltonian function $H=H(\varphi)$ such that for $i$ odd $R_i$, the $i$-th component
217: of the vector $R$ in (\ref{eq:system}) satisfies $R_i=\partial H\bigl/{\partial \varphi_{i+1}}$
218: while for $i$ even one has $R_i=-{\pH}\bigl/{\partial \varphi_{i-1}}$, with $n$, the size of the system, even. Assume furthermore that $f$, the initial probability
219: density, is  $f(\varphi)=Z^{-1}\exp(-H/T)$ 
220: where $T$ is a parameter, known in physics as the ``temperature", which will be set equal to one in much, but not all, of the discussion below.
221: In physics this density appears naturally and is known as the ``canonical" density; 
222: the normalizing constant $Z=Z(T)$ is the ``partition function".
223: This density $f$ is invariant, i.e.
224: sampling it and evolving the system in time commute.
225: 
226: 
227: 
228: A numerical analyst who wants to approximate the solution of an equation usually
229: starts by approximating the equation. 
230: If one solves for the resolved variables one has values for the variables $\hat\varphi$ available
231: at each instant $t$ and the best approximation should be a function of these variables; it is natural to seek a best approximation in the
232: mean square sense with respect to the invariant density $f$ at each time; the best approximation
233: in this sense is the conditional expectation 
234: $E[R(\varphi)|\hat\varphi]=\int e^{-H}d\tilde\varphi\bigl/\int e^{-H}d\tilde\vp$ (note that we set $T=1$). 
235: This conditional expectation is the orthogonal projection of $R$ onto the space of
236: functions of $\hat{\varphi}$ with respect to the inner product $(u,v)=E[uv]=\int u(\varphi)v(\varphi)f(\varphi)d\varphi$, where $d\varphi$ denotes integration over all the components
237: of $\varphi$. 
238: We then try to approximate the system ($\ref{eq:system}$) by:
239: \begin{eqnarray}
240: \frac{d}{dt}\hat\varphi(t) &=& E[R (\varphi(t))|\hat\varphi(t)],\nonumber \\ 
241: \hat\varphi(0)&=&\hat x.
242: \label{foop}
243: \end{eqnarray}
244: 
245: We have shown in \cite{CHK3,CKK1,CHK} that: 
246: (i) The new system (\ref{foop}) is also Hamiltonian:
247: \begin{equation}
248: E \left[\frac{\pH}{\partial \varphi_i}|\hat\varphi(t)\right]=\int\frac{\pH}{\partial \varphi_i}\exp(-H)d\tilde\varphi
249: \bigl/ 
250: \int \exp(-H)d\tilde\varphi
251: =\frac{\partial \hat H}{\partial\varphi_i},
252: \label{hald1}
253: \end{equation}
254: where $i\le m=$ the dimension of $\hat\varphi$, and 
255: \begin{equation}
256: \hat H=-log\int \exp(-H)d\tilde \vp
257: \label{renham}
258: \end{equation}
259: is the new Hamiltonian.
260: 
261: (ii) The new canonical density $\hat f=Z^{-1}\exp(-\hat H)$ is invariant in the 
262: evolution of the
263: new, reduced, system.
264: 
265: (iii)
266: When the data are sampled from the canonical distribution,  the distribution of $\hat \varphi$ in the new system is its
267: marginal distribution in the old system; equivalently, 
268: the partition function $Z$  is the same for the old system and  for the new system.
269: 
270: 
271: Now the question is, what does the solution $\hat\varphi(t)$ of (\ref{foop}) represent ? 
272: It does not approximate the first $m$ components of the solution
273: $\varphi(t)$ of (\ref{eq:system})- the components of $\hat{\varphi}$ and 
274: the components of $\varphi$ live in spaces of different dimension and in general
275: the components of the latter in those higher $n-m$ dimensions are not small.
276: One could hope that what the solution of (\ref{foop}) approximates is the vector 
277: $E[\hat\varphi(t)|\hatx]$, the best estimate of the first components of the
278: solution at time $t$ given the partial initial information $\hatx$. 
279: This is the case for linear systems (where averaging and time integration
280: commute), and is approximately the case for limited time in some
281: other special situations- nearly linear systems, some systems where the 
282: ``unresolved variables" are fast.
283: However, in general this is not the case. We shall see below that a
284: reduced description of the solution of nonlinear systems in time
285: requires in general ``noise" and a ``memory".
286: 
287: The lack of convergence can be  understood by the following physics
288: argument. 
289: In physics a system in which the values of all
290: the variables are drawn from a canonical distribution is a system in thermal 
291: equilibrium. 
292: The assignment of 
293: definite values $\hat x$ to the variables $\hat \varphi$ at time $t=0$
294: amounts to taking the system out of equilibrium at $t=0$;
295: if the system is ergodic it will then decay to equilibrium in time, so that
296: all the variables become randomized and acquire the joint density $f$.
297: Thus 
298: the  predictive value of the partial initial data $\hat x$
299: decreases in time; all averages of the $\hat \varphi$ approach
300: equilibrium averages. However, the reduced system (\ref{foop}) is Hamiltonian, and the 
301: solutions it produces 
302: oscillate forever.
303: 
304: In Figure 1 we consider the Hald Hamiltonian system (\cite{CHK3}) with 
305: \begin{equation}
306: H=\frac{1}{2}\left(\varphi_1^2+\varphi_2^2 +\varphi_3^2+\varphi_4^2
307: +\varphi_1^2\varphi_3^2\right) 
308: \label{haldmodel}
309: \end{equation}
310: (physically, two linear oscillators with a nonlinear coupling).
311: We assume that $\varphi_1(0), \varphi_2(0)$ are given and sample the two other
312: initial data from the canonical distribution with $T=1$.
313: 
314: \begin{figure}
315: \centering
316: \epsfig{file=plot_rslt.eps,height=3in}
317: \caption{Comparison of the evolution of $E[\phi_1(t)|\phi_1(0),\phi_2(0)]$ (truth), to the 
318: prediction by the "Galerkin" approximation and the prediction by the averaging procedure described 
319: in the text.}
320: \label{fig:unkno1}
321: \end{figure}
322: 
323: 
324: In Figure 1 are displayed (1) The result  for $\varphi_1$ of a ``Galerkin" calculation in which
325: the unresolved variables are set to zero (this is what is implicitly done in many
326: unresolved computations); (2) the result of the averaging procedure just described,
327: and (3) the true $E[\varphi_1(t)|\hatx]$, calculated by repeatedly sampling the initial data,
328: solving the full system, and averaging. As one can see, averaging is initially better than
329: the null ``Galerkin" method, but in the long run the truth decays but the solution of the
330: averaged system oscillates for ever. For more detail, see \cite{CHK3}.
331: 
332: The procedure we have just described resembles sufficiently the averaging
333: methods used in some areas of engineering, for example the large-eddy
334: simulation methods in turbulence (see e.g. \cite{moser}) and in some multiscale problems (see e.g. \cite{kevrekidis}), to cast a very serious doubt on the broad validity of the latter.
335: For a description of special cases, with small fluctuations and particular structures, where this
336: procedure is legitimate, see \cite{givon1}.
337: 
338: 
339: 
340: 
341: \section{Prediction with no data and block Monte-Carlo}\label{block}
342: 
343: There is however a case where the construction of the preceding section
344: can be very useful-- when $m=0$, i.e., when one tries to predict the future 
345: with no initial information. Equations (\ref{eq:system}) then sample the canonical
346: distribution and the reduced system samples a subset of variables
347: without sampling the others, and, as we have seen, keep the statistics of the resolved
348: variables unchanged (see \cite{seibold} for an application to molecular dynamics).
349: 
350: To see what is happening, suppose the variables $\varphi_i$ are associated
351: with nodes on a regular lattice, for example, they may represent
352: spins in a solid, or originate in the spatial discretization
353: of a partial differential equation.
354: 
355: Divide the lattice into blocks of some fixed shape (for example, divide
356: a regular one-dimensional lattice into groups of two contiguous nodes). 
357: We had not yet specified how the variables are to be divided into
358: resolved and unresolved. Now decide to ``resolve" one variable per block,
359: and leave the others in the same block unresolved. The transformation between the old variables and the smaller set of resolved variables is a Kadanoff
360: renormalization group transformation \cite{kadanoff}; the Hamiltonian $\hat H$ defined
361: above in equation (\ref{renham}) is the renormalized Hamiltonian. We will now explain what this means.
362: 
363: 
364: Suppose the system described by the Hamiltonian is translation invariant.  The equations of
365: motion for any at any one point, say at the location labeled by $1$, have the same form as the equations of motion at any
366: any other point. The relation between the right hand side of the reduced system and the 
367: right hand side of the old system can be rewritten as:
368: \begin{equation}
369: \frac{\partial \hat H}{\partial \varphi_1}=E[\frac{\partial H}{\partial \varphi_1}|\hat\varphi],
370: \label{start}
371: \end{equation}
372: where the expected value is with respect to the invariant density as before. This relation is the starting
373: point for the actual evaluation of $\hat H$.
374: 
375: Hamiltonians are functions of the variables $\varphi$. They can be expanded in the form:
376: \begin{equation}
377: H=\sum_ja_j\psi_j,
378: \label{expandH}
379: \end{equation}
380: where the $\psi_j$ are ``elementary Hamiltonians". In a translation invariant system, where
381: each equation has the same form as any other, the Hamiltonian is made up of sums over $i$ of terms
382: of the form $h(\varphi_j\varphi_j)$ for various values of $j$, where $h$ is some function; these terms 
383: represent ``couplings" between variables $j$ apart;  one can then choose the elementary Hamiltonians to be
384: polynomials in $x_ix_{i+j}$ with a fixed $j$ in each $\psi_j$, i.e., one segregates the couplings between
385: variables $j$ apart into separate terms.
386: 
387: In a homogeneous system where there is only one variable per site it is enough to satisfy (\ref{start}) for one variable, say for $\varphi_1$. Define
388: $\psi'=\frac{\partial}{\partial\varphi_1}\psi$, noting that though $\psi$ is necessarily a function with
389: at least
390: as many arguments as there are components on $\varphi$, $\psi'$ can be sparse. Equation (\ref{start}) reduces to
391: \begin{equation}
392: \frac{\partial \hat H}{\partial \varphi_1}=\sum_ja_jP\psi'_j(\varphi)=\sum_j \hat{a}_j \psi'_j
393: (\hat{\varphi}),
394: \label{more1}
395: \end{equation}
396: with the projection $P$ defined as before by $Pg(\varphi)=E[g|\hat\varphi]$ for any function $g$ of $\varphi$.
397: Now we're almost done. One can pick a basis in $\hat L_2$, the subspace of square integrable functions that depend only
398: on the variables $\hat\varphi$, which consists of a subset of the set of functions $\psi'$. The right-hand
399: of equation (\ref{more1}) is then a linear combination of $\psi's$; integration with respect to $\varphi_1$
400: requires only the erasure of the primes and yields a series for $\hat H$. The elements of $\tilde\varphi$ are
401: now gone, and one can relabel the remaining variables $\hat\varphi$ so that the terms in the series
402: have exactly the same form as before; the calculation can then be repeated, yielding a sequence of 
403: Hamiltonians with ever fewer variables: $H, H^{(1)}=\hat H$, $H^{(2)}=\hat H^{(1)}, \dots$. The corresponding
404: densities $f^{n}=Z^{-1}\exp(-H^{(n)}/T)$ can in principle be sampled by any sampling scheme, for example by Metropolis sampling 
405: (but there are caveats, see e.g. \cite{chorin9}).
406: 
407: At this point we have reduced the number of variables by a factor $L$ equal to the number of
408: variables in each  block, but this may well seem to be a pyrrhic victory. The Hamiltonians
409: one usually encounters are simple, in the sense that they involve few couplings- finite
410: differences typically link a few neighboring variables, and so do the usual spin Hamiltonians
411: in physics. As one reduces the number of variables, the new Hamiltonians become more complex,
412: with more terms in the series (\ref{expandH}); the cost per time step of solving the equations in time or
413: of  the cost per move in a Metropolis sampling typically increases fast as well. To see what has
414: been gained one must turn to the physics literature (see e.g. \cite{kadanoff}.\cite{hohenberg}).
415: 
416: 
417: Consider the spatial correlation length $\ell$  which measures the range of values of 
418: $|j|$ over which the spatial covariances $E[\varphi_i\varphi_{i+j}]$ are non negligible,
419: and the correlation time $\tau$ for which the temporal covariances $E[\varphi_i(t)\varphi(t+s)]$
420: are non-negligible. For very large and very small values of the temperature $T$ (the variance 
421: parameter in the density  $f$) both the correlation time and the correlation length are small;
422: the properties of the system can then be found from calculations with a small number of variables and
423: it is not urgent to reduce the number of variables. There is a range of intermediate values of
424: $T$ for which the correlation length and time for are large and then the reduction is worthwhile.
425: There often is a value $T_c$ of $T$, the ``critical value", for which $\ell=\infty$. Values of $T$
426: around $T_c$ are often of great interest.
427: 
428: 
429: Now we can see what the reduction can accomplish. If one tries to compute averages with $T$ near
430: $T_c$ one finds that the cost of computation is proportional to $\tau$- one has to compute long
431: enough to obtain independent samples of $\varphi$, and a new independent sample will not
432: appear until a time $\sim\tau$ has passed. The reductions above produce a system
433: with smaller $\ell$ and $\tau$ and therefore computation takes less time.
434: Though we started with the declared goal of reducing the number of variables, what has been
435: produced is more interesting: a new system with shorter correlations which is more amenable to
436: computation. It is not the raw number of variables that matters.
437: 
438: The renormalization can be used with a multigrid scheme, in which one runs  up and down on different levels
439: of renormalization, on the finer ones to achieve accuracy and  the cruder ones to move fast from
440: one macroscopic configuration to another. A comparison with other multigrid
441: sampling schemes (see e.g. \cite{brandt}) reveals that we have derived a reasonably standard scheme, with however
442: a particularly effective way to store conditional expectations. For details see \cite{chorin9}.
443: 
444: 
445: An alternative method for obtaining the expansion coefficients for the renormalized Hamiltonians was proposed in \cite{stinis2}. The method is based on the maximization of the likelihood of the renormalized density. The maximization of the likelihood leads to a moment-matching problem. The moments in this case are the expectation values of the "elementary Hamiltonians" (see above) with respect to the renormalized density. The solution of the moment matching problem yields the expansion of the renormalized Hamiltonian.
446: 
447: The recognition of the links of probability with renormalization is largely due to Jona-Lasinio (see e.g. \cite{jl}). 
448: The connection of renormalization with incomplete similarity is too well known (see \cite{barenblatt, kadanoff, goldenfeld1})
449: to require further comment here.
450: 
451: 
452: 
453: \section{An example: The Korteveg-deVries-Burgers equation}\label{kdv}
454: 
455: As an illustration of the ideas in the previous section, consider
456: the equation
457: \begin{equation}
458: u_t+uu_x=\epsilon u_{xx}-\beta u_{xxx},
459: \end{equation}
460: with boundary conditions
461: \begin{equation}
462: u(-\infty)=u_0,\ \ u(+\infty)=0, \ \ u_{x}(-\infty)=0,
463: \end{equation}
464: where the subscripts denote differentiation, $x$ is the spatial variable,
465: $t$ is time, $\epsilon>0$ is a diffusion coefficient, $\beta>0$ is a dispersion coefficient and $u_0>0$ is a given constant.
466: The boundary conditions create a traveling wave solution moving to the right
467: (towards $+\infty$) with velocity
468: $u_0/2$ which becomes steady in a moving framework as $t\rightarrow\infty$.
469: In nondimensional form the equation can be written as: 
470: \begin{equation}
471: u_t+uu_x=\frac{1}{R} u_{xx}+u_{xxx},
472: \label{kdvb}
473: \end{equation}
474: with $u_x(-\infty)=0$, $u(+\infty)=0$, $u(-\infty)=1$;
475: $R=\eps\sqrt U/\alpha$ is a ``Reynolds number".
476: For $R\leq1$ the traveling wave has a monotonic profile,
477: while for $R>1$ the profile
478: is oscillatory, with oscillations whose wave length is of order 1 \cite{bona}.
479: At zero diffusion $(R=\infty)$
480: the stationary asymptotic wave train extends to infinity 
481: on the left. For finite $R$ the wave train is damped and the solution 
482: tends to 1 as $x$ decreases.
483: 
484: 
485: The steady wave profile can be found by noting that it satisfies an ordinary
486: differential equation, whose solution connects a spiral singularity at $x=\infty$
487: to a saddle point at $x=+\infty$. 
488: At the steady state we average the solution at each point $x$ over the region
489: $\left(x-\ell/2, x+\ell /2\right)$ and call the result $\bar u$. 
490: Now look for 
491: an effective equation $g(v,v_x,v_{xx},\ldots)=0$
492: whose solution $v$ approximates $\bar u$; $v$ can be expected
493: to be smoother than the solution of (\ref{kdvb}) and thus require fewer mesh points
494: for an accurate numerical solution.
495: 
496: We now make an analogy between the conditional expectations which define the 
497: renormalized variables in the previous sections 
498: and an 
499: averaging in space which defines ``renormalized"
500: variables for solutions of the KdVB equations that are stationary
501: in a moving  frame.
502: Averaging over an increasing length scale corresponds either to more 
503: renormalization steps or, equivalently, to renormalization with a greater
504: number of variables grouped together.
505: We pick a class of equations in which to seek the ``effective" equation,
506: the one whose solutions best approximate the averages of the true solution in the
507: mean square sense; the choice of mean-square approximation
508: in the KdVB case corresponds to the use of $L_2$ norms implied by the use
509: of conditional
510: expectations in the previous sections, and the choice of a class of equations in which to
511: look for the effective equation is analogous to the choice of a basis
512: for the representation of the Hamiltonian; the calculation of 
513: the best coefficients in the chosen class of ``effective" equations corresponds to the
514: evaluation of the coefficients in the series for the renormalized Hamiltonians. 
515: In the Hamiltonian case we average the right-hand-sides of the equations and
516: in the analogous KdVB case we attempt to average the solutions;
517: this must be so because in the KdVB case we do not have theorems which
518: guarantee that averaging the right-hand-sides produces the correct statistics for the
519: solutions.
520: 
521: We can look for an effective equation in the class of equations of
522: the form
523: \begin{equation}
524: -cv_x+vv_x=\epsilon_{eff} v_{xx}+v_{xxx}+\beta |v_x|^\alpha v_{xx}+\dots,
525: \end{equation}
526: where $\epsilon\geq0,\alpha\geq0, \beta\geq0$ are constants and $c=1/2$ is the velocity of propagation 
527: of the steady wave (see also \cite{barenblatt3}).
528: The problem is to find the value of the parameters  in the effective equation which minimizes
529: \begin{equation}
530: I= \int_{-\infty}^{+\infty}|\bar u(x)-v(x)|^2 dx.
531: \label{min}
532: \end{equation}
533: One finds numerically that that the last terms have little effect on the minimum if $I$ when $\ell\ge5$
534: (in the physics terminology, 
535: they are ``irrelevant").
536: The effective equation is thus a Burgers equation 
537: with a value of the dimensionless diffusion coefficient $\epsilon_{eff}$  different from $1/R$.
538: 
539: 
540: The minimization in (\ref{min}) was carried out in \cite{chorin10}, and it showed that the mimimun
541: was achieved when $\epsilon_{eff}=R^{\nu}\Phi(\ell)$, with the exponent $\nu\sim 0.75$. Note that
542: when the diffusion coefficient $\epsilon\rightarrow0$,
543: then $\epsilon_{eff}\rightarrow \infty !$.
544: This is an incomplete similarity relation, as advertised, relating a ``bare" Reynolds number $R$ to
545: a ``dressed" Reynolds $\epsilon_{eff}^{-1}$. 
546: The form of the effective equation could conceivably have been found by averaging the original 
547: equation, but the relation between the original $\epsilon$ and $\epsilon_{eff}$ requires
548: some form of renormalization-like reasoning.
549: 
550: 
551: \section{The Mori-Zwanzig formalism}\label{mz}
552: We now return to the problem we started investigating in Section \ref{ave}: How to determine the evolution of
553: a subset $\hat\varphi$ of components of a vector $\varphi$ described by a nonlinear set of equations
554: of the form (\ref{eq:system}). This is a nonlinear closure problem of a type much studied in
555: physics, and a variety of formalisms is available for the job. We choose the Mori-Zwanzig formalism of
556: irreversible statistical mechanics \cite{fick,grabert,mori,zwanzig,zwanzig2}, because it homes in on the basic difficulty, which is the
557: description of the memory in the system; the relation of this formalism to other nonlinear formalisms
558: is described in \cite{CHK04}. That a reduced description of a nonlinear system involves a memory
559: should be intuitively obvious: suppose you have $n>3$ billiard balls moving about on top of a table
560: and are trying to describe the motion of just three; the second ball may strike the seventh ball
561: at a time $t_1$ and the seventh ball may then strike the third ball at a later time. 
562: The third ball then ``remembers" the state of the system at time $t_1$, and if this memory is
563: not encoded in the explicit knowledge of where the seventh ball is at all times, then it has to be encoded in some
564: other way.  We are no longer assuming that the system is Hamiltonian nor that we know an invariant
565: density.
566: 
567: It is much easier to work with linear equations, and we start by finding a linear equation
568: equivalent to (not approximating!) the system (\ref{eq:system}). 
569: Introduce the linear Liouville operator 
570: $L= \sum_{i=1}^n R_i(x)
571: \frac{\partial}{\partial x_i}$, and the Liouville equation:
572: \begin{eqnarray}
573: \frac{\partial}{\partial t}u(x,t)& = &Lu(x,t) \nonumber\\
574: u(x,0)& = &g(x),
575: \label{Liouville}
576: \end{eqnarray}
577: with initial data $g(x)$. This is the partial differential
578: equation for which (\ref{eq:system}) is the set of characteristic equations. One can
579: verify that the solution of the Liouville equation is $u(x,t)=g(\varphi(x,t))$ (see e.g \cite{CHK}).  In
580: particular, if $g(x)=x_i$, the solution is $u(x,t)=\varphi_i(x,t),$ the
581: i-th component of the solution of (\ref{eq:system}). 
582: This linear partial differential equation is thus equivalent to 
583: the nonlinear system (\ref{eq:system}). The linearity of equation (\ref{Liouville}) greatly facilitates
584: the analysis.
585: 
586: Introduce the semigroup notation $u(x,t)=(e^{t L}g)(x)=g(\varphi(x,t))$,
587: where $e^{tL}$ is the evolution operator associated with the operator $L$;
588: therefore $e^{tL}g(x)=g(e^{tL}x)$, and
589: one can also verify that
590: $e^{tL}L=Le^{tL}$ (this can be seen to be a change of variables formula).  Equation
591: (\ref{Liouville}) becomes
592: \[
593: \frac{\partial}{\partial t}e^{tL}g = L e^{tL} g = e^{tL} Lg.
594: \]
595: We suppose that as before we are given 
596: the initial values of the
597: $m$ coordinates $\hatx$, and that the distribution of the remaining $n-m$
598: coordinates $\tilde{x}$ is the conditional density, $f$
599: conditioned by $\hatx$, where $f$ is initially given.
600: 
601: We define a projection operator $P$ by $Pg=E[g|\hatx]$. 
602: The conditioning variables are the initial values of $\hat \varphi$;
603: in section \ref{ave} the conditioning variables were the values of $\hat\varphi(t)$, which are
604: unusable here when we do not know the probability density at time $t$. Quantities such
605: as $P\hat \varphi(t)=E[\hat\varphi(t)|\hatx]$ are by definition the best estimates of
606: the future values of the variables $\hat\varphi$ given the partial data $\hatx$ and are
607: often the quantities of greatest interest.
608: 
609: Consider 
610: a resolved coordinate $\varphi_j(x,t)=e^{tL} x_j$ ($j\le m$), and split its time
611: derivative, $R_j(\varphi(x,t))=e^{tL} L x_j$ as follows: 
612: \begin{equation}
613: \frac {\partial}{\partial t} e^{tL} x_j = e^{tL} L x_j =  e^{tL}\P L x_j + e^{tL} \Q L x_j, 
614: \label{eq:split}
615: \end{equation}
616: where $\Q=I-\P$. Define $ \hat{R}_j(\hatx) = (\P R_j)(\hatx)$; the first
617: term is $e^{tL}\P L x_j =  \hat{R}(\hat{\varphi}(x,t))$ and is a function of the resolved components only (but it is a function of the whole vector of initial data). 
618: Note that if $Q$ were zero we would recover something that looks 
619: like the crude approximation of the previous section; however the conditioning
620: variables are not the same. We shall see that the term in $Q$ is essential.
621: 
622: We further split the remaining term $e^{tL} \Q L x_j$. This splitting will
623: bring it into a very useful form: a noise term, and a memory term whose kernel depends
624: on the correlations of the noise term. The fact that such a splitting is possible
625: is the essence of ``fluctuation-dissipation" theorems (see e.g \cite{landau}).
626: 
627: 
628: Let $w(x,t)=e^{t\Q L}\Q L x_j$, i.e., let 
629: $w(x,t)$ be a solution of the initial value problem: 
630: \begin{eqnarray}
631: \frac{\partial}{\partial t}w(x,t)&=&\Q L w(x,t)\  = \  Lw(x,t)- \P Lw(x,t) \nonumber\\
632: w(x,0)&=&\Q L x_j.
633: \label{ortho1}
634: \end{eqnarray}
635: If for some function h(x), $Ph=0,$ then $Pe^{t\Q L}h=0$ for all time $t$, i.e., $e^{t\Q
636: L}$ maps the null space of $\P$ into itself.
637: 
638: The evolution operators $e^{tL}$ and $e^{t\Q L}$ satisfy the Duhamel
639: relation
640: \[
641: e^{t L} = e^{t\Q L} + \int_0^t e^{(t-s) L} \P L e^{s \Q L} \,ds.
642: \]
643: Hence,
644: \begin{equation}
645: e^{tL} Q L x_j = 
646: e^{t\Q L} \Q L x_j + \int_0^t e^{(t-s)L} \P L e^{s\Q L} \Q L x_j \,ds.
647: \label{dyson}
648: \end{equation}
649: 
650: Collecting terms, we find
651: \begin{equation}
652: \frac {\partial}{\partial t} e^{tL} x_j =  e^{tL}\P L x_j + 
653: \int_0^t e^{(t-s) L} \P L e^{s Q L}Q L x_j \,ds +e^{tQL} Q L x_j
654: \label{eq:langevin}
655: \end{equation}
656: 
657: 
658: The first term on the right hand side is the
659: Markovian contribution to $\partial_t \varphi_j(x,t)$---it depends only on
660: the instantaneous value of the resolved $\hatvp(x,t)$.  The second
661: term depends on $x$ through the values of $\hatvp(x,s)$ at times $s$
662: between $0$ and $t$, and embodies a memory---a dependence on the past
663: values of the resolved variables.  Finally, the third term, which
664: depends on full knowledge of the initial conditions $x$, lies in the
665: null space of $\P$ and can be viewed as noise with statistics
666: determined by the initial conditions.
667: 
668: It is important to see that equation (\ref{eq:langevin}) is an identity. The memory and noise
669: terms have not been added artificially, their presence is a direct consequence of the original
670: equations of motion. However tempting it may be to average equations by taking one-time 
671: averages, the results will in general be wrong; one must add a memory and a noise as well.
672: 
673: 
674: If what is desired is $P\hat \varphi(t)$, the conditional expectation of
675: $\hat \varphi(t)$ given $\hat x$ (the best approximation in the sense of $L_2$ to $\hat\vp$ given the
676: partial data $\hat x$), then one can  premultiply equation (\ref{eq:langevin}) by P; the noise term
677: then drops out and we find
678: \begin{equation}
679: \frac {\partial}{\partial t}P e^{tL} x_j = P e^{tL}\P L x_j + 
680: P\int_0^t e^{(t-s) L} \P L e^{s Q L}Q L x_j \,ds 
681: \label{eq:langevin_pro}
682: \end{equation}
683: Even if the system we start with is Hamiltonian, the Langevin
684: equation (\ref{eq:langevin}) is not;  the memory and the noise allow the system to forget
685: its initial values and decay to ``thermal equilibrium" as it should (see section \ref{ave}).
686: 
687: We now show that the memory term is a functional of the temporal correlations of the noise. 
688: To save on writing 
689: we restrict ourselves to cases where the operator $L$ is skew-symmetric,
690: i.e, $(Lu,v)=-(u,Lv)$, (remember $(u,v)=E[uv]$). The skew-symmetry holds in particular for 
691: Hamiltonian systems with canonical data, see \cite{CHK3},\cite{evans}; however, here the the assumption is skew-symmetry 
692: is only an excuse to reduce the number of symbols, 
693: not a
694: return to the Hamiltonian case. Pick an orthonormal basis $\{h_k=h_k(\hat x),k=1,\dots\}$ in 
695: the range of $P$, which is the space of functions of $\hat x$ 
696: (for example,
697: the $h_k$ could be Hermite polynomials in the variables $\hatx$). Any function
698: $\psi(x,t)$,
699: can be expanded as  $\psi=\sum_k(\psi(x,t),h_k)h_k(\hatx)$, and in particular,
700: \begin{equation}
701: P(LQe^{sQL}QLx_j)=\sum_k(LQe^{sQL}QLx_j,h_k)h_k(\hat x).
702: \label{expand_fin}
703: \end{equation}
704: where a factor $Q$ has been inserted before the exponentials, harmlessly because
705: the operators that follow it all live in the null space of $P$. 
706: The memory term now becomes
707: \begin{eqnarray}
708: \int_0^te^{(t-s)L}PLe^{sQL}QLx_jds\!\!\!&=\!\!\!&\int_0^t\sum_ke^{(t-s)L}(LQe^{sQL}QLx_j,h_k)h_k(\hat x)ds\nonumber\\
709: \!\!\!&=&\!\!\!\sum_k\!\!\int_0^t(LQe^{sQL}QLx_j,h_k)h_k(\hat \varphi(t-s))ds;
710: \label{expand}
711: \end{eqnarray}
712: In the last identity we used the fact that the parenthesis is independent of time and therefore
713: commutes with the time evolution operator $e^{tQL}$, and also the fact that $e^{(t-s)L}h_k(\hatx)=h_k(\hat\varphi(t-s))$ by
714: definition. 
715: Now $(LQe^{sQL}QLx_j,h_k(\hatx))=-(e^{sQL}QLx_j,QLh_k(\hatx))$ by the symmetry of $Q$
716: and the assumed skew-symmetry of $L$; each term on the right hand side of equation 
717: (\ref{expand}) is the ensemble average of the product of the value of the stochastic process $e^{tQL}QLx_j$ at time $s=t$
718: with the value of the stochastic process $e^{tQL}QLh_k(\hatx)$ evaluated at time $s=0$, i.e., it
719: is a temporal correlation. All these stochastic processes are in the range of $Q$ for all $t$,
720: they are therefore components of the noise.  Remember that by definition $Lx_j=R_j$ (a right-hand side in equations (\ref{eq:system})). $PLx_j$ is then an average of the right-hand side of (\ref{eq:system})
721: and $QLx_j=R_j-E[R_j|\hat x]$ is the initial fluctuation in that right-hand side.
722: 
723: 
724: The first, ``Markovian", term in equations (\ref{eq:langevin}) looks straightforward, but perils lurk there
725: as well. 
726: In general $R_j$ in equations (\ref{eq:system}) is nonlinear,
727: and so is $PLx_j=E[R_j|\hat x]$. $e^{tL}PLx_j$ is a nonlinear
728: function of the functions $\hat\varphi(t)$ which depends on all the components of $x$, not only on $\hat x$.
729: Some way of approximating this function must be found. If one looks for conditional expectations, one must
730: find a way to commute $P$ with a nonlinear function; for a discussion, see \cite{CHK3}. This bullet was dodged in section \ref{ave} when the conditioning variables were chosen to be $\hat\varphi(t)$ which change in time, but it may be hard to dodge here.
731: 
732: 
733:   																																	      The task now at hand is to extract something usable from these rather cumbersome formulas. A very detailed presentation of
734:   																																	      the analysis in this section can be found in \cite{c11}.
735: 
736: 
737: 
738: 
739:   \section{Fluctuation-dissipation theorems}\label{fd}
740: 
741: We have established a relation between kernels in the memory term and the noise (the former is made up of covariances of the latter). This is the mathematical content of what are known as ``fluctuation-dissipation theorems" in physics. However, under some specific restricted circumstances, the relation between noise and memory takes on more intuitively appealing forms, which we now briefly describe.
742: In physics one often takes a restricted basis in the range of $P$ consisting
743: of the coordinate functions $x_1,...,x_m$ (the components of $\hat{x}$). The resulting projection
744: is called there the `` linear projection" as if $P$ as defined above were not linear. 
745: The use of this projection is appropriate when the amplitude of the functions $\hat\phi(t)$ is small. 
746: One then has 
747: $h_k(\hat x)=x_k$ for $k\le m$. 
748: The correlations in equation (\ref{expand}) are then simply 
749: the temporal correlations of the noise (not of the full solutions of the system!). This is known as the fluctuation-dissipation theorem of the second kind.
750: 
751: Specialize further to a situation where there is a single resolved variable, say $\phi_1$, so that $m=1$
752: and $\hat\phi$ has a single component. The Mori-Zwanzig equation becomes:
753: 
754: \begin{equation*}
755: \frac{\partial}{\partial{t}} e^{tL}x_1=
756: e^{tL}PLx_1+e^{tQL}QLx_1+
757: \int_0^t e^{(t-s)L}PLe^{sQL}QLx_1ds,
758: \end{equation*}
759: or, 
760: \begin{multline} 
761: \label{lmz}
762: \frac{\partial}{\partial{t}} \phi_1(x,t) =
763: (Lx_1,x_1)\phi_1(x,t)+e^{tQL}QLx_1 \\\
764: +\int_0^t(LQe^{sQL}QLx_1,x_1)\phi_1(x,t-s)ds \\\
765: =(Lx_1,x_1)\phi_1(x,t)+e^{tQL}QLx_1-
766: \int_0^t (e^{sQL}QLx_1,QLx_1)\phi_1(x,t-s)ds, 
767: \end{multline}
768: where we have again inserted a harmless factor $Q$ in front of $e^{QL}$, assumed that
769: $L$ was skew-symmetric as above, and for the sake of simplicity also assumed $(x_1,x_1)=1$
770: (if the last statement is not true the formulas can be adjusted appropriately). 
771: Take the inner  product of equation (\ref{lmz}) with $x_1$, you find: 
772: \begin{multline}
773: \label{clmz}
774: \frac{\partial}{\partial{t}} (\phi_1(x,t),x_1)=(Lx_1,x_1)(\phi_1(x,t),x_1) \\\
775: +(e^{tQL}QLx_1,x_1)-\int_0^t(e^{sQL}QLx_1,QLx_1)\phi_1(x,t-s)ds \\\
776: =(Lx_1,x_1)(\phi_1(x,t),x_1)-\int_0^t(e^{sQL}QLx_1,QLx_1)
777: (\phi_1(x,t-s),x_1)ds,
778: \end{multline}
779: because $Pe^{tQL}QLx_1=(e^{tQL}QLx_1,x_1)x_1=0$ 
780: and hence $(e^{tQL}QLx_1,x_1)=0.$ 
781: Multiply equation (\ref{clmz}) by $x_1$, and remember that  $P\phi_1(x,t)=(\phi_1(x,t),x_1)x_1.$ You find:
782: \begin{equation}
783: \label{plmz}
784: \frac{\partial}{\partial{t}} P\phi_1(x,t)= (Lx_1,x_1)P\phi_1(x,t)-
785: \int_0^t (e^{sQL}QLx_1,QLx_1)P\phi_1(x,t-s)ds. 
786: \end{equation}
787: You observe that the covariance $(\phi_t(x,t),x_1)$ and the projection of $\phi_1$ on $x_1$
788: obey the same homogenous linear integral equation. This is the fluctuation-dissipation theorem
789: of the first kind, which embodies the Onsager principle, according to which spontaneous fluctations
790: in a system
791: decay at the same rate as perturbations imposed by external means,  when both are small
792: (so that the linear projection is adequate).
793: This reasoning can be extended to cases where there are multiple resolved variables, and this is
794: usually done with the added simplifying assumption that $(x_i,x_j)=0$ when $i\ne j$. We omit the details.
795: 
796: 
797: 
798: \section{Very short and very long memory approximations}\label{short}
799: 
800: 
801: 
802: 
803: The approximation we shall examine is some detail is:
804: \begin{equation}
805: e^{tQL}\cong e^{tL},
806: \label{QLeL}
807: \end{equation}
808: and we will consider under what conditions  it is reasonable. 
809: We will find that it is reasonable both when memory is very short and when it is very long. The fact that the same approximation works for two opposite cases is not a paradox. The approximation (\ref{QLeL}) states that the orthogonal dynamics operator is very close to the full dynamics operator. In other words, the orthogonal dynamics, which evolve in a space orthogonal to that of the resolved variables, are insensitive to the coupling between resolved and unresolved variables. This can happen in particular when the orthogonal dynamics are very fast or when the orthogonal dynamics are very slow. The ansatz above should work when there is  an effective decoupling of the equations for the resolved and unresolved variables. This raises the question of what determines the range of the memory. Is it possible to have a reduced model with very short or very long memory, depending on how one coarse-grains  a particular system at hand? In \cite{stinis} evidence was presented that, fo!
810:  r the Kuramoto-Sivashinsky equation, the range of the memory of a reduced model can vary dramatically, depending on whether all the unstable modes in the system are resolved or not. The construction of a reduced model corresponds to renormalization, and the two extreme cases can be interpreted as two fixed points of a renormalization scheme. In which one a reduced model will end up depends on how one renormalizes. Finally, note that the Duhamel formula can be used for an iterative solution of the orthogonal dynamics equation. The term $e^{tL}$ is the zero-th order term of an iterative solution for $e^{tQL}.$ This construction can be based on the use of Feynman diagrams.
811: 
812: 
813: First we examine the case when the memory is short, i.e., when the
814: various terms in the series (\ref{expand_fin}) vanish for $s$ beyond a small value; see \cite{majda} for 
815: a different approach to short-memory reduced model construction and \cite{stinis3} for comparison with the present short-memory approximation, as well as \cite{p} and the references therein.
816: 
817: The memory term in the Mori-Zwanzig equations (\ref{eq:langevin}) can be rewritten as 
818: \begin{equation}
819: \int_0^t e^{(t-s)L} \P L e^{s\Q L} \Q L x_j \,ds =
820: \int_0^t e^{(t-s)L} \P L \Q e^{s\Q L} \Q L x_j \,ds,
821: \end{equation}
822: where the insertion of the extra $\Q$ is harmless.
823: Adding and subtracting equal quantities, we find:
824: \begin{equation}
825: PLe^{sQL}QLx_j=PLQe^{sL} QLx_j + PLQ (e^{sQL}-e^{sL}) QLx_j;
826: \end{equation}
827: a Taylor series yields:
828: \begin{equation}
829: e^{sQL}-e^{sL}=I+sQL+\dots-I-sL-\dots=-sPL+O(s^2),
830: \end{equation}
831: and therefore, using $QP=0$, we find:
832: \begin{equation}
833: \int_0^t e^{(t-s)L} P L e^{sQL} Q L x_j \,ds = 
834: \int_0^t e^{(t-s)L} P L Q e^{sL} Q L x_j\,ds + O(t^3).
835: \end{equation}
836: If $P$ is a finite rank projection then
837: \begin{equation}
838: P L e^{sQL} Q L x_j = 
839: \sum_{k} (Q L e^{sQL} Q  L x_j, h_k) h_k(\hatx).
840: \end{equation}
841: where, as before, one can write $(QLe^{sQL}QLx_j, h_k)$ as $-(e^{sQL}QLx_j, QLh_k)$ when $L$ is skew-symmetric. 
842: If the correlations $(e^{sQL}QLx_j,QLh_k)$  and also the correlations $(e^{sL}QLx_j,QLh_k)$ are significant only
843: over short times $s$, the approximation (\ref{QLeL}) provides an
844: acceptable approximation without requiring the solution of the
845: orthogonal dynamics equation (see \cite{stinis} for an application to the dimensional reduction of 
846: the Kuramoto-Sivashinsky equation and \cite{barber} for an application to molecular dynamics).
847: 
848: The limiting case of the short-memory approximation is when the correlations are delta functions. There is a large literature on solving 
849: equations (\ref{eq:langevin}) with the
850: assumption of delta function memory; usually this is done without explicit mention, as if it
851: were an obvious property of stochastic systems- an astonishing state of affairs
852: nearly 40 years after Alder and Wainwright demonstrated the long memory 
853: in a typical physical system \cite{a1}. All the dynamic (i.e., time-dependent)
854: renormalization group methods we can find depend on this assumption \cite{hohenberg}, and this remark goes a long way towards
855: explaining their relative lack of success in applications. We will no longer bother making
856: detailed comparisons with this dynamic renormalization literature; the point of view here is that reduction on the
857: basis of equations (\ref{eq:langevin}) is the right kind of renormalization, and anything with added drastic assumptions must be justified by appeal to that right kind.
858: 
859: 
860: Nevertheless, there are important circumstances where the very short memory assumption can be justified,
861: in particular in problems with separation of time scales, where the components of $\tilde\varphi(t)$,
862: the unresolved variables, vary on much faster scales than the resolved variables (see e.g. \cite{majda},\cite{stinis3}).
863: One can then set 
864: \begin{equation}
865: e^{tQL}QLx_j=A_jw_j'(t),
866: \label{assume}
867: \end{equation}
868: where the prime denotes a derivative, the $w_j(t)$ are independent unit Brownian motions,
869: and the $A_j$ constants that must be derived from some prior knowledge. 
870: Assume further that the projection $P$ is well represented by the physicists' ``linear" projection and that the density used to perform the projections is invariant. 
871: The memory term becomes $-A_j^2\delta(t-s)$, equations (\ref{eq:langevin}) become stochastic ordinary
872: differential equations of the usual kind. As usual (see e.g. \cite{just}), the
873: corresponding probability densities can be found via Fokker-Planck formalisms (or Kolmogorov
874: equations, in mathematicians' language). Everything is easier. There is a big literature on 
875: these methods which we recoil from surveying.
876: 
877: 
878: It is often the case that the quantities of interest are the components of $E[\hat\varphi|\hat x]$, and the corresponding projection $P$ is in general poorly approximated by the ``linear" projection. The formalism above readily extends to more general projections, with more terms in the basis chosen in the range of $P$ (see e.g. \cite{CHK3}), as long as one assumes that the temporal correlations of the new terms are fast decaying functions. Terms that have long correlation
879: times violate the ansatz (\ref{QLeL}) and can hamper rather than enhance accuracy (see e.g. \cite{stinis}). A way to pick the fast decaying terms in the projection of the memory kernel for problems that exhibit separation of time scales was presented in \cite{stinis3}. We should note here that projections which include higher than linear terms are at the heart of mode-coupling theory (see e.g. \cite{schofield}), which has proved very effective in tackling problems in condensed matter physics.
880: 
881: 
882: 
883: 
884: 
885: We examine now the validity of the ansatz $e^{tQL}=e^{tL}$ for cases with slowly decaying memory. Write the memory term in the Mori-Zwanzig equation (\ref{eq:langevin}) as
886: \begin{align*}
887: \int_0^t e^{(t-s)L}PLe^{sQL}QLx_jds &=\int_0^t Le^{(t-s)L} 
888: e^{sQL}QLx_jds \\ 
889: &-\int_0^t e^{(t-s)L}e^{sQL}QLQLx_jds ,
890: \end{align*} 
891: where we have used the commutation of $L$ and $QL$ with $e^{tL}$ and $e^{sQL},$ 
892: respectively. At this point, make the approximation (\ref{QLeL}), which 
893: eliminates
894: the $s$ dependence of both integrands and we have
895: $$\int_0^t e^{(t-s)L}PLe^{sQL}QLx_jds \cong t e^{tL} PLQLx_j.$$
896: All that remains of the integration in time is the coefficient $t$. 
897: One can get rid of the noise term by premultiplying equations (\ref{eq:langevin}) by a projection $\P$, as in equation (\ref{eq:langevin_pro}), and obtain a reduced non-autonomous set of differential equations. This approximation was named the $t$-model in \cite{CHK3} (see \cite{ingerman} for an application to the dimensional reduction of a nonlinear Schr\"odinger equation). Other cases where non-Markovian models can be approximated 
898: by Markovian equations with time-dependent coefficients can be found in \cite{raz}.
899: 
900: 
901: We proceed to examine the order of accuracy of this approximation. We have
902: 
903: \begin{multline*}
904: \int_0^t e^{(t-s)L}PLe^{sQL}QLx_jds- t e^{tL} PLQLx_j = \\\
905: \int_0^t [e^{(t-s)L}PLe^{sQL}-e^{tL} PL]QLx_jds.
906: \end{multline*}
907: Adding and subtracting equal quantities we find
908: 
909: $$ e^{(t-s)L}PLe^{sQL}=e^{tL}PL+e^{tL}[e^{-sL}PLe^{sQL}-PL],$$
910: and a Taylor series around $s=0$ gives
911: \begin{equation}\label{t-mod}
912: e^{-sL}PLe^{sQL}-PL =(I-sL+\ldots)PL(I+sQL+\ldots)-PL=O(s).
913: \end{equation}
914: This implies 
915: $$\int_0^t e^{(t-s)L}PLe^{sQL}QLx_jds=t e^{tL} PLQLx_j + O(t^2).$$
916: The $O(t^2)$ error estimate can be put into perspective by examining an alternate derivation of the $t$-model. If we expand the integrand of 
917: the memory term of the Mori-Zwanzig equation around $s=0$ and retain only
918: the leading term, we find
919: \begin{align*}
920: \int_0^t e^{(t-s)L}PLe^{sQL}QLx_jds &= \int_0^t [e^{tL}PLQLx_j 
921: +O(s)]ds\\
922: &=t e^{tL} PLQLx_j +O(t^2).
923: \end{align*}
924: If we retain only the leading term, we do not keep any information about
925: the time evolution of the integrand, which in turn means
926: no information about the evolution of the resolved component and of the 
927: coupling to the orthogonal dynamics (through the term 
928: ($(LQe^{sQL}QLx_j,h_k)$). Such a drastic approximation is expected to be appropriate in cases where the memory term integrand is slowly decaying, so that information about its initial value is enough.
929: 
930: 
931: 
932: As an example, consider again the Hald model whose Hamiltonian is
933: \begin{equation}
934: H(\phi) = \frac{1}{2} (\phi_1^2 + \phi_2^2 + \phi_3^2 + \phi_4^2 + \phi_1^2 \phi_3^2).
935: \end{equation}
936: The resulting equations of motion are:
937: \begin{align*}
938: \frac{d\phi_1}{dt} &= \phi_2 \nonumber \\
939: \frac{d\phi_2}{dt} &= -\phi_1(1 + \phi_3^2) \nonumber \\
940: \frac{d\phi_3}{dt} &= \phi_4 \nonumber \\
941: \frac{d\phi_4}{dt} &= -\phi_3(1 + \phi_1^2).
942: \end{align*}
943: Suppose one wants to solve only for $\hat\phi=(\phi_1,\phi_2)$, with initial data 
944: $\hatx=(x_1,x_2)$. Assume the initial data $x_3,x_4$ are sampled from a canonical
945: density with temperature $T=1$. A quick calculation yields $E[x_3^2|x_1,x_2]=1/(1+x_1^2)$.
946: the advance in time described by the multiplication by $e^{tL}$ requires just the
947: substitution $\hatx\rightarrow\hat\phi$. If one commutes the nonlinear function evaluation and
948: the conditional averaging, i.e., writes $\P f(\hat\phi)=f(\P\hat\phi)$ ( a ``mean-field
949: approximation"), and writes furthemore $\Phi(t)=\P\hat\phi=E[\hat\phi|\hatx]$ one finds
950: $\P e^{tL}PLx_1=\Phi_2,\P e^{tL}PLx_2=-\Phi_1(1+1/(1+\Phi_2^2))$; one can calculate
951: $\P e^{tL}LQLx_j$ for $j=1,2$ and finally one finds:
952: 
953: \begin{align}
954: \frac{d}{dt}\Phi_1 &=\Phi_2 \nonumber \\
955: \frac{d}{dt} \Phi_2 &=-\Phi_1 (1 + \frac{1}{1 + \Phi_1^2}) -
956: 2 t \frac{\Phi_1^2 \Phi_2}{(1 + \Phi_1^2)^2}.
957: \label{eq:hald_t}
958: \end{align}
959: 
960: The last term represents the damping due to the loss of predictive power
961: of partial data; the coefficient of the last term increases in time and one may
962: worry that this last term eventually overpowers the equations and leads to some
963: odd behavior. This is not the case. Indeed, one can prove the following. If the system
964: one starts from, equation (\ref{eq:system}) is Hamiltonian with Hamiltonian $H$, and if the
965: initial data are sampled from an initial canonical density conditioned by partial data $\hat x$,
966: and if $\hat H$ is the renormalized Hamiltonian ( in the sense of Section \ref{ave}), then
967: $(d/dt)\hat H \le0$, showing that the components of $\hat\phi$ decay as they should. 
968: The proof requires a technical assumption ( that the Hamiltonian $H$ can be written
969: as the sum of a function of $p$ and a function of $q$, a condition commonly satisfied) and
970: we omit it (see \cite{CHK3}). The reduced system (\ref{eq:hald_t}) was solved numerically in \cite{CHK3}
971: with gratifying results.
972: 
973: 
974: The $t$-model is the zero-th order term in a Taylor expansion (around $s=0$) of the integrand of the memory term in (\ref{eq:langevin}). However, nothing prevents us from keeping more terms in this expansion. Let $$K(\hat{\varphi}(t-s),s)=e^{(t-s)L}PLe^{sQL}QLx_j$$ and expand $K$ around $s=0$, i.e. $$K(\hat{\varphi}(t-s),s)=K(\hat{\varphi}(t),0)+s\frac{\partial K}{\partial s}|_{s=0}+\frac{1}{2}s^2 \frac{\partial^2 K}{\partial s^2}|_{s=0}+O(s^3).$$ In the case when $P$ is the finite-rank projection and the density used to define the projection is invariant, the derivatives of $K$ at $s=0$ are equal-time (static) correlations. In mode-coupling theory, such expressions are known as sum rules. One can assume a functional form for the memory term integrand around $s=0$, e.g. a Gaussian $a e^{-bs^2},$ and use the derivatives of $K$ at $s=0$ to estimate $a,b$ (see \cite{pomeau} for more on sum rules and mode-coupling theory).
975: 
976: 
977: 
978: 
979: 
980: \section{Intermediate-range memory}\label{long}
981: 
982: There are intermediate cases where the memory is sufficiently long-range for the short-memory approximation to break down, yet not so slowly decaying that the $t$-model can give accurate results. At present, it is not known how to deal effectively with such cases. In a series of papers \cite{CHK}-\cite{CHK3} we presented special cases and their solutions. In particular in \cite{CHK3} we presented a detailed analysis of the
983: Hald system. We showed that the memory decays roughly at the same rate as the solution itself (
984: this is the general case in the absence of separation of scales). We expanded the various correlation functions at equilibrium (i.e., when there are no resolved variables) in Hermite
985: polynomials, evaluated the coefficients in the expansions by Monte-Carlo once and for all, and then obtained
986: a system of integro-differential approximations to equations (\ref{eq:langevin}) which we then solved
987: in various cases. This is a legitimate procedure which may be useful when the same system of equations has to be
988: solved repeatedly. 
989: These calculations do exhibit a salient feature of model reduction in time-dependent problems, which is that its set-up costs are often very high. 
990: The future remedy, if there is one, will surely lie in a deeper understanding of dynamical renormalization and in particular of the
991: way memory depends on scale.
992: 
993: 
994: \section{Acknowledgements} We would like to thank Prof. G.I. Barenblatt, Prof. O. Hald and Prof. R. Kupferman for many helpful 
995: discussions and comments. 
996: This work was supported in part by
997: the National Science Foundation under Grant DMS 04-32710, and by the Director,
998: Office of Science, Computational and Technology Research,
999: U.S.\ Department of Energy under Contract No.\ DE-AC03-76SF000098.
1000: 
1001: 
1002: \begin{thebibliography}{99}
1003: 
1004: \bibitem{a1} B. Alder and T. Wainwright, Decay of the velocity correlation function,
1005: Phys. Rev. A 1, (1970), pp. 1-12.
1006: 
1007: \bibitem{barber}
1008: J. Barber, Application of optimal prediction to molecular dynamics, PhD thesis, 2005,
1009: UC Berkeley Physics Dept.
1010: 
1011: 
1012: \bibitem{barenblatt}
1013: G.I. Barenblatt, Scaling. Cambridge University Press, Cambridge, 2002.
1014: 
1015: 
1016: \bibitem{barenblatt3}
1017: G.I. Barenblatt, M. Ivanov, and G.I. Shapiro,
1018: On the structure of wave fronts in nonlinear dissipative media.
1019: Arch. Rat. Mech. Anal. 87 (1985), pp. 293-303.
1020: 
1021: \bibitem{benettin}
1022: G. Benettin, C. di Castro, G. Jona-Lasinio, L. Peliti and A. Stella,
1023: On the equivalence of different renormalization groups,
1024: in "New developements in quantum theory and statistical mechanics",
1025: Cargese Conf. Theor. Physics, M. Levy and P. Mitter (eds), Springer, NY, 
1026: (1976).
1027: 
1028: \bibitem{benfatto}
1029: G. Benfatto and G. Gallavotti, Renormalization group, 
1030: Physics notes Vol. 1, Princeton University Press, Princeton NJ (1995).
1031: 
1032: 
1033: 
1034: 
1035: \bibitem{bona}
1036: J. Bona and M. Schonbek, Travelling-wave solutions to the Korteveg-de Vries-Burgers
1037: equation. Proc. Roy. Soc. Edinburgh 101A (1985), pp. 207-226.
1038: 
1039: \bibitem{brandt}A. Brandt and D. Ron, Renormalization Multigrid (RMG): 
1040: Statistically Optimal Renormalization Group Flow and Coarse-to-Fine Monte Carlo Acceleration, J. Stat. Phys. (2001) 102, 1-2, 
1041: 231-257.
1042: 
1043: \bibitem{kevrekidis2}
1044: L. Chen, P. Debenedetti, C. Gear and I. Kevrekidis, From molecular dynamics to coarse self-similar
1045: solutions: a simple example using equation-free computation, J. Non-Newt. Fluid. Mech. (2004), 120, 215.
1046: 
1047: \bibitem{chorin9}
1048: A.J. Chorin,
1049: Conditional expectations and renormalization, Multiscale Modeling and
1050: Simulation,  1 (2003) pp. 105-118.
1051: 
1052: \bibitem{chorin10}
1053: A.J. Chorin, 
1054: Averaging and renormalization for the Korteveg-deVries-Burgers equation,
1055: Proc. Nat. Acad. Sci. 100, (2003), pp. 9674-9679.
1056: 
1057: \bibitem{c11}
1058: A.J. Chorin, Stochastic Tools for Mathematics and Science, American Math. Society, Providence RI (2005).
1059: 
1060: \bibitem{CHK}
1061: A.J. Chorin, O. Hald and R. Kupferman,
1062: Optimal prediction and the Mori-Zwanzig representation of irreversible
1063: processes. Proc. Nat. Acad. Sc. USA, 97, (2000),
1064: pp. 2968-2973.
1065: 
1066: \bibitem{CHK2}
1067: A.J. Chorin, O. Hald and R. Kupferman,
1068: Non-Markovian optimal prediction, Monte-Carlo Meth. Appl.,7, (2001), pp. 99-109.
1069: 
1070: 
1071: \bibitem{CHK3}
1072: A.J. Chorin, O. Hald and R. Kupferman,
1073: Optimal prediction with memory, 
1074: Physica D 166, (2002), pp. 239-257.
1075: 
1076: \bibitem{CHK04}
1077: A.J. Chorin, O. Hald and R. Kupferman,
1078: Prediction from partial data, renormalization and averaging, J. Sci. Comp. (2005), (in press).
1079: 
1080: \bibitem{CKK1}
1081: A.J. Chorin, A. Kast and R. Kupferman,
1082: Optimal prediction of underresolved dynamics, Proc. Nat. Acad. Sci. USA (1998), 95, 4094.
1083: 
1084: \bibitem{CKL}
1085: A.J. Chorin, R. Kupferman and D. Levy
1086: Optimal prediction for Hamiltonian partial differential equations, J. Comp. Phys. (2000), 162, pp. 267-297.
1087: 
1088: 
1089: \bibitem{evans}
1090: D. Evans and G. Morriss, Statistical Mechanics of Nonequilibrium Liquids,
1091: Academic, London, 1990.
1092: 
1093: 
1094: \bibitem{fick}
1095: E. Fick and G. Sauerman, The Quantum Statistics of Dynamical Processes,
1096: Springer, Berlin, 1990.
1097: 
1098: 
1099: \bibitem{fisher}
1100: M. Fisher, Renormalization group theory, its basis and formulation in statistical physics,
1101: Rev. Mod. Phys., 70, (1998), pp. 653-681.
1102: 
1103: 
1104: 
1105: \bibitem{givon1}
1106: D. Givon, R. Kupferman and A. Stuart, Extracting macroscopic dynamics: model problems and
1107: algorithms, Nonlinearity 17 (2004), pp. R55-R127.
1108: 
1109: 
1110: \bibitem{goldenfeld1}
1111: N. Goldenfeld, Lectures on Phase Transitions and the Renormalization Group,
1112: Perseus Books, Reading, Mass., 1992.
1113: 
1114: 
1115: \bibitem{grabert}
1116: H. Grabert, Projection Operator Techniques in Nonequilibrium Statistical
1117: Mechanics, Springer, Berlin, 1982.
1118: 
1119: 
1120: 
1121: \bibitem{hohenberg}
1122: P. Hohenberg and B. Halperin, Theory of dynamical critical phenomena, Rev. Mod. Phys., 49,
1123: (1977), pp. 435-479.
1124: 
1125: 
1126: \bibitem{ingerman}
1127: E. Ingerman, Modeling the loss of information in optimal prediction, PhD thesis, 2003,
1128: UC Berkeley Mathematics Dept.
1129: 
1130: 
1131: \bibitem{jl}  G. Jona-Lasinio,  The renormalization group- a probabilistic view,
1132: Nuovo Cimento, 26 (1975), pp. 99-118.
1133: 
1134: 
1135: 
1136: \bibitem{just}
1137: W. Just, H. Kantz, C. Roedenbeck and M. Helm,
1138: Stochastic modeling: replacing the fast degrees of freedom by noise,
1139: J. Phys. A: Math. Gen. 34 (2001), pp. 3199-3213.
1140: 
1141: \bibitem{kadanoff}
1142: L. Kadanoff, Statistical Physics: Statics, Dynamics, and Renormalization,
1143: World Scientific, Singapore, 2000.
1144: 
1145: 
1146: 
1147: \bibitem{raz}
1148: R. Kupferman, Fractional kinetics in Kac-Zwanzig heat bath models, J. Stat. Phys. 114 (2004), 
1149: pp. 291-326.
1150: 
1151: 
1152: \bibitem{landau}
1153: L. Landau and E.M. Lifshitz, Statistical Physics, Part 1, Butterworth-Heinemann, 1980.
1154: 
1155: \bibitem{moser}
1156: J. Langford and R. Moser, Optimal LES formulations for isotropic turbulence, J. Fluid. Mech. 
1157: (1999) 398, pp. 321-346.
1158: 
1159: \bibitem{majda}
1160: A. Majda, I. Timofeyev and E. Vanden Eijnden,
1161: A mathematical framework for stochastic climate models, Comm. Pure
1162: Appl. Math., 54 (2001), pp. 891-974.
1163: 
1164: \bibitem{mori}
1165: H. Mori, Transport, collective motion and Brownian motion, Prog. Theor. Phys. (1965) 33, 
1166: pp. 423-450.
1167: 
1168: \bibitem{zwanzig2}
1169: S. Nordholm and R. Zwanzig, A systematic derivation of exact generalized Brownian 
1170: motion theory, J. Stat. Phys., (1975) 13(4), pp. 347-371.
1171: 
1172: 
1173: \bibitem{p} G. Papanicolaou, Asymptotic analysis
1174: of stochastic equations, Studies in Probability Theory, vol 18 Studies in Mathematics
1175: M. Rosenblatt  (Ed.), Math. Assoc. Am. (1978).
1176: 
1177: \bibitem{pomeau}
1178: Y. Pomeau and P. Resibois, Time dependent correlation functions and mode-mode coupling theories, Physics Reports C (1975) 2, pp. 63-139.
1179: 
1180: 
1181: \bibitem{seibold}
1182: B. Seibold, Optimal prediction in molecular dynamics, Monte Carlo Meth. Appl. (2004), 10,1, pp. 25-50.
1183: 
1184: \bibitem{stanley}
1185: H. E. Stanley, Scaling, universality and renormalization, three pillars of modern 
1186: critical phenomena, Rev. Mod. Phys., 71 (1999), pp. S358- S366.
1187: 
1188: 
1189: \bibitem{stinis}
1190: P. Stinis, Stochastic optimal prediction for the Kuramoto-Sivashinsky
1191: equation, Mult. Scale. Simul. 5 (2004), pp. 580-612.
1192: 
1193: \bibitem{stinis2}
1194: P. Stinis, A maximum likelihood algorithm for the estimation and renormalization of 
1195: exponential densities, J. Comp. Phys. (2005) (in press).
1196: 
1197: \bibitem{stinis3}
1198: P. Stinis, A comparative study of two stochastic mode reduction methods, Physica D (2004) (submitted).
1199: 
1200: 
1201: \bibitem{swendsen}
1202: R. Swendsen, Monte-Carlo renormalization group, Phys. Rev. Lett. 42 (1979),
1203: pp. 859-861.
1204: 
1205: \bibitem{kevrekidis}
1206: K. Theodoropoulos, Y.-H. Qian and I.G. Kevrekidis, "Coarse" stability and bifurcation
1207: analysis using timesteppers: a reaction diffusion example, Proc. Natl. Acad. Sci. (2000), 97(18), 
1208: pp. 9840-9843.
1209: 
1210: \bibitem{schofield}
1211: R. van Zon and J. Schofield, Mode-coupling theory for multiple-point and multiple-time correlation functions, Phys. Rev. E (2002) 65, 011106.
1212: 
1213: \bibitem{zwanzig}
1214: R. Zwanzig, Nonlinear generalized Langevin equations, J. Stat. Phys., 9, (1973),
1215: pp. 215-220.
1216: 
1217: 
1218: 
1219: 
1220: 
1221: 
1222: 
1223: 
1224: \end{thebibliography}
1225: \end{document}
1226: