0503:math0503612/C05.tex

1:

2:

3:

4: \documentclass{article}

5:

6: \usepackage{amsmath,amsthm,amsfonts}

7:

8: \usepackage{amssymb}

9:

10: \usepackage{epsfig}

11:

12: \usepackage{rotating}

13:

14: \usepackage{subfigure}

15:

16:

17:

18: %\newcommand{\p}[2]{\frac{\partial#1}{\partial#2}}

19: %\def{\p}{\partial }

20:

21: %\newcommand{\sl R}{\mathscr{R}}

22: \def\sl R{\mathscr{R}}

23: \def\scrH{\mathscr{H}}

24: \def\scrS{\mathscr{S}}

25: \def\R{\Re}

26:

27: \def\vp{{\varphi}}

28: %\newcommand{\E}{{E}}

29: \def\E{{E}}

30: %\newcommand{\G}{{\Gamma}}

31: \def\G{{\Gamma}}

32: %\newcommand{\hatx}{{\hat{x}}}

33: \def\hatx{{\hat{x}}}

34: %\newcommand{\haty}{{\hat{y}}}

35: \def\haty{{\hat{y}}}

36: %\newcommand{\hatR}{{\hat{R}}}

37: \def\hatR{{\hat{R}}}

38: %\newcommand{\hatvp}{{\hat{\vp}}}

39: \def\hatvp{{\hat{\vp}}}

40: %\renewcommand{\P}{{P}}

41: \def\P{{P}}

42: %\newcommand{\Q}{{Q}}

43: \def\Q{{Q}}

44: \def\LQ{{LQ}}

45: %\newcommand{\hatp}{{\hat{p}}}

46: \def\hatp{{\hat{p}}}

47: %\newcommand{\hatq}{{\hat{q}}}

48: %\def\hatq{{\hat{q}}}

49: %\newcommand{\tildep}{{\tilde{p}}}

50: \def\tildep{{\tilde{p}}}

51: %\newcommand{\tildeq}{{\tilde{q}}}

52: \def\tildeq{{\tilde{q}}}

53: \def\pH{\partial H}

54: \def\eps{\epsilon}

55:

56: % ================================================================

57:

58: \begin{document}

59: \centerline{\Large\bf PROBLEM REDUCTION, RENORMALIZATION, AND MEMORY}

60: \vskip14pt

61:

62: \centerline{\bf Alexandre J.\ Chorin and Panagiotis Stinis}

63: \vskip12pt

64: \centerline{Department of Mathematics, University of California}

65: \centerline{and}

66: \centerline{Lawrence Berkeley National Laboratory}

67: \centerline{Berkeley, CA 94720}

68:

69:

70: \begin{abstract}

71: Methods for the reduction of the complexity of computational problems are

72: presented, as well as their connections to renormalization, scaling, and irreversible statistical mechanics.

73: Several statistically stationary cases are analyzed; for time dependent

74: problem averaging usually fails, and averaged equations

75: must be augmented by appropriate memory and random forcing terms.

76: Approximations are described and

77: examples are given.

78: \end{abstract}

79:

80:

81: \section{Introduction}\label{intro}

82: There are many problems in science which are too complex for numerical

83: solution as they stand. Examples include turbulence and other problems

84: where multiple scales must be taken into account. Such problems must be

85: reduced to more amenable forms before one computes. In the present paper

86: we would like to summarize some reduction methods that

87: have been developed in recent years, together with an account of what was

88: learned in the process. It is obvious that the problem has not been fully solved,

89: but we think that the examples and the

90: conclusions  reached so far are useful.

91:

92: In general terms,  a reduction to a more  amenable form is a renormalization

93: group transformation, as in physics --- a

94: transformation of a problem into a more tractable form while keeping

95: quantities of interest invariant. A renormalization group transformation

96: involves an incomplete similarity transformation (see below for definitions),

97: and thus a reduction method is a search for hidden

98: similarities. This a general feature of reduction methods,

99: and it will be illustrated in the examples. A successful problem reduction

100: produces a new problem which must in some asymptotic sense be similar to the

101: original problem. For general backgound on renormalization, see e.g.\cite{benfatto,fisher,stanley}.

102:

103: In problems with strong time dependence, reduction methods resemble methods

104: for the analysis of thermodynamic systems not in equilibrium; indeed, those aspects

105: of the problem that are ignored in a reduced description conspire to

106: destroy order and increase entropy.  Problem reduction for time-dependent

107: problems is basically renormalization group theory for non-equilibrium

108: statistical mechanics. For background on such theory, see e.g. \cite{barenblatt,goldenfeld1,kevrekidis2}.

109:

110: The content of the paper is as follows: In section \ref{ave} we consider Hamiltonian systems

111: and their conditional expectations.

112: In section \ref{block} we narrow the discussion to statistically stationary Hamiltonian systems

113: and recover Kadanoff real-space renormalization

114: groups and an interesting block Monte-Carlo method. In section \ref{kdv} we display an example that

115: exhibits and also extends the main features of this analysis in simple form.

116:

117: In section \ref{mz}  we explain the Mori-Zwanzig formalism for the reduction of statistically

118: time-dependent problems. The analysis shows that averaging the equations is in general

119: not enough; one must take into account noise and a temporal memory. The Mori-Zwazig

120: formalism is rather dense, and in the sections that follow we present various special

121: cases in which it can be simplified, in particular when the memory is very short or very long.

122:

123: For the sake of readability, we remind the reader of the rudiments of similarity

124: theory

125: \cite{barenblatt}.

126: Suppose a variable $a$ is a function of variables

127: $a_1,a_2,\ldots, a_m$,

128: $b_1,b_2,\ldots, b_k$, where

129: $a_1,\ldots,  a_m$ have independent units, for example units of length and mass,

130: while the units  of

131: $b_1,\ldots, b_k$, can be formed from the units of

132: $a_1,a_2,\ldots, a_m$.

133: Then there exist dimensionless variables

134: $\Pi=\frac{a}{a_1^{\alpha_1}\cdots a_m^{\alpha_m}}$,

135: $\Pi_i=\frac{b_i}{a_1^{\alpha_{i1}}\cdots a_m^{\alpha_{im}}}$,

136: $i=1,\ldots,k$, where the $\alpha_i,\alpha_{ij}$ are simple fractions, such that

137: $\Pi$ is a function of the $\Pi_{i}$:

138: \begin{equation}

139: \Pi=\Phi(\Pi_1,\ldots,\Pi_k).

140: \end{equation}

141: This is just a consequence  of the requirement that a physical relationship

142: be independent of the size of the units of measurement. At this stage nothing can be said about the

143: function $\Phi$.

144: Now

145: suppose the variables $\Pi_i$ are small or large, and assume that the

146: function $\Phi$

147: has a non-zero finite limit as

148: its arguments tend to zero or to infinity; then $\Pi\sim$ constant, and one finds a power

149: monomial relation between $a$ and the $a_i$.

150: This is a complete similarity relation.

151: If the function $\Phi$ does not have the assumed limit, it may

152: happen that for $\Pi_1$ small or large, $\Phi(\Pi_1)=\Pi_1^{\alpha}\Phi_1(\Pi_1)+\ldots$,

153: where the dots denote lower order terms, $\alpha$ is a constant, the other arguments of

154: $\Phi$ have been omitted and $\Phi_1$ has a finite non-zero limit.

155: One can then obtain a scaling (power monomial) expression for $a$ in terms of  the $a_i$ and $b_i$,

156: with undetermined powers which must be found by means other than dimensional analysis.

157: The resulting power relation is an {\it incomplete}  similarity relation.

158: Of course one may well have functions $\Phi$ with neither kind of similarity.

159:

160: Incomplete similarity expresses what is invariant under a renormalization

161: group; all renormalization group transformations involve incomplete similarity,

162: see  the books already cited as well as \cite{benettin} written before the

163: notion of incomplete similarity was formalized. The exponent $\alpha$ is called

164: an anomalous exponent.

165:

166: The paper \cite{givon1} is a survey of reduction methods organized along different

167: lines and can be profitably read in tandem with the present paper.

168:

169:

170:

171: \section{Averaging a Hamiltonian system} \label{ave}

172: We begin by examining what happens when one tries to reduce the complexity of

173: a Hamiltonian system by averaging (see also \cite{CKK1,CKL,seibold}).

174: Consider a system of nonlinear ordinary differential equations,

175: \begin{eqnarray}

176: \frac{d}{dt}\varphi(t)& =&R (\varphi(t)),\nonumber\\

177: \varphi(0)&=&x,

178: \label{eq:system}

179: \end{eqnarray}

180: where $\varphi$ and $x$ are $n$-dimensional vectors with components

181: $\varphi_i$ and $x_i$, and $R$ is a vector-valued function with components

182: $R_i$; $t$ is time.

183: To each initial value $x$ in (\ref{eq:system}) corresponds a

184: trajectory $\varphi(t)=\varphi(x,t)$.

185:

186:

187: Suppose that we only want to find

188: $m$ of the $n$ components

189: of the solution vector $\varphi(t)$ without finding the $n-m$ others.

190: One has to assume something about the variables that are not evaluated, and we assume

191: that at time t=0 we have a a joint probability density $F(x)$ for all the variables.

192: The variables we keep will have definite initial values $x_1,x_2,\dots,x_m$, and the

193: rest of variables will then have a conditional probability density $f_m=f(x_1,\dots,x_m,x_{m+1},\dots)/Z_m$,

194: where $Z_m=\int_{-\infty}^{+\infty}f(x_1,\dots,x_m,x_{m+1},\dots)dx_{m+1}dx_{m+2}\cdots$ is a normalization

195: constant. Without some assumption about the missing variables the problem is meaningless;

196: this particular assumption is reasonable because in practice $f$ can often be estimated

197: from previous experience or from general considerations of statistical mechanics.

198: The question is how to use this prior

199: knowledge in the evaluation of $\varphi(t)$.

200:

201:

202: Partition the vector $x$ so that $\hatx=(x_1,x_2,\dots,x_m)$, $\tilde x=(x_{m+1},\dots,

203: x_n)$ and $x=(\hat x,\tilde x)$, and similarly $\varphi=(\hat\varphi,\tilde\varphi),

204: R=(\hat  R,\tilde R)$.  In general the first $m$ components of $R$ depend on all the components of $\vp$,  $\hatR=\hatR(\varphi)=\hatR(\hat\varphi,\tilde\varphi)$;

205: if they do not we have a system of $m$ equations in $m$ variables and

206: nothing further needs to be done.

207: We want to calculate only

208: the variables $\hat\varphi$; then

209: $(d/dt)\hat\varphi(t) =\hat R (\varphi(t))$ where the right hand

210: side depends on the variables $\tilde\varphi$ which are unknown at time $t$.

211: We shall call the variables $\hat\varphi$ the ``resolved variables" and the

212: remaining variables $\tilde\varphi$ the ``unresolved variables".

213:

214:

215: Consider in particular a Hamiltonian system as in \cite{CKK1},\cite{CKL}. There exists then

216: a Hamiltonian function $H=H(\varphi)$ such that for $i$ odd $R_i$, the $i$-th component

217: of the vector $R$ in (\ref{eq:system}) satisfies $R_i=\partial H\bigl/{\partial \varphi_{i+1}}$

218: while for $i$ even one has $R_i=-{\pH}\bigl/{\partial \varphi_{i-1}}$, with $n$, the size of the system, even. Assume furthermore that $f$, the initial probability

219: density, is  $f(\varphi)=Z^{-1}\exp(-H/T)$

220: where $T$ is a parameter, known in physics as the ``temperature", which will be set equal to one in much, but not all, of the discussion below.

221: In physics this density appears naturally and is known as the ``canonical" density;

222: the normalizing constant $Z=Z(T)$ is the ``partition function".

223: This density $f$ is invariant, i.e.

224: sampling it and evolving the system in time commute.

225:

226:

227:

228: A numerical analyst who wants to approximate the solution of an equation usually

229: starts by approximating the equation.

230: If one solves for the resolved variables one has values for the variables $\hat\varphi$ available

231: at each instant $t$ and the best approximation should be a function of these variables; it is natural to seek a best approximation in the

232: mean square sense with respect to the invariant density $f$ at each time; the best approximation

233: in this sense is the conditional expectation

234: $E[R(\varphi)|\hat\varphi]=\int e^{-H}d\tilde\varphi\bigl/\int e^{-H}d\tilde\vp$ (note that we set $T=1$).

235: This conditional expectation is the orthogonal projection of $R$ onto the space of

236: functions of $\hat{\varphi}$ with respect to the inner product $(u,v)=E[uv]=\int u(\varphi)v(\varphi)f(\varphi)d\varphi$, where $d\varphi$ denotes integration over all the components

237: of $\varphi$.

238: We then try to approximate the system ($\ref{eq:system}$) by:

239: \begin{eqnarray}

240: \frac{d}{dt}\hat\varphi(t) &=& E[R (\varphi(t))|\hat\varphi(t)],\nonumber \\

241: \hat\varphi(0)&=&\hat x.

242: \label{foop}

243: \end{eqnarray}

244:

245: We have shown in \cite{CHK3,CKK1,CHK} that:

246: (i) The new system (\ref{foop}) is also Hamiltonian:

247: \begin{equation}

248: E \left[\frac{\pH}{\partial \varphi_i}|\hat\varphi(t)\right]=\int\frac{\pH}{\partial \varphi_i}\exp(-H)d\tilde\varphi

249: \bigl/

250: \int \exp(-H)d\tilde\varphi

251: =\frac{\partial \hat H}{\partial\varphi_i},

252: \label{hald1}

253: \end{equation}

254: where $i\le m=$ the dimension of $\hat\varphi$, and

255: \begin{equation}

256: \hat H=-log\int \exp(-H)d\tilde \vp

257: \label{renham}

258: \end{equation}

259: is the new Hamiltonian.

260:

261: (ii) The new canonical density $\hat f=Z^{-1}\exp(-\hat H)$ is invariant in the

262: evolution of the

263: new, reduced, system.

264:

265: (iii)

266: When the data are sampled from the canonical distribution,  the distribution of $\hat \varphi$ in the new system is its

267: marginal distribution in the old system; equivalently,

268: the partition function $Z$  is the same for the old system and  for the new system.

269:

270:

271: Now the question is, what does the solution $\hat\varphi(t)$ of (\ref{foop}) represent ?

272: It does not approximate the first $m$ components of the solution

273: $\varphi(t)$ of (\ref{eq:system})- the components of $\hat{\varphi}$ and

274: the components of $\varphi$ live in spaces of different dimension and in general

275: the components of the latter in those higher $n-m$ dimensions are not small.

276: One could hope that what the solution of (\ref{foop}) approximates is the vector

277: $E[\hat\varphi(t)|\hatx]$, the best estimate of the first components of the

278: solution at time $t$ given the partial initial information $\hatx$.

279: This is the case for linear systems (where averaging and time integration

280: commute), and is approximately the case for limited time in some

281: other special situations- nearly linear systems, some systems where the

282: ``unresolved variables" are fast.

283: However, in general this is not the case. We shall see below that a

284: reduced description of the solution of nonlinear systems in time

285: requires in general ``noise" and a ``memory".

286:

287: The lack of convergence can be  understood by the following physics

288: argument.

289: In physics a system in which the values of all

290: the variables are drawn from a canonical distribution is a system in thermal

291: equilibrium.

292: The assignment of

293: definite values $\hat x$ to the variables $\hat \varphi$ at time $t=0$

294: amounts to taking the system out of equilibrium at $t=0$;

295: if the system is ergodic it will then decay to equilibrium in time, so that

296: all the variables become randomized and acquire the joint density $f$.

297: Thus

298: the  predictive value of the partial initial data $\hat x$

299: decreases in time; all averages of the $\hat \varphi$ approach

300: equilibrium averages. However, the reduced system (\ref{foop}) is Hamiltonian, and the

301: solutions it produces

302: oscillate forever.

303:

304: In Figure 1 we consider the Hald Hamiltonian system (\cite{CHK3}) with

305: \begin{equation}

306: H=\frac{1}{2}\left(\varphi_1^2+\varphi_2^2 +\varphi_3^2+\varphi_4^2

307: +\varphi_1^2\varphi_3^2\right)

308: \label{haldmodel}

309: \end{equation}

310: (physically, two linear oscillators with a nonlinear coupling).

311: We assume that $\varphi_1(0), \varphi_2(0)$ are given and sample the two other

312: initial data from the canonical distribution with $T=1$.

313:

314: \begin{figure}

315: \centering

316: \epsfig{file=plot_rslt.eps,height=3in}

317: \caption{Comparison of the evolution of $E[\phi_1(t)|\phi_1(0),\phi_2(0)]$ (truth), to the

318: prediction by the "Galerkin" approximation and the prediction by the averaging procedure described

319: in the text.}

320: \label{fig:unkno1}

321: \end{figure}

322:

323:

324: In Figure 1 are displayed (1) The result  for $\varphi_1$ of a ``Galerkin" calculation in which

325: the unresolved variables are set to zero (this is what is implicitly done in many

326: unresolved computations); (2) the result of the averaging procedure just described,

327: and (3) the true $E[\varphi_1(t)|\hatx]$, calculated by repeatedly sampling the initial data,

328: solving the full system, and averaging. As one can see, averaging is initially better than

329: the null ``Galerkin" method, but in the long run the truth decays but the solution of the

330: averaged system oscillates for ever. For more detail, see \cite{CHK3}.

331:

332: The procedure we have just described resembles sufficiently the averaging

333: methods used in some areas of engineering, for example the large-eddy

334: simulation methods in turbulence (see e.g. \cite{moser}) and in some multiscale problems (see e.g. \cite{kevrekidis}), to cast a very serious doubt on the broad validity of the latter.

335: For a description of special cases, with small fluctuations and particular structures, where this

336: procedure is legitimate, see \cite{givon1}.

337:

338:

339:

340:

341: \section{Prediction with no data and block Monte-Carlo}\label{block}

342:

343: There is however a case where the construction of the preceding section

344: can be very useful-- when $m=0$, i.e., when one tries to predict the future

345: with no initial information. Equations (\ref{eq:system}) then sample the canonical

346: distribution and the reduced system samples a subset of variables

347: without sampling the others, and, as we have seen, keep the statistics of the resolved

348: variables unchanged (see \cite{seibold} for an application to molecular dynamics).

349:

350: To see what is happening, suppose the variables $\varphi_i$ are associated

351: with nodes on a regular lattice, for example, they may represent

352: spins in a solid, or originate in the spatial discretization

353: of a partial differential equation.

354:

355: Divide the lattice into blocks of some fixed shape (for example, divide

356: a regular one-dimensional lattice into groups of two contiguous nodes).

357: We had not yet specified how the variables are to be divided into

358: resolved and unresolved. Now decide to ``resolve" one variable per block,

359: and leave the others in the same block unresolved. The transformation between the old variables and the smaller set of resolved variables is a Kadanoff

360: renormalization group transformation \cite{kadanoff}; the Hamiltonian $\hat H$ defined

361: above in equation (\ref{renham}) is the renormalized Hamiltonian. We will now explain what this means.

362:

363:

364: Suppose the system described by the Hamiltonian is translation invariant.  The equations of

365: motion for any at any one point, say at the location labeled by $1$, have the same form as the equations of motion at any

366: any other point. The relation between the right hand side of the reduced system and the

367: right hand side of the old system can be rewritten as:

368: \begin{equation}

369: \frac{\partial \hat H}{\partial \varphi_1}=E[\frac{\partial H}{\partial \varphi_1}|\hat\varphi],

370: \label{start}

371: \end{equation}

372: where the expected value is with respect to the invariant density as before. This relation is the starting

373: point for the actual evaluation of $\hat H$.

374:

375: Hamiltonians are functions of the variables $\varphi$. They can be expanded in the form:

376: \begin{equation}

377: H=\sum_ja_j\psi_j,

378: \label{expandH}

379: \end{equation}

380: where the $\psi_j$ are ``elementary Hamiltonians". In a translation invariant system, where

381: each equation has the same form as any other, the Hamiltonian is made up of sums over $i$ of terms

382: of the form $h(\varphi_j\varphi_j)$ for various values of $j$, where $h$ is some function; these terms

383: represent ``couplings" between variables $j$ apart;  one can then choose the elementary Hamiltonians to be

384: polynomials in $x_ix_{i+j}$ with a fixed $j$ in each $\psi_j$, i.e., one segregates the couplings between

385: variables $j$ apart into separate terms.

386:

387: In a homogeneous system where there is only one variable per site it is enough to satisfy (\ref{start}) for one variable, say for $\varphi_1$. Define

388: $\psi'=\frac{\partial}{\partial\varphi_1}\psi$, noting that though $\psi$ is necessarily a function with

389: at least

390: as many arguments as there are components on $\varphi$, $\psi'$ can be sparse. Equation (\ref{start}) reduces to

391: \begin{equation}

392: \frac{\partial \hat H}{\partial \varphi_1}=\sum_ja_jP\psi'_j(\varphi)=\sum_j \hat{a}_j \psi'_j

393: (\hat{\varphi}),

394: \label{more1}

395: \end{equation}

396: with the projection $P$ defined as before by $Pg(\varphi)=E[g|\hat\varphi]$ for any function $g$ of $\varphi$.

397: Now we're almost done. One can pick a basis in $\hat L_2$, the subspace of square integrable functions that depend only

398: on the variables $\hat\varphi$, which consists of a subset of the set of functions $\psi'$. The right-hand

399: of equation (\ref{more1}) is then a linear combination of $\psi's$; integration with respect to $\varphi_1$

400: requires only the erasure of the primes and yields a series for $\hat H$. The elements of $\tilde\varphi$ are

401: now gone, and one can relabel the remaining variables $\hat\varphi$ so that the terms in the series

402: have exactly the same form as before; the calculation can then be repeated, yielding a sequence of

403: Hamiltonians with ever fewer variables: $H, H^{(1)}=\hat H$, $H^{(2)}=\hat H^{(1)}, \dots$. The corresponding

404: densities $f^{n}=Z^{-1}\exp(-H^{(n)}/T)$ can in principle be sampled by any sampling scheme, for example by Metropolis sampling

405: (but there are caveats, see e.g. \cite{chorin9}).

406:

407: At this point we have reduced the number of variables by a factor $L$ equal to the number of

408: variables in each  block, but this may well seem to be a pyrrhic victory. The Hamiltonians

409: one usually encounters are simple, in the sense that they involve few couplings- finite

410: differences typically link a few neighboring variables, and so do the usual spin Hamiltonians

411: in physics. As one reduces the number of variables, the new Hamiltonians become more complex,

412: with more terms in the series (\ref{expandH}); the cost per time step of solving the equations in time or

413: of  the cost per move in a Metropolis sampling typically increases fast as well. To see what has

414: been gained one must turn to the physics literature (see e.g. \cite{kadanoff}.\cite{hohenberg}).

415:

416:

417: Consider the spatial correlation length $\ell$  which measures the range of values of

418: $|j|$ over which the spatial covariances $E[\varphi_i\varphi_{i+j}]$ are non negligible,

419: and the correlation time $\tau$ for which the temporal covariances $E[\varphi_i(t)\varphi(t+s)]$

420: are non-negligible. For very large and very small values of the temperature $T$ (the variance

421: parameter in the density  $f$) both the correlation time and the correlation length are small;

422: the properties of the system can then be found from calculations with a small number of variables and

423: it is not urgent to reduce the number of variables. There is a range of intermediate values of

424: $T$ for which the correlation length and time for are large and then the reduction is worthwhile.

425: There often is a value $T_c$ of $T$, the ``critical value", for which $\ell=\infty$. Values of $T$

426: around $T_c$ are often of great interest.

427:

428:

429: Now we can see what the reduction can accomplish. If one tries to compute averages with $T$ near

430: $T_c$ one finds that the cost of computation is proportional to $\tau$- one has to compute long

431: enough to obtain independent samples of $\varphi$, and a new independent sample will not

432: appear until a time $\sim\tau$ has passed. The reductions above produce a system

433: with smaller $\ell$ and $\tau$ and therefore computation takes less time.

434: Though we started with the declared goal of reducing the number of variables, what has been

435: produced is more interesting: a new system with shorter correlations which is more amenable to

436: computation. It is not the raw number of variables that matters.

437:

438: The renormalization can be used with a multigrid scheme, in which one runs  up and down on different levels

439: of renormalization, on the finer ones to achieve accuracy and  the cruder ones to move fast from

440: one macroscopic configuration to another. A comparison with other multigrid

441: sampling schemes (see e.g. \cite{brandt}) reveals that we have derived a reasonably standard scheme, with however

442: a particularly effective way to store conditional expectations. For details see \cite{chorin9}.

443:

444:

445: An alternative method for obtaining the expansion coefficients for the renormalized Hamiltonians was proposed in \cite{stinis2}. The method is based on the maximization of the likelihood of the renormalized density. The maximization of the likelihood leads to a moment-matching problem. The moments in this case are the expectation values of the "elementary Hamiltonians" (see above) with respect to the renormalized density. The solution of the moment matching problem yields the expansion of the renormalized Hamiltonian.

446:

447: The recognition of the links of probability with renormalization is largely due to Jona-Lasinio (see e.g. \cite{jl}).

448: The connection of renormalization with incomplete similarity is too well known (see \cite{barenblatt, kadanoff, goldenfeld1})

449: to require further comment here.

450:

451:

452:

453: \section{An example: The Korteveg-deVries-Burgers equation}\label{kdv}

454:

455: As an illustration of the ideas in the previous section, consider

456: the equation

457: \begin{equation}

458: u_t+uu_x=\epsilon u_{xx}-\beta u_{xxx},

459: \end{equation}

460: with boundary conditions

461: \begin{equation}

462: u(-\infty)=u_0,\ \ u(+\infty)=0, \ \ u_{x}(-\infty)=0,

463: \end{equation}

464: where the subscripts denote differentiation, $x$ is the spatial variable,

465: $t$ is time, $\epsilon>0$ is a diffusion coefficient, $\beta>0$ is a dispersion coefficient and $u_0>0$ is a given constant.

466: The boundary conditions create a traveling wave solution moving to the right

467: (towards $+\infty$) with velocity

468: $u_0/2$ which becomes steady in a moving framework as $t\rightarrow\infty$.

469: In nondimensional form the equation can be written as:

470: \begin{equation}

471: u_t+uu_x=\frac{1}{R} u_{xx}+u_{xxx},

472: \label{kdvb}

473: \end{equation}

474: with $u_x(-\infty)=0$, $u(+\infty)=0$, $u(-\infty)=1$;

475: $R=\eps\sqrt U/\alpha$ is a ``Reynolds number".

476: For $R\leq1$ the traveling wave has a monotonic profile,

477: while for $R>1$ the profile

478: is oscillatory, with oscillations whose wave length is of order 1 \cite{bona}.

479: At zero diffusion $(R=\infty)$

480: the stationary asymptotic wave train extends to infinity

481: on the left. For finite $R$ the wave train is damped and the solution

482: tends to 1 as $x$ decreases.

483:

484:

485: The steady wave profile can be found by noting that it satisfies an ordinary

486: differential equation, whose solution connects a spiral singularity at $x=\infty$

487: to a saddle point at $x=+\infty$.

488: At the steady state we average the solution at each point $x$ over the region

489: $\left(x-\ell/2, x+\ell /2\right)$ and call the result $\bar u$.

490: Now look for

491: an effective equation $g(v,v_x,v_{xx},\ldots)=0$

492: whose solution $v$ approximates $\bar u$; $v$ can be expected

493: to be smoother than the solution of (\ref{kdvb}) and thus require fewer mesh points

494: for an accurate numerical solution.

495:

496: We now make an analogy between the conditional expectations which define the

497: renormalized variables in the previous sections

498: and an

499: averaging in space which defines ``renormalized"

500: variables for solutions of the KdVB equations that are stationary

501: in a moving  frame.

502: Averaging over an increasing length scale corresponds either to more

503: renormalization steps or, equivalently, to renormalization with a greater

504: number of variables grouped together.

505: We pick a class of equations in which to seek the ``effective" equation,

506: the one whose solutions best approximate the averages of the true solution in the

507: mean square sense; the choice of mean-square approximation

508: in the KdVB case corresponds to the use of $L_2$ norms implied by the use

509: of conditional

510: expectations in the previous sections, and the choice of a class of equations in which to

511: look for the effective equation is analogous to the choice of a basis

512: for the representation of the Hamiltonian; the calculation of

513: the best coefficients in the chosen class of ``effective" equations corresponds to the

514: evaluation of the coefficients in the series for the renormalized Hamiltonians.

515: In the Hamiltonian case we average the right-hand-sides of the equations and

516: in the analogous KdVB case we attempt to average the solutions;

517: this must be so because in the KdVB case we do not have theorems which

518: guarantee that averaging the right-hand-sides produces the correct statistics for the

519: solutions.

520:

521: We can look for an effective equation in the class of equations of

522: the form

523: \begin{equation}

524: -cv_x+vv_x=\epsilon_{eff} v_{xx}+v_{xxx}+\beta |v_x|^\alpha v_{xx}+\dots,

525: \end{equation}

526: where $\epsilon\geq0,\alpha\geq0, \beta\geq0$ are constants and $c=1/2$ is the velocity of propagation

527: of the steady wave (see also \cite{barenblatt3}).

528: The problem is to find the value of the parameters  in the effective equation which minimizes

529: \begin{equation}

530: I= \int_{-\infty}^{+\infty}|\bar u(x)-v(x)|^2 dx.

531: \label{min}

532: \end{equation}

533: One finds numerically that that the last terms have little effect on the minimum if $I$ when $\ell\ge5$

534: (in the physics terminology,

535: they are ``irrelevant").

536: The effective equation is thus a Burgers equation

537: with a value of the dimensionless diffusion coefficient $\epsilon_{eff}$  different from $1/R$.

538:

539:

540: The minimization in (\ref{min}) was carried out in \cite{chorin10}, and it showed that the mimimun

541: was achieved when $\epsilon_{eff}=R^{\nu}\Phi(\ell)$, with the exponent $\nu\sim 0.75$. Note that

542: when the diffusion coefficient $\epsilon\rightarrow0$,

543: then $\epsilon_{eff}\rightarrow \infty !$.

544: This is an incomplete similarity relation, as advertised, relating a ``bare" Reynolds number $R$ to

545: a ``dressed" Reynolds $\epsilon_{eff}^{-1}$.

546: The form of the effective equation could conceivably have been found by averaging the original

547: equation, but the relation between the original $\epsilon$ and $\epsilon_{eff}$ requires

548: some form of renormalization-like reasoning.

549:

550:

551: \section{The Mori-Zwanzig formalism}\label{mz}

552: We now return to the problem we started investigating in Section \ref{ave}: How to determine the evolution of

553: a subset $\hat\varphi$ of components of a vector $\varphi$ described by a nonlinear set of equations

554: of the form (\ref{eq:system}). This is a nonlinear closure problem of a type much studied in

555: physics, and a variety of formalisms is available for the job. We choose the Mori-Zwanzig formalism of

556: irreversible statistical mechanics \cite{fick,grabert,mori,zwanzig,zwanzig2}, because it homes in on the basic difficulty, which is the

557: description of the memory in the system; the relation of this formalism to other nonlinear formalisms

558: is described in \cite{CHK04}. That a reduced description of a nonlinear system involves a memory

559: should be intuitively obvious: suppose you have $n>3$ billiard balls moving about on top of a table

560: and are trying to describe the motion of just three; the second ball may strike the seventh ball

561: at a time $t_1$ and the seventh ball may then strike the third ball at a later time.

562: The third ball then ``remembers" the state of the system at time $t_1$, and if this memory is

563: not encoded in the explicit knowledge of where the seventh ball is at all times, then it has to be encoded in some

564: other way.  We are no longer assuming that the system is Hamiltonian nor that we know an invariant

565: density.

566:

567: It is much easier to work with linear equations, and we start by finding a linear equation

568: equivalent to (not approximating!) the system (\ref{eq:system}).

569: Introduce the linear Liouville operator

570: $L= \sum_{i=1}^n R_i(x)

571: \frac{\partial}{\partial x_i}$, and the Liouville equation:

572: \begin{eqnarray}

573: \frac{\partial}{\partial t}u(x,t)& = &Lu(x,t) \nonumber\\

574: u(x,0)& = &g(x),

575: \label{Liouville}

576: \end{eqnarray}

577: with initial data $g(x)$. This is the partial differential

578: equation for which (\ref{eq:system}) is the set of characteristic equations. One can

579: verify that the solution of the Liouville equation is $u(x,t)=g(\varphi(x,t))$ (see e.g \cite{CHK}).  In

580: particular, if $g(x)=x_i$, the solution is $u(x,t)=\varphi_i(x,t),$ the

581: i-th component of the solution of (\ref{eq:system}).

582: This linear partial differential equation is thus equivalent to

583: the nonlinear system (\ref{eq:system}). The linearity of equation (\ref{Liouville}) greatly facilitates

584: the analysis.

585:

586: Introduce the semigroup notation $u(x,t)=(e^{t L}g)(x)=g(\varphi(x,t))$,

587: where $e^{tL}$ is the evolution operator associated with the operator $L$;

588: therefore $e^{tL}g(x)=g(e^{tL}x)$, and

589: one can also verify that

590: $e^{tL}L=Le^{tL}$ (this can be seen to be a change of variables formula).  Equation

591: (\ref{Liouville}) becomes

592: \[

593: \frac{\partial}{\partial t}e^{tL}g = L e^{tL} g = e^{tL} Lg.

594: \]

595: We suppose that as before we are given

596: the initial values of the

597: $m$ coordinates $\hatx$, and that the distribution of the remaining $n-m$

598: coordinates $\tilde{x}$ is the conditional density, $f$

599: conditioned by $\hatx$, where $f$ is initially given.

600:

601: We define a projection operator $P$ by $Pg=E[g|\hatx]$.

602: The conditioning variables are the initial values of $\hat \varphi$;

603: in section \ref{ave} the conditioning variables were the values of $\hat\varphi(t)$, which are

604: unusable here when we do not know the probability density at time $t$. Quantities such

605: as $P\hat \varphi(t)=E[\hat\varphi(t)|\hatx]$ are by definition the best estimates of

606: the future values of the variables $\hat\varphi$ given the partial data $\hatx$ and are

607: often the quantities of greatest interest.

608:

609: Consider

610: a resolved coordinate $\varphi_j(x,t)=e^{tL} x_j$ ($j\le m$), and split its time

611: derivative, $R_j(\varphi(x,t))=e^{tL} L x_j$ as follows:

612: \begin{equation}

613: \frac {\partial}{\partial t} e^{tL} x_j = e^{tL} L x_j =  e^{tL}\P L x_j + e^{tL} \Q L x_j,

614: \label{eq:split}

615: \end{equation}

616: where $\Q=I-\P$. Define $ \hat{R}_j(\hatx) = (\P R_j)(\hatx)$; the first

617: term is $e^{tL}\P L x_j =  \hat{R}(\hat{\varphi}(x,t))$ and is a function of the resolved components only (but it is a function of the whole vector of initial data).

618: Note that if $Q$ were zero we would recover something that looks

619: like the crude approximation of the previous section; however the conditioning

620: variables are not the same. We shall see that the term in $Q$ is essential.

621:

622: We further split the remaining term $e^{tL} \Q L x_j$. This splitting will

623: bring it into a very useful form: a noise term, and a memory term whose kernel depends

624: on the correlations of the noise term. The fact that such a splitting is possible

625: is the essence of ``fluctuation-dissipation" theorems (see e.g \cite{landau}).

626:

627:

628: Let $w(x,t)=e^{t\Q L}\Q L x_j$, i.e., let

629: $w(x,t)$ be a solution of the initial value problem:

630: \begin{eqnarray}

631: \frac{\partial}{\partial t}w(x,t)&=&\Q L w(x,t)\  = \  Lw(x,t)- \P Lw(x,t) \nonumber\\

632: w(x,0)&=&\Q L x_j.

633: \label{ortho1}

634: \end{eqnarray}

635: If for some function h(x), $Ph=0,$ then $Pe^{t\Q L}h=0$ for all time $t$, i.e., $e^{t\Q

636: L}$ maps the null space of $\P$ into itself.

637:

638: The evolution operators $e^{tL}$ and $e^{t\Q L}$ satisfy the Duhamel

639: relation

640: \[

641: e^{t L} = e^{t\Q L} + \int_0^t e^{(t-s) L} \P L e^{s \Q L} \,ds.

642: \]

643: Hence,

644: \begin{equation}

645: e^{tL} Q L x_j =

646: e^{t\Q L} \Q L x_j + \int_0^t e^{(t-s)L} \P L e^{s\Q L} \Q L x_j \,ds.

647: \label{dyson}

648: \end{equation}

649:

650: Collecting terms, we find

651: \begin{equation}

652: \frac {\partial}{\partial t} e^{tL} x_j =  e^{tL}\P L x_j +

653: \int_0^t e^{(t-s) L} \P L e^{s Q L}Q L x_j \,ds +e^{tQL} Q L x_j

654: \label{eq:langevin}

655: \end{equation}

656:

657:

658: The first term on the right hand side is the

659: Markovian contribution to $\partial_t \varphi_j(x,t)$---it depends only on

660: the instantaneous value of the resolved $\hatvp(x,t)$.  The second

661: term depends on $x$ through the values of $\hatvp(x,s)$ at times $s$

662: between $0$ and $t$, and embodies a memory---a dependence on the past

663: values of the resolved variables.  Finally, the third term, which

664: depends on full knowledge of the initial conditions $x$, lies in the

665: null space of $\P$ and can be viewed as noise with statistics

666: determined by the initial conditions.

667:

668: It is important to see that equation (\ref{eq:langevin}) is an identity. The memory and noise

669: terms have not been added artificially, their presence is a direct consequence of the original

670: equations of motion. However tempting it may be to average equations by taking one-time

671: averages, the results will in general be wrong; one must add a memory and a noise as well.

672:

673:

674: If what is desired is $P\hat \varphi(t)$, the conditional expectation of

675: $\hat \varphi(t)$ given $\hat x$ (the best approximation in the sense of $L_2$ to $\hat\vp$ given the

676: partial data $\hat x$), then one can  premultiply equation (\ref{eq:langevin}) by P; the noise term

677: then drops out and we find

678: \begin{equation}

679: \frac {\partial}{\partial t}P e^{tL} x_j = P e^{tL}\P L x_j +

680: P\int_0^t e^{(t-s) L} \P L e^{s Q L}Q L x_j \,ds

681: \label{eq:langevin_pro}

682: \end{equation}

683: Even if the system we start with is Hamiltonian, the Langevin

684: equation (\ref{eq:langevin}) is not;  the memory and the noise allow the system to forget

685: its initial values and decay to ``thermal equilibrium" as it should (see section \ref{ave}).

686:

687: We now show that the memory term is a functional of the temporal correlations of the noise.

688: To save on writing

689: we restrict ourselves to cases where the operator $L$ is skew-symmetric,

690: i.e, $(Lu,v)=-(u,Lv)$, (remember $(u,v)=E[uv]$). The skew-symmetry holds in particular for

691: Hamiltonian systems with canonical data, see \cite{CHK3},\cite{evans}; however, here the the assumption is skew-symmetry

692: is only an excuse to reduce the number of symbols,

693: not a

694: return to the Hamiltonian case. Pick an orthonormal basis $\{h_k=h_k(\hat x),k=1,\dots\}$ in

695: the range of $P$, which is the space of functions of $\hat x$

696: (for example,

697: the $h_k$ could be Hermite polynomials in the variables $\hatx$). Any function

698: $\psi(x,t)$,

699: can be expanded as  $\psi=\sum_k(\psi(x,t),h_k)h_k(\hatx)$, and in particular,

700: \begin{equation}

701: P(LQe^{sQL}QLx_j)=\sum_k(LQe^{sQL}QLx_j,h_k)h_k(\hat x).

702: \label{expand_fin}

703: \end{equation}

704: where a factor $Q$ has been inserted before the exponentials, harmlessly because

705: the operators that follow it all live in the null space of $P$.

706: The memory term now becomes

707: \begin{eqnarray}

708: \int_0^te^{(t-s)L}PLe^{sQL}QLx_jds\!\!\!&=\!\!\!&\int_0^t\sum_ke^{(t-s)L}(LQe^{sQL}QLx_j,h_k)h_k(\hat x)ds\nonumber\\

709: \!\!\!&=&\!\!\!\sum_k\!\!\int_0^t(LQe^{sQL}QLx_j,h_k)h_k(\hat \varphi(t-s))ds;

710: \label{expand}

711: \end{eqnarray}

712: In the last identity we used the fact that the parenthesis is independent of time and therefore

713: commutes with the time evolution operator $e^{tQL}$, and also the fact that $e^{(t-s)L}h_k(\hatx)=h_k(\hat\varphi(t-s))$ by

714: definition.

715: Now $(LQe^{sQL}QLx_j,h_k(\hatx))=-(e^{sQL}QLx_j,QLh_k(\hatx))$ by the symmetry of $Q$

716: and the assumed skew-symmetry of $L$; each term on the right hand side of equation

717: (\ref{expand}) is the ensemble average of the product of the value of the stochastic process $e^{tQL}QLx_j$ at time $s=t$

718: with the value of the stochastic process $e^{tQL}QLh_k(\hatx)$ evaluated at time $s=0$, i.e., it

719: is a temporal correlation. All these stochastic processes are in the range of $Q$ for all $t$,

720: they are therefore components of the noise.  Remember that by definition $Lx_j=R_j$ (a right-hand side in equations (\ref{eq:system})). $PLx_j$ is then an average of the right-hand side of (\ref{eq:system})

721: and $QLx_j=R_j-E[R_j|\hat x]$ is the initial fluctuation in that right-hand side.

722:

723:

724: The first, ``Markovian", term in equations (\ref{eq:langevin}) looks straightforward, but perils lurk there

725: as well.

726: In general $R_j$ in equations (\ref{eq:system}) is nonlinear,

727: and so is $PLx_j=E[R_j|\hat x]$. $e^{tL}PLx_j$ is a nonlinear

728: function of the functions $\hat\varphi(t)$ which depends on all the components of $x$, not only on $\hat x$.

729: Some way of approximating this function must be found. If one looks for conditional expectations, one must

730: find a way to commute $P$ with a nonlinear function; for a discussion, see \cite{CHK3}. This bullet was dodged in section \ref{ave} when the conditioning variables were chosen to be $\hat\varphi(t)$ which change in time, but it may be hard to dodge here.

731:

732:

733:   																																	      The task now at hand is to extract something usable from these rather cumbersome formulas. A very detailed presentation of

734:   																																	      the analysis in this section can be found in \cite{c11}.

735:

736:

737:

738:

739:   \section{Fluctuation-dissipation theorems}\label{fd}

740:

741: We have established a relation between kernels in the memory term and the noise (the former is made up of covariances of the latter). This is the mathematical content of what are known as ``fluctuation-dissipation theorems" in physics. However, under some specific restricted circumstances, the relation between noise and memory takes on more intuitively appealing forms, which we now briefly describe.

742: In physics one often takes a restricted basis in the range of $P$ consisting

743: of the coordinate functions $x_1,...,x_m$ (the components of $\hat{x}$). The resulting projection

744: is called there the `` linear projection" as if $P$ as defined above were not linear.

745: The use of this projection is appropriate when the amplitude of the functions $\hat\phi(t)$ is small.

746: One then has

747: $h_k(\hat x)=x_k$ for $k\le m$.

748: The correlations in equation (\ref{expand}) are then simply

749: the temporal correlations of the noise (not of the full solutions of the system!). This is known as the fluctuation-dissipation theorem of the second kind.

750:

751: Specialize further to a situation where there is a single resolved variable, say $\phi_1$, so that $m=1$

752: and $\hat\phi$ has a single component. The Mori-Zwanzig equation becomes:

753:

754: \begin{equation*}

755: \frac{\partial}{\partial{t}} e^{tL}x_1=

756: e^{tL}PLx_1+e^{tQL}QLx_1+

757: \int_0^t e^{(t-s)L}PLe^{sQL}QLx_1ds,

758: \end{equation*}

759: or,

760: \begin{multline}

761: \label{lmz}

762: \frac{\partial}{\partial{t}} \phi_1(x,t) =

763: (Lx_1,x_1)\phi_1(x,t)+e^{tQL}QLx_1 \\\

764: +\int_0^t(LQe^{sQL}QLx_1,x_1)\phi_1(x,t-s)ds \\\

765: =(Lx_1,x_1)\phi_1(x,t)+e^{tQL}QLx_1-

766: \int_0^t (e^{sQL}QLx_1,QLx_1)\phi_1(x,t-s)ds,

767: \end{multline}

768: where we have again inserted a harmless factor $Q$ in front of $e^{QL}$, assumed that

769: $L$ was skew-symmetric as above, and for the sake of simplicity also assumed $(x_1,x_1)=1$

770: (if the last statement is not true the formulas can be adjusted appropriately).

771: Take the inner  product of equation (\ref{lmz}) with $x_1$, you find:

772: \begin{multline}

773: \label{clmz}

774: \frac{\partial}{\partial{t}} (\phi_1(x,t),x_1)=(Lx_1,x_1)(\phi_1(x,t),x_1) \\\

775: +(e^{tQL}QLx_1,x_1)-\int_0^t(e^{sQL}QLx_1,QLx_1)\phi_1(x,t-s)ds \\\

776: =(Lx_1,x_1)(\phi_1(x,t),x_1)-\int_0^t(e^{sQL}QLx_1,QLx_1)

777: (\phi_1(x,t-s),x_1)ds,

778: \end{multline}

779: because $Pe^{tQL}QLx_1=(e^{tQL}QLx_1,x_1)x_1=0$

780: and hence $(e^{tQL}QLx_1,x_1)=0.$

781: Multiply equation (\ref{clmz}) by $x_1$, and remember that  $P\phi_1(x,t)=(\phi_1(x,t),x_1)x_1.$ You find:

782: \begin{equation}

783: \label{plmz}

784: \frac{\partial}{\partial{t}} P\phi_1(x,t)= (Lx_1,x_1)P\phi_1(x,t)-

785: \int_0^t (e^{sQL}QLx_1,QLx_1)P\phi_1(x,t-s)ds.

786: \end{equation}

787: You observe that the covariance $(\phi_t(x,t),x_1)$ and the projection of $\phi_1$ on $x_1$

788: obey the same homogenous linear integral equation. This is the fluctuation-dissipation theorem

789: of the first kind, which embodies the Onsager principle, according to which spontaneous fluctations

790: in a system

791: decay at the same rate as perturbations imposed by external means,  when both are small

792: (so that the linear projection is adequate).

793: This reasoning can be extended to cases where there are multiple resolved variables, and this is

794: usually done with the added simplifying assumption that $(x_i,x_j)=0$ when $i\ne j$. We omit the details.

795:

796:

797:

798: \section{Very short and very long memory approximations}\label{short}

799:

800:

801:

802:

803: The approximation we shall examine is some detail is:

804: \begin{equation}

805: e^{tQL}\cong e^{tL},

806: \label{QLeL}

807: \end{equation}

808: and we will consider under what conditions  it is reasonable.

809: We will find that it is reasonable both when memory is very short and when it is very long. The fact that the same approximation works for two opposite cases is not a paradox. The approximation (\ref{QLeL}) states that the orthogonal dynamics operator is very close to the full dynamics operator. In other words, the orthogonal dynamics, which evolve in a space orthogonal to that of the resolved variables, are insensitive to the coupling between resolved and unresolved variables. This can happen in particular when the orthogonal dynamics are very fast or when the orthogonal dynamics are very slow. The ansatz above should work when there is  an effective decoupling of the equations for the resolved and unresolved variables. This raises the question of what determines the range of the memory. Is it possible to have a reduced model with very short or very long memory, depending on how one coarse-grains  a particular system at hand? In \cite{stinis} evidence was presented that, fo!

810:  r the Kuramoto-Sivashinsky equation, the range of the memory of a reduced model can vary dramatically, depending on whether all the unstable modes in the system are resolved or not. The construction of a reduced model corresponds to renormalization, and the two extreme cases can be interpreted as two fixed points of a renormalization scheme. In which one a reduced model will end up depends on how one renormalizes. Finally, note that the Duhamel formula can be used for an iterative solution of the orthogonal dynamics equation. The term $e^{tL}$ is the zero-th order term of an iterative solution for $e^{tQL}.$ This construction can be based on the use of Feynman diagrams.

811:

812:

813: First we examine the case when the memory is short, i.e., when the

814: various terms in the series (\ref{expand_fin}) vanish for $s$ beyond a small value; see \cite{majda} for

815: a different approach to short-memory reduced model construction and \cite{stinis3} for comparison with the present short-memory approximation, as well as \cite{p} and the references therein.

816:

817: The memory term in the Mori-Zwanzig equations (\ref{eq:langevin}) can be rewritten as

818: \begin{equation}

819: \int_0^t e^{(t-s)L} \P L e^{s\Q L} \Q L x_j \,ds =

820: \int_0^t e^{(t-s)L} \P L \Q e^{s\Q L} \Q L x_j \,ds,

821: \end{equation}

822: where the insertion of the extra $\Q$ is harmless.

823: Adding and subtracting equal quantities, we find:

824: \begin{equation}

825: PLe^{sQL}QLx_j=PLQe^{sL} QLx_j + PLQ (e^{sQL}-e^{sL}) QLx_j;

826: \end{equation}

827: a Taylor series yields:

828: \begin{equation}

829: e^{sQL}-e^{sL}=I+sQL+\dots-I-sL-\dots=-sPL+O(s^2),

830: \end{equation}

831: and therefore, using $QP=0$, we find:

832: \begin{equation}

833: \int_0^t e^{(t-s)L} P L e^{sQL} Q L x_j \,ds =

834: \int_0^t e^{(t-s)L} P L Q e^{sL} Q L x_j\,ds + O(t^3).

835: \end{equation}

836: If $P$ is a finite rank projection then

837: \begin{equation}

838: P L e^{sQL} Q L x_j =

839: \sum_{k} (Q L e^{sQL} Q  L x_j, h_k) h_k(\hatx).

840: \end{equation}

841: where, as before, one can write $(QLe^{sQL}QLx_j, h_k)$ as $-(e^{sQL}QLx_j, QLh_k)$ when $L$ is skew-symmetric.

842: If the correlations $(e^{sQL}QLx_j,QLh_k)$  and also the correlations $(e^{sL}QLx_j,QLh_k)$ are significant only

843: over short times $s$, the approximation (\ref{QLeL}) provides an

844: acceptable approximation without requiring the solution of the

845: orthogonal dynamics equation (see \cite{stinis} for an application to the dimensional reduction of

846: the Kuramoto-Sivashinsky equation and \cite{barber} for an application to molecular dynamics).

847:

848: The limiting case of the short-memory approximation is when the correlations are delta functions. There is a large literature on solving

849: equations (\ref{eq:langevin}) with the

850: assumption of delta function memory; usually this is done without explicit mention, as if it

851: were an obvious property of stochastic systems- an astonishing state of affairs

852: nearly 40 years after Alder and Wainwright demonstrated the long memory

853: in a typical physical system \cite{a1}. All the dynamic (i.e., time-dependent)

854: renormalization group methods we can find depend on this assumption \cite{hohenberg}, and this remark goes a long way towards

855: explaining their relative lack of success in applications. We will no longer bother making

856: detailed comparisons with this dynamic renormalization literature; the point of view here is that reduction on the

857: basis of equations (\ref{eq:langevin}) is the right kind of renormalization, and anything with added drastic assumptions must be justified by appeal to that right kind.

858:

859:

860: Nevertheless, there are important circumstances where the very short memory assumption can be justified,

861: in particular in problems with separation of time scales, where the components of $\tilde\varphi(t)$,

862: the unresolved variables, vary on much faster scales than the resolved variables (see e.g. \cite{majda},\cite{stinis3}).

863: One can then set

864: \begin{equation}

865: e^{tQL}QLx_j=A_jw_j'(t),

866: \label{assume}

867: \end{equation}

868: where the prime denotes a derivative, the $w_j(t)$ are independent unit Brownian motions,

869: and the $A_j$ constants that must be derived from some prior knowledge.

870: Assume further that the projection $P$ is well represented by the physicists' ``linear" projection and that the density used to perform the projections is invariant.

871: The memory term becomes $-A_j^2\delta(t-s)$, equations (\ref{eq:langevin}) become stochastic ordinary

872: differential equations of the usual kind. As usual (see e.g. \cite{just}), the

873: corresponding probability densities can be found via Fokker-Planck formalisms (or Kolmogorov

874: equations, in mathematicians' language). Everything is easier. There is a big literature on

875: these methods which we recoil from surveying.

876:

877:

878: It is often the case that the quantities of interest are the components of $E[\hat\varphi|\hat x]$, and the corresponding projection $P$ is in general poorly approximated by the ``linear" projection. The formalism above readily extends to more general projections, with more terms in the basis chosen in the range of $P$ (see e.g. \cite{CHK3}), as long as one assumes that the temporal correlations of the new terms are fast decaying functions. Terms that have long correlation

879: times violate the ansatz (\ref{QLeL}) and can hamper rather than enhance accuracy (see e.g. \cite{stinis}). A way to pick the fast decaying terms in the projection of the memory kernel for problems that exhibit separation of time scales was presented in \cite{stinis3}. We should note here that projections which include higher than linear terms are at the heart of mode-coupling theory (see e.g. \cite{schofield}), which has proved very effective in tackling problems in condensed matter physics.

880:

881:

882:

883:

884:

885: We examine now the validity of the ansatz $e^{tQL}=e^{tL}$ for cases with slowly decaying memory. Write the memory term in the Mori-Zwanzig equation (\ref{eq:langevin}) as

886: \begin{align*}

887: \int_0^t e^{(t-s)L}PLe^{sQL}QLx_jds &=\int_0^t Le^{(t-s)L}

888: e^{sQL}QLx_jds \\

889: &-\int_0^t e^{(t-s)L}e^{sQL}QLQLx_jds ,

890: \end{align*}

891: where we have used the commutation of $L$ and $QL$ with $e^{tL}$ and $e^{sQL},$

892: respectively. At this point, make the approximation (\ref{QLeL}), which

893: eliminates

894: the $s$ dependence of both integrands and we have

895: $$\int_0^t e^{(t-s)L}PLe^{sQL}QLx_jds \cong t e^{tL} PLQLx_j.$$

896: All that remains of the integration in time is the coefficient $t$.

897: One can get rid of the noise term by premultiplying equations (\ref{eq:langevin}) by a projection $\P$, as in equation (\ref{eq:langevin_pro}), and obtain a reduced non-autonomous set of differential equations. This approximation was named the $t$-model in \cite{CHK3} (see \cite{ingerman} for an application to the dimensional reduction of a nonlinear Schr\"odinger equation). Other cases where non-Markovian models can be approximated

898: by Markovian equations with time-dependent coefficients can be found in \cite{raz}.

899:

900:

901: We proceed to examine the order of accuracy of this approximation. We have

902:

903: \begin{multline*}

904: \int_0^t e^{(t-s)L}PLe^{sQL}QLx_jds- t e^{tL} PLQLx_j = \\\

905: \int_0^t [e^{(t-s)L}PLe^{sQL}-e^{tL} PL]QLx_jds.

906: \end{multline*}

907: Adding and subtracting equal quantities we find

908:

909: $$ e^{(t-s)L}PLe^{sQL}=e^{tL}PL+e^{tL}[e^{-sL}PLe^{sQL}-PL],$$

910: and a Taylor series around $s=0$ gives

911: \begin{equation}\label{t-mod}

912: e^{-sL}PLe^{sQL}-PL =(I-sL+\ldots)PL(I+sQL+\ldots)-PL=O(s).

913: \end{equation}

914: This implies

915: $$\int_0^t e^{(t-s)L}PLe^{sQL}QLx_jds=t e^{tL} PLQLx_j + O(t^2).$$

916: The $O(t^2)$ error estimate can be put into perspective by examining an alternate derivation of the $t$-model. If we expand the integrand of

917: the memory term of the Mori-Zwanzig equation around $s=0$ and retain only

918: the leading term, we find

919: \begin{align*}

920: \int_0^t e^{(t-s)L}PLe^{sQL}QLx_jds &= \int_0^t [e^{tL}PLQLx_j

921: +O(s)]ds\\

922: &=t e^{tL} PLQLx_j +O(t^2).

923: \end{align*}

924: If we retain only the leading term, we do not keep any information about

925: the time evolution of the integrand, which in turn means

926: no information about the evolution of the resolved component and of the

927: coupling to the orthogonal dynamics (through the term

928: ($(LQe^{sQL}QLx_j,h_k)$). Such a drastic approximation is expected to be appropriate in cases where the memory term integrand is slowly decaying, so that information about its initial value is enough.

929:

930:

931:

932: As an example, consider again the Hald model whose Hamiltonian is

933: \begin{equation}

934: H(\phi) = \frac{1}{2} (\phi_1^2 + \phi_2^2 + \phi_3^2 + \phi_4^2 + \phi_1^2 \phi_3^2).

935: \end{equation}

936: The resulting equations of motion are:

937: \begin{align*}

938: \frac{d\phi_1}{dt} &= \phi_2 \nonumber \\

939: \frac{d\phi_2}{dt} &= -\phi_1(1 + \phi_3^2) \nonumber \\

940: \frac{d\phi_3}{dt} &= \phi_4 \nonumber \\

941: \frac{d\phi_4}{dt} &= -\phi_3(1 + \phi_1^2).

942: \end{align*}

943: Suppose one wants to solve only for $\hat\phi=(\phi_1,\phi_2)$, with initial data

944: $\hatx=(x_1,x_2)$. Assume the initial data $x_3,x_4$ are sampled from a canonical

945: density with temperature $T=1$. A quick calculation yields $E[x_3^2|x_1,x_2]=1/(1+x_1^2)$.

946: the advance in time described by the multiplication by $e^{tL}$ requires just the

947: substitution $\hatx\rightarrow\hat\phi$. If one commutes the nonlinear function evaluation and

948: the conditional averaging, i.e., writes $\P f(\hat\phi)=f(\P\hat\phi)$ ( a ``mean-field

949: approximation"), and writes furthemore $\Phi(t)=\P\hat\phi=E[\hat\phi|\hatx]$ one finds

950: $\P e^{tL}PLx_1=\Phi_2,\P e^{tL}PLx_2=-\Phi_1(1+1/(1+\Phi_2^2))$; one can calculate

951: $\P e^{tL}LQLx_j$ for $j=1,2$ and finally one finds:

952:

953: \begin{align}

954: \frac{d}{dt}\Phi_1 &=\Phi_2 \nonumber \\

955: \frac{d}{dt} \Phi_2 &=-\Phi_1 (1 + \frac{1}{1 + \Phi_1^2}) -

956: 2 t \frac{\Phi_1^2 \Phi_2}{(1 + \Phi_1^2)^2}.

957: \label{eq:hald_t}

958: \end{align}

959:

960: The last term represents the damping due to the loss of predictive power

961: of partial data; the coefficient of the last term increases in time and one may

962: worry that this last term eventually overpowers the equations and leads to some

963: odd behavior. This is not the case. Indeed, one can prove the following. If the system

964: one starts from, equation (\ref{eq:system}) is Hamiltonian with Hamiltonian $H$, and if the

965: initial data are sampled from an initial canonical density conditioned by partial data $\hat x$,

966: and if $\hat H$ is the renormalized Hamiltonian ( in the sense of Section \ref{ave}), then

967: $(d/dt)\hat H \le0$, showing that the components of $\hat\phi$ decay as they should.

968: The proof requires a technical assumption ( that the Hamiltonian $H$ can be written

969: as the sum of a function of $p$ and a function of $q$, a condition commonly satisfied) and

970: we omit it (see \cite{CHK3}). The reduced system (\ref{eq:hald_t}) was solved numerically in \cite{CHK3}

971: with gratifying results.

972:

973:

974: The $t$-model is the zero-th order term in a Taylor expansion (around $s=0$) of the integrand of the memory term in (\ref{eq:langevin}). However, nothing prevents us from keeping more terms in this expansion. Let $$K(\hat{\varphi}(t-s),s)=e^{(t-s)L}PLe^{sQL}QLx_j$$ and expand $K$ around $s=0$, i.e. $$K(\hat{\varphi}(t-s),s)=K(\hat{\varphi}(t),0)+s\frac{\partial K}{\partial s}|_{s=0}+\frac{1}{2}s^2 \frac{\partial^2 K}{\partial s^2}|_{s=0}+O(s^3).$$ In the case when $P$ is the finite-rank projection and the density used to define the projection is invariant, the derivatives of $K$ at $s=0$ are equal-time (static) correlations. In mode-coupling theory, such expressions are known as sum rules. One can assume a functional form for the memory term integrand around $s=0$, e.g. a Gaussian $a e^{-bs^2},$ and use the derivatives of $K$ at $s=0$ to estimate $a,b$ (see \cite{pomeau} for more on sum rules and mode-coupling theory).

975:

976:

977:

978:

979:

980: \section{Intermediate-range memory}\label{long}

981:

982: There are intermediate cases where the memory is sufficiently long-range for the short-memory approximation to break down, yet not so slowly decaying that the $t$-model can give accurate results. At present, it is not known how to deal effectively with such cases. In a series of papers \cite{CHK}-\cite{CHK3} we presented special cases and their solutions. In particular in \cite{CHK3} we presented a detailed analysis of the

983: Hald system. We showed that the memory decays roughly at the same rate as the solution itself (

984: this is the general case in the absence of separation of scales). We expanded the various correlation functions at equilibrium (i.e., when there are no resolved variables) in Hermite

985: polynomials, evaluated the coefficients in the expansions by Monte-Carlo once and for all, and then obtained

986: a system of integro-differential approximations to equations (\ref{eq:langevin}) which we then solved

987: in various cases. This is a legitimate procedure which may be useful when the same system of equations has to be

988: solved repeatedly.

989: These calculations do exhibit a salient feature of model reduction in time-dependent problems, which is that its set-up costs are often very high.

990: The future remedy, if there is one, will surely lie in a deeper understanding of dynamical renormalization and in particular of the

991: way memory depends on scale.

992:

993:

994: \section{Acknowledgements} We would like to thank Prof. G.I. Barenblatt, Prof. O. Hald and Prof. R. Kupferman for many helpful

995: discussions and comments.

996: This work was supported in part by

997: the National Science Foundation under Grant DMS 04-32710, and by the Director,

998: Office of Science, Computational and Technology Research,

999: U.S.\ Department of Energy under Contract No.\ DE-AC03-76SF000098.

1000:

1001:

1002: \begin{thebibliography}{99}

1003:

1004: \bibitem{a1} B. Alder and T. Wainwright, Decay of the velocity correlation function,

1005: Phys. Rev. A 1, (1970), pp. 1-12.

1006:

1007: \bibitem{barber}

1008: J. Barber, Application of optimal prediction to molecular dynamics, PhD thesis, 2005,

1009: UC Berkeley Physics Dept.

1010:

1011:

1012: \bibitem{barenblatt}

1013: G.I. Barenblatt, Scaling. Cambridge University Press, Cambridge, 2002.

1014:

1015:

1016: \bibitem{barenblatt3}

1017: G.I. Barenblatt, M. Ivanov, and G.I. Shapiro,

1018: On the structure of wave fronts in nonlinear dissipative media.

1019: Arch. Rat. Mech. Anal. 87 (1985), pp. 293-303.

1020:

1021: \bibitem{benettin}

1022: G. Benettin, C. di Castro, G. Jona-Lasinio, L. Peliti and A. Stella,

1023: On the equivalence of different renormalization groups,

1024: in "New developements in quantum theory and statistical mechanics",

1025: Cargese Conf. Theor. Physics, M. Levy and P. Mitter (eds), Springer, NY,

1026: (1976).

1027:

1028: \bibitem{benfatto}

1029: G. Benfatto and G. Gallavotti, Renormalization group,

1030: Physics notes Vol. 1, Princeton University Press, Princeton NJ (1995).

1031:

1032:

1033:

1034:

1035: \bibitem{bona}

1036: J. Bona and M. Schonbek, Travelling-wave solutions to the Korteveg-de Vries-Burgers

1037: equation. Proc. Roy. Soc. Edinburgh 101A (1985), pp. 207-226.

1038:

1039: \bibitem{brandt}A. Brandt and D. Ron, Renormalization Multigrid (RMG):

1040: Statistically Optimal Renormalization Group Flow and Coarse-to-Fine Monte Carlo Acceleration, J. Stat. Phys. (2001) 102, 1-2,

1041: 231-257.

1042:

1043: \bibitem{kevrekidis2}

1044: L. Chen, P. Debenedetti, C. Gear and I. Kevrekidis, From molecular dynamics to coarse self-similar

1045: solutions: a simple example using equation-free computation, J. Non-Newt. Fluid. Mech. (2004), 120, 215.

1046:

1047: \bibitem{chorin9}

1048: A.J. Chorin,

1049: Conditional expectations and renormalization, Multiscale Modeling and

1050: Simulation,  1 (2003) pp. 105-118.

1051:

1052: \bibitem{chorin10}

1053: A.J. Chorin,

1054: Averaging and renormalization for the Korteveg-deVries-Burgers equation,

1055: Proc. Nat. Acad. Sci. 100, (2003), pp. 9674-9679.

1056:

1057: \bibitem{c11}

1058: A.J. Chorin, Stochastic Tools for Mathematics and Science, American Math. Society, Providence RI (2005).

1059:

1060: \bibitem{CHK}

1061: A.J. Chorin, O. Hald and R. Kupferman,

1062: Optimal prediction and the Mori-Zwanzig representation of irreversible

1063: processes. Proc. Nat. Acad. Sc. USA, 97, (2000),

1064: pp. 2968-2973.

1065:

1066: \bibitem{CHK2}

1067: A.J. Chorin, O. Hald and R. Kupferman,

1068: Non-Markovian optimal prediction, Monte-Carlo Meth. Appl.,7, (2001), pp. 99-109.

1069:

1070:

1071: \bibitem{CHK3}

1072: A.J. Chorin, O. Hald and R. Kupferman,

1073: Optimal prediction with memory,

1074: Physica D 166, (2002), pp. 239-257.

1075:

1076: \bibitem{CHK04}

1077: A.J. Chorin, O. Hald and R. Kupferman,

1078: Prediction from partial data, renormalization and averaging, J. Sci. Comp. (2005), (in press).

1079:

1080: \bibitem{CKK1}

1081: A.J. Chorin, A. Kast and R. Kupferman,

1082: Optimal prediction of underresolved dynamics, Proc. Nat. Acad. Sci. USA (1998), 95, 4094.

1083:

1084: \bibitem{CKL}

1085: A.J. Chorin, R. Kupferman and D. Levy

1086: Optimal prediction for Hamiltonian partial differential equations, J. Comp. Phys. (2000), 162, pp. 267-297.

1087:

1088:

1089: \bibitem{evans}

1090: D. Evans and G. Morriss, Statistical Mechanics of Nonequilibrium Liquids,

1091: Academic, London, 1990.

1092:

1093:

1094: \bibitem{fick}

1095: E. Fick and G. Sauerman, The Quantum Statistics of Dynamical Processes,

1096: Springer, Berlin, 1990.

1097:

1098:

1099: \bibitem{fisher}

1100: M. Fisher, Renormalization group theory, its basis and formulation in statistical physics,

1101: Rev. Mod. Phys., 70, (1998), pp. 653-681.

1102:

1103:

1104:

1105: \bibitem{givon1}

1106: D. Givon, R. Kupferman and A. Stuart, Extracting macroscopic dynamics: model problems and

1107: algorithms, Nonlinearity 17 (2004), pp. R55-R127.

1108:

1109:

1110: \bibitem{goldenfeld1}

1111: N. Goldenfeld, Lectures on Phase Transitions and the Renormalization Group,

1112: Perseus Books, Reading, Mass., 1992.

1113:

1114:

1115: \bibitem{grabert}

1116: H. Grabert, Projection Operator Techniques in Nonequilibrium Statistical

1117: Mechanics, Springer, Berlin, 1982.

1118:

1119:

1120:

1121: \bibitem{hohenberg}

1122: P. Hohenberg and B. Halperin, Theory of dynamical critical phenomena, Rev. Mod. Phys., 49,

1123: (1977), pp. 435-479.

1124:

1125:

1126: \bibitem{ingerman}

1127: E. Ingerman, Modeling the loss of information in optimal prediction, PhD thesis, 2003,

1128: UC Berkeley Mathematics Dept.

1129:

1130:

1131: \bibitem{jl}  G. Jona-Lasinio,  The renormalization group- a probabilistic view,

1132: Nuovo Cimento, 26 (1975), pp. 99-118.

1133:

1134:

1135:

1136: \bibitem{just}

1137: W. Just, H. Kantz, C. Roedenbeck and M. Helm,

1138: Stochastic modeling: replacing the fast degrees of freedom by noise,

1139: J. Phys. A: Math. Gen. 34 (2001), pp. 3199-3213.

1140:

1141: \bibitem{kadanoff}

1142: L. Kadanoff, Statistical Physics: Statics, Dynamics, and Renormalization,

1143: World Scientific, Singapore, 2000.

1144:

1145:

1146:

1147: \bibitem{raz}

1148: R. Kupferman, Fractional kinetics in Kac-Zwanzig heat bath models, J. Stat. Phys. 114 (2004),

1149: pp. 291-326.

1150:

1151:

1152: \bibitem{landau}

1153: L. Landau and E.M. Lifshitz, Statistical Physics, Part 1, Butterworth-Heinemann, 1980.

1154:

1155: \bibitem{moser}

1156: J. Langford and R. Moser, Optimal LES formulations for isotropic turbulence, J. Fluid. Mech.

1157: (1999) 398, pp. 321-346.

1158:

1159: \bibitem{majda}

1160: A. Majda, I. Timofeyev and E. Vanden Eijnden,

1161: A mathematical framework for stochastic climate models, Comm. Pure

1162: Appl. Math., 54 (2001), pp. 891-974.

1163:

1164: \bibitem{mori}

1165: H. Mori, Transport, collective motion and Brownian motion, Prog. Theor. Phys. (1965) 33,

1166: pp. 423-450.

1167:

1168: \bibitem{zwanzig2}

1169: S. Nordholm and R. Zwanzig, A systematic derivation of exact generalized Brownian

1170: motion theory, J. Stat. Phys., (1975) 13(4), pp. 347-371.

1171:

1172:

1173: \bibitem{p} G. Papanicolaou, Asymptotic analysis

1174: of stochastic equations, Studies in Probability Theory, vol 18 Studies in Mathematics

1175: M. Rosenblatt  (Ed.), Math. Assoc. Am. (1978).

1176:

1177: \bibitem{pomeau}

1178: Y. Pomeau and P. Resibois, Time dependent correlation functions and mode-mode coupling theories, Physics Reports C (1975) 2, pp. 63-139.

1179:

1180:

1181: \bibitem{seibold}

1182: B. Seibold, Optimal prediction in molecular dynamics, Monte Carlo Meth. Appl. (2004), 10,1, pp. 25-50.

1183:

1184: \bibitem{stanley}

1185: H. E. Stanley, Scaling, universality and renormalization, three pillars of modern

1186: critical phenomena, Rev. Mod. Phys., 71 (1999), pp. S358- S366.

1187:

1188:

1189: \bibitem{stinis}

1190: P. Stinis, Stochastic optimal prediction for the Kuramoto-Sivashinsky

1191: equation, Mult. Scale. Simul. 5 (2004), pp. 580-612.

1192:

1193: \bibitem{stinis2}

1194: P. Stinis, A maximum likelihood algorithm for the estimation and renormalization of

1195: exponential densities, J. Comp. Phys. (2005) (in press).

1196:

1197: \bibitem{stinis3}

1198: P. Stinis, A comparative study of two stochastic mode reduction methods, Physica D (2004) (submitted).

1199:

1200:

1201: \bibitem{swendsen}

1202: R. Swendsen, Monte-Carlo renormalization group, Phys. Rev. Lett. 42 (1979),

1203: pp. 859-861.

1204:

1205: \bibitem{kevrekidis}

1206: K. Theodoropoulos, Y.-H. Qian and I.G. Kevrekidis, "Coarse" stability and bifurcation

1207: analysis using timesteppers: a reaction diffusion example, Proc. Natl. Acad. Sci. (2000), 97(18),

1208: pp. 9840-9843.

1209:

1210: \bibitem{schofield}

1211: R. van Zon and J. Schofield, Mode-coupling theory for multiple-point and multiple-time correlation functions, Phys. Rev. E (2002) 65, 011106.

1212:

1213: \bibitem{zwanzig}

1214: R. Zwanzig, Nonlinear generalized Langevin equations, J. Stat. Phys., 9, (1973),

1215: pp. 215-220.

1216:

1217:

1218:

1219:

1220:

1221:

1222:

1223:

1224: \end{thebibliography}

1225: \end{document}

1226: