0806:0806.4789/oic.tex

1: % The Organization of Intrinsic Computation: Complexity-Entropy Diagrams

2: % dpf: 12/06/02

3: % jpc: 12/11/02, 12/14/02

4: %  cm: 12/18/02

5: % dpf: 12/11/03, dynamics, CAs, Ising sections only

6: % jpc: 2/11/04, 6/18/04

7: % dpf: 8/22/04

8: % jpc: 2/05/06

9: % dpf: 2/09/06

10: % jpc: 3/04/06

11: % dpf: 3/07/08:  edits throughout.  updated Ising figures.  removed

12: %                comp mech discussion

13: % jpc: 3/19/08, 5/14/08, 6/23/08, sumbitted to CHAOS

14:

15: \documentclass[superscriptaddress,twocolumn,showpacs,preprintnumbers,floatfix]{revtex4}

16: %\documentclass[12pt]{article}

17:

18: \usepackage{url}

19: \usepackage{graphics}

20: \usepackage{amsfonts}

21: \usepackage{amsmath}

22: \usepackage{amssymb}

23: \usepackage{epsf}

24:

25: \input{cmechabbrev.tex}

26:

27: %\oddsidemargin=0.0in

28: %\evensidemargin=0.0in

29:

30: %\topmargin=-0.16in

31: % NOTE:  dpf needs this or else the text is truncated when printing

32:

33: %\textwidth=6.5in

34: %\textheight=8.74in

35: %\headsep=0in

36: %\parindent=.5in

37: %\footskip=.20in

38:

39: \newcommand{\mc}[1]{\mathcal{#1}}

40:

41: \begin{document}

42:

43: \title{The Organization of Intrinsic Computation:\\

44: Complexity-Entropy Diagrams and\\

45: the Diversity of Natural Information Processing}

46:

47: \author{David P. Feldman}

48: \email{dave@hornacek.coa.edu}

49: \affiliation{College of the Atlantic, Bar Harbor, MA 04609}

50: \affiliation{Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501}

51: \affiliation{Complexity Sciences Center and Physics Department,

52: University of California, Davis, One Shields Ave, Davis CA 95616}

53:

54: \author{Carl S. McTague}

55: \email{c.mctague@dpmms.cam.ac.uk}

56: \affiliation{DPMMS, Centre for Mathematical Sciences,

57: University of Cambridge, Wilberforce Road, Cambridge, CB3 0WB, England}

58: \affiliation{Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501}

59:

60: \author{James P. Crutchfield}

61: \email{chaos@cse.ucdavis.edu}

62: \affiliation{Complexity Sciences Center and Physics Department,

63: University of California, Davis, One Shields Ave, Davis CA 95616}

64: \affiliation{Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501}

65:

66: \date{\today}

67:

68: \begin{abstract}

69: Intrinsic computation refers to how dynamical systems store,

70: structure, and transform historical and spatial information. By

71: graphing a measure of structural complexity against a measure of

72: randomness, complexity-entropy diagrams display the range and

73: different kinds of intrinsic computation across an entire class of

74: system.  Here, we use complexity-entropy diagrams to analyze intrinsic

75: computation in a broad array of deterministic nonlinear and linear

76: stochastic processes, including maps of the interval, cellular

77: automata and Ising spin systems in one and two dimensions, Markov

78: chains, and probabilistic minimal finite-state machines. Since

79: complexity-entropy diagrams are a function only of observed

80: configurations, they can be used to compare systems without reference

81: to system coordinates or parameters. It has been known for some time

82: that in special cases complexity-entropy diagrams reveal that high

83: degrees of information processing are associated with phase

84: transitions in the underlying process space, the so-called ``edge of

85: chaos''. Generally, though, complexity-entropy diagrams differ

86: substantially in character, demonstrating a genuine diversity of

87: distinct kinds of intrinsic computation.

88: \end{abstract}

89:

90: % Insert PACS numbers on next line

91: \pacs{

92: %02.50.Ey  % Stochastic processes

93: %02.50.Ga % Markov processes

94: 05.20.-y % Classical statistical mechanics

95: 05.45.-a  % Nonlinear dynamics and nonlinear dynamical systems

96: %05.45.Tp  % Time series analysis

97: %65.40.Gr % Thermodynamics of solids: Entropy and other thermodynamical

98:           % quantities

99: 89.70.+c % Information science

100: 89.75.Kd  % Complex Systems: Patterns

101: }

102:

103: \preprint{Santa Fe Institute Working Paper 08-06-XXX}

104: \preprint{arxiv.org:0806.XXXX [nlin.CD]}

105:

106: \keywords{structure, randomness, intrinsic computation, excess entropy,

107: entropy rate, statistical complexity, dynamical systems, spin systems,

108: cellular automata, epsilon-machines}

109:

110: \maketitle

111:

112: \bibliographystyle{unsrt}

113:

114: %\tableofcontents

115:

116: %\section*{}

117:

118: {\bf

119: Discovering organization in the natural world is one of science's central

120: goals. Recent innovations in nonlinear mathematics and physics, in concert

121: with analyses of how dynamical systems store and process information,

122: has produced a growing body of results on quantitative ways to measure

123: natural organization. These efforts had their origin in earlier investigations

124: of the origins of randomness. Eventually, however, it was realized that

125: measures of randomness do not capture the property of organization. This

126: led to the recent efforts to develop measures that are, on the one hand,

127: as generally applicable as the randomness measures but which, on the other,

128: capture a system's complexity---its organization, structure, memory, regularity,

129: symmetry, and pattern. Here---analyzing processes from dynamical systems,

130: statistical mechanics, stochastic processes, and automata theory---we

131: show that measures of structural complexity are a necessary and useful

132: complement to describing natural systems only in terms of their randomness.

133: The result is a broad appreciation of the kinds of information processing

134: embedded in nonlinear systems. This, in turn, suggests new physical substrates

135: to harness for future developments of novel forms of computation.

136: }

137:

138: \section{Introduction}

139:

140: The past several decades have produced a growing body of work on ways

141: to measure the organization of natural systems. (For early work, see,

142: e.g., Refs.~\cite{Crut83a,Shaw84,Wolf84,Benn86,Hube86,Gras86,Szep86,Erik87,Kopp87,Land88a,Lloy88,Lind88b,Szep89a,Crut89,Benn90,Crut90,Badi91a,Li91,Crut92c,Bate93a};

143: for more recent reviews, see

144: Refs.~\cite{Wack94,Ebel97b,Badi97,Feld98a,Feld98b,Bial01a,Crut03a,Shal01a}.)

145: The original interest derived from explorations, during the 60's to

146: the mid-80's, of behavior generated by nonlinear dynamical systems.

147: The thread that focused especially on pattern and structural complexity

148: originated, in effect, in attempts to reconstruct geometry \cite{Pack80},

149: topology \cite{Muld93a}, equations of motion \cite{Crut87a}, periodic orbits

150: \cite{Auer87a}, and stochastic processes \cite{Fras90b} from observations

151: of nonlinear processes.  More recently, developing

152: and using measures of complexity has been a concern of researchers

153: studying neural computation \cite{Tono94a,Wenn05a}, the clinical

154: analysis of patterns from a variety of medical signals and imaging

155: technologies \cite{Sapa98a,Marw02a,Youn05a}, and machine learning and

156: synchronization \cite{Bial00a,Neme00a,Crut01b,Debo04a,Feld04a},

157: to mention only a few contemporary applications.

158:

159: These efforts, however, have their origin in an earlier period in

160: which the central concern was not the emergence of organization,

161: but rather the origins of randomness. Specifically, measures were

162: developed and refined that quantify the degree of randomness and

163: unpredictability generated by dynamical systems. These

164: quantities---metric entropy, Lyapunov characteristic exponents,

165: fractal dimensions, and so on---now provide an often-used and well

166: understood set of tools for detecting and quantifying deterministic

167: chaos of various kinds. In the arena of stochastic processes,

168: Shannon's entropy rate predates even these and has been productively

169: used for half a century as a measure of an information source's degree

170: of randomness or unpredictability \cite{Cove91}.

171:

172: Over this long early history, researchers came to appreciate that

173: dynamical systems were capable of an astonishing array of behaviors

174: that could not be meaningfully summarized by the entropy rate or

175: fractal dimension.  The reason for this is that, by their definition,

176: these measures of randomness do not capture the property of

177: organization. This realization led to the considerable contemporary

178: efforts just cited to develop measures that are as generally

179: applicable as the randomness measures but that capture a system's

180: complexity---its organization, structure, memory, regularity,

181: symmetry, pattern, and so on.

182:

183: Complexity measures which do this are often referred to as {\em

184: statistical} or {\em structural complexities} to indicate that they

185: capture a property distinct from randomness. In contrast, {\em

186: deterministic complexities}---such as the Shannon entropy rate,

187: Lyapunov characteristic exponents, and the Kolmogorov-Chaitin

188: complexity---are maximized for random systems. In essence, they are

189: simply alternatives to measuring the same property---degrees of

190: randomness. Here, we shall emphasize complexity of the structural

191: and statistical sort which measures a property complementary to

192: randomness.  We will demonstrate, across a broad range of model

193: systems, that measures of structural complexity are a necessary and

194: useful addition to describing a process in terms of its randomness.

195:

196:

197: \subsection{Structural Complexity}

198:

199: How might one go about developing a structural complexity measure? A typical

200: starting point is to argue that that the structural complexity of a system

201: must reach a maximum between the system's perfectly ordered and perfectly

202: disordered extremes

203: \cite{Crut82b,Hube86,Gras86,Benn90,Crut89,Crut92c,Kopp87,Gell96}.

204: The basic idea behind these claims is that a system which is either

205: perfectly predictable (e.g., a periodic sequence) or perfectly

206: unpredictable (e.g., a fair coin toss) is deemed to have zero

207: structural complexity. Thus, the argument goes, a system with either

208: zero entropy or maximal entropy (usually normalized to one), has zero

209: complexity; these systems are simple and not highly structured. This

210: line of reasoning further posits that in between these extremes lies

211: complexity.  Those objects that we intuitively consider to be complex

212: must involve a continuous element of newness or novelty (i.e.,

213: entropy), but not to such an extent that the novelty becomes

214: completely unpredictable and degenerates into mere noise.

215:

216: In summary, then, it is common practice to require that a structural

217: complexity measure vanish in the perfectly ordered and perfectly

218: disordered limits. Between these limits, the complexity is usually

219: assumed  to achieve a maximum.  These requirements are often

220: taken as axioms from which one constructs a complexity measure

221: that is a single-valued function of randomness as measured by, say, entropy.

222: In both technical and popular scientific literatures, it is not uncommon

223: to find a ``complexity'' plotted against entropy in merely schematic form

224: as a sketch of a generic complexity function that vanishes for extreme

225: values of entropy and achieves a maximum in a middle region

226: \cite{Hube86,Atla91a,Gell94a,Flak99a}.  Several authors, in fact, have taken

227: these as the \emph{only} constraints defining complexity

228: \cite{Shin99,Lope95,Plas96,Calb01a,Lope01a}.

229:

230: Here we take a different approach: {\em We do not prescribe how

231: complexity depends on entropy.} One reason for this is that a useful

232: complexity measure needs to do more than satisfy the boundary

233: conditions of vanishing in the high- and low-entropy limits

234: \cite{Feld98a,Crut00a,Bind00}. In particular, a useful complexity

235: measure should have an unambiguous interpretation that accounts in

236: some direct way for how correlations are {\em organized} in a system.

237: To that end we consider a well defined and frequently used

238: complexity measures---the \emph{excess entropy}---and empirically

239: examine its relationship to entropy for a variety of systems.

240:

241:

242: \subsection{Complexity-Entropy Diagrams}

243:

244: The diagnostic tool that will be the focal point for our studies is

245: the {\em complexity-entropy diagram}. Introduced in

246: Ref.~\cite{Crut89}, a complexity-entropy diagram plots structural

247: complexity (vertical axis) versus randomness (horizontal axis) for

248: systems in a given model class. Complexity-entropy diagrams

249: allow for a direct view of the complexity-entropy relationship within

250: and across different systems. For example, one can easily read whether

251: or not complexity is a single-valued function of entropy.

252:

253: The complexity and entropy measures that we use capture a system's

254: {\em intrinsic computation} \cite{Crut92c}: how a system stores,

255: organizes, and transforms information.  A crucial point is that these

256: measures of intrinsic computation are properties of the system's

257: configurations. They do not require knowledge of the equations of

258: motion or Hamiltonian or of system parameters (e.g., temperature,

259: dissipation, or spin-coupling strength) that generated the configurations.

260: Hence, in addition to the many cases in which they can be calculated

261: analytically, they can be inductively calculated from observations of

262: symbolic sequences or configurations.

263:

264: Thus, a complexity-entropy diagram measures intrinsic computation

265: in a parameter-free way.  This allows for the direct comparison of

266: intrinsic computation across very different classes since a

267: complexity-entropy diagram expresses this in terms of common

268: ``information-processing'' coordinates.  As such, a complexity-entropy

269: diagram demonstrates how much a given resource (e.g., stored

270: information) is required to produce a given amount of randomness

271: (entropy), or how much novelty (entropy) is needed to produce a

272: certain amount of statistical complexity.

273:

274: Recently, a form of complexity-entropy diagram has been used in the

275: study of anatomical MRI brain images \cite{Youn05a,Youn08a}.  This

276: work showed that complexity-entropy diagrams give a reliable way

277: to distinguish between ``normal'' brains and those experiencing

278: cortical thinning, a condition associated with Alzheimer's disease.

279: Complexity-entropy diagrams have also recently been used as part of a

280: proposed test to distinguish chaos from noise \cite{Ross07a}.  And

281: Ref.~\cite{Mart06a} calculates complexity-entropy diagrams for a

282: handful of different complexity measures using the sequences generated

283: by the symbolic dynamics of various chaotic maps.

284:

285: Historically, one of the motivations behind complexity-entropy diagrams was

286: to explore the common claim that complexity achieves a {\em sharp} maximum

287: at a well defined boundary between the order-disorder extremes. This led,

288: for example, to the widely popularized notion of the ``edge of chaos''

289: \cite{Pack88a,Kauf93a,Lang90a,Wald92a,Ray94a,Melb00a,Bert03a,Bert04a}---namely,

290: that objects achieve maximum complexity at a \emph{boundary} between

291: order and disorder.

292: Although these particular claims have been criticized \cite{Mitc93}, during

293: the same period it was shown that at the \emph{onset of chaos} complexity

294: does reach a maximum. Specifically, Ref.~\cite{Crut89} showed that the

295: \emph{statistical complexity} diverges at the accumulation point of the

296: period-doubling route to chaos. This led to an analytical theory that

297: describes exactly the interdependence of complexity and entropy for this

298: universal route to chaos \cite{Crut90}. Similarly, another complexity measure,

299: the \emph{excess entropy} \cite{Crut83a,Shaw84,Gras86,Lind88b,Li91,Crut03a,Rate96a,Freu96a,Schu02a}

300: has also been shown to diverge at the period-doubling critical point.

301:

302: This latter work gave some hope that there would be a universal relationship

303: between complexity and entropy---that some appropriately defined measure of

304: complexity plotted against an appropriate entropy would have the same

305: functional form for a wide variety of systems. In part, the motivation for

306: this was the remarkable success of scaling and data collapse for critical

307: phenomena.  Data collapse is a phenomena in which certain variables for very

308: different systems collapse onto a single curve when appropriately rescaled

309: near the critical point of a continuous phase transition. For example, the

310: magnetization and susceptibility exhibit data collapse near the

311: ferromagnet-paramagnet transition.  See, for example,

312: Refs.~\cite{Stan99a,Yeom92} for further discussion.

313: Data collapse reveals that different

314: systems---e.g., different materials with different critical

315: temperatures---possess a deep similarity despite differences in their details.

316:

317: The hope, then, was to find a similar universal curve for complexity as

318: a function of entropy. One now sees that this is not and, fortunately, cannot

319: be the case. Notwithstanding special parametrized examples, such as

320: period-doubling and other routes to chaos, a wide range of complexity-entropy

321: relationships exists \cite{Crut92c,Li91,Crut97a,Feld98a}. This is a point

322: that we will repeatedly reinforce in the following.

323:

324:

325: \subsection{Surveying Complexity-Entropy Diagrams}

326:

327: We will present a survey of the relationships between structure and

328: randomness for a number of familiar, well studied systems including

329: deterministic nonlinear and linear stochastic processes and well known

330: models of computation. The systems we study include maps of the

331: interval, cellular automata and Ising models in one and two

332: dimensions, Markov chains, and minimal finite-state machines. To our

333: knowledge, this is the first such cross-model survey of

334: complexity-entropy diagrams.

335:

336: The main conclusion that emerges from our results is that there is a large

337: range of possible complexity-entropy behaviors. Specifically, there is

338: not a universal complexity-entropy curve, there is not a general

339: complexity-entropy transition, nor is it case that complexity-entropy

340: diagrams for different systems are even qualitatively similar. These results

341: give a concrete picture of the very different types of relationship between

342: a system's rate of information production and the structural organization

343: which produces that randomness. This diversity opens up a number

344: of interesting mathematical questions, and it appears to suggest a new

345: kind of richness in nature's organization of intrinsic computation.

346: %and to bode well for the future creation of new information

347: %technologies.

348:

349: Our exploration of intrinsic computation is structured as follows: In

350: Section \ref{Info.Theory.Review} we briefly review several

351: information-theoretic quantities, most notably the entropy rate and

352: the excess entropy. In Section \ref{Comp.Ent.Section} we present results

353: for the complexity-entropy diagrams for a wide range of model systems.

354: %: the

355: %logistic map, the tent map, one- and two-dimensional Ising models; one-

356: %and two-dimensional cellular automata; Markov chains; and topological

357: %Markov processes, which are related to finite-state models from computation

358: %theory.

359: In Section \ref{Discussion} we discuss our results, make a number

360: of general comments and observations, and conclude by summarizing.

361:

362:

363: % **********************************************************************

364: \section{Entropy and Complexity Measures}

365: \label{Info.Theory.Review}

366:

367: \subsection{Information-Theoretic Quantities}

368:

369: The complexity-entropy diagrams we will examine make use of two

370: information-theoretic quantities: the excess entropy and the entropy

371: rate.  In this section we fix notation and give a brief but

372: self-contained review of them.

373:

374: We begin by describing the stochastic process generated by a system.

375: Specifically, we are interested here in describing the character of

376: bi-infinite, one-dimensional sequences:

377: $\BiInfinity = \ldots,  S_{-2}, S_{-1}, S_0, S_1, \ldots$, where

378: the $S_i$'s are random variables that assume values $s_i$ in a

379: finite alphabet $\mathcal{A}$. Throughout, we follow the standard

380: convention that a lower-case letter refers to a particular value of

381: the random variable denoted by the corresponding upper-case letter.

382: In the following, the index $i$ on the $S_i$'s will refer to either

383: space or time.

384:

385: A \emph{process} is, quite simply, the distribution over all possible

386: sequences generated by a system: $\Prob(\BiInfinity)$. Let $\Prob(s_i^L)$

387: denote the probability that a block $S_i^L = S_i S_{i+1} \ldots S_{i+L-1}$

388: of $L$ consecutive symbols takes on the particular values

389: $s_i, s_{i+1}, \ldots , s_{i+L-1} \in \mathcal{A}$. We will assume that

390: the distribution over blocks is stationary: $\Prob(S_i^L) =

391: \Prob(S_{i+M}^L)$ for all $i$, $M$, and $L$. And so we will drop the

392: index on the block probabilities.  When there is no confusion, then,

393: we denote by $s^L$ a particular sequence of $L$ symbols, and use

394: $\Prob(s^L)$ to denote the probability that the particular $L$-block

395: occurs.

396:

397: The \emph{support} of a process is the set of allowed sequences---i.e.,

398: those with positive probability. In the parlance of computation theory,

399: a process' support is a formal language: the set of all finite length

400: words that occur at least once in an infinite sequence.

401:

402: A special class of processes that we will consider in subsequent

403: sections are {\em Order-$R$ Markov Chains}.  These processes are those

404: for which the joint distribution can be conditionally factored into

405: words $S^R$ of length $R$---that is,

406: \begin{equation}

407: \Prob(\BiInfinity) = \ldots \Prob(S_i^R|S_{i-R}^R)

408:   \Prob(S_{i+R}^R|S_i^R) \Prob(S_{i+2R}^R|S_{i+R}^R) \ldots \;.

409: \label{R.Markovian}

410: \end{equation}

411: In other words, knowledge of the current length-$R$ word is all that

412: is needed to determine the distribution of future symbols.  As a

413: result, the states of the Markov chain are associated with the

414: $\mathcal{A}^R$ possible values that can be assumed by a length-$R$

415: word.

416:

417: We now briefly review several central quantities of information theory

418: that we will use to develop measures of unpredictability and entropy.

419: For details see any textbook on information theory; e.g.,

420: Ref.~\cite{Cove91}.  Let $X$ be a random variable that assumes the

421: values $x \in {\cal X}$, where ${\cal X}$ is a finite set. The probability

422: that $X$ assumes the value $x$ is given by $\Prob(x)$. Also, let $Y$

423: be a random variable that assumes values $y\in {\cal Y}$.

424:

425: The \emph{Shannon entropy} of the variable $X$ is given by:

426: \begin{equation}

427: H[X] \, \equiv \,  - \sum_{x \in {\cal X}} \Prob(x) \log_2 {\rm

428: P}(x) \; .

429: \end{equation}

430: The units are given in \emph{bits}. This quantity measures the uncertainty

431: associated with the random variable $X$. Equivalently, $H[X]$ is also

432: the average amount of memory needed to store outcomes of variable $X$.

433:

434: The \emph{joint entropy} of two random variables, X and Y, is defined as:

435: \begin{equation}

436: H[X,Y] \, \equiv \, -\sum_{x \in {\cal X}, y \in {\cal Y} } \Prob(x,y)

437: \log_2 \Prob(x,y) \;.

438: \end{equation}

439: It is a measure of the uncertainty associated with the joint distribution

440: $\Prob(X,Y)$. The \emph{conditional entropy} is defined as:

441: \begin{equation}

442: H[X|Y]  \, \equiv \,- \sum_{x \in {\cal X}, y \in {\cal Y} }

443: \Prob(x,y) \log_2  \Prob(x|y) \;,

444: \end{equation}

445: and gives the average uncertainty of the conditional probability

446: $\Prob(X|Y)$. That is, $H[X|Y]$ tells us how uncertain, on

447: average, we are about $X$, given that the outcome of $Y$ is known.

448:

449: Finally, the \emph{mutual information} is defined as:

450: \begin{equation}

451: I [X;Y]  \, \equiv \, H[X] - H[X|Y] \;.

452: \label{MI.def}

453: \end{equation}

454: It measures the average reduction of uncertainty

455: of one variable due to knowledge of another.  If knowing $Y$ on

456: average reduces uncertainty about $X$, then it makes sense to say that

457: $Y$ carries information about $X$.  Note that $I[X;Y] = I[Y;X]$.

458:

459: \subsection{Entropy Growth and Entropy Rate}

460:

461: With these definitions set, we are ready to develop an information-theoretic

462: measure of a process's randomness. Our starting point is to consider blocks

463: of consecutive variables. The \emph{block entropy} is the total Shannon

464: entropy of length-$L$ sequences:

465: \begin{equation}

466: H(L) \, \equiv \, - \sum_{ s^L \in {\cal A}^L} \Prob(s^L)

467:   \log_2 \Prob(s^L) \;,

468: \label{H.def}

469: \end{equation}

470: where $L > 0$. The sums run over all possible blocks of

471: length $L$. We define $H(0) \equiv 0$. The block entropy grows monotonically

472: with block length: $H(L) \geq H(L-1)$.

473:

474: For stationary processes the total Shannon entropy typically grows linearly

475: with $L$.  That is, for sufficiently large $L$, $H(L) \sim L$. This leads

476: one to define the \emph{entropy rate} $\hmu$ as:

477: \begin{equation}

478:    \hmu \, \equiv \,\lim_{L \rightarrow \infty} \frac{H(L)}{L }\;.

479: \label{hmu.def}

480: \end{equation}

481: The units of $\hmu$ are \emph{bits per symbol}.

482: This limit exists for all stationary sequences \cite[Chapter

483:   4.2]{Cove91}. The entropy rate is also know as the \emph{metric

484:   entropy} in dynamical systems theory and is equivalent to the

485: \emph{thermodynamic entropy density} familiar from equilibrium

486: statistical mechanics.

487:

488: The entropy rate can be given an additional interpretation as

489: follows.  First, we define an $L$-dependent entropy rate estimate:

490: \begin{eqnarray}

491: \hmu(L)  &\, = \,& H(L) - H(L\!-\!1) \\

492:  & \, = \, & H[S_L|S_{L-1}, S_{L-2} , \ldots , S_1] \;, \;\; L >0 \;.

493: \label{hmu.L.def}

494: \end{eqnarray}

495: We set $\hmu(0) = \log_2 |{\cal A}|$. In words, then, $\hmu(L)$

496: is the average

497: uncertainty of the next variable $S_L$, given that the previous

498: $L\!-\!1$ symbols have been seen.  Geometrically, $h_{\mu}(L)$ is the

499: two-point slope of the total entropy growth curve $H(L)$.  Since

500: conditioning on more variables can never increase the entropy, it

501: follows that $\hmu(L) \leq \hmu(L-1)$.  In the $L \rightarrow \infty$

502: limit, $\hmu(L)$ is equal to the entropy rate defined above in

503: Eq.~(\ref{hmu.def}):

504: \begin{equation}

505:   \hmu \, = \, \lim_{L \rightarrow \infty} \hmu(L) \;.

506: \label{hmu.conditional}

507: \end{equation}

508: Again, this limit exists for all stationary processes \cite{Cove91}.

509: Equation (\ref{hmu.conditional}) tells us that $\hmu$ may be viewed

510: as the irreducible randomness in a process---the randomness

511: that persists even after statistics over longer and longer blocks of

512: variables are taken into account.

513:

514: \subsection{Excess Entropy}

515:

516: The entropy rate gives a reliable and well understood measure of

517: the randomness or disorder intrinsic to a process. However, as the

518: introduction noted, this tells us little about the underlying system's

519: organization, structure, or correlations.  Looking at the manner in

520: which $\hmu(L)$ converges to its asymptotic value $\hmu$, however,

521: provides one measure of these properties.

522:

523: When observations only over length-$L$ blocks are taken into account,

524: a process appears to have an entropy rate of $\hmu(L)$.  This

525: quantity is larger than the true, asymptotic value of the entropy

526: rate $\hmu$.  As a result, the process appears more random by

527: $\hmu(L) - \hmu$ bits.  Summing these entropy over-estimates over

528: $L$, one obtains the {\em excess entropy}

529: \cite{Crut83a,Shaw84,Gras86,Lind88b}:

530: \begin{equation}

531: \EE \, \equiv \,  \sum_{L=1}^{\infty} [\hmu(L) - \hmu] \;.

532: \label{E.def}

533: \end{equation}

534: The units of $\EE$ are \emph{bits}. The excess entropy tells us

535: how much information must be gained before it is possible to infer

536: the actual per-symbol randomness $\hmu$.  It is large if the system

537: possesses many regularities or correlations that manifest themselves

538: only at large scales. As such, the excess entropy can serve as a

539: measure of global structure or correlation present in the system.

540:

541: This interpretation is strengthened by noting that the excess

542: entropy can also be expressed as the mutual information between two

543: adjacent semi-infinite blocks of variables \cite{Li91,Crut03a}:

544: \begin{equation}

545:   {\bf E } \, = \, \lim_{L \rightarrow \infty} I[ S_{-L}, S_{-L+1}

546:   ,S_{-1}; S_0, S_1, \ldots S_{L-1}] \;.

547: \label{E.mutual.info}

548: \end{equation}

549: Thus, the excess entropy measures one type of the memory of the

550: system; it tells us how much knowledge of one half of the system

551: reduces our uncertainty about the other half. If the sequence of

552: random variables is a time series, then $\EE$ is the amount of

553: information the past shares with the future.

554:

555: The excess entropy may also be given a geometric interpretation.

556: The existence of the entropy rate suggests that $H(L)$ grows

557: linearly with $L$ for large $L$ and that the growth rate, or

558: slope, is given by $\hmu$.  It is then possible to show that the

559: excess entropy is the ``$y$-intercept'' of the asymptotic form for

560: $H(L)$ \cite{Shaw84,Gras86,Li91,Arno96,Bial00a,Neme00a}:

561: \begin{equation}

562:   H(L) \, \sim \, \EE + \hmu L ~,

563:   \; {\rm as} \; L \rightarrow \infty \;.

564: \label{H.Scaling.Form}

565: \end{equation}

566: Or, rearranging, we have

567: \begin{equation}

568:   \EE \, = \, \lim_{L\rightarrow \infty} \left[ H(L) - \hmu L

569:   \right] \;.

570: \end{equation}

571:

572: This form of the excess entropy highlights another interpretation: $\EE$ is

573: the \emph{cost of amnesia}. If an observer has extracted enough information

574: from a system (at large $L$) to predict it optimally ($\sim \hmu$), but

575: suddenly loses all of that information, the process will then appear more

576: random by an amount $H(L) - \hmu L$.

577:

578: To close, note that the excess entropy, originally coined in \cite{Crut83a},

579: goes by a number of different names,

580: including ``stored information'' \cite{Shaw84}; ``effective

581: measure complexity'' \cite{Gras86,Lind88b,Lind89a,Erik87,Ebel02a};

582: ``complexity'' \cite{Li91,Arno96}; ``predictive information''

583: \cite{Bial00a,Neme00a}; and ``reduced R\'enyi entropy of order $1$''

584: \cite{Csor89a,Kauf91a}. For recent reviews on excess entropy, entropy

585: convergence in general, and applications of this approach see

586: Refs.~\cite{Ebel97b,Crut03a,Bial00a}.

587:

588: \subsection{Intrinsic Information Processing Coordinates}

589: \label{Sec:ComplexityEntropyDiagram}

590:

591: In the model classes examined below, we shall take the excess

592: entropy $\EE$ as our measure of complexity and use the entropy rate

593: $\hmu$ as the randomness measure. The excess entropy ${\bf E}$ and the

594: entropy rate $\hmu$ are exactly the two quantities that specify the

595: large-$L$ asymptotic form for the block entropy

596: Eq.~(\ref{H.Scaling.Form}).

597: The set of all $(\hmu, \EE)$ pairs is thus geometrically equivalent to

598: the set of all straight lines with non-negative slope and intercept.

599: Clearly, a line's slope and intercept are independent quantities. Thus,

600: there is no {\em a priori} reason to anticipate any relationship between

601: $\hmu$ and $\EE$, a point emphasized early on by Li \cite{Li91}.

602:

603: It is helpful in the following to know that for binary order-$R$

604: Markov processes there is an upper bound on the excess entropy:

605: \begin{equation}

606: \EE \leq R (1-h_\mu) \;.

607: \label{EE_hmu_bound}

608: \end{equation}

609: We sketch a justification of this result here; for the derivation, see

610: \cite[Proposition 11]{Crut03a}. First, recall that the excess entropy may

611: be written as the mutual information between two semi-infinite blocks, as

612: indicated in Eq.~(\ref{E.mutual.info}).  However, given the process is

613: order-$R$ Markovian, Eq.~(\ref{R.Markovian}), the excess entropy reduces

614: to the mutual information between two adjacent $R$-blocks. From

615: Eq.~(\ref{MI.def}), we see that the excess entropy is the entropy of

616: an $R$-block minus the entropy of an $R$-block conditioned on its neighboring

617: $R$-block:

618: \begin{equation}

619:  \EE \, = \, H(R) - H[S_i^R|S_{i-R}^R] \;.

620: \label{E.markov}

621: \end{equation}

622: (Note that this only holds in the special case of order-$R$ Markov processes.

623: It is \emph{not} true in general.)

624: The first term on the right hand side of Eq.~(\ref{E.markov}) is

625: maximized when the distribution over the $R$-block is uniform, in

626: which case $H(R) = R$.  The second term on the right hand side is

627: minimized by assuming that the conditional entropy of the two blocks

628: is given simply by $R h_\mu$---i.e., $R$ times the per-symbol entropy

629: rate $h_\mu$.  In other words, we obtain a lower bound by assuming that

630: the process is independent, identically distributed over $R$-blocks.

631: Combining the two bounds gives Eq.~(\ref{EE_hmu_bound}).

632:

633: It is also helpful in the following to know that for periodic processes

634: $\hmu = 0$ (perfectly predictable) and $\EE = \log_2 p$, where $p$ is

635: the period \cite{Crut03a}. In this case, $\EE$ is the amount of information

636: required to distinguish the $p$ phases of the cycle.

637:

638: \subsection{Calculating Complexities and Entropies}

639: \label{numerical.methods.section}

640:

641: As is now clear, all quantities of interest depend on knowing sequence

642: probabilities $\Prob (s^L)$. These can be obtained by direct analytical

643: approximation given a model or by numerical estimation via simulation.

644: Sometimes, in special cases, the complexity and entropy can be calculated

645: in closed form.

646:

647: For some, but not all, of the process classes studied in the following,

648: we estimate the various information-theoretic quantities by simulation.

649: We generate a long sequence, keeping track of the frequency of occurrence

650: of words up to some finite length $L$. The word counts are stored in

651: a dynamically generated parse tree, allowing us to go out to $L = 120$

652: in some cases. We first make a rough estimate of the topological entropy

653: using a small $L$ value. This entropy determines the sparseness of the

654: parse tree, which in turn determines how large a tree can be stored in

655: a given amount of memory. From the word and subword frequencies

656: $\Prob(s^L)$, one directly calculates $H(L)$ and, thus, $\hmu$ and

657: $\EE$.  Estimation errors in these quantities are a function of

658: statistical errors in $\Prob(s^L)$.

659:

660: Here, we are mainly interested in gaining a general sense of the behavior

661: of the entropy rate $\hmu$ and the excess entropy $\EE$. And so, for the

662: purposes of our survey, this direct method is sufficient.  The vast

663: majority of our estimates are accurate to at least $1\%$. If extremely

664: accurate estimates are needed, there exist a variety of techniques for

665: correcting for estimator bias

666: \cite{Gras88a,Gras89a,Herz94a,Schu96,deWi99a,Neme02a}.  When one is

667: working with finite data, there is also the question of what errors

668: occur, since the $L \rightarrow \infty$ limit cannot be taken. For

669: more on this issue, see Ref.~\cite{Crut03a}.

670:

671: Regardless of these potential subtleties, the entropy rate and

672: excess entropy can be reliably estimated via simulation, given access

673: to a reasonably large amount of data. Moreover, this estimation is

674: purely inductive---one does not need to use knowledge of the

675: underlying equations of motion or the hidden states that produced the

676: sequence.  Nevertheless, for several of the model classes we

677: consider---one-dimensional Ising models, Markov chains, and

678: topological Markov chains---we calculate the quantities using

679: closed-form expressions, leading to essentially no error.

680:

681:

682: % **********************************************************************

683: % **********************************************************************

684: % **********************************************************************

685:

686: \section{Complexity-Entropy Diagrams}

687: \label{Comp.Ent.Section}

688:

689: In the following sections we present a survey of intrinsic computation

690: across a wide range of process classes.

691: We think of a \emph{class} of system as given by equations of motion,

692: or other specification for a stochastic process,

693: that are parametrized in some way---a pair of control parameters

694: in a one-dimensional map or the energy of a Hamiltonian, say. The

695: space of parameters, then, is the concrete representation of the

696: space of possible systems, and a class of system is a subset of the

697: set of all possible processes. A point in the parameter space is then

698: a particular \emph{system}, whose intrinsic computation we will

699: summarize by a pair of numbers---one a measure of randomness, the

700: other a measure of structure. In several cases, these measures are estimated

701: from sequences generated by the temporal or spatial process.

702:

703: \subsection{One-Dimensional Discrete Iterated Maps}

704:

705: Here we look at the symbolic dynamics generated by two iterated maps

706: of the interval---the well studied \emph{logistic} and \emph{tent

707: maps}---of the form:

708: \begin{equation}

709: x_{n+1} = f_\mu (x_n) ~,

710: \end{equation}

711: where $\mu$ is a parameter that controls the nonlinear function $f$,

712: $x_n \in [0,1]$, and one starts with $x_0$, the \emph{initial condition}.

713: The logistic and tent maps are canonical examples of systems

714: exhibiting deterministic chaos.

715: The nonlinear iterated function $f$  consists of two monotone

716: pieces. And so, one can analyze the maps' behavior on the interval via

717: a \emph{generating partition} that reduces a sequence of continuous states

718: $x_0, x_1, x_2, \ldots$ to a binary sequence $s_0, s_1, s_2,

719: \ldots$ \cite{Bai89a}. The binary partition is given by

720: \begin{equation}

721: s_i =  \left\{

722:   \begin{array}{cl}

723: 	0 & x \leq \frac{1}{2} \\

724: 	\\

725: 	1 & x > \frac{1}{2}

726:   \end{array}

727:   \right. ~.

728: \end{equation}

729: The binary sequence may be viewed as a {\em code} for the set of initial

730: conditions that produce the sequence. When the maps are chaotic, arbitrarily

731: long binary

732: sequences produced using this partition code for arbitrarily small

733: intervals of initial conditions on the chaotic attractor. Hence, one

734: can explore many of these maps' properties via binary sequences.

735:

736: \subsubsection{Logistic Map}

737:

738: We begin with the logistic map of the unit interval:

739: \begin{equation}

740:     f(x) \, = \, rx(1-x) \;,

741: \label{logistic}

742: \end{equation}

743: where the control parameter $r \in [0,4]$. We iterate this starting

744: with an arbitrary initial condition $x_0 \in [0,1]$.  In

745: Fig.~\ref{logistic.E.hmu.vs.r.plot} we show numerical estimates of the

746: excess entropy $\EE$ and the entropy rate $\hmu$ as a function of

747: $r$.  Notice that both $\EE$ and $\hmu$ change in a complicated matter

748: as the parameter $r$ is varied continuously.

749:

750: As $r$ increases from $3.0$ to approximately $3.5926$,

751: %$3.6785735104283219$,

752: the logistic map undergoes a series of period-doubling bifurcations.

753: For $r \in (3.0,3.2361)$ the sequences generated by the logistic

754: %3.2360679774997897

755: map are periodic with period two, for $r \in

756: (3.2361,3.4986)$ the sequences are period 4,

757: %3.2360679774997897,3.4985616993277016

758: and for $r \in (3.4986,3.5546)$ the sequences are

759: %3.4985616993277016,3.5546408627688249

760: period $8$. For all periodic sequences of period $p$, the entropy

761: rate $\hmu$ is zero and the excess entropy $\EE$ is $\log_2 p$. So, as

762: the period doubles, the excess entropy increases by one bit.  This can

763: be seen in the staircase on the left hand side of

764: Fig.~\ref{logistic.E.hmu.vs.r.plot}.  At $r \approx

765: 3.5926$, the logistic map becomes chaotic, as evidenced by

766: a positive entropy rate.  For further discussion of the

767: phenomenology of the logistic map, see almost any modern textbook on

768: nonlinear dynamics, e.g., Refs.~\cite{Peit92,Ott93}.

769:

770: % **********************************************************************

771: \begin{figure}[tbp]

772: \epsfxsize=3.2in

773: \begin{center}

774: \leavevmode

775: \epsffile{logistic.Ehmu.vs.r.eps}

776: \end{center}

777: \vspace{-.6cm}

778: \caption{Excess entropy $\EE$ and entropy rate $\hmu$ as a function of

779: the parameter $r$.  The top curve is excess entropy. The $r$ values

780: were sampled uniformly as $r$ was varied from $3.4$ to $4.0$ in

781: increments of $0.0001$.  The largest $L$ used was $L=30$ for systems

782: with low entropy.  For each parameter value with positive entropy, $1

783: \times 10^7$ words of length $L$ were sampled.  }

784: \label{logistic.E.hmu.vs.r.plot}

785: \vspace{-.2cm}

786: \end{figure}

787: % **********************************************************************

788:

789: Looking at Fig.~\ref{logistic.E.hmu.vs.r.plot}, it is difficult to see

790: how ${\bf E}$ and $h_\mu$ are related.  This relationship can be seen

791: much more clearly in Fig.~\ref{logistic.banded.plot}, in which we show

792: the complexity-entropy diagram for the same system.  That is, we plot

793: $(\hmu,\EE)$ pairs.  This lets us

794: look at how the excess entropy and the entropy rate are related,

795: independent of the parameter $r$.

796:

797: %This is similar in spirit to the

798: %idea behind a phase plane.  For example, when studying two coupled

799: %ordinary differential equations, e.g., a Lotka-Volterra predator-prey

800: %system, one could plot the values of the two populations as a function

801: %of time.  It is often more revealing, however, to plot one population

802: %against the other, to reveal how the two populations are related.  The

803: %resulting phase plane plot more directly reveals the relationship

804: %between the two variables.

805:

806: % **********************************************************************

807: \begin{figure}[htbp]

808: \epsfxsize=3.2in

809: \begin{center}

810: \leavevmode

811: \epsffile{logistic.two-bands.eps}

812: \end{center}

813: \vspace{-.8cm}

814: \caption{Entropy rate and excess entropy $(\hmu,\EE)$-pairs for

815:   logistic map. Points from regions of the map in which the bifurcation

816: diagram has one or two bands are colored differently. There are $3214$

817: parameter values sampled for the one-band region and $3440$ values for

818: the two-band region.  The $r$ values were sampled uniformly.  The

819: one-band region is $r \in (3.6786, 4.0)$; the two-band

820: region is $r \in (3.5926, 3.6786)$. The largest

821: $L$ used was $L=30$ for systems with low entropy.  For each parameter

822: value with positive entropy, $1 \times 10^7$ words of length $L$ were

823: sampled.  }

824: \label{logistic.banded.plot}

825: \vspace{-.2cm}

826: \end{figure}

827: % **********************************************************************

828:

829: Figure \ref{logistic.banded.plot} shows that there is a definite

830: relationship between $\EE$ and $\hmu$---one that is not immediately evident

831: from looking at Fig.~\ref{logistic.E.hmu.vs.r.plot}.  Note, however,

832: that this relationship is not a simple one.  In particular, complexity

833: is not a function of entropy: $\EE \neq g(\hmu)$.  For a given value

834: of $\hmu$, multiple excess entropy values $\EE$ are possible.

835:

836: There are several additional empirical observations to extract from

837: Fig.~\ref{logistic.banded.plot}. First, the shape appears to be

838: self-similar. This is not at all surprising, given that the logistic

839: map's bifurcation diagram itself is self-similar.  Second, note the

840: clumpy, nonuniform clustering of $(\hmu, \EE)$ pairs within the dense

841: region. Third, note that there is a fairly well defined lower bound.

842: Fourth, for a given value of the

843: entropy rate $h_\mu$ there are many possible values for the excess

844: entropy $\EE$.   However, it appears as if not all $\EE$ values are

845: possible for a given $h_\mu$.  Lastly, note that there does not appear to be

846: any phase transition (at finite $h_\mu$) in the complexity-entropy diagram.

847: Strictly speaking, such a transition does occur, but it does so at

848: zero entropy rate.  As the period doublings accumulate, the excess entropy

849: grows without bound.  As a result, the possible excess entropy values

850: at $h_\mu = 0$ on the complexity-entropy diagram are unbounded.  For

851: further discussion, see Ref. \cite{Crut90}.

852:

853:

854: \subsubsection{Tent Map}

855:

856: We next consider the {\em tent map}:

857: \begin{equation}

858:  f(x) \, = \,  \left\{ \begin{array}{ll} ax &

859: x < \frac{1}{2} \\ \\ \hspace{1mm}  a(1-x)  & x \geq \frac{1}{2}

860:         \end{array} \right.   \; ,

861: \end{equation}

862: where $a \in [0,2]$ is the control parameter. For $a \in [1,2]$,

863: the entropy rate $\hmu = \log_2 a$; when $a \in [0,1]$, $\hmu = 0$.

864: Fig.~\ref{tent.plot} shows $1,200$ $(\hmu,\EE)$-pairs in which

865: $\EE$ is calculated numerically from empirical estimates of the

866: binary word distribution $\Prob(s^L)$.

867:

868: % **********************************************************************

869: \begin{figure}[tbp]

870: \epsfxsize=3.0in

871: \begin{center}

872: \leavevmode

873: \epsffile{tent.compent.eps}

874: \end{center}

875: \vspace{-.6cm}

876: \caption{Excess entropy $\EE$ versus entropy density $\hmu$ for the

877:   tent map. The $L$ used to estimate $\Prob(s^L)$, and so $\EE$ and

878:   $\hmu$, varied depending on the $a$ parameter. The largest $L$ used

879:   was $L=120$ at low $\hmu$. The plot shows $1,200$ $(\hmu,\EE)$-pairs.

880:   The parameter was incremented every $\Delta a = 5 \times 10^{-4}$ for

881:   $a \in [1,1.2]$ and then incremented every $\Delta a = 0.001$ for

882:   $a \in [1.2,2.0]$. For each parameter value with positive entropy,

883:   $10^7$ words of length $L$ were sampled.

884:   }

885: \label{tent.plot}

886: \vspace{-.2cm}

887: \end{figure}

888: % ********************************************************************

889:

890: Reference \cite{Crut90} developed a phenomenological theory that explains

891: the properties of the tent map at the so-called \emph{band-merging points},

892: where bands of the chaotic attractor merge pairwise as a function of the

893: control parameter. The behavior at these points is

894: \emph{noisy periodic}---the order of band visitations is periodic,

895: but motion within is deterministic chaotic. They occur when

896: $a = 2^{2^{-n}}$. The symbolic-dynamic process is described by a Markov

897: chain consisting of a periodic cycle of $2^n$ states in which

898: all state-to-state transitions are nonbranching except for one where

899: $s_i = 0$ or $s_i = 1$ with equal probability. Thus, each phase

900: of the Markov chain has zero entropy per transition, except for the one

901: that has a branching entropy of $1$ bit. The entropy rate at band-mergings

902: is thus $\hmu = 2^{-n}$, with $n$ an integer.

903:

904: The excess entropy for the symbolic-dynamic process at the

905: $2^n$-to-$2^{n-1}$ band-merging is simply $\EE = \log_2 2^n = n$.

906: That is, the process carries $n$ bits of phase information. Putting

907: these facts together, then, we have a very simple relationship in

908: the complexity-entropy diagram at band-mergings:

909: \begin{equation}

910:   \EE \, = \, -\log_2 \hmu ~.

911: \label{tent.theory}

912: \end{equation}

913: This is graphed as the dashed line in Fig.~\ref{tent.plot}.

914: It is clear that the entire complexity-entropy diagram is much richer

915: than this simple expression indicates. Nonetheless, Eq.

916: (\ref{tent.theory}) does capture the overall shape quite well.

917:

918: Note that, in sharp contrast to the logistic map, for the tent map it

919: does appear as if the excess entropy takes on only a single value for each

920: value of the entropy rate $\hmu$. The reason for this is straightforward.

921: The entropy rate $\hmu$ is a simple monotonic function of the parameter

922: $a$---$h_\mu = \log_2 a$---and so there is a one-to-one relationship

923: between them. As a result, each $h_\mu$ value on the complexity-entropy

924: diagram corresponds to one and only one value of $a$ and, in turn, corresponds

925: to one and only one value of $\EE$. Interestingly, the excess entropy appears

926: to be a continuous function of $h_\mu$, although not a differentiable one.

927:

928: \subsection{Ising Spin Systems}

929:

930: We now investigate the complexity-entropy diagrams of the Ising model

931: in one and two spatial dimensions. Ising models are among the simplest

932: physical models of spatially extended systems.  Originally introduced

933: to model magnetic materials, they are now used to model a wide range

934: of cooperative phenomena and order-disorder transitions and, more

935: generally, are viewed as generic models of spatially extended,

936: statistical mechanical systems \cite{Chri05b,Seth06a}.  Like the

937: logistic and tent maps, Ising models are also studied as an

938: intrinsically interesting mathematical topic.

939: As we will see, Ising models provide an

940: interesting contrast with the intrinsic computation seen in the

941: interval maps.

942:

943: Specifically, we consider spin-$1/2$ Ising models

944: with nearest (NN) and next-nearest neighbor (NNN) interactions. The

945: Hamiltonian (energy function) for such a system is:

946: \begin{eqnarray}

947:   {\cal H} \,& =&  \, -J_1 \sum_{\langle i,j\rangle_{\rm nn}} S_{i} S_{j}

948:   \nonumber \\ & &  -J_2

949:   \sum_{\langle i,j\rangle_{{\rm nnn}} } S_{i} S_{j} \,-\, B \sum_{i} S_{i} \;,

950: \label{Hamiltonian}

951: \end{eqnarray}

952: where the first (second) sum is understood to run over all NN (NNN)

953: pairs of spins.  In one dimension, a spin's nearest-neighbors will

954: consist of two spins, one to the right and one to the left, whereas

955: in two dimensions a spin will have four nearest neighbors---left,

956: right, up, and down.  Each spin $S_{i}$ is a binary variable: $S_{i}

957: \,\in\, \{-1,+1\}$.  The coupling constant $J_1$ is a parameter that

958: when positive (negative) makes it energetically favorable for NN

959: spins to (anti-)align.  The constant $J_2$ has the same effect on NNN

960: spins.  The parameter $B$ may be viewed as an external field; its

961: effect is to make it energetically favorable for spins to point up

962: (i.e., have a value of $+1$) instead of down.  The probability of a

963: configuration is taken to be proportional to its Boltzmann weight:

964: the probability of a spin configuration ${\cal C}$ is proportional

965: to $e^{-\beta{\cal H}({\cal C})}$, where $\beta = 1/T$ is the inverse

966: temperature.

967:

968: In equilibrium statistical mechanics, the entropy density is a

969: monotonic increasing function of the temperature.  Quite generically,

970: a plot of the entropy $h_\mu$ as a function of temperature $T$

971: resembles that of the top plot in Fig.~\ref{2DCriticalhvsE}.  Thus,

972: $h_\mu$ may be viewed as a nonlinearly rescaled temperature.  One

973: might ask, then, why one might want to plot complexity versus entropy:

974: Isn't a plot of complexity versus temperature qualitatively the same?

975: Indeed, the two plots would look very similar.  However, there

976: are two major benefits of complexity-entropy diagrams for statistical

977: mechanical systems.  First,

978: the entropy captures directly the system's unpredictability, measured

979: in bits per spin.  The entropy thus measures the system's

980: information processing properties.  Second, plotting complexity versus

981: entropy and not temperature allows for a direct comparison of the

982: range of information processing properties of statistical mechanical

983: systems with systems for which there is not a well defined

984: temperature, such as the deterministic dynamical systems of the

985: previous section or the cellular automata of the subsequent one.

986:

987:

988: \subsubsection{One-Dimensional Ising System}

989:

990: \label{1D.spin.section}

991:

992: We begin by examining one-dimensional Ising systems.

993: In Refs.~\cite{Crut97a,Feld98b,Feld98c} two of the authors developed

994: exact, analytic transfer-matrix methods for calculating $\hmu$ and

995: $\EE$ in the thermodynamic ($N \rightarrow \infty$) limit.  These

996: methods make use of the fact the NNN Ising model is order-$2$

997: Markovian.  We used

998: these methods to produce Fig.~\ref{Ising.Batcape}, the

999: complexity-entropy diagram for the NNN Ising system with

1000: antiferromagnetic coupling constants $J_1$ and $J_2$ that tend to

1001: anti-align coupled spins. The figure gives a

1002: scatter plot of $10^5$ $(\hmu, \EE)$ pairs for system parameters that

1003: were sampled randomly

1004: from the following ranges: $J_1 \in [-8,0]$, $J_2 \in [-8,0]$, $T \in

1005: [0.05,6.05]$, and $B \in [0,3]$. For each parameter realization, the

1006: excess entropy $\EE$ and entropy density $\hmu$ were calculated.

1007: Fig.~\ref{Ising.Batcape} is rather striking---the $(\hmu,\EE)$ pairs

1008: are organized in the shape of a ``batcape''. Why does the plot have

1009: this form?

1010:

1011: % **********************************************************************

1012: \begin{figure}[tbp]

1013: \epsfxsize=3.5in

1014: \begin{center}

1015: \leavevmode

1016: %\includegraphics[width=4.3in]{batcape.eps}

1017: %\epsffile{batcape.lowres.ps}

1018: \epsffile{batcape.eps}

1019: \end{center}

1020: \vspace{-6mm}

1021: \caption{Complexity-entropy diagram for the one-dimensional, spin-$1/2$

1022:   antiferromagnetic Ising model with nearest- and next-nearest-neighbor

1023:   interactions. $10^5$ system parameters were sampled randomly from the

1024:   following ranges: $J_1 \in [-8,0]$, $J_2 \in [-8,0]$,

1025:   $T \in [0.05,6.05]$, and $B \in [0,3]$. For each parameter setting,

1026:   the excess entropy $\EE$ and entropy density $\hmu$ were calculated

1027:   analytically.

1028:   }

1029: \label{Ising.Batcape}

1030: \vspace{-2mm}

1031: \end{figure}

1032: % **********************************************************************

1033:

1034: Recall that if a sequence over a binary alphabet is periodic with period

1035: $p$, then $\EE = \log_2 p$ and $\hmu = 0$. Thus, the ``tips'' of the batcape

1036: at $\hmu = 0$ correspond to crystalline (periodic) spin configurations with

1037: periods $1$, $2$, $3$, and $4$. For example, the $(0,0)$ point is the

1038: period-$1$ configuration with all spins aligned. These periodic regimes

1039: correspond to the system's different possible ground states. As the entropy

1040: density increases, the cape tips widen and eventually join.

1041:

1042: Figure~\ref{Ising.Batcape} demonstrates in graphical form that there

1043: is organization in the process space defined by the Hamiltonian of

1044: Eq.~(\ref{Hamiltonian}). Specifically, for antiferromagnetic couplings,

1045: $\EE$ and $\hmu$ values do not uniformly fill the plane. There are forbidden

1046: regions in the complexity-entropy plane. Adding randomness ($\hmu$) to

1047: the periodic ground states does not immediately destroy them.  That is,

1048: there are low-entropy states that are almost-periodic. The apparent upper

1049: linear bound is that of Eq.~(\ref{EE_hmu_bound}) for a system with

1050: at most $4$ Markov states or, equivalently, a order-$2$ Markov chain:

1051: $\EE \leq 2 - 2 \hmu$.

1052:

1053: In contrast, in the logistic map's complexity-entropy diagram

1054: (Fig.~\ref{logistic.banded.plot}) one does not see anything

1055: remotely like the batcape. This indicates that there are no

1056: low-entropy, almost-periodic configurations related to the exactly

1057: periodic configurations generated at zero-entropy along the

1058: period-doubling route to chaos. Increasing the parameter there

1059: does not add randomness to a periodic orbit. Rather, it causes a

1060: system bifurcation to a higher-period orbit.

1061:

1062: \subsubsection{Two-Dimensional Ising Model}

1063:

1064: Thus far we have considered only one-dimensional systems, either

1065: temporal or spatial.  However,

1066: the excess entropy can be extended to apply to two-dimensional

1067: configurations as well; for details, see Ref.~\cite{Feld03a}.  Using

1068: methods from there, we calculated the excess entropy

1069: and entropy density for the two-dimensional Ising model with nearest-

1070: and next-nearest-neighbor interactions.  In other words, we calculated

1071: the complexity-entropy diagram for the two-dimensional version of the

1072: system whose complexity-entropy diagram is shown in

1073: Fig.~\ref{Ising.Batcape}.  There are several different definitions for

1074: the excess entropy in two dimensions, all of which are similar but not

1075: identical.  In Fig.~\ref{Ising.Batcape} we used a version that is

1076: based on the mutual information and, hence, is denoted ${\EE}_I$

1077: \cite{Feld03a}.

1078: %See Ref.~\cite{Feld03a} for a discussion of the different forms of

1079: %two-dimensional excess entropy.

1080:

1081: Figure \ref{2DIsingBatcape} gives a scatter plot of $4,500$

1082: complexity-entropy pairs.  System parameters in

1083: Eq.~(\ref{Hamiltonian}) were sampled randomly from the following

1084: ranges: $J_1 \in [-3,0]$, $J_2 \in [-3,0]$, $T \in [0.05,4.05]$, and

1085: $B = 0$. For each parameter setting, the excess entropy $\EE_I$ and

1086: entropy density $\hmu$ were estimated numerically; the configurations

1087: themselves were generated via a Monte Carlo simulation. For each $(\hmu,\EE)$

1088: point the simulation was run for $200,000$ Monte Carlo updates per site to

1089: equilibrate. Configuration data was then taken for $20,000$ Monte Carlo

1090: updates per site. The lattice size was a square of $48 \times 48$ spins.

1091: The long equilibration time is necessary because, for some Ising models at

1092: low temperature, single-spin flip dynamics of the sort used here have very

1093: long transient times \cite{Spir01a,Spir01b,Vazq02a}.

1094:

1095: Note the similarity between Figs.~\ref{Ising.Batcape} and \ref{2DIsingBatcape}.

1096: For the 2D model, there is also a near-linear upper bound:

1097: $\EE \leq 5 (1-h_\mu)$. In addition, one sees periodic

1098: spin configurations, as evidenced by the horizontal bands. An $\EE_I$ of

1099: $1$ bit corresponds to a checkerboard of period $2$; $\EE_I = 3$

1100: corresponds to a checkerboard of period $4$; while $\EE_I = 2$

1101: corresponds to a ``staircase'' pattern of period $4$.  See

1102: Ref.~\cite{Feld03a} for illustrations.  The two period-$4$

1103: configurations are both ground states for the model in the parameter

1104: regime in which $|J_2| < |J_1|$ and $J_2 < 0$. At low temperatures,

1105: the state into which the system settles is a matter of chance.

1106:

1107: % **********************************************************************

1108: \begin{figure}[tbp]

1109: \epsfxsize=3.5in

1110: \begin{center}

1111: \leavevmode

1112: %\includegraphics[width=4.3in]{2DIsingBatcape.eps}

1113: \epsffile{2dnnn.batcape.Ei.ps}

1114: \end{center}

1115: \vspace{-6mm}

1116: \caption{Complexity-entropy diagram for the two-dimensional, spin-$1/2$

1117:   anti\-ferromagnetic Ising model with nearest- and next-nearest-neighbor

1118:   interactions. System parameters were sampled randomly from the

1119:   following ranges: $J_1 \in [-3,0]$, $J_2 \in [-3,0]$, $T \in

1120:   [0.05,4.05]$, and $B = 0$. For each parameter setting, the excess

1121:   entropy $\EE_I$ and entropy density $\hmu$ were estimated

1122:   numerically.}

1123: \vspace{-4mm}

1124: \label{2DIsingBatcape}

1125: \end{figure}

1126: % **********************************************************************

1127:

1128: Thus, the horizontal streaks in the low-entropy region of

1129: Fig.~\ref{2DIsingBatcape} are the different ground states possible for

1130: the system.  In this regard Fig.~\ref{2DIsingBatcape} is qualitatively

1131: similar to Fig.~\ref{Ising.Batcape}---in each there are several

1132: possible ground states at $\hmu = 0$ that persist as the entropy

1133: density is increased. However, in the two-dimensional system of

1134: Fig.~\ref{2DIsingBatcape} one sees a scatter of other values

1135: around the periodic bands.  There are even $\EE_I$ values larger than

1136: $3$.  These $\EE_I$ values arise when parameters are selected in which

1137: the NN and NNN coupling strengths are similar; $J_1 \approx J_2$.

1138: When this is the case, there is no energy cost associated with a

1139: horizontal or vertical defect between the two possible ground states.

1140: As a result, for low temperatures the systems effectively freezes into

1141: horizontal or vertical strips consisting of the different ground states.

1142: Depending on the number of strips and their relative widths, a number

1143: of different $\EE_I$ values are possible, including values well above

1144: $3$, indicating very complex spatial structure.

1145:

1146: Despite these differences, the similarities between the

1147: complexity-entropy plots for the one- and two-dimensional systems is

1148: clearly evident.  This is all the more noteworthy since one- and

1149: two-dimensional Ising models are regarded as very different sorts of

1150: system by those who focus solely on phase transitions. The

1151: two-dimensional Ising model has a critical phase transition while the

1152: one-dimensional does not.  And, more generally, two-dimensional random

1153: fields are generally considered very different mathematical entities

1154: than one-dimensional sequences. Nevertheless, the two complexity-entropy

1155: diagrams show that, away from criticality, the one- and two-dimensional

1156: Ising systems' ranges of intrinsic computation are similar.

1157:

1158: \subsubsection{Ising Model Phase Transition}

1159:

1160: %Figure \ref{2DIsingBatcape} may be initially surprising.

1161: As noted above, the two-dimensional Ising

1162: model is well known as a canonical model of a system that undergoes a

1163: continuous phase transition---a discontinuous change in the system's

1164: properties as a parameter is continuously varied.  The 2D NN Ising

1165: model with ferromagnetic ($J_1>0$) bonds and no NNN coupling ($J_2 =

1166: 0$) and zero external field ($B=0$)

1167: undergoes a phase transition at $T = T_c \approx 2.269$ when $J_1 =

1168: 1$.  At the critical temperature $T_c$ the magnetic susceptibility

1169: diverges and the specific heat is not differentiable.  In

1170: Fig.~\ref{2DIsingBatcape} we restricted ourselves to antiferromagnetic

1171: couplings and thus did not sample in the region of parameter space in

1172: which the phase transition occurs.

1173:

1174: What happens if we fix $J_1 = 1$, $J_2 = 0$, and $B=0$, and vary the

1175: temperature? In this case, we see that the complexity, as measured by $\EE$,

1176: shows a sharp maximum near the critical temperature $T_c$. Figure

1177: \ref{2DCriticalhvsE} shows results obtained via a Monte Carlo simulation on

1178: a $100 \times 100$ lattice. We used a Wolff cluster algorithm and periodic

1179: boundary conditions. After $10^6$ Monte Carlo steps (one step is one proposed

1180: cluster flip), $25,000$ configurations were sampled, with $200$ Monte

1181: Carlo steps between measurements.  This process was repeated for over

1182: $200$ samples between $T=0$ and $T=6$.  More temperatures were sampled

1183: near the critical region.

1184: %In Fig.~\ref{2DCriticalhvsE} the

1185: %lines between the points are just guides to the eye.

1186:

1187: % **********************************************************************

1188: \begin{figure}[tbp]

1189: \epsfxsize=3.5in

1190: \begin{center}

1191: \leavevmode

1192: \epsffile{2d.nn.ising.T.ent.eps}\\

1193: \vspace{-5mm}

1194: \epsfxsize=3.5in

1195: \epsffile{2d.nn.ising.T.E.eps}\\

1196: \vspace{-5mm}

1197: \epsfxsize=3.5in

1198: \epsffile{2d.nn.ising.comp.ent.eps}

1199: \end{center}

1200: \vspace{-6mm}

1201: \caption{Entropy rate vs.~temperature, excess entropy vs.~temperature, and

1202:   the complexity-entropy diagram for the 2D NN ferromagnetic Ising

1203:   model.  Monte Carlo results for $200$ temperatures between $0$ and

1204:   $6$.  The temperature was sampled more densely near the critical

1205:   temperature. For further discussion, see text.  }

1206: \vspace{-4mm}

1207: \label{2DCriticalhvsE}

1208: \end{figure}

1209: % **********************************************************************

1210:

1211: In Fig.~\ref{2DCriticalhvsE} we first plot entropy density $\hmu$ and

1212: excess entropy $\EE$ versus temperature.  As expected, the

1213: excess entropy reaches a maximum at the critical temperature $T_c$.

1214: At $T_c$ the correlations in the system decay algebraically, whereas

1215: they decay exponentially for all other $T_c$ values.  Hence, $\EE$,

1216: which may be viewed as a global measure of correlation, is maximized at

1217: $T_c$.  For the system of Fig.~\ref{2DCriticalhvsE}, $T_c$ appears to

1218: have an approximate value of $2.42$. This is above the exact value for an

1219: infinite system, which is $T_c \approx 2.27$. Our estimated value is higher,

1220: as one expects for a finite lattice. At the critical temperature,  $h_\mu

1221: \approx 0.57$, and $\EE \approx 0.413$.

1222:

1223: Also in Fig.~\ref{2DCriticalhvsE} we show the complexity-entropy

1224: diagram for the 2D Ising model.  This complexity-entropy diagram is a

1225: single curve, instead of the scatter plots seen in the

1226: previous complexity-entropy diagrams.  The reason is that

1227: we varied a single parameter, the temperature, and

1228: entropy is a single-valued function of the temperature, as can clearly

1229: be seen in the first plot in Fig.~\ref{2DCriticalhvsE}.  Hence, there

1230: is only one value of $\hmu$ for each temperature, leading to a single

1231: curve for the complexity-entropy diagram.

1232:

1233: Note that the peak in the complexity-entropy diagram for the

1234: 2D Ising model is rather rounded, whereas $\EE$ plotted versus

1235: temperature shows a much sharper peak.  The reason for this rounding

1236: is that the entropy density $\hmu$ changes very rapidly near $T_c$.

1237: The effect is to smooth the $\EE$ curve when plotted against $\hmu$.

1238:

1239: A similar complexity-entropy was produced by Arnold \cite{Arno96}. He also

1240: estimated the excess entropy, but did so by considering only one-dimensional

1241: sequences of measurements obtained at a single site, while a Monte Carlo

1242: simulation generated a sequence of two-dimensional configurations. Thus,

1243: those results do not account for two-dimensional structure but, rather,

1244: reflect properties of the dynamics of the particular Monte Carlo updating

1245: algorithm used. Nevertheless, the results of Ref.~\cite{Arno96} are

1246: qualitatively similar to ours.

1247:

1248: Erb and Ay \cite{Erb04a} have calculated the \emph{multi-information} for

1249: the two-dimensional Ising model as a function of temperature. The

1250: multi-information is the difference between the entropy rate and the

1251: entropy of a single site: $H(1) - h_\mu$. That is, the multi-information

1252: is only the leading term in the sum which defines the excess entropy,

1253: Eq.~(\ref{E.def}). (Recall that $h_\mu(1) = H(1)$.) They find that the

1254: multi-information is a continuous function of the temperature and that

1255: it reaches a sharp peak at the critical temperature \cite[Fig.~4]{Erb04a}.

1256:

1257: % **********************************************************************

1258: % **********************************************************************

1259: \subsection{Cellular Automata}

1260:

1261: The next process class we consider is \emph{cellular automata} (CAs)

1262: in one and two spatial dimensions. Like spin systems, CAs are common

1263: prototypes used to model spatially extended dynamical systems. For reviews

1264: see, e.g., Refs.~\cite{Wolf83a,Chop98a,Ilac01a}. Unlike the Ising

1265: models of the previous section, the CAs that we study here are

1266: deterministic. There is no noise or temperature in the system.

1267:

1268: The states of the CAs we shall consider consist of one- or

1269: two-dimensional \emph{configurations}

1270: ${\mathbf s} = \ldots s^{-1} , s^0 , s^1 , \ldots $ of discrete $K$-ary

1271: \emph{local states} $s^i \in \{ 0, 1, \ldots , K-1 \}$. The

1272: configurations change in time according to a \emph{global update

1273: function} $\mathbf \Phi$:

1274: \begin{equation}

1275: {\mathbf s}_{t+1}^i = {\mathbf \Phi} {\mathbf s}_t^i ~,

1276: \end{equation}

1277: starting from an \emph{initial configuration} ${\mathbf s}_0$. What makes CAs

1278: \emph{cellular} is that configurations evolve according to a

1279: \emph{local update rule}. The value $s_{t+1}^i$ of site $i$ at the next time

1280: step is a function $\phi$ of the site's previous value and the values of

1281: neighboring sites within some \emph{radius} $r$:

1282: \begin{equation}

1283: s_{t+1}^i = \phi ( s_t^{i-r} \ldots, s_t^i \ldots, s_t^{i+r} ) ~.

1284: \end{equation}

1285: All sites are updated synchronously. The CA update rule $\phi$ consists

1286: of specifying the \emph{output value} $s_{t+1}$ for all possible

1287: \emph{neighborhood configurations}

1288: $\eta_t = s_t^{i-r} \ldots, s_t^i \ldots, s_t^{i+r}$.

1289: Thus, for 1D radius-$r$ CAs, there are $K^{2r+1}$ possible neighborhood

1290: configurations and $2^{K^{2r+1}}$ possible CA rules. The $r=1$, $K=2$

1291: 1D CAs are called {\em elementary cellular automata}\/ \cite{Wolf83a}.

1292:

1293: In all CA simulations reported we began with an arbitrary random initial

1294: configuration ${\mathbf s}_0$ and iterated the CA several thousand times

1295: to let transient behavior die away. Configuration statistics were then

1296: accumulated for an additional period of thousands of time steps, as

1297: appropriate. Periodic boundary conditions on the underlying lattice

1298: were used.

1299:

1300: %To motivate the questions we will address, we begin the following

1301: %section by closely reviewing several earlier investigations of

1302: %two-dimensional CAs. We then turn to consider the simpler case

1303: %of $r = 1$ and $r = 2$ 1D CAs.

1304:

1305:

1306: %\subsubsection{One-Dimensional Cellular Automata}

1307:

1308: In Fig.~\ref{1D.rad2.spatial.hvsE} we

1309: show the results of calculating various complexity-entropy diagrams

1310: for 1D, $r = 2$, $K=2$ (binary) cellular automata.  There are $2^{2^5}

1311: \approx 4.3 \times 10^9$ such CAs.  We cannot examine all $4.3$

1312: billion CAs; instead we sample the space these CAs uniformly.

1313: %For the data of Fig.~\ref{1D.rad2.spatial.HvsI}, the lattice has

1314: %$1000$ sites; a  transient time of $1000$ iterations was used. We plot

1315: %$\hmu$ versus $\EE$ for temporal sequences.

1316: For the data of

1317: Fig.~\ref{1D.rad2.spatial.hvsE}, the lattice has $5\times 10^4$ sites

1318: and a transient time of $5 \times 10^4$ iterations was used. We plot

1319: $\hmu$ versus $\EE$ for spatial sequences.  Plots for the temporal

1320: sequences are qualitatively similar. There are several things to

1321: observe in these diagrams.

1322:

1323:

1324: % **********************************************************************

1325: \begin{figure}[tbp]

1326: \epsfxsize=3.0in

1327: \begin{center}

1328: \leavevmode

1329: \epsffile{1d.radius2.spatial.hvsE.eps}

1330: \end{center}

1331: \vspace{-6mm}

1332: \caption{Spatial entropy density $h^s_\mu$ and spatial excess entropy

1333:   $\EE^s$ for a random sampling of $10^3$ $r = 2$, binary 1D CAs.

1334:    }

1335: \vspace{-2mm}

1336: \label{1D.rad2.spatial.hvsE}

1337: \end{figure}

1338: % **********************************************************************

1339:

1340: One feature to notice in Fig.~\ref{1D.rad2.spatial.hvsE} is that no sharp

1341: peak in the excess entropy appears at some intermediate $\hmu$ value. In

1342: contrast, the maximum possible excess entropy falls off moderately rapidly

1343: with increasing $\hmu$. A linear upper bound, $\EE \leq 4 ( 1 - h_\mu)$,

1344: is almost completely respected. Note that, as is the case with the other

1345: complexity-entropy diagrams presented here, for all $\hmu$ values except

1346: $\hmu = 1$, there is a range of possible excess entropies.

1347:

1348: %Note also that the scale of the complexity-entropy diagram

1349: %for radius-$2$ shown in Fig.~\ref{1D.rad2.spatial.hvsE} is the same as

1350: %that of the radius-$1$ (or elementary) CAs in Fig.~\ref{ECA}.

1351:

1352: %We now turn our attention to the $H(1)$ versus $I_2$ plot shown in

1353: %Fig.~\ref{1D.rad2.spatial.HvsI}.

1354: %Note that these diagrams are strikingly different than those for the

1355: %2D, $8$-state CAs shown in Figs.~\ref{Langton} and

1356: %\ref{Langton.Bogus.Rescaling}. For the 1D CAs, nothing even close

1357: %to an ``edge of chaos'' is seen---the complexity, as measured by the

1358: %two-point mutual information, is maximized at $\hmu = 1$.

1359:

1360: %\subsubsection{Comparison with Earlier Results}

1361:

1362: In the early 1990's there was considerable exploration of the

1363: organization of CA rule space. In particular, a series of papers

1364: \cite{Lang90a,Li90b,Woot90a,Lang91a} looked at two-dimensional

1365: eight-state ($K=8$) cellular automata, with a neighborhood size of

1366: $5$ sites---the site itself and its nearest neighbor to the north,

1367: east, west, and south. These references reported evidence for

1368: the existence of a phase transition in the complexity-entropy diagram at a

1369: critical entropy level. In contrast, however, here and in the previous

1370: sections we find no evidence for such a transition. The reasons that

1371: Refs.~\cite{Lang90a,Li90b,Woot90a,Lang91a} report a transition are two-fold.

1372: First, they used very restricted measures of randomness and complexity:

1373: entropy of \emph{single isolated} sites and mutual information of neighboring

1374: \emph{pairs} of single sites, respectively. These choices have the effect of

1375: projecting organization \emph{onto} their complexity-entropy diagrams. The

1376: organization seen is largely a reflection of constraints on the chosen

1377: measures, not of intrinsic properties of the CAs. Second, they do not sample

1378: the space of CA's uniformly; rather, they parametrize the space of CAs and

1379: sample only by sweeping their single parameter. This results in a sample of

1380: CA space that is very different from uniform and that is biased toward higher

1381: complexity CAs. For a further discussion of complexity-entropy diagrams for

1382: cellular automata, including a discussion of

1383: Refs.~\cite{Lang90a,Li90b,Woot90a,Lang91a}, see Ref.~\cite{Feld06a}.

1384:

1385:

1386: % **********************************************************************

1387: \subsection{Markov Chain Processes}

1388:

1389: In this and the next section, we consider two classes of process that

1390: provide a basis of comparison for the preceding nonlinear dynamics

1391: and statistical mechanical systems: those generated by Markov chains and

1392: topological \eMs. These classes are complementary to each other in the

1393: following sense.  Topological \eMs\ represent structure in terms of which

1394: sequences (or configurations) are allowed or not. When we explore the space

1395: of topological \eMs, the associated processes differ in which sets of

1396: sequences occur and which are forbidden. In contrast, when exploring

1397: Markov chains, we fix a set of allowed words---in the present case the

1398: full set of binary sequences---and then vary the probability with

1399: which subwords occur. These two classes thus represent two different

1400: types of possible organization in intrinsic computation---types that

1401: were mixed in the preceding example systems.

1402:

1403: In Fig.~\ref{null} we plot $\EE$ versus $\hmu$ for order-$2$

1404: ($4$-state) Markov chains over a binary alphabet. Each element in the

1405: stochastic transition matrix $T$ is chosen uniformly from the unit

1406: interval. The elements of the matrix are then normalized row by row so

1407: that $\sum_j T_{ij} = 1$. We generated $10^5$ such matrices and formed

1408: the complexity-entropy diagram shown in Fig.~\ref{null}. Since these

1409: processes are order-$2$ Markov chains, the bound of

1410: Eq. (\ref{EE_hmu_bound}) applies.  This bound is the sharp, linear

1411: upper limit evident in Fig.~\ref{null}: $\EE \, = 2 - 2\hmu$.

1412:

1413: It is illustrative to compare the $4$-state Markov chains considered

1414: here with the 1D NNN Ising models of Sec. \ref{1D.spin.section}.

1415: The order-$2$ (or $4$-state Markov) chains with a binary alphabet are

1416: those systems

1417: for which the value of a site depends on the previous two sites, but

1418: no others.  In terms of spin systems, then, this is a spin-$1/2$

1419: (i.e., binary) system with nearest- and  next-nearest neighbors.  The

1420: transition matrix for the Markov chain is $4 \times 4$ and thus has

1421: $16$ elements.  However, since each row of the transition matrix

1422: must be normalized, there are $12$ independent parameters for this

1423: model class.  In contrast, there are only $3$ independent parameters

1424: for the 1D NNN Ising chain---the parameters $J_1$, $J_2$, $B$, and

1425: the temperature $T$.  One of the parameters may be viewed as

1426: setting an energy scale, so only three are independent.

1427:

1428: Thus, the 1D NNN systems are a proper subset of the $4$-state Markov chains.

1429: Note that their complexity-entropy diagrams are very different, as a quick

1430: glance at Figs.~\ref{Ising.Batcape} and \ref{null} confirms. The reason for

1431: this is that the Ising model, due to its parametrization (via the Hamiltonian

1432: of Eq.~(\ref{Hamiltonian})), samples the space of processes in a

1433: very different way than the Markov chains. This underscores the

1434: crucial role played by the choice of model and, so too, the choice in

1435: parametrizing a model space. Different parametrizations of the same

1436: model class, when sampled uniformly over those parameters, yield

1437: complexity-entropy diagrams with different structural properties.

1438:

1439:

1440: % **********************************************************************

1441: \begin{figure}[tbp]

1442: \epsfxsize=3.0in

1443: \begin{center}

1444: \leavevmode

1445: \epsffile{markov.eps}

1446: \end{center}

1447: \vspace{-6mm}

1448: \caption{Excess-entropy, entropy-rate pairs for $10^5$ randomly

1449:   selected $4$-state Markov chains.

1450:   }

1451: \vspace{-2mm}

1452: \label{null}

1453: \end{figure}

1454: % **********************************************************************

1455:

1456: \subsection{The Space of Processes: Topological \EMs}

1457:

1458: The preceding model classes are familiar from dynamical systems theory,

1459: statistical mechanics, and stochastic process theory. Each has served

1460: an historical purpose in their respective fields---purposes that reflect

1461: mathematically, physically, or statistically useful parametrizations of

1462: the space of processes. In the preceding sections we explored these

1463: classes, asked what sort of processes they could generate, and then

1464: calculated complexity-entropy pairs for each process to reveal the

1465: range of possible information processing within each class.

1466:

1467: Is there a way, though, to directly explore the space of processes, without

1468: assuming a particular model class or parametrization? Can each process be

1469: taken at face value and tell us how it is structured? More to the point, can

1470: we avoid making structural assumptions, as done in the preceding sections?

1471:

1472: Affirmative answers to these questions are found in the approach laid out

1473: by \emph{computational mechanics} \cite{Crut89,Crut92c,Shal01a}.

1474: Computational mechanics demonstrates that each process has an optimal,

1475: minimal, and unique representation---the \emph{\eM}---that captures the

1476: process's structure. Due to optimality, minimality, and uniqueness, the

1477: \eM\ may be viewed as \emph{the} representation of its associated process. In

1478: this sense, this representation is parameter free. To determine an \eM\ for a

1479: process one calculates a set of \emph{causal states} and their transitions.

1480: In other words, one does not specify a priori the number of states or

1481: the transition structure between them. Determining the \eM\ makes such no

1482: structural assumptions \cite{Crut92c,Shal01a}.

1483:

1484: Using the one-to-one relationship between processes and their \eMs, here we

1485: invert the preceding logic of going from a process to its \eM. We explore the

1486: space of processes by systematically enumerating \eMs\ and then calculating

1487: their excess entropies ${\bf E}$ and their entropy rates $h_\mu$. This gives

1488: a direct view of how intrinsic computation is organized in the space of

1489: processes.

1490:

1491: As a complement to the Markov chain exploration of how intrinsic computation

1492: depends on transition probability variation, here we examine how an \eM's

1493: structure (states and their connectivity) affects information processing. We

1494: do this by restricting attention to the class of \emph{topological \eMs} whose

1495: branching transition probabilities are fair (equally probable). (An example

1496: is shown in Fig.~\ref{f_mn}.)

1497:

1498: If we regard two \eMs\ isomorphic up to variation in transition

1499: probabilities as members of a single equivalence class, then each such

1500: class of \eMs\ contains precisely one topological \eM. (Symbolic dynamics

1501: \cite{Lind95a} refers to a related class of representations as

1502: \emph{topological Markov chains}. An essential, and important,

1503: difference is that \eMs\ always have the smallest number of states.)

1504:

1505: \begin{table}

1506: \begin{tabular}{|c|c|}

1507: \hline

1508:   {Causal States} & {Topological}   \\

1509:        n          & {\eMs}          \\

1510: \hline

1511:        1          & 3               \\

1512:        2          & 7               \\

1513:        3          & 78              \\

1514:        4          & 1,388           \\

1515:        5          & 35,186          \\

1516: \hline

1517: \end{tabular}

1518: \caption{The number of topological binary \eMs\ up to $n = 5$ causal states.

1519:   (After Ref. \protect\cite{McTa05a}.)

1520: \label{proclangcount}

1521:   }

1522: \end{table}

1523:

1524: It turns out that the topological \eMs\ with a finite number of states can be

1525: systematically enumerated \cite{McTa05a}. Here we consider only \eMs\ for

1526: binary processes: $\mathcal{A} = \{0,1\}$. Two \eMs\ are

1527: isomorphic and generate essentially the same stochastic process,

1528: if they are related by a relabeling of states or if their output

1529: symbols are exchanged: $0$ is mapped to $1$ and vice versa. The number

1530: of isomorphically distinct topological \eMs\ of $n=1,\ldots,5$~states

1531: is listed in Table~\ref{proclangcount}.

1532:

1533: % **********************************************************************

1534: \begin{figure*}[tbp]

1535: \epsfxsize=6.0in

1536: \begin{center}

1537: \leavevmode

1538: \epsffile{Evshmu.eps}

1539: \end{center}

1540: \vspace{-6mm}

1541: \caption{Complexity-entropy pairs $(h_\mu,\EE)$ for all topological binary

1542:   \eMs\ with $n = 1, \ldots, 4$ states and for $35,041$ of the $35,186$

1543:   $5$-state \eMs. The excess entropy is estimated as $\EE(L) = H(L) - L \hmu$

1544:   using the exact value for the entropy rate $\hmu$ and a storage-efficient

1545:   type-class algorithm \cite{Youn93a} for the block entropy $H(L)$. The

1546:   estimates were made by increasing $L$ until $\EE(L) - \EE(L-1) < \delta$,

1547:   where $\delta = 0.0001$ for $1$, $2$, and $3$ states; $\delta = 0.0050$

1548:   for $4$ states; and $\delta = 0.0100$ for 5 states.

1549:   }

1550: \label{process.plot}

1551: \end{figure*}

1552: % **********************************************************************

1553:

1554: In Fig.~\ref{process.plot} we plot their $(\hmu,\EE)$ pairs. There one

1555: sees that the complexity-entropy diagram exhibits quite a bit of

1556: organization, with variations from very low to very high density of

1557: \eMs\ co-existing with several distinct vertical (iso-entropy)

1558: families. To better understand the structure in the complexity-entropy

1559: diagram, though, it is helpful to consider bounds on the complexities

1560: and entropies of Fig.~\ref{process.plot}. The minimum complexity, $\EE = 0$,

1561: corresponds to machines with only a single state. There are two possibilities

1562: for such binary \eMs. Either they generate all $1$s (or $0$s) or all sequences

1563: occurring with equal probability (at each length). If the latter, then

1564: $\hmu = 1$; if the former, $\hmu = 0$. These two points, $(0,0)$ and $(1,0)$,

1565: are denoted with solid circles along Fig.~\ref{process.plot}'s horizontal axis.

1566:

1567: % **********************************************************************

1568: \begin{figure}[tbp]

1569: \epsfxsize=3.0in

1570: \begin{center}

1571: \leavevmode

1572: \epsffile{f_mn.eps}

1573: \end{center}

1574: \caption{An example topological \eM\ for a cyclic process in $\mc{F}_{5,3}$.

1575:   Note that branching occurs only between pairs of successive states in

1576:   the cyclic chain.   The excess entropy for this process is $\log_2 5

1577:   \approx 2.32$, and the entropy rate is $3/5$.

1578:   }

1579: \label{f_mn}

1580: \end{figure}

1581: % **********************************************************************

1582:

1583: The maximum $\EE$ in the complexity-entropy diagram is $\log_2 5

1584: \approx 2.3219$.  One such \eM\ corresponds to the zero-entropy,

1585: period-$5$ processes. And there are four similar processes with

1586: periods $p = 1, 2, 3, 4$ at the points $(0, \log_2 p)$. These are

1587: denoted on the figure by the tokens along the left vertical

1588: axis.

1589:

1590: There are other period-$5$ \emph{cyclic, partially random} processes with

1591: maximal complexity, though; those with causal states in a cyclic

1592: chain. These have $b = 1, 2, 3, 4$ branching transitions between successive

1593: states in the chain and so positive entropy. These appear as a horizontal

1594: line of enlarged square tokens along in the upper portion of the

1595: complexity-entropy diagram.

1596: Denote the family of $p$-cyclic processes with $b$ branchings as

1597: $\mc{F}_{p,b}$. An \eM\ illustrating $\mc{F}_{5,3}$ is shown in

1598: Fig.~\ref{f_mn}.  The excess entropy for this process is $\log_2 5

1599: \approx 2.32$, and the entropy rate is $3/5$.

1600:

1601: Since \eMs\ for cyclic processes consist of states in a single loop,

1602: their excess entropies provide an upper bound among \eMs\ that generate

1603: $p$-cyclic processes with $b$ branchings states, namely:

1604: \begin{equation}

1605:   \EE(\mc{F}_{p,b}) = \log_2 (p) ~.

1606: \end{equation}

1607: Clearly, $\EE(\mc{F}_{p,b}) \rightarrow \infty$ as $p \rightarrow \infty$.

1608: Their entropy rates are given by a similarly simple expression:

1609: \begin{equation}

1610:   \hmu(\mc{F}_{p,b}) = \frac{b}{p} ~.

1611: \end{equation}

1612: Note that $\hmu(\mc{F}_{p,b}) \rightarrow 0$ as $p \rightarrow \infty$

1613: with fixed $b$ and $\hmu(\mc{F}_{p,b}) \rightarrow 1$ as $b \rightarrow p$.

1614: Together, then, the family $\mc{F}_{5,b}$ gives an upper bound to the

1615: complexity-entropy diagram.

1616:

1617: The processes $\mc{F}_{p,b}$ are representatives of the highest points

1618: of the prominent jutting vertical towers of \eMs\ so prevalent in

1619: Fig.~\ref{process.plot}. It therefore seems reasonable

1620: to expect the $(\hmu,\EE)$~coordinates for $p$-cyclic process

1621: languages to possess at least $p-1$~vertical towers, distributed

1622: evenly at $\hmu=b/p$, $b=1, \dots, p-1$, and for these towers to

1623: correspond with towers of $m$-cyclic process languages whenever $m$ is

1624: a multiple of~$p$.

1625:

1626: These upper bounds are one key difference from earlier classes in which there

1627: was a decreasing linear upper bound on complexity as a function of entropy

1628: rate: $\EE \leq R(1-h_\mu)$. That is, in the space of processes, many are not

1629: so constrained. The subspace of topological \eMs\ illustrates that there are

1630: many highly entropic, highly structured processes. Some of the more

1631: familiar model classes appear to inherit, in their implied parametrization of

1632: process space, a bias away from such processes.

1633:

1634: It is easy to see that the families $\mc{F}_{p,p-1}$ and

1635: $\mc{F}_{p,1}$ provide upper and lower bounds for~$\hmu$,

1636: respectively, among the process languages that achieve

1637: maximal~$\EE$ and for which~$\hmu > 0$. Indeed, the smallest

1638: positive~$\hmu$ possible is achieved when only a single of

1639: the equally probable states has more than one outgoing transition.

1640:

1641: More can be said about this picture of the space of intrinsic

1642: computation spanned by topological \eMs\ \cite{McTa05a}. Here,

1643: however, our aim is to illustrate how rich the diversity of intrinsic

1644: computation can be and to do so independent of conventional

1645: model-class parametrizations. These results allow us to probe in a

1646: systematic way a subset of processes in which structure dominates.

1647:

1648: \section{Discussion and Conclusion}

1649: \label{Discussion}

1650:

1651: Complexity-entropy diagrams provide a common view of the intrinsic

1652: information processing embedded in different processes. We used them

1653: to compare markedly different systems: one-dimensional maps of the

1654: unit interval; one- and two-dimensional Ising models; cellular automata;

1655: Markov chains; and topological \eMs. The exploration of each class turned

1656: different knobs in the sense that we adjusted different parameters:

1657: temperature, nonlinearity, coupling strength, cellular automaton rule,

1658: and transition probabilities.  Moreover, these parameters had very

1659: different effects.  Changing the temperature and coupling constants in the

1660: Ising models altered the probabilities of configurations, but it did not

1661: change which configurations were allowed to occur. In contrast, the

1662: topological \eMs\ exactly expressed what it means for different processes to

1663: have different sets of allowed sequences. Changing the CA rules or the

1664: nonlinearity parameter in the logistic map combined these effects: the

1665: allowed sequences or the probability of sequences or both changed.

1666: In this way, the survey illustrated in dramatic fashion one of the benefits

1667: of the complexity-entropy diagram: it allows for a common comparison

1668: across rather varied systems.

1669:

1670: For example, the complexity-entropy diagram for the radius-$2$,

1671: one-dimensional cellular automata, shown in Fig.~\ref{1D.rad2.spatial.hvsE},

1672: is very different from that of the logistic map, shown in

1673: Fig.~\ref{logistic.banded.plot}.  For the logistic map, there is a

1674: distinct lower bound for the excess entropy as a function of the

1675: entropy rate. In Fig.~\ref{logistic.banded.plot} this is seen as the

1676: large forbidden region at the diagram's lower portion. In sharp contrast,

1677: in Fig.~\ref{1D.rad2.spatial.hvsE} no such forbidden region is seen.

1678:

1679: At a more general level of comparison, the survey showed that for a given

1680: $\hmu$, the excess entropy $\EE$ can be arbitrarily small.  This suggests

1681: that the intrinsic computation of cellular automata and the logistic map are

1682: organized in fundamentally different ways.  In turn, the 1D and 2D

1683: Ising systems exhibit yet another kind of

1684: information processing capability. Each of has well defined ground

1685: states---seen as the zero-entropy tips of the ``batcapes'' in

1686: Figs.~\ref{Ising.Batcape} and \ref{2DIsingBatcape}. These ground states are

1687: robust under small amounts of noise---i.e., as the temperature increases from

1688: zero. Thus, there are almost-periodic configurations at low entropy. In

1689: contrast, there do not appear to be any almost-periodic configurations at low

1690: entropy for the logistic map of Fig.~\ref{logistic.banded.plot}.

1691:

1692: Our last example, topological \eMs, was a rather different kind of

1693: model class. In fact, we argued that it gave a direct view into the

1694: very structure of the space of processes. In this sense, the

1695: complexity-entropy diagram was parameter free.  Note, however, that by

1696: choosing all branching probabilities to be fair, we intentionally biased

1697: this model class toward high-complexity, high-entropy processes. Nevertheless,

1698: the distinction between the topological \eM\ complexity-entropy diagram of

1699: Fig.~\ref{process.plot} and the others is striking.

1700:

1701: The diversity of possible complexity-entropy diagrams points to their

1702: utility as a way to compare information processing across different classes.

1703: Complexity-entropy diagrams can be empirically calculated from observed

1704: configurations themselves. The organization reflected in the complexity-entropy

1705: diagram then provides clues as to an appropriate model class to use for the

1706: system at hand.  For example, if one found a complexity-entropy diagram with

1707: a batcape structure like that of Figs.~\ref{Ising.Batcape} and

1708: \ref{2DIsingBatcape}, this suggests that the class could be well modeled

1709: using energies that, in turn, were expressed via a Hamiltonian.

1710: Complexity-entropy diagrams may also be of use in classifying behavior

1711: within a model class.  For example, as noted above, a type of

1712: complexity-entropy diagram has already been successfully used to

1713: distinguish between different types of structure in anatomical MRI

1714: images of brains \cite{Youn05a,Youn08a}.

1715:

1716: Ultimately, the main conclusion to draw from this survey is that there

1717: is a large diversity of complexity-entropy diagrams. There is certainly

1718: not a universal complexity-entropy curve, as once hoped. Nor is it the case

1719: that there are even qualitative similarities among complexity-entropy

1720: diagrams.  They

1721: capture distinctive structure in the intrinsic information processing

1722: capabilities of a class of processes. This diversity is not a negative

1723: result. Rather, it indicates the utility of this type of

1724: intrinsic computation analysis, and it optimistically points

1725: to the richness of information processing available in the mathematical and

1726: natural worlds. Simply put, information processing is too complex to be

1727: simply universal.

1728:

1729: \section*{Acknowledgments}

1730:

1731: Our understanding of the relationships between complexity and entropy

1732: has benefited from numerous discussions with Chris Ellison, Kristian

1733: Lindgren, John Mahoney, Susan McKay, Cris Moore, Mats Nordahl, Dan Upper,

1734: Patrick Yannul, and Karl Young. The authors thank, in particular, Chris

1735: Ellison, for help in producing the \eM\ complexity-entropy diagram.

1736: This work was supported at the Santa Fe Institute under the Computation,

1737: Dynamics, and Inference Program via SFI's core grants from the National

1738: Science and MacArthur Foundations. Direct support was provided from DARPA

1739: contract F30602-00-2-0583. The CSC Network Dynamics Program funded by Intel

1740: Corporation also supported this work. DPF thanks the Department of Physics

1741: and Astronomy at the University of Maine for its hospitality. The REUs,

1742: including one of the authors (CM), who worked on related parts of the

1743: project at SFI were supported by the NSF during the summers of 2002 and

1744: 2003.

1745:

1746: \bibliography{dpf}

1747:

1748: \end{document}

1749: