1: % The Organization of Intrinsic Computation: Complexity-Entropy Diagrams
2: % dpf: 12/06/02
3: % jpc: 12/11/02, 12/14/02
4: % cm: 12/18/02
5: % dpf: 12/11/03, dynamics, CAs, Ising sections only
6: % jpc: 2/11/04, 6/18/04
7: % dpf: 8/22/04
8: % jpc: 2/05/06
9: % dpf: 2/09/06
10: % jpc: 3/04/06
11: % dpf: 3/07/08: edits throughout. updated Ising figures. removed
12: % comp mech discussion
13: % jpc: 3/19/08, 5/14/08, 6/23/08, sumbitted to CHAOS
14:
15: \documentclass[superscriptaddress,twocolumn,showpacs,preprintnumbers,floatfix]{revtex4}
16: %\documentclass[12pt]{article}
17:
18: \usepackage{url}
19: \usepackage{graphics}
20: \usepackage{amsfonts}
21: \usepackage{amsmath}
22: \usepackage{amssymb}
23: \usepackage{epsf}
24:
25: \input{cmechabbrev.tex}
26:
27: %\oddsidemargin=0.0in
28: %\evensidemargin=0.0in
29:
30: %\topmargin=-0.16in
31: % NOTE: dpf needs this or else the text is truncated when printing
32:
33: %\textwidth=6.5in
34: %\textheight=8.74in
35: %\headsep=0in
36: %\parindent=.5in
37: %\footskip=.20in
38:
39: \newcommand{\mc}[1]{\mathcal{#1}}
40:
41: \begin{document}
42:
43: \title{The Organization of Intrinsic Computation:\\
44: Complexity-Entropy Diagrams and\\
45: the Diversity of Natural Information Processing}
46:
47: \author{David P. Feldman}
48: \email{dave@hornacek.coa.edu}
49: \affiliation{College of the Atlantic, Bar Harbor, MA 04609}
50: \affiliation{Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501}
51: \affiliation{Complexity Sciences Center and Physics Department,
52: University of California, Davis, One Shields Ave, Davis CA 95616}
53:
54: \author{Carl S. McTague}
55: \email{c.mctague@dpmms.cam.ac.uk}
56: \affiliation{DPMMS, Centre for Mathematical Sciences,
57: University of Cambridge, Wilberforce Road, Cambridge, CB3 0WB, England}
58: \affiliation{Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501}
59:
60: \author{James P. Crutchfield}
61: \email{chaos@cse.ucdavis.edu}
62: \affiliation{Complexity Sciences Center and Physics Department,
63: University of California, Davis, One Shields Ave, Davis CA 95616}
64: \affiliation{Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501}
65:
66: \date{\today}
67:
68: \begin{abstract}
69: Intrinsic computation refers to how dynamical systems store,
70: structure, and transform historical and spatial information. By
71: graphing a measure of structural complexity against a measure of
72: randomness, complexity-entropy diagrams display the range and
73: different kinds of intrinsic computation across an entire class of
74: system. Here, we use complexity-entropy diagrams to analyze intrinsic
75: computation in a broad array of deterministic nonlinear and linear
76: stochastic processes, including maps of the interval, cellular
77: automata and Ising spin systems in one and two dimensions, Markov
78: chains, and probabilistic minimal finite-state machines. Since
79: complexity-entropy diagrams are a function only of observed
80: configurations, they can be used to compare systems without reference
81: to system coordinates or parameters. It has been known for some time
82: that in special cases complexity-entropy diagrams reveal that high
83: degrees of information processing are associated with phase
84: transitions in the underlying process space, the so-called ``edge of
85: chaos''. Generally, though, complexity-entropy diagrams differ
86: substantially in character, demonstrating a genuine diversity of
87: distinct kinds of intrinsic computation.
88: \end{abstract}
89:
90: % Insert PACS numbers on next line
91: \pacs{
92: %02.50.Ey % Stochastic processes
93: %02.50.Ga % Markov processes
94: 05.20.-y % Classical statistical mechanics
95: 05.45.-a % Nonlinear dynamics and nonlinear dynamical systems
96: %05.45.Tp % Time series analysis
97: %65.40.Gr % Thermodynamics of solids: Entropy and other thermodynamical
98: % quantities
99: 89.70.+c % Information science
100: 89.75.Kd % Complex Systems: Patterns
101: }
102:
103: \preprint{Santa Fe Institute Working Paper 08-06-XXX}
104: \preprint{arxiv.org:0806.XXXX [nlin.CD]}
105:
106: \keywords{structure, randomness, intrinsic computation, excess entropy,
107: entropy rate, statistical complexity, dynamical systems, spin systems,
108: cellular automata, epsilon-machines}
109:
110: \maketitle
111:
112: \bibliographystyle{unsrt}
113:
114: %\tableofcontents
115:
116: %\section*{}
117:
118: {\bf
119: Discovering organization in the natural world is one of science's central
120: goals. Recent innovations in nonlinear mathematics and physics, in concert
121: with analyses of how dynamical systems store and process information,
122: has produced a growing body of results on quantitative ways to measure
123: natural organization. These efforts had their origin in earlier investigations
124: of the origins of randomness. Eventually, however, it was realized that
125: measures of randomness do not capture the property of organization. This
126: led to the recent efforts to develop measures that are, on the one hand,
127: as generally applicable as the randomness measures but which, on the other,
128: capture a system's complexity---its organization, structure, memory, regularity,
129: symmetry, and pattern. Here---analyzing processes from dynamical systems,
130: statistical mechanics, stochastic processes, and automata theory---we
131: show that measures of structural complexity are a necessary and useful
132: complement to describing natural systems only in terms of their randomness.
133: The result is a broad appreciation of the kinds of information processing
134: embedded in nonlinear systems. This, in turn, suggests new physical substrates
135: to harness for future developments of novel forms of computation.
136: }
137:
138: \section{Introduction}
139:
140: The past several decades have produced a growing body of work on ways
141: to measure the organization of natural systems. (For early work, see,
142: e.g., Refs.~\cite{Crut83a,Shaw84,Wolf84,Benn86,Hube86,Gras86,Szep86,Erik87,Kopp87,Land88a,Lloy88,Lind88b,Szep89a,Crut89,Benn90,Crut90,Badi91a,Li91,Crut92c,Bate93a};
143: for more recent reviews, see
144: Refs.~\cite{Wack94,Ebel97b,Badi97,Feld98a,Feld98b,Bial01a,Crut03a,Shal01a}.)
145: The original interest derived from explorations, during the 60's to
146: the mid-80's, of behavior generated by nonlinear dynamical systems.
147: The thread that focused especially on pattern and structural complexity
148: originated, in effect, in attempts to reconstruct geometry \cite{Pack80},
149: topology \cite{Muld93a}, equations of motion \cite{Crut87a}, periodic orbits
150: \cite{Auer87a}, and stochastic processes \cite{Fras90b} from observations
151: of nonlinear processes. More recently, developing
152: and using measures of complexity has been a concern of researchers
153: studying neural computation \cite{Tono94a,Wenn05a}, the clinical
154: analysis of patterns from a variety of medical signals and imaging
155: technologies \cite{Sapa98a,Marw02a,Youn05a}, and machine learning and
156: synchronization \cite{Bial00a,Neme00a,Crut01b,Debo04a,Feld04a},
157: to mention only a few contemporary applications.
158:
159: These efforts, however, have their origin in an earlier period in
160: which the central concern was not the emergence of organization,
161: but rather the origins of randomness. Specifically, measures were
162: developed and refined that quantify the degree of randomness and
163: unpredictability generated by dynamical systems. These
164: quantities---metric entropy, Lyapunov characteristic exponents,
165: fractal dimensions, and so on---now provide an often-used and well
166: understood set of tools for detecting and quantifying deterministic
167: chaos of various kinds. In the arena of stochastic processes,
168: Shannon's entropy rate predates even these and has been productively
169: used for half a century as a measure of an information source's degree
170: of randomness or unpredictability \cite{Cove91}.
171:
172: Over this long early history, researchers came to appreciate that
173: dynamical systems were capable of an astonishing array of behaviors
174: that could not be meaningfully summarized by the entropy rate or
175: fractal dimension. The reason for this is that, by their definition,
176: these measures of randomness do not capture the property of
177: organization. This realization led to the considerable contemporary
178: efforts just cited to develop measures that are as generally
179: applicable as the randomness measures but that capture a system's
180: complexity---its organization, structure, memory, regularity,
181: symmetry, pattern, and so on.
182:
183: Complexity measures which do this are often referred to as {\em
184: statistical} or {\em structural complexities} to indicate that they
185: capture a property distinct from randomness. In contrast, {\em
186: deterministic complexities}---such as the Shannon entropy rate,
187: Lyapunov characteristic exponents, and the Kolmogorov-Chaitin
188: complexity---are maximized for random systems. In essence, they are
189: simply alternatives to measuring the same property---degrees of
190: randomness. Here, we shall emphasize complexity of the structural
191: and statistical sort which measures a property complementary to
192: randomness. We will demonstrate, across a broad range of model
193: systems, that measures of structural complexity are a necessary and
194: useful addition to describing a process in terms of its randomness.
195:
196:
197: \subsection{Structural Complexity}
198:
199: How might one go about developing a structural complexity measure? A typical
200: starting point is to argue that that the structural complexity of a system
201: must reach a maximum between the system's perfectly ordered and perfectly
202: disordered extremes
203: \cite{Crut82b,Hube86,Gras86,Benn90,Crut89,Crut92c,Kopp87,Gell96}.
204: The basic idea behind these claims is that a system which is either
205: perfectly predictable (e.g., a periodic sequence) or perfectly
206: unpredictable (e.g., a fair coin toss) is deemed to have zero
207: structural complexity. Thus, the argument goes, a system with either
208: zero entropy or maximal entropy (usually normalized to one), has zero
209: complexity; these systems are simple and not highly structured. This
210: line of reasoning further posits that in between these extremes lies
211: complexity. Those objects that we intuitively consider to be complex
212: must involve a continuous element of newness or novelty (i.e.,
213: entropy), but not to such an extent that the novelty becomes
214: completely unpredictable and degenerates into mere noise.
215:
216: In summary, then, it is common practice to require that a structural
217: complexity measure vanish in the perfectly ordered and perfectly
218: disordered limits. Between these limits, the complexity is usually
219: assumed to achieve a maximum. These requirements are often
220: taken as axioms from which one constructs a complexity measure
221: that is a single-valued function of randomness as measured by, say, entropy.
222: In both technical and popular scientific literatures, it is not uncommon
223: to find a ``complexity'' plotted against entropy in merely schematic form
224: as a sketch of a generic complexity function that vanishes for extreme
225: values of entropy and achieves a maximum in a middle region
226: \cite{Hube86,Atla91a,Gell94a,Flak99a}. Several authors, in fact, have taken
227: these as the \emph{only} constraints defining complexity
228: \cite{Shin99,Lope95,Plas96,Calb01a,Lope01a}.
229:
230: Here we take a different approach: {\em We do not prescribe how
231: complexity depends on entropy.} One reason for this is that a useful
232: complexity measure needs to do more than satisfy the boundary
233: conditions of vanishing in the high- and low-entropy limits
234: \cite{Feld98a,Crut00a,Bind00}. In particular, a useful complexity
235: measure should have an unambiguous interpretation that accounts in
236: some direct way for how correlations are {\em organized} in a system.
237: To that end we consider a well defined and frequently used
238: complexity measures---the \emph{excess entropy}---and empirically
239: examine its relationship to entropy for a variety of systems.
240:
241:
242: \subsection{Complexity-Entropy Diagrams}
243:
244: The diagnostic tool that will be the focal point for our studies is
245: the {\em complexity-entropy diagram}. Introduced in
246: Ref.~\cite{Crut89}, a complexity-entropy diagram plots structural
247: complexity (vertical axis) versus randomness (horizontal axis) for
248: systems in a given model class. Complexity-entropy diagrams
249: allow for a direct view of the complexity-entropy relationship within
250: and across different systems. For example, one can easily read whether
251: or not complexity is a single-valued function of entropy.
252:
253: The complexity and entropy measures that we use capture a system's
254: {\em intrinsic computation} \cite{Crut92c}: how a system stores,
255: organizes, and transforms information. A crucial point is that these
256: measures of intrinsic computation are properties of the system's
257: configurations. They do not require knowledge of the equations of
258: motion or Hamiltonian or of system parameters (e.g., temperature,
259: dissipation, or spin-coupling strength) that generated the configurations.
260: Hence, in addition to the many cases in which they can be calculated
261: analytically, they can be inductively calculated from observations of
262: symbolic sequences or configurations.
263:
264: Thus, a complexity-entropy diagram measures intrinsic computation
265: in a parameter-free way. This allows for the direct comparison of
266: intrinsic computation across very different classes since a
267: complexity-entropy diagram expresses this in terms of common
268: ``information-processing'' coordinates. As such, a complexity-entropy
269: diagram demonstrates how much a given resource (e.g., stored
270: information) is required to produce a given amount of randomness
271: (entropy), or how much novelty (entropy) is needed to produce a
272: certain amount of statistical complexity.
273:
274: Recently, a form of complexity-entropy diagram has been used in the
275: study of anatomical MRI brain images \cite{Youn05a,Youn08a}. This
276: work showed that complexity-entropy diagrams give a reliable way
277: to distinguish between ``normal'' brains and those experiencing
278: cortical thinning, a condition associated with Alzheimer's disease.
279: Complexity-entropy diagrams have also recently been used as part of a
280: proposed test to distinguish chaos from noise \cite{Ross07a}. And
281: Ref.~\cite{Mart06a} calculates complexity-entropy diagrams for a
282: handful of different complexity measures using the sequences generated
283: by the symbolic dynamics of various chaotic maps.
284:
285: Historically, one of the motivations behind complexity-entropy diagrams was
286: to explore the common claim that complexity achieves a {\em sharp} maximum
287: at a well defined boundary between the order-disorder extremes. This led,
288: for example, to the widely popularized notion of the ``edge of chaos''
289: \cite{Pack88a,Kauf93a,Lang90a,Wald92a,Ray94a,Melb00a,Bert03a,Bert04a}---namely,
290: that objects achieve maximum complexity at a \emph{boundary} between
291: order and disorder.
292: Although these particular claims have been criticized \cite{Mitc93}, during
293: the same period it was shown that at the \emph{onset of chaos} complexity
294: does reach a maximum. Specifically, Ref.~\cite{Crut89} showed that the
295: \emph{statistical complexity} diverges at the accumulation point of the
296: period-doubling route to chaos. This led to an analytical theory that
297: describes exactly the interdependence of complexity and entropy for this
298: universal route to chaos \cite{Crut90}. Similarly, another complexity measure,
299: the \emph{excess entropy} \cite{Crut83a,Shaw84,Gras86,Lind88b,Li91,Crut03a,Rate96a,Freu96a,Schu02a}
300: has also been shown to diverge at the period-doubling critical point.
301:
302: This latter work gave some hope that there would be a universal relationship
303: between complexity and entropy---that some appropriately defined measure of
304: complexity plotted against an appropriate entropy would have the same
305: functional form for a wide variety of systems. In part, the motivation for
306: this was the remarkable success of scaling and data collapse for critical
307: phenomena. Data collapse is a phenomena in which certain variables for very
308: different systems collapse onto a single curve when appropriately rescaled
309: near the critical point of a continuous phase transition. For example, the
310: magnetization and susceptibility exhibit data collapse near the
311: ferromagnet-paramagnet transition. See, for example,
312: Refs.~\cite{Stan99a,Yeom92} for further discussion.
313: Data collapse reveals that different
314: systems---e.g., different materials with different critical
315: temperatures---possess a deep similarity despite differences in their details.
316:
317: The hope, then, was to find a similar universal curve for complexity as
318: a function of entropy. One now sees that this is not and, fortunately, cannot
319: be the case. Notwithstanding special parametrized examples, such as
320: period-doubling and other routes to chaos, a wide range of complexity-entropy
321: relationships exists \cite{Crut92c,Li91,Crut97a,Feld98a}. This is a point
322: that we will repeatedly reinforce in the following.
323:
324:
325: \subsection{Surveying Complexity-Entropy Diagrams}
326:
327: We will present a survey of the relationships between structure and
328: randomness for a number of familiar, well studied systems including
329: deterministic nonlinear and linear stochastic processes and well known
330: models of computation. The systems we study include maps of the
331: interval, cellular automata and Ising models in one and two
332: dimensions, Markov chains, and minimal finite-state machines. To our
333: knowledge, this is the first such cross-model survey of
334: complexity-entropy diagrams.
335:
336: The main conclusion that emerges from our results is that there is a large
337: range of possible complexity-entropy behaviors. Specifically, there is
338: not a universal complexity-entropy curve, there is not a general
339: complexity-entropy transition, nor is it case that complexity-entropy
340: diagrams for different systems are even qualitatively similar. These results
341: give a concrete picture of the very different types of relationship between
342: a system's rate of information production and the structural organization
343: which produces that randomness. This diversity opens up a number
344: of interesting mathematical questions, and it appears to suggest a new
345: kind of richness in nature's organization of intrinsic computation.
346: %and to bode well for the future creation of new information
347: %technologies.
348:
349: Our exploration of intrinsic computation is structured as follows: In
350: Section \ref{Info.Theory.Review} we briefly review several
351: information-theoretic quantities, most notably the entropy rate and
352: the excess entropy. In Section \ref{Comp.Ent.Section} we present results
353: for the complexity-entropy diagrams for a wide range of model systems.
354: %: the
355: %logistic map, the tent map, one- and two-dimensional Ising models; one-
356: %and two-dimensional cellular automata; Markov chains; and topological
357: %Markov processes, which are related to finite-state models from computation
358: %theory.
359: In Section \ref{Discussion} we discuss our results, make a number
360: of general comments and observations, and conclude by summarizing.
361:
362:
363: % **********************************************************************
364: \section{Entropy and Complexity Measures}
365: \label{Info.Theory.Review}
366:
367: \subsection{Information-Theoretic Quantities}
368:
369: The complexity-entropy diagrams we will examine make use of two
370: information-theoretic quantities: the excess entropy and the entropy
371: rate. In this section we fix notation and give a brief but
372: self-contained review of them.
373:
374: We begin by describing the stochastic process generated by a system.
375: Specifically, we are interested here in describing the character of
376: bi-infinite, one-dimensional sequences:
377: $\BiInfinity = \ldots, S_{-2}, S_{-1}, S_0, S_1, \ldots$, where
378: the $S_i$'s are random variables that assume values $s_i$ in a
379: finite alphabet $\mathcal{A}$. Throughout, we follow the standard
380: convention that a lower-case letter refers to a particular value of
381: the random variable denoted by the corresponding upper-case letter.
382: In the following, the index $i$ on the $S_i$'s will refer to either
383: space or time.
384:
385: A \emph{process} is, quite simply, the distribution over all possible
386: sequences generated by a system: $\Prob(\BiInfinity)$. Let $\Prob(s_i^L)$
387: denote the probability that a block $S_i^L = S_i S_{i+1} \ldots S_{i+L-1}$
388: of $L$ consecutive symbols takes on the particular values
389: $s_i, s_{i+1}, \ldots , s_{i+L-1} \in \mathcal{A}$. We will assume that
390: the distribution over blocks is stationary: $\Prob(S_i^L) =
391: \Prob(S_{i+M}^L)$ for all $i$, $M$, and $L$. And so we will drop the
392: index on the block probabilities. When there is no confusion, then,
393: we denote by $s^L$ a particular sequence of $L$ symbols, and use
394: $\Prob(s^L)$ to denote the probability that the particular $L$-block
395: occurs.
396:
397: The \emph{support} of a process is the set of allowed sequences---i.e.,
398: those with positive probability. In the parlance of computation theory,
399: a process' support is a formal language: the set of all finite length
400: words that occur at least once in an infinite sequence.
401:
402: A special class of processes that we will consider in subsequent
403: sections are {\em Order-$R$ Markov Chains}. These processes are those
404: for which the joint distribution can be conditionally factored into
405: words $S^R$ of length $R$---that is,
406: \begin{equation}
407: \Prob(\BiInfinity) = \ldots \Prob(S_i^R|S_{i-R}^R)
408: \Prob(S_{i+R}^R|S_i^R) \Prob(S_{i+2R}^R|S_{i+R}^R) \ldots \;.
409: \label{R.Markovian}
410: \end{equation}
411: In other words, knowledge of the current length-$R$ word is all that
412: is needed to determine the distribution of future symbols. As a
413: result, the states of the Markov chain are associated with the
414: $\mathcal{A}^R$ possible values that can be assumed by a length-$R$
415: word.
416:
417: We now briefly review several central quantities of information theory
418: that we will use to develop measures of unpredictability and entropy.
419: For details see any textbook on information theory; e.g.,
420: Ref.~\cite{Cove91}. Let $X$ be a random variable that assumes the
421: values $x \in {\cal X}$, where ${\cal X}$ is a finite set. The probability
422: that $X$ assumes the value $x$ is given by $\Prob(x)$. Also, let $Y$
423: be a random variable that assumes values $y\in {\cal Y}$.
424:
425: The \emph{Shannon entropy} of the variable $X$ is given by:
426: \begin{equation}
427: H[X] \, \equiv \, - \sum_{x \in {\cal X}} \Prob(x) \log_2 {\rm
428: P}(x) \; .
429: \end{equation}
430: The units are given in \emph{bits}. This quantity measures the uncertainty
431: associated with the random variable $X$. Equivalently, $H[X]$ is also
432: the average amount of memory needed to store outcomes of variable $X$.
433:
434: The \emph{joint entropy} of two random variables, X and Y, is defined as:
435: \begin{equation}
436: H[X,Y] \, \equiv \, -\sum_{x \in {\cal X}, y \in {\cal Y} } \Prob(x,y)
437: \log_2 \Prob(x,y) \;.
438: \end{equation}
439: It is a measure of the uncertainty associated with the joint distribution
440: $\Prob(X,Y)$. The \emph{conditional entropy} is defined as:
441: \begin{equation}
442: H[X|Y] \, \equiv \,- \sum_{x \in {\cal X}, y \in {\cal Y} }
443: \Prob(x,y) \log_2 \Prob(x|y) \;,
444: \end{equation}
445: and gives the average uncertainty of the conditional probability
446: $\Prob(X|Y)$. That is, $H[X|Y]$ tells us how uncertain, on
447: average, we are about $X$, given that the outcome of $Y$ is known.
448:
449: Finally, the \emph{mutual information} is defined as:
450: \begin{equation}
451: I [X;Y] \, \equiv \, H[X] - H[X|Y] \;.
452: \label{MI.def}
453: \end{equation}
454: It measures the average reduction of uncertainty
455: of one variable due to knowledge of another. If knowing $Y$ on
456: average reduces uncertainty about $X$, then it makes sense to say that
457: $Y$ carries information about $X$. Note that $I[X;Y] = I[Y;X]$.
458:
459: \subsection{Entropy Growth and Entropy Rate}
460:
461: With these definitions set, we are ready to develop an information-theoretic
462: measure of a process's randomness. Our starting point is to consider blocks
463: of consecutive variables. The \emph{block entropy} is the total Shannon
464: entropy of length-$L$ sequences:
465: \begin{equation}
466: H(L) \, \equiv \, - \sum_{ s^L \in {\cal A}^L} \Prob(s^L)
467: \log_2 \Prob(s^L) \;,
468: \label{H.def}
469: \end{equation}
470: where $L > 0$. The sums run over all possible blocks of
471: length $L$. We define $H(0) \equiv 0$. The block entropy grows monotonically
472: with block length: $H(L) \geq H(L-1)$.
473:
474: For stationary processes the total Shannon entropy typically grows linearly
475: with $L$. That is, for sufficiently large $L$, $H(L) \sim L$. This leads
476: one to define the \emph{entropy rate} $\hmu$ as:
477: \begin{equation}
478: \hmu \, \equiv \,\lim_{L \rightarrow \infty} \frac{H(L)}{L }\;.
479: \label{hmu.def}
480: \end{equation}
481: The units of $\hmu$ are \emph{bits per symbol}.
482: This limit exists for all stationary sequences \cite[Chapter
483: 4.2]{Cove91}. The entropy rate is also know as the \emph{metric
484: entropy} in dynamical systems theory and is equivalent to the
485: \emph{thermodynamic entropy density} familiar from equilibrium
486: statistical mechanics.
487:
488: The entropy rate can be given an additional interpretation as
489: follows. First, we define an $L$-dependent entropy rate estimate:
490: \begin{eqnarray}
491: \hmu(L) &\, = \,& H(L) - H(L\!-\!1) \\
492: & \, = \, & H[S_L|S_{L-1}, S_{L-2} , \ldots , S_1] \;, \;\; L >0 \;.
493: \label{hmu.L.def}
494: \end{eqnarray}
495: We set $\hmu(0) = \log_2 |{\cal A}|$. In words, then, $\hmu(L)$
496: is the average
497: uncertainty of the next variable $S_L$, given that the previous
498: $L\!-\!1$ symbols have been seen. Geometrically, $h_{\mu}(L)$ is the
499: two-point slope of the total entropy growth curve $H(L)$. Since
500: conditioning on more variables can never increase the entropy, it
501: follows that $\hmu(L) \leq \hmu(L-1)$. In the $L \rightarrow \infty$
502: limit, $\hmu(L)$ is equal to the entropy rate defined above in
503: Eq.~(\ref{hmu.def}):
504: \begin{equation}
505: \hmu \, = \, \lim_{L \rightarrow \infty} \hmu(L) \;.
506: \label{hmu.conditional}
507: \end{equation}
508: Again, this limit exists for all stationary processes \cite{Cove91}.
509: Equation (\ref{hmu.conditional}) tells us that $\hmu$ may be viewed
510: as the irreducible randomness in a process---the randomness
511: that persists even after statistics over longer and longer blocks of
512: variables are taken into account.
513:
514: \subsection{Excess Entropy}
515:
516: The entropy rate gives a reliable and well understood measure of
517: the randomness or disorder intrinsic to a process. However, as the
518: introduction noted, this tells us little about the underlying system's
519: organization, structure, or correlations. Looking at the manner in
520: which $\hmu(L)$ converges to its asymptotic value $\hmu$, however,
521: provides one measure of these properties.
522:
523: When observations only over length-$L$ blocks are taken into account,
524: a process appears to have an entropy rate of $\hmu(L)$. This
525: quantity is larger than the true, asymptotic value of the entropy
526: rate $\hmu$. As a result, the process appears more random by
527: $\hmu(L) - \hmu$ bits. Summing these entropy over-estimates over
528: $L$, one obtains the {\em excess entropy}
529: \cite{Crut83a,Shaw84,Gras86,Lind88b}:
530: \begin{equation}
531: \EE \, \equiv \, \sum_{L=1}^{\infty} [\hmu(L) - \hmu] \;.
532: \label{E.def}
533: \end{equation}
534: The units of $\EE$ are \emph{bits}. The excess entropy tells us
535: how much information must be gained before it is possible to infer
536: the actual per-symbol randomness $\hmu$. It is large if the system
537: possesses many regularities or correlations that manifest themselves
538: only at large scales. As such, the excess entropy can serve as a
539: measure of global structure or correlation present in the system.
540:
541: This interpretation is strengthened by noting that the excess
542: entropy can also be expressed as the mutual information between two
543: adjacent semi-infinite blocks of variables \cite{Li91,Crut03a}:
544: \begin{equation}
545: {\bf E } \, = \, \lim_{L \rightarrow \infty} I[ S_{-L}, S_{-L+1}
546: ,S_{-1}; S_0, S_1, \ldots S_{L-1}] \;.
547: \label{E.mutual.info}
548: \end{equation}
549: Thus, the excess entropy measures one type of the memory of the
550: system; it tells us how much knowledge of one half of the system
551: reduces our uncertainty about the other half. If the sequence of
552: random variables is a time series, then $\EE$ is the amount of
553: information the past shares with the future.
554:
555: The excess entropy may also be given a geometric interpretation.
556: The existence of the entropy rate suggests that $H(L)$ grows
557: linearly with $L$ for large $L$ and that the growth rate, or
558: slope, is given by $\hmu$. It is then possible to show that the
559: excess entropy is the ``$y$-intercept'' of the asymptotic form for
560: $H(L)$ \cite{Shaw84,Gras86,Li91,Arno96,Bial00a,Neme00a}:
561: \begin{equation}
562: H(L) \, \sim \, \EE + \hmu L ~,
563: \; {\rm as} \; L \rightarrow \infty \;.
564: \label{H.Scaling.Form}
565: \end{equation}
566: Or, rearranging, we have
567: \begin{equation}
568: \EE \, = \, \lim_{L\rightarrow \infty} \left[ H(L) - \hmu L
569: \right] \;.
570: \end{equation}
571:
572: This form of the excess entropy highlights another interpretation: $\EE$ is
573: the \emph{cost of amnesia}. If an observer has extracted enough information
574: from a system (at large $L$) to predict it optimally ($\sim \hmu$), but
575: suddenly loses all of that information, the process will then appear more
576: random by an amount $H(L) - \hmu L$.
577:
578: To close, note that the excess entropy, originally coined in \cite{Crut83a},
579: goes by a number of different names,
580: including ``stored information'' \cite{Shaw84}; ``effective
581: measure complexity'' \cite{Gras86,Lind88b,Lind89a,Erik87,Ebel02a};
582: ``complexity'' \cite{Li91,Arno96}; ``predictive information''
583: \cite{Bial00a,Neme00a}; and ``reduced R\'enyi entropy of order $1$''
584: \cite{Csor89a,Kauf91a}. For recent reviews on excess entropy, entropy
585: convergence in general, and applications of this approach see
586: Refs.~\cite{Ebel97b,Crut03a,Bial00a}.
587:
588: \subsection{Intrinsic Information Processing Coordinates}
589: \label{Sec:ComplexityEntropyDiagram}
590:
591: In the model classes examined below, we shall take the excess
592: entropy $\EE$ as our measure of complexity and use the entropy rate
593: $\hmu$ as the randomness measure. The excess entropy ${\bf E}$ and the
594: entropy rate $\hmu$ are exactly the two quantities that specify the
595: large-$L$ asymptotic form for the block entropy
596: Eq.~(\ref{H.Scaling.Form}).
597: The set of all $(\hmu, \EE)$ pairs is thus geometrically equivalent to
598: the set of all straight lines with non-negative slope and intercept.
599: Clearly, a line's slope and intercept are independent quantities. Thus,
600: there is no {\em a priori} reason to anticipate any relationship between
601: $\hmu$ and $\EE$, a point emphasized early on by Li \cite{Li91}.
602:
603: It is helpful in the following to know that for binary order-$R$
604: Markov processes there is an upper bound on the excess entropy:
605: \begin{equation}
606: \EE \leq R (1-h_\mu) \;.
607: \label{EE_hmu_bound}
608: \end{equation}
609: We sketch a justification of this result here; for the derivation, see
610: \cite[Proposition 11]{Crut03a}. First, recall that the excess entropy may
611: be written as the mutual information between two semi-infinite blocks, as
612: indicated in Eq.~(\ref{E.mutual.info}). However, given the process is
613: order-$R$ Markovian, Eq.~(\ref{R.Markovian}), the excess entropy reduces
614: to the mutual information between two adjacent $R$-blocks. From
615: Eq.~(\ref{MI.def}), we see that the excess entropy is the entropy of
616: an $R$-block minus the entropy of an $R$-block conditioned on its neighboring
617: $R$-block:
618: \begin{equation}
619: \EE \, = \, H(R) - H[S_i^R|S_{i-R}^R] \;.
620: \label{E.markov}
621: \end{equation}
622: (Note that this only holds in the special case of order-$R$ Markov processes.
623: It is \emph{not} true in general.)
624: The first term on the right hand side of Eq.~(\ref{E.markov}) is
625: maximized when the distribution over the $R$-block is uniform, in
626: which case $H(R) = R$. The second term on the right hand side is
627: minimized by assuming that the conditional entropy of the two blocks
628: is given simply by $R h_\mu$---i.e., $R$ times the per-symbol entropy
629: rate $h_\mu$. In other words, we obtain a lower bound by assuming that
630: the process is independent, identically distributed over $R$-blocks.
631: Combining the two bounds gives Eq.~(\ref{EE_hmu_bound}).
632:
633: It is also helpful in the following to know that for periodic processes
634: $\hmu = 0$ (perfectly predictable) and $\EE = \log_2 p$, where $p$ is
635: the period \cite{Crut03a}. In this case, $\EE$ is the amount of information
636: required to distinguish the $p$ phases of the cycle.
637:
638: \subsection{Calculating Complexities and Entropies}
639: \label{numerical.methods.section}
640:
641: As is now clear, all quantities of interest depend on knowing sequence
642: probabilities $\Prob (s^L)$. These can be obtained by direct analytical
643: approximation given a model or by numerical estimation via simulation.
644: Sometimes, in special cases, the complexity and entropy can be calculated
645: in closed form.
646:
647: For some, but not all, of the process classes studied in the following,
648: we estimate the various information-theoretic quantities by simulation.
649: We generate a long sequence, keeping track of the frequency of occurrence
650: of words up to some finite length $L$. The word counts are stored in
651: a dynamically generated parse tree, allowing us to go out to $L = 120$
652: in some cases. We first make a rough estimate of the topological entropy
653: using a small $L$ value. This entropy determines the sparseness of the
654: parse tree, which in turn determines how large a tree can be stored in
655: a given amount of memory. From the word and subword frequencies
656: $\Prob(s^L)$, one directly calculates $H(L)$ and, thus, $\hmu$ and
657: $\EE$. Estimation errors in these quantities are a function of
658: statistical errors in $\Prob(s^L)$.
659:
660: Here, we are mainly interested in gaining a general sense of the behavior
661: of the entropy rate $\hmu$ and the excess entropy $\EE$. And so, for the
662: purposes of our survey, this direct method is sufficient. The vast
663: majority of our estimates are accurate to at least $1\%$. If extremely
664: accurate estimates are needed, there exist a variety of techniques for
665: correcting for estimator bias
666: \cite{Gras88a,Gras89a,Herz94a,Schu96,deWi99a,Neme02a}. When one is
667: working with finite data, there is also the question of what errors
668: occur, since the $L \rightarrow \infty$ limit cannot be taken. For
669: more on this issue, see Ref.~\cite{Crut03a}.
670:
671: Regardless of these potential subtleties, the entropy rate and
672: excess entropy can be reliably estimated via simulation, given access
673: to a reasonably large amount of data. Moreover, this estimation is
674: purely inductive---one does not need to use knowledge of the
675: underlying equations of motion or the hidden states that produced the
676: sequence. Nevertheless, for several of the model classes we
677: consider---one-dimensional Ising models, Markov chains, and
678: topological Markov chains---we calculate the quantities using
679: closed-form expressions, leading to essentially no error.
680:
681:
682: % **********************************************************************
683: % **********************************************************************
684: % **********************************************************************
685:
686: \section{Complexity-Entropy Diagrams}
687: \label{Comp.Ent.Section}
688:
689: In the following sections we present a survey of intrinsic computation
690: across a wide range of process classes.
691: We think of a \emph{class} of system as given by equations of motion,
692: or other specification for a stochastic process,
693: that are parametrized in some way---a pair of control parameters
694: in a one-dimensional map or the energy of a Hamiltonian, say. The
695: space of parameters, then, is the concrete representation of the
696: space of possible systems, and a class of system is a subset of the
697: set of all possible processes. A point in the parameter space is then
698: a particular \emph{system}, whose intrinsic computation we will
699: summarize by a pair of numbers---one a measure of randomness, the
700: other a measure of structure. In several cases, these measures are estimated
701: from sequences generated by the temporal or spatial process.
702:
703: \subsection{One-Dimensional Discrete Iterated Maps}
704:
705: Here we look at the symbolic dynamics generated by two iterated maps
706: of the interval---the well studied \emph{logistic} and \emph{tent
707: maps}---of the form:
708: \begin{equation}
709: x_{n+1} = f_\mu (x_n) ~,
710: \end{equation}
711: where $\mu$ is a parameter that controls the nonlinear function $f$,
712: $x_n \in [0,1]$, and one starts with $x_0$, the \emph{initial condition}.
713: The logistic and tent maps are canonical examples of systems
714: exhibiting deterministic chaos.
715: The nonlinear iterated function $f$ consists of two monotone
716: pieces. And so, one can analyze the maps' behavior on the interval via
717: a \emph{generating partition} that reduces a sequence of continuous states
718: $x_0, x_1, x_2, \ldots$ to a binary sequence $s_0, s_1, s_2,
719: \ldots$ \cite{Bai89a}. The binary partition is given by
720: \begin{equation}
721: s_i = \left\{
722: \begin{array}{cl}
723: 0 & x \leq \frac{1}{2} \\
724: \\
725: 1 & x > \frac{1}{2}
726: \end{array}
727: \right. ~.
728: \end{equation}
729: The binary sequence may be viewed as a {\em code} for the set of initial
730: conditions that produce the sequence. When the maps are chaotic, arbitrarily
731: long binary
732: sequences produced using this partition code for arbitrarily small
733: intervals of initial conditions on the chaotic attractor. Hence, one
734: can explore many of these maps' properties via binary sequences.
735:
736: \subsubsection{Logistic Map}
737:
738: We begin with the logistic map of the unit interval:
739: \begin{equation}
740: f(x) \, = \, rx(1-x) \;,
741: \label{logistic}
742: \end{equation}
743: where the control parameter $r \in [0,4]$. We iterate this starting
744: with an arbitrary initial condition $x_0 \in [0,1]$. In
745: Fig.~\ref{logistic.E.hmu.vs.r.plot} we show numerical estimates of the
746: excess entropy $\EE$ and the entropy rate $\hmu$ as a function of
747: $r$. Notice that both $\EE$ and $\hmu$ change in a complicated matter
748: as the parameter $r$ is varied continuously.
749:
750: As $r$ increases from $3.0$ to approximately $3.5926$,
751: %$3.6785735104283219$,
752: the logistic map undergoes a series of period-doubling bifurcations.
753: For $r \in (3.0,3.2361)$ the sequences generated by the logistic
754: %3.2360679774997897
755: map are periodic with period two, for $r \in
756: (3.2361,3.4986)$ the sequences are period 4,
757: %3.2360679774997897,3.4985616993277016
758: and for $r \in (3.4986,3.5546)$ the sequences are
759: %3.4985616993277016,3.5546408627688249
760: period $8$. For all periodic sequences of period $p$, the entropy
761: rate $\hmu$ is zero and the excess entropy $\EE$ is $\log_2 p$. So, as
762: the period doubles, the excess entropy increases by one bit. This can
763: be seen in the staircase on the left hand side of
764: Fig.~\ref{logistic.E.hmu.vs.r.plot}. At $r \approx
765: 3.5926$, the logistic map becomes chaotic, as evidenced by
766: a positive entropy rate. For further discussion of the
767: phenomenology of the logistic map, see almost any modern textbook on
768: nonlinear dynamics, e.g., Refs.~\cite{Peit92,Ott93}.
769:
770: % **********************************************************************
771: \begin{figure}[tbp]
772: \epsfxsize=3.2in
773: \begin{center}
774: \leavevmode
775: \epsffile{logistic.Ehmu.vs.r.eps}
776: \end{center}
777: \vspace{-.6cm}
778: \caption{Excess entropy $\EE$ and entropy rate $\hmu$ as a function of
779: the parameter $r$. The top curve is excess entropy. The $r$ values
780: were sampled uniformly as $r$ was varied from $3.4$ to $4.0$ in
781: increments of $0.0001$. The largest $L$ used was $L=30$ for systems
782: with low entropy. For each parameter value with positive entropy, $1
783: \times 10^7$ words of length $L$ were sampled. }
784: \label{logistic.E.hmu.vs.r.plot}
785: \vspace{-.2cm}
786: \end{figure}
787: % **********************************************************************
788:
789: Looking at Fig.~\ref{logistic.E.hmu.vs.r.plot}, it is difficult to see
790: how ${\bf E}$ and $h_\mu$ are related. This relationship can be seen
791: much more clearly in Fig.~\ref{logistic.banded.plot}, in which we show
792: the complexity-entropy diagram for the same system. That is, we plot
793: $(\hmu,\EE)$ pairs. This lets us
794: look at how the excess entropy and the entropy rate are related,
795: independent of the parameter $r$.
796:
797: %This is similar in spirit to the
798: %idea behind a phase plane. For example, when studying two coupled
799: %ordinary differential equations, e.g., a Lotka-Volterra predator-prey
800: %system, one could plot the values of the two populations as a function
801: %of time. It is often more revealing, however, to plot one population
802: %against the other, to reveal how the two populations are related. The
803: %resulting phase plane plot more directly reveals the relationship
804: %between the two variables.
805:
806: % **********************************************************************
807: \begin{figure}[htbp]
808: \epsfxsize=3.2in
809: \begin{center}
810: \leavevmode
811: \epsffile{logistic.two-bands.eps}
812: \end{center}
813: \vspace{-.8cm}
814: \caption{Entropy rate and excess entropy $(\hmu,\EE)$-pairs for
815: logistic map. Points from regions of the map in which the bifurcation
816: diagram has one or two bands are colored differently. There are $3214$
817: parameter values sampled for the one-band region and $3440$ values for
818: the two-band region. The $r$ values were sampled uniformly. The
819: one-band region is $r \in (3.6786, 4.0)$; the two-band
820: region is $r \in (3.5926, 3.6786)$. The largest
821: $L$ used was $L=30$ for systems with low entropy. For each parameter
822: value with positive entropy, $1 \times 10^7$ words of length $L$ were
823: sampled. }
824: \label{logistic.banded.plot}
825: \vspace{-.2cm}
826: \end{figure}
827: % **********************************************************************
828:
829: Figure \ref{logistic.banded.plot} shows that there is a definite
830: relationship between $\EE$ and $\hmu$---one that is not immediately evident
831: from looking at Fig.~\ref{logistic.E.hmu.vs.r.plot}. Note, however,
832: that this relationship is not a simple one. In particular, complexity
833: is not a function of entropy: $\EE \neq g(\hmu)$. For a given value
834: of $\hmu$, multiple excess entropy values $\EE$ are possible.
835:
836: There are several additional empirical observations to extract from
837: Fig.~\ref{logistic.banded.plot}. First, the shape appears to be
838: self-similar. This is not at all surprising, given that the logistic
839: map's bifurcation diagram itself is self-similar. Second, note the
840: clumpy, nonuniform clustering of $(\hmu, \EE)$ pairs within the dense
841: region. Third, note that there is a fairly well defined lower bound.
842: Fourth, for a given value of the
843: entropy rate $h_\mu$ there are many possible values for the excess
844: entropy $\EE$. However, it appears as if not all $\EE$ values are
845: possible for a given $h_\mu$. Lastly, note that there does not appear to be
846: any phase transition (at finite $h_\mu$) in the complexity-entropy diagram.
847: Strictly speaking, such a transition does occur, but it does so at
848: zero entropy rate. As the period doublings accumulate, the excess entropy
849: grows without bound. As a result, the possible excess entropy values
850: at $h_\mu = 0$ on the complexity-entropy diagram are unbounded. For
851: further discussion, see Ref. \cite{Crut90}.
852:
853:
854: \subsubsection{Tent Map}
855:
856: We next consider the {\em tent map}:
857: \begin{equation}
858: f(x) \, = \, \left\{ \begin{array}{ll} ax &
859: x < \frac{1}{2} \\ \\ \hspace{1mm} a(1-x) & x \geq \frac{1}{2}
860: \end{array} \right. \; ,
861: \end{equation}
862: where $a \in [0,2]$ is the control parameter. For $a \in [1,2]$,
863: the entropy rate $\hmu = \log_2 a$; when $a \in [0,1]$, $\hmu = 0$.
864: Fig.~\ref{tent.plot} shows $1,200$ $(\hmu,\EE)$-pairs in which
865: $\EE$ is calculated numerically from empirical estimates of the
866: binary word distribution $\Prob(s^L)$.
867:
868: % **********************************************************************
869: \begin{figure}[tbp]
870: \epsfxsize=3.0in
871: \begin{center}
872: \leavevmode
873: \epsffile{tent.compent.eps}
874: \end{center}
875: \vspace{-.6cm}
876: \caption{Excess entropy $\EE$ versus entropy density $\hmu$ for the
877: tent map. The $L$ used to estimate $\Prob(s^L)$, and so $\EE$ and
878: $\hmu$, varied depending on the $a$ parameter. The largest $L$ used
879: was $L=120$ at low $\hmu$. The plot shows $1,200$ $(\hmu,\EE)$-pairs.
880: The parameter was incremented every $\Delta a = 5 \times 10^{-4}$ for
881: $a \in [1,1.2]$ and then incremented every $\Delta a = 0.001$ for
882: $a \in [1.2,2.0]$. For each parameter value with positive entropy,
883: $10^7$ words of length $L$ were sampled.
884: }
885: \label{tent.plot}
886: \vspace{-.2cm}
887: \end{figure}
888: % ********************************************************************
889:
890: Reference \cite{Crut90} developed a phenomenological theory that explains
891: the properties of the tent map at the so-called \emph{band-merging points},
892: where bands of the chaotic attractor merge pairwise as a function of the
893: control parameter. The behavior at these points is
894: \emph{noisy periodic}---the order of band visitations is periodic,
895: but motion within is deterministic chaotic. They occur when
896: $a = 2^{2^{-n}}$. The symbolic-dynamic process is described by a Markov
897: chain consisting of a periodic cycle of $2^n$ states in which
898: all state-to-state transitions are nonbranching except for one where
899: $s_i = 0$ or $s_i = 1$ with equal probability. Thus, each phase
900: of the Markov chain has zero entropy per transition, except for the one
901: that has a branching entropy of $1$ bit. The entropy rate at band-mergings
902: is thus $\hmu = 2^{-n}$, with $n$ an integer.
903:
904: The excess entropy for the symbolic-dynamic process at the
905: $2^n$-to-$2^{n-1}$ band-merging is simply $\EE = \log_2 2^n = n$.
906: That is, the process carries $n$ bits of phase information. Putting
907: these facts together, then, we have a very simple relationship in
908: the complexity-entropy diagram at band-mergings:
909: \begin{equation}
910: \EE \, = \, -\log_2 \hmu ~.
911: \label{tent.theory}
912: \end{equation}
913: This is graphed as the dashed line in Fig.~\ref{tent.plot}.
914: It is clear that the entire complexity-entropy diagram is much richer
915: than this simple expression indicates. Nonetheless, Eq.
916: (\ref{tent.theory}) does capture the overall shape quite well.
917:
918: Note that, in sharp contrast to the logistic map, for the tent map it
919: does appear as if the excess entropy takes on only a single value for each
920: value of the entropy rate $\hmu$. The reason for this is straightforward.
921: The entropy rate $\hmu$ is a simple monotonic function of the parameter
922: $a$---$h_\mu = \log_2 a$---and so there is a one-to-one relationship
923: between them. As a result, each $h_\mu$ value on the complexity-entropy
924: diagram corresponds to one and only one value of $a$ and, in turn, corresponds
925: to one and only one value of $\EE$. Interestingly, the excess entropy appears
926: to be a continuous function of $h_\mu$, although not a differentiable one.
927:
928: \subsection{Ising Spin Systems}
929:
930: We now investigate the complexity-entropy diagrams of the Ising model
931: in one and two spatial dimensions. Ising models are among the simplest
932: physical models of spatially extended systems. Originally introduced
933: to model magnetic materials, they are now used to model a wide range
934: of cooperative phenomena and order-disorder transitions and, more
935: generally, are viewed as generic models of spatially extended,
936: statistical mechanical systems \cite{Chri05b,Seth06a}. Like the
937: logistic and tent maps, Ising models are also studied as an
938: intrinsically interesting mathematical topic.
939: As we will see, Ising models provide an
940: interesting contrast with the intrinsic computation seen in the
941: interval maps.
942:
943: Specifically, we consider spin-$1/2$ Ising models
944: with nearest (NN) and next-nearest neighbor (NNN) interactions. The
945: Hamiltonian (energy function) for such a system is:
946: \begin{eqnarray}
947: {\cal H} \,& =& \, -J_1 \sum_{\langle i,j\rangle_{\rm nn}} S_{i} S_{j}
948: \nonumber \\ & & -J_2
949: \sum_{\langle i,j\rangle_{{\rm nnn}} } S_{i} S_{j} \,-\, B \sum_{i} S_{i} \;,
950: \label{Hamiltonian}
951: \end{eqnarray}
952: where the first (second) sum is understood to run over all NN (NNN)
953: pairs of spins. In one dimension, a spin's nearest-neighbors will
954: consist of two spins, one to the right and one to the left, whereas
955: in two dimensions a spin will have four nearest neighbors---left,
956: right, up, and down. Each spin $S_{i}$ is a binary variable: $S_{i}
957: \,\in\, \{-1,+1\}$. The coupling constant $J_1$ is a parameter that
958: when positive (negative) makes it energetically favorable for NN
959: spins to (anti-)align. The constant $J_2$ has the same effect on NNN
960: spins. The parameter $B$ may be viewed as an external field; its
961: effect is to make it energetically favorable for spins to point up
962: (i.e., have a value of $+1$) instead of down. The probability of a
963: configuration is taken to be proportional to its Boltzmann weight:
964: the probability of a spin configuration ${\cal C}$ is proportional
965: to $e^{-\beta{\cal H}({\cal C})}$, where $\beta = 1/T$ is the inverse
966: temperature.
967:
968: In equilibrium statistical mechanics, the entropy density is a
969: monotonic increasing function of the temperature. Quite generically,
970: a plot of the entropy $h_\mu$ as a function of temperature $T$
971: resembles that of the top plot in Fig.~\ref{2DCriticalhvsE}. Thus,
972: $h_\mu$ may be viewed as a nonlinearly rescaled temperature. One
973: might ask, then, why one might want to plot complexity versus entropy:
974: Isn't a plot of complexity versus temperature qualitatively the same?
975: Indeed, the two plots would look very similar. However, there
976: are two major benefits of complexity-entropy diagrams for statistical
977: mechanical systems. First,
978: the entropy captures directly the system's unpredictability, measured
979: in bits per spin. The entropy thus measures the system's
980: information processing properties. Second, plotting complexity versus
981: entropy and not temperature allows for a direct comparison of the
982: range of information processing properties of statistical mechanical
983: systems with systems for which there is not a well defined
984: temperature, such as the deterministic dynamical systems of the
985: previous section or the cellular automata of the subsequent one.
986:
987:
988: \subsubsection{One-Dimensional Ising System}
989:
990: \label{1D.spin.section}
991:
992: We begin by examining one-dimensional Ising systems.
993: In Refs.~\cite{Crut97a,Feld98b,Feld98c} two of the authors developed
994: exact, analytic transfer-matrix methods for calculating $\hmu$ and
995: $\EE$ in the thermodynamic ($N \rightarrow \infty$) limit. These
996: methods make use of the fact the NNN Ising model is order-$2$
997: Markovian. We used
998: these methods to produce Fig.~\ref{Ising.Batcape}, the
999: complexity-entropy diagram for the NNN Ising system with
1000: antiferromagnetic coupling constants $J_1$ and $J_2$ that tend to
1001: anti-align coupled spins. The figure gives a
1002: scatter plot of $10^5$ $(\hmu, \EE)$ pairs for system parameters that
1003: were sampled randomly
1004: from the following ranges: $J_1 \in [-8,0]$, $J_2 \in [-8,0]$, $T \in
1005: [0.05,6.05]$, and $B \in [0,3]$. For each parameter realization, the
1006: excess entropy $\EE$ and entropy density $\hmu$ were calculated.
1007: Fig.~\ref{Ising.Batcape} is rather striking---the $(\hmu,\EE)$ pairs
1008: are organized in the shape of a ``batcape''. Why does the plot have
1009: this form?
1010:
1011: % **********************************************************************
1012: \begin{figure}[tbp]
1013: \epsfxsize=3.5in
1014: \begin{center}
1015: \leavevmode
1016: %\includegraphics[width=4.3in]{batcape.eps}
1017: %\epsffile{batcape.lowres.ps}
1018: \epsffile{batcape.eps}
1019: \end{center}
1020: \vspace{-6mm}
1021: \caption{Complexity-entropy diagram for the one-dimensional, spin-$1/2$
1022: antiferromagnetic Ising model with nearest- and next-nearest-neighbor
1023: interactions. $10^5$ system parameters were sampled randomly from the
1024: following ranges: $J_1 \in [-8,0]$, $J_2 \in [-8,0]$,
1025: $T \in [0.05,6.05]$, and $B \in [0,3]$. For each parameter setting,
1026: the excess entropy $\EE$ and entropy density $\hmu$ were calculated
1027: analytically.
1028: }
1029: \label{Ising.Batcape}
1030: \vspace{-2mm}
1031: \end{figure}
1032: % **********************************************************************
1033:
1034: Recall that if a sequence over a binary alphabet is periodic with period
1035: $p$, then $\EE = \log_2 p$ and $\hmu = 0$. Thus, the ``tips'' of the batcape
1036: at $\hmu = 0$ correspond to crystalline (periodic) spin configurations with
1037: periods $1$, $2$, $3$, and $4$. For example, the $(0,0)$ point is the
1038: period-$1$ configuration with all spins aligned. These periodic regimes
1039: correspond to the system's different possible ground states. As the entropy
1040: density increases, the cape tips widen and eventually join.
1041:
1042: Figure~\ref{Ising.Batcape} demonstrates in graphical form that there
1043: is organization in the process space defined by the Hamiltonian of
1044: Eq.~(\ref{Hamiltonian}). Specifically, for antiferromagnetic couplings,
1045: $\EE$ and $\hmu$ values do not uniformly fill the plane. There are forbidden
1046: regions in the complexity-entropy plane. Adding randomness ($\hmu$) to
1047: the periodic ground states does not immediately destroy them. That is,
1048: there are low-entropy states that are almost-periodic. The apparent upper
1049: linear bound is that of Eq.~(\ref{EE_hmu_bound}) for a system with
1050: at most $4$ Markov states or, equivalently, a order-$2$ Markov chain:
1051: $\EE \leq 2 - 2 \hmu$.
1052:
1053: In contrast, in the logistic map's complexity-entropy diagram
1054: (Fig.~\ref{logistic.banded.plot}) one does not see anything
1055: remotely like the batcape. This indicates that there are no
1056: low-entropy, almost-periodic configurations related to the exactly
1057: periodic configurations generated at zero-entropy along the
1058: period-doubling route to chaos. Increasing the parameter there
1059: does not add randomness to a periodic orbit. Rather, it causes a
1060: system bifurcation to a higher-period orbit.
1061:
1062: \subsubsection{Two-Dimensional Ising Model}
1063:
1064: Thus far we have considered only one-dimensional systems, either
1065: temporal or spatial. However,
1066: the excess entropy can be extended to apply to two-dimensional
1067: configurations as well; for details, see Ref.~\cite{Feld03a}. Using
1068: methods from there, we calculated the excess entropy
1069: and entropy density for the two-dimensional Ising model with nearest-
1070: and next-nearest-neighbor interactions. In other words, we calculated
1071: the complexity-entropy diagram for the two-dimensional version of the
1072: system whose complexity-entropy diagram is shown in
1073: Fig.~\ref{Ising.Batcape}. There are several different definitions for
1074: the excess entropy in two dimensions, all of which are similar but not
1075: identical. In Fig.~\ref{Ising.Batcape} we used a version that is
1076: based on the mutual information and, hence, is denoted ${\EE}_I$
1077: \cite{Feld03a}.
1078: %See Ref.~\cite{Feld03a} for a discussion of the different forms of
1079: %two-dimensional excess entropy.
1080:
1081: Figure \ref{2DIsingBatcape} gives a scatter plot of $4,500$
1082: complexity-entropy pairs. System parameters in
1083: Eq.~(\ref{Hamiltonian}) were sampled randomly from the following
1084: ranges: $J_1 \in [-3,0]$, $J_2 \in [-3,0]$, $T \in [0.05,4.05]$, and
1085: $B = 0$. For each parameter setting, the excess entropy $\EE_I$ and
1086: entropy density $\hmu$ were estimated numerically; the configurations
1087: themselves were generated via a Monte Carlo simulation. For each $(\hmu,\EE)$
1088: point the simulation was run for $200,000$ Monte Carlo updates per site to
1089: equilibrate. Configuration data was then taken for $20,000$ Monte Carlo
1090: updates per site. The lattice size was a square of $48 \times 48$ spins.
1091: The long equilibration time is necessary because, for some Ising models at
1092: low temperature, single-spin flip dynamics of the sort used here have very
1093: long transient times \cite{Spir01a,Spir01b,Vazq02a}.
1094:
1095: Note the similarity between Figs.~\ref{Ising.Batcape} and \ref{2DIsingBatcape}.
1096: For the 2D model, there is also a near-linear upper bound:
1097: $\EE \leq 5 (1-h_\mu)$. In addition, one sees periodic
1098: spin configurations, as evidenced by the horizontal bands. An $\EE_I$ of
1099: $1$ bit corresponds to a checkerboard of period $2$; $\EE_I = 3$
1100: corresponds to a checkerboard of period $4$; while $\EE_I = 2$
1101: corresponds to a ``staircase'' pattern of period $4$. See
1102: Ref.~\cite{Feld03a} for illustrations. The two period-$4$
1103: configurations are both ground states for the model in the parameter
1104: regime in which $|J_2| < |J_1|$ and $J_2 < 0$. At low temperatures,
1105: the state into which the system settles is a matter of chance.
1106:
1107: % **********************************************************************
1108: \begin{figure}[tbp]
1109: \epsfxsize=3.5in
1110: \begin{center}
1111: \leavevmode
1112: %\includegraphics[width=4.3in]{2DIsingBatcape.eps}
1113: \epsffile{2dnnn.batcape.Ei.ps}
1114: \end{center}
1115: \vspace{-6mm}
1116: \caption{Complexity-entropy diagram for the two-dimensional, spin-$1/2$
1117: anti\-ferromagnetic Ising model with nearest- and next-nearest-neighbor
1118: interactions. System parameters were sampled randomly from the
1119: following ranges: $J_1 \in [-3,0]$, $J_2 \in [-3,0]$, $T \in
1120: [0.05,4.05]$, and $B = 0$. For each parameter setting, the excess
1121: entropy $\EE_I$ and entropy density $\hmu$ were estimated
1122: numerically.}
1123: \vspace{-4mm}
1124: \label{2DIsingBatcape}
1125: \end{figure}
1126: % **********************************************************************
1127:
1128: Thus, the horizontal streaks in the low-entropy region of
1129: Fig.~\ref{2DIsingBatcape} are the different ground states possible for
1130: the system. In this regard Fig.~\ref{2DIsingBatcape} is qualitatively
1131: similar to Fig.~\ref{Ising.Batcape}---in each there are several
1132: possible ground states at $\hmu = 0$ that persist as the entropy
1133: density is increased. However, in the two-dimensional system of
1134: Fig.~\ref{2DIsingBatcape} one sees a scatter of other values
1135: around the periodic bands. There are even $\EE_I$ values larger than
1136: $3$. These $\EE_I$ values arise when parameters are selected in which
1137: the NN and NNN coupling strengths are similar; $J_1 \approx J_2$.
1138: When this is the case, there is no energy cost associated with a
1139: horizontal or vertical defect between the two possible ground states.
1140: As a result, for low temperatures the systems effectively freezes into
1141: horizontal or vertical strips consisting of the different ground states.
1142: Depending on the number of strips and their relative widths, a number
1143: of different $\EE_I$ values are possible, including values well above
1144: $3$, indicating very complex spatial structure.
1145:
1146: Despite these differences, the similarities between the
1147: complexity-entropy plots for the one- and two-dimensional systems is
1148: clearly evident. This is all the more noteworthy since one- and
1149: two-dimensional Ising models are regarded as very different sorts of
1150: system by those who focus solely on phase transitions. The
1151: two-dimensional Ising model has a critical phase transition while the
1152: one-dimensional does not. And, more generally, two-dimensional random
1153: fields are generally considered very different mathematical entities
1154: than one-dimensional sequences. Nevertheless, the two complexity-entropy
1155: diagrams show that, away from criticality, the one- and two-dimensional
1156: Ising systems' ranges of intrinsic computation are similar.
1157:
1158: \subsubsection{Ising Model Phase Transition}
1159:
1160: %Figure \ref{2DIsingBatcape} may be initially surprising.
1161: As noted above, the two-dimensional Ising
1162: model is well known as a canonical model of a system that undergoes a
1163: continuous phase transition---a discontinuous change in the system's
1164: properties as a parameter is continuously varied. The 2D NN Ising
1165: model with ferromagnetic ($J_1>0$) bonds and no NNN coupling ($J_2 =
1166: 0$) and zero external field ($B=0$)
1167: undergoes a phase transition at $T = T_c \approx 2.269$ when $J_1 =
1168: 1$. At the critical temperature $T_c$ the magnetic susceptibility
1169: diverges and the specific heat is not differentiable. In
1170: Fig.~\ref{2DIsingBatcape} we restricted ourselves to antiferromagnetic
1171: couplings and thus did not sample in the region of parameter space in
1172: which the phase transition occurs.
1173:
1174: What happens if we fix $J_1 = 1$, $J_2 = 0$, and $B=0$, and vary the
1175: temperature? In this case, we see that the complexity, as measured by $\EE$,
1176: shows a sharp maximum near the critical temperature $T_c$. Figure
1177: \ref{2DCriticalhvsE} shows results obtained via a Monte Carlo simulation on
1178: a $100 \times 100$ lattice. We used a Wolff cluster algorithm and periodic
1179: boundary conditions. After $10^6$ Monte Carlo steps (one step is one proposed
1180: cluster flip), $25,000$ configurations were sampled, with $200$ Monte
1181: Carlo steps between measurements. This process was repeated for over
1182: $200$ samples between $T=0$ and $T=6$. More temperatures were sampled
1183: near the critical region.
1184: %In Fig.~\ref{2DCriticalhvsE} the
1185: %lines between the points are just guides to the eye.
1186:
1187: % **********************************************************************
1188: \begin{figure}[tbp]
1189: \epsfxsize=3.5in
1190: \begin{center}
1191: \leavevmode
1192: \epsffile{2d.nn.ising.T.ent.eps}\\
1193: \vspace{-5mm}
1194: \epsfxsize=3.5in
1195: \epsffile{2d.nn.ising.T.E.eps}\\
1196: \vspace{-5mm}
1197: \epsfxsize=3.5in
1198: \epsffile{2d.nn.ising.comp.ent.eps}
1199: \end{center}
1200: \vspace{-6mm}
1201: \caption{Entropy rate vs.~temperature, excess entropy vs.~temperature, and
1202: the complexity-entropy diagram for the 2D NN ferromagnetic Ising
1203: model. Monte Carlo results for $200$ temperatures between $0$ and
1204: $6$. The temperature was sampled more densely near the critical
1205: temperature. For further discussion, see text. }
1206: \vspace{-4mm}
1207: \label{2DCriticalhvsE}
1208: \end{figure}
1209: % **********************************************************************
1210:
1211: In Fig.~\ref{2DCriticalhvsE} we first plot entropy density $\hmu$ and
1212: excess entropy $\EE$ versus temperature. As expected, the
1213: excess entropy reaches a maximum at the critical temperature $T_c$.
1214: At $T_c$ the correlations in the system decay algebraically, whereas
1215: they decay exponentially for all other $T_c$ values. Hence, $\EE$,
1216: which may be viewed as a global measure of correlation, is maximized at
1217: $T_c$. For the system of Fig.~\ref{2DCriticalhvsE}, $T_c$ appears to
1218: have an approximate value of $2.42$. This is above the exact value for an
1219: infinite system, which is $T_c \approx 2.27$. Our estimated value is higher,
1220: as one expects for a finite lattice. At the critical temperature, $h_\mu
1221: \approx 0.57$, and $\EE \approx 0.413$.
1222:
1223: Also in Fig.~\ref{2DCriticalhvsE} we show the complexity-entropy
1224: diagram for the 2D Ising model. This complexity-entropy diagram is a
1225: single curve, instead of the scatter plots seen in the
1226: previous complexity-entropy diagrams. The reason is that
1227: we varied a single parameter, the temperature, and
1228: entropy is a single-valued function of the temperature, as can clearly
1229: be seen in the first plot in Fig.~\ref{2DCriticalhvsE}. Hence, there
1230: is only one value of $\hmu$ for each temperature, leading to a single
1231: curve for the complexity-entropy diagram.
1232:
1233: Note that the peak in the complexity-entropy diagram for the
1234: 2D Ising model is rather rounded, whereas $\EE$ plotted versus
1235: temperature shows a much sharper peak. The reason for this rounding
1236: is that the entropy density $\hmu$ changes very rapidly near $T_c$.
1237: The effect is to smooth the $\EE$ curve when plotted against $\hmu$.
1238:
1239: A similar complexity-entropy was produced by Arnold \cite{Arno96}. He also
1240: estimated the excess entropy, but did so by considering only one-dimensional
1241: sequences of measurements obtained at a single site, while a Monte Carlo
1242: simulation generated a sequence of two-dimensional configurations. Thus,
1243: those results do not account for two-dimensional structure but, rather,
1244: reflect properties of the dynamics of the particular Monte Carlo updating
1245: algorithm used. Nevertheless, the results of Ref.~\cite{Arno96} are
1246: qualitatively similar to ours.
1247:
1248: Erb and Ay \cite{Erb04a} have calculated the \emph{multi-information} for
1249: the two-dimensional Ising model as a function of temperature. The
1250: multi-information is the difference between the entropy rate and the
1251: entropy of a single site: $H(1) - h_\mu$. That is, the multi-information
1252: is only the leading term in the sum which defines the excess entropy,
1253: Eq.~(\ref{E.def}). (Recall that $h_\mu(1) = H(1)$.) They find that the
1254: multi-information is a continuous function of the temperature and that
1255: it reaches a sharp peak at the critical temperature \cite[Fig.~4]{Erb04a}.
1256:
1257: % **********************************************************************
1258: % **********************************************************************
1259: \subsection{Cellular Automata}
1260:
1261: The next process class we consider is \emph{cellular automata} (CAs)
1262: in one and two spatial dimensions. Like spin systems, CAs are common
1263: prototypes used to model spatially extended dynamical systems. For reviews
1264: see, e.g., Refs.~\cite{Wolf83a,Chop98a,Ilac01a}. Unlike the Ising
1265: models of the previous section, the CAs that we study here are
1266: deterministic. There is no noise or temperature in the system.
1267:
1268: The states of the CAs we shall consider consist of one- or
1269: two-dimensional \emph{configurations}
1270: ${\mathbf s} = \ldots s^{-1} , s^0 , s^1 , \ldots $ of discrete $K$-ary
1271: \emph{local states} $s^i \in \{ 0, 1, \ldots , K-1 \}$. The
1272: configurations change in time according to a \emph{global update
1273: function} $\mathbf \Phi$:
1274: \begin{equation}
1275: {\mathbf s}_{t+1}^i = {\mathbf \Phi} {\mathbf s}_t^i ~,
1276: \end{equation}
1277: starting from an \emph{initial configuration} ${\mathbf s}_0$. What makes CAs
1278: \emph{cellular} is that configurations evolve according to a
1279: \emph{local update rule}. The value $s_{t+1}^i$ of site $i$ at the next time
1280: step is a function $\phi$ of the site's previous value and the values of
1281: neighboring sites within some \emph{radius} $r$:
1282: \begin{equation}
1283: s_{t+1}^i = \phi ( s_t^{i-r} \ldots, s_t^i \ldots, s_t^{i+r} ) ~.
1284: \end{equation}
1285: All sites are updated synchronously. The CA update rule $\phi$ consists
1286: of specifying the \emph{output value} $s_{t+1}$ for all possible
1287: \emph{neighborhood configurations}
1288: $\eta_t = s_t^{i-r} \ldots, s_t^i \ldots, s_t^{i+r}$.
1289: Thus, for 1D radius-$r$ CAs, there are $K^{2r+1}$ possible neighborhood
1290: configurations and $2^{K^{2r+1}}$ possible CA rules. The $r=1$, $K=2$
1291: 1D CAs are called {\em elementary cellular automata}\/ \cite{Wolf83a}.
1292:
1293: In all CA simulations reported we began with an arbitrary random initial
1294: configuration ${\mathbf s}_0$ and iterated the CA several thousand times
1295: to let transient behavior die away. Configuration statistics were then
1296: accumulated for an additional period of thousands of time steps, as
1297: appropriate. Periodic boundary conditions on the underlying lattice
1298: were used.
1299:
1300: %To motivate the questions we will address, we begin the following
1301: %section by closely reviewing several earlier investigations of
1302: %two-dimensional CAs. We then turn to consider the simpler case
1303: %of $r = 1$ and $r = 2$ 1D CAs.
1304:
1305:
1306: %\subsubsection{One-Dimensional Cellular Automata}
1307:
1308: In Fig.~\ref{1D.rad2.spatial.hvsE} we
1309: show the results of calculating various complexity-entropy diagrams
1310: for 1D, $r = 2$, $K=2$ (binary) cellular automata. There are $2^{2^5}
1311: \approx 4.3 \times 10^9$ such CAs. We cannot examine all $4.3$
1312: billion CAs; instead we sample the space these CAs uniformly.
1313: %For the data of Fig.~\ref{1D.rad2.spatial.HvsI}, the lattice has
1314: %$1000$ sites; a transient time of $1000$ iterations was used. We plot
1315: %$\hmu$ versus $\EE$ for temporal sequences.
1316: For the data of
1317: Fig.~\ref{1D.rad2.spatial.hvsE}, the lattice has $5\times 10^4$ sites
1318: and a transient time of $5 \times 10^4$ iterations was used. We plot
1319: $\hmu$ versus $\EE$ for spatial sequences. Plots for the temporal
1320: sequences are qualitatively similar. There are several things to
1321: observe in these diagrams.
1322:
1323:
1324: % **********************************************************************
1325: \begin{figure}[tbp]
1326: \epsfxsize=3.0in
1327: \begin{center}
1328: \leavevmode
1329: \epsffile{1d.radius2.spatial.hvsE.eps}
1330: \end{center}
1331: \vspace{-6mm}
1332: \caption{Spatial entropy density $h^s_\mu$ and spatial excess entropy
1333: $\EE^s$ for a random sampling of $10^3$ $r = 2$, binary 1D CAs.
1334: }
1335: \vspace{-2mm}
1336: \label{1D.rad2.spatial.hvsE}
1337: \end{figure}
1338: % **********************************************************************
1339:
1340: One feature to notice in Fig.~\ref{1D.rad2.spatial.hvsE} is that no sharp
1341: peak in the excess entropy appears at some intermediate $\hmu$ value. In
1342: contrast, the maximum possible excess entropy falls off moderately rapidly
1343: with increasing $\hmu$. A linear upper bound, $\EE \leq 4 ( 1 - h_\mu)$,
1344: is almost completely respected. Note that, as is the case with the other
1345: complexity-entropy diagrams presented here, for all $\hmu$ values except
1346: $\hmu = 1$, there is a range of possible excess entropies.
1347:
1348: %Note also that the scale of the complexity-entropy diagram
1349: %for radius-$2$ shown in Fig.~\ref{1D.rad2.spatial.hvsE} is the same as
1350: %that of the radius-$1$ (or elementary) CAs in Fig.~\ref{ECA}.
1351:
1352: %We now turn our attention to the $H(1)$ versus $I_2$ plot shown in
1353: %Fig.~\ref{1D.rad2.spatial.HvsI}.
1354: %Note that these diagrams are strikingly different than those for the
1355: %2D, $8$-state CAs shown in Figs.~\ref{Langton} and
1356: %\ref{Langton.Bogus.Rescaling}. For the 1D CAs, nothing even close
1357: %to an ``edge of chaos'' is seen---the complexity, as measured by the
1358: %two-point mutual information, is maximized at $\hmu = 1$.
1359:
1360: %\subsubsection{Comparison with Earlier Results}
1361:
1362: In the early 1990's there was considerable exploration of the
1363: organization of CA rule space. In particular, a series of papers
1364: \cite{Lang90a,Li90b,Woot90a,Lang91a} looked at two-dimensional
1365: eight-state ($K=8$) cellular automata, with a neighborhood size of
1366: $5$ sites---the site itself and its nearest neighbor to the north,
1367: east, west, and south. These references reported evidence for
1368: the existence of a phase transition in the complexity-entropy diagram at a
1369: critical entropy level. In contrast, however, here and in the previous
1370: sections we find no evidence for such a transition. The reasons that
1371: Refs.~\cite{Lang90a,Li90b,Woot90a,Lang91a} report a transition are two-fold.
1372: First, they used very restricted measures of randomness and complexity:
1373: entropy of \emph{single isolated} sites and mutual information of neighboring
1374: \emph{pairs} of single sites, respectively. These choices have the effect of
1375: projecting organization \emph{onto} their complexity-entropy diagrams. The
1376: organization seen is largely a reflection of constraints on the chosen
1377: measures, not of intrinsic properties of the CAs. Second, they do not sample
1378: the space of CA's uniformly; rather, they parametrize the space of CAs and
1379: sample only by sweeping their single parameter. This results in a sample of
1380: CA space that is very different from uniform and that is biased toward higher
1381: complexity CAs. For a further discussion of complexity-entropy diagrams for
1382: cellular automata, including a discussion of
1383: Refs.~\cite{Lang90a,Li90b,Woot90a,Lang91a}, see Ref.~\cite{Feld06a}.
1384:
1385:
1386: % **********************************************************************
1387: \subsection{Markov Chain Processes}
1388:
1389: In this and the next section, we consider two classes of process that
1390: provide a basis of comparison for the preceding nonlinear dynamics
1391: and statistical mechanical systems: those generated by Markov chains and
1392: topological \eMs. These classes are complementary to each other in the
1393: following sense. Topological \eMs\ represent structure in terms of which
1394: sequences (or configurations) are allowed or not. When we explore the space
1395: of topological \eMs, the associated processes differ in which sets of
1396: sequences occur and which are forbidden. In contrast, when exploring
1397: Markov chains, we fix a set of allowed words---in the present case the
1398: full set of binary sequences---and then vary the probability with
1399: which subwords occur. These two classes thus represent two different
1400: types of possible organization in intrinsic computation---types that
1401: were mixed in the preceding example systems.
1402:
1403: In Fig.~\ref{null} we plot $\EE$ versus $\hmu$ for order-$2$
1404: ($4$-state) Markov chains over a binary alphabet. Each element in the
1405: stochastic transition matrix $T$ is chosen uniformly from the unit
1406: interval. The elements of the matrix are then normalized row by row so
1407: that $\sum_j T_{ij} = 1$. We generated $10^5$ such matrices and formed
1408: the complexity-entropy diagram shown in Fig.~\ref{null}. Since these
1409: processes are order-$2$ Markov chains, the bound of
1410: Eq. (\ref{EE_hmu_bound}) applies. This bound is the sharp, linear
1411: upper limit evident in Fig.~\ref{null}: $\EE \, = 2 - 2\hmu$.
1412:
1413: It is illustrative to compare the $4$-state Markov chains considered
1414: here with the 1D NNN Ising models of Sec. \ref{1D.spin.section}.
1415: The order-$2$ (or $4$-state Markov) chains with a binary alphabet are
1416: those systems
1417: for which the value of a site depends on the previous two sites, but
1418: no others. In terms of spin systems, then, this is a spin-$1/2$
1419: (i.e., binary) system with nearest- and next-nearest neighbors. The
1420: transition matrix for the Markov chain is $4 \times 4$ and thus has
1421: $16$ elements. However, since each row of the transition matrix
1422: must be normalized, there are $12$ independent parameters for this
1423: model class. In contrast, there are only $3$ independent parameters
1424: for the 1D NNN Ising chain---the parameters $J_1$, $J_2$, $B$, and
1425: the temperature $T$. One of the parameters may be viewed as
1426: setting an energy scale, so only three are independent.
1427:
1428: Thus, the 1D NNN systems are a proper subset of the $4$-state Markov chains.
1429: Note that their complexity-entropy diagrams are very different, as a quick
1430: glance at Figs.~\ref{Ising.Batcape} and \ref{null} confirms. The reason for
1431: this is that the Ising model, due to its parametrization (via the Hamiltonian
1432: of Eq.~(\ref{Hamiltonian})), samples the space of processes in a
1433: very different way than the Markov chains. This underscores the
1434: crucial role played by the choice of model and, so too, the choice in
1435: parametrizing a model space. Different parametrizations of the same
1436: model class, when sampled uniformly over those parameters, yield
1437: complexity-entropy diagrams with different structural properties.
1438:
1439:
1440: % **********************************************************************
1441: \begin{figure}[tbp]
1442: \epsfxsize=3.0in
1443: \begin{center}
1444: \leavevmode
1445: \epsffile{markov.eps}
1446: \end{center}
1447: \vspace{-6mm}
1448: \caption{Excess-entropy, entropy-rate pairs for $10^5$ randomly
1449: selected $4$-state Markov chains.
1450: }
1451: \vspace{-2mm}
1452: \label{null}
1453: \end{figure}
1454: % **********************************************************************
1455:
1456: \subsection{The Space of Processes: Topological \EMs}
1457:
1458: The preceding model classes are familiar from dynamical systems theory,
1459: statistical mechanics, and stochastic process theory. Each has served
1460: an historical purpose in their respective fields---purposes that reflect
1461: mathematically, physically, or statistically useful parametrizations of
1462: the space of processes. In the preceding sections we explored these
1463: classes, asked what sort of processes they could generate, and then
1464: calculated complexity-entropy pairs for each process to reveal the
1465: range of possible information processing within each class.
1466:
1467: Is there a way, though, to directly explore the space of processes, without
1468: assuming a particular model class or parametrization? Can each process be
1469: taken at face value and tell us how it is structured? More to the point, can
1470: we avoid making structural assumptions, as done in the preceding sections?
1471:
1472: Affirmative answers to these questions are found in the approach laid out
1473: by \emph{computational mechanics} \cite{Crut89,Crut92c,Shal01a}.
1474: Computational mechanics demonstrates that each process has an optimal,
1475: minimal, and unique representation---the \emph{\eM}---that captures the
1476: process's structure. Due to optimality, minimality, and uniqueness, the
1477: \eM\ may be viewed as \emph{the} representation of its associated process. In
1478: this sense, this representation is parameter free. To determine an \eM\ for a
1479: process one calculates a set of \emph{causal states} and their transitions.
1480: In other words, one does not specify a priori the number of states or
1481: the transition structure between them. Determining the \eM\ makes such no
1482: structural assumptions \cite{Crut92c,Shal01a}.
1483:
1484: Using the one-to-one relationship between processes and their \eMs, here we
1485: invert the preceding logic of going from a process to its \eM. We explore the
1486: space of processes by systematically enumerating \eMs\ and then calculating
1487: their excess entropies ${\bf E}$ and their entropy rates $h_\mu$. This gives
1488: a direct view of how intrinsic computation is organized in the space of
1489: processes.
1490:
1491: As a complement to the Markov chain exploration of how intrinsic computation
1492: depends on transition probability variation, here we examine how an \eM's
1493: structure (states and their connectivity) affects information processing. We
1494: do this by restricting attention to the class of \emph{topological \eMs} whose
1495: branching transition probabilities are fair (equally probable). (An example
1496: is shown in Fig.~\ref{f_mn}.)
1497:
1498: If we regard two \eMs\ isomorphic up to variation in transition
1499: probabilities as members of a single equivalence class, then each such
1500: class of \eMs\ contains precisely one topological \eM. (Symbolic dynamics
1501: \cite{Lind95a} refers to a related class of representations as
1502: \emph{topological Markov chains}. An essential, and important,
1503: difference is that \eMs\ always have the smallest number of states.)
1504:
1505: \begin{table}
1506: \begin{tabular}{|c|c|}
1507: \hline
1508: {Causal States} & {Topological} \\
1509: n & {\eMs} \\
1510: \hline
1511: 1 & 3 \\
1512: 2 & 7 \\
1513: 3 & 78 \\
1514: 4 & 1,388 \\
1515: 5 & 35,186 \\
1516: \hline
1517: \end{tabular}
1518: \caption{The number of topological binary \eMs\ up to $n = 5$ causal states.
1519: (After Ref. \protect\cite{McTa05a}.)
1520: \label{proclangcount}
1521: }
1522: \end{table}
1523:
1524: It turns out that the topological \eMs\ with a finite number of states can be
1525: systematically enumerated \cite{McTa05a}. Here we consider only \eMs\ for
1526: binary processes: $\mathcal{A} = \{0,1\}$. Two \eMs\ are
1527: isomorphic and generate essentially the same stochastic process,
1528: if they are related by a relabeling of states or if their output
1529: symbols are exchanged: $0$ is mapped to $1$ and vice versa. The number
1530: of isomorphically distinct topological \eMs\ of $n=1,\ldots,5$~states
1531: is listed in Table~\ref{proclangcount}.
1532:
1533: % **********************************************************************
1534: \begin{figure*}[tbp]
1535: \epsfxsize=6.0in
1536: \begin{center}
1537: \leavevmode
1538: \epsffile{Evshmu.eps}
1539: \end{center}
1540: \vspace{-6mm}
1541: \caption{Complexity-entropy pairs $(h_\mu,\EE)$ for all topological binary
1542: \eMs\ with $n = 1, \ldots, 4$ states and for $35,041$ of the $35,186$
1543: $5$-state \eMs. The excess entropy is estimated as $\EE(L) = H(L) - L \hmu$
1544: using the exact value for the entropy rate $\hmu$ and a storage-efficient
1545: type-class algorithm \cite{Youn93a} for the block entropy $H(L)$. The
1546: estimates were made by increasing $L$ until $\EE(L) - \EE(L-1) < \delta$,
1547: where $\delta = 0.0001$ for $1$, $2$, and $3$ states; $\delta = 0.0050$
1548: for $4$ states; and $\delta = 0.0100$ for 5 states.
1549: }
1550: \label{process.plot}
1551: \end{figure*}
1552: % **********************************************************************
1553:
1554: In Fig.~\ref{process.plot} we plot their $(\hmu,\EE)$ pairs. There one
1555: sees that the complexity-entropy diagram exhibits quite a bit of
1556: organization, with variations from very low to very high density of
1557: \eMs\ co-existing with several distinct vertical (iso-entropy)
1558: families. To better understand the structure in the complexity-entropy
1559: diagram, though, it is helpful to consider bounds on the complexities
1560: and entropies of Fig.~\ref{process.plot}. The minimum complexity, $\EE = 0$,
1561: corresponds to machines with only a single state. There are two possibilities
1562: for such binary \eMs. Either they generate all $1$s (or $0$s) or all sequences
1563: occurring with equal probability (at each length). If the latter, then
1564: $\hmu = 1$; if the former, $\hmu = 0$. These two points, $(0,0)$ and $(1,0)$,
1565: are denoted with solid circles along Fig.~\ref{process.plot}'s horizontal axis.
1566:
1567: % **********************************************************************
1568: \begin{figure}[tbp]
1569: \epsfxsize=3.0in
1570: \begin{center}
1571: \leavevmode
1572: \epsffile{f_mn.eps}
1573: \end{center}
1574: \caption{An example topological \eM\ for a cyclic process in $\mc{F}_{5,3}$.
1575: Note that branching occurs only between pairs of successive states in
1576: the cyclic chain. The excess entropy for this process is $\log_2 5
1577: \approx 2.32$, and the entropy rate is $3/5$.
1578: }
1579: \label{f_mn}
1580: \end{figure}
1581: % **********************************************************************
1582:
1583: The maximum $\EE$ in the complexity-entropy diagram is $\log_2 5
1584: \approx 2.3219$. One such \eM\ corresponds to the zero-entropy,
1585: period-$5$ processes. And there are four similar processes with
1586: periods $p = 1, 2, 3, 4$ at the points $(0, \log_2 p)$. These are
1587: denoted on the figure by the tokens along the left vertical
1588: axis.
1589:
1590: There are other period-$5$ \emph{cyclic, partially random} processes with
1591: maximal complexity, though; those with causal states in a cyclic
1592: chain. These have $b = 1, 2, 3, 4$ branching transitions between successive
1593: states in the chain and so positive entropy. These appear as a horizontal
1594: line of enlarged square tokens along in the upper portion of the
1595: complexity-entropy diagram.
1596: Denote the family of $p$-cyclic processes with $b$ branchings as
1597: $\mc{F}_{p,b}$. An \eM\ illustrating $\mc{F}_{5,3}$ is shown in
1598: Fig.~\ref{f_mn}. The excess entropy for this process is $\log_2 5
1599: \approx 2.32$, and the entropy rate is $3/5$.
1600:
1601: Since \eMs\ for cyclic processes consist of states in a single loop,
1602: their excess entropies provide an upper bound among \eMs\ that generate
1603: $p$-cyclic processes with $b$ branchings states, namely:
1604: \begin{equation}
1605: \EE(\mc{F}_{p,b}) = \log_2 (p) ~.
1606: \end{equation}
1607: Clearly, $\EE(\mc{F}_{p,b}) \rightarrow \infty$ as $p \rightarrow \infty$.
1608: Their entropy rates are given by a similarly simple expression:
1609: \begin{equation}
1610: \hmu(\mc{F}_{p,b}) = \frac{b}{p} ~.
1611: \end{equation}
1612: Note that $\hmu(\mc{F}_{p,b}) \rightarrow 0$ as $p \rightarrow \infty$
1613: with fixed $b$ and $\hmu(\mc{F}_{p,b}) \rightarrow 1$ as $b \rightarrow p$.
1614: Together, then, the family $\mc{F}_{5,b}$ gives an upper bound to the
1615: complexity-entropy diagram.
1616:
1617: The processes $\mc{F}_{p,b}$ are representatives of the highest points
1618: of the prominent jutting vertical towers of \eMs\ so prevalent in
1619: Fig.~\ref{process.plot}. It therefore seems reasonable
1620: to expect the $(\hmu,\EE)$~coordinates for $p$-cyclic process
1621: languages to possess at least $p-1$~vertical towers, distributed
1622: evenly at $\hmu=b/p$, $b=1, \dots, p-1$, and for these towers to
1623: correspond with towers of $m$-cyclic process languages whenever $m$ is
1624: a multiple of~$p$.
1625:
1626: These upper bounds are one key difference from earlier classes in which there
1627: was a decreasing linear upper bound on complexity as a function of entropy
1628: rate: $\EE \leq R(1-h_\mu)$. That is, in the space of processes, many are not
1629: so constrained. The subspace of topological \eMs\ illustrates that there are
1630: many highly entropic, highly structured processes. Some of the more
1631: familiar model classes appear to inherit, in their implied parametrization of
1632: process space, a bias away from such processes.
1633:
1634: It is easy to see that the families $\mc{F}_{p,p-1}$ and
1635: $\mc{F}_{p,1}$ provide upper and lower bounds for~$\hmu$,
1636: respectively, among the process languages that achieve
1637: maximal~$\EE$ and for which~$\hmu > 0$. Indeed, the smallest
1638: positive~$\hmu$ possible is achieved when only a single of
1639: the equally probable states has more than one outgoing transition.
1640:
1641: More can be said about this picture of the space of intrinsic
1642: computation spanned by topological \eMs\ \cite{McTa05a}. Here,
1643: however, our aim is to illustrate how rich the diversity of intrinsic
1644: computation can be and to do so independent of conventional
1645: model-class parametrizations. These results allow us to probe in a
1646: systematic way a subset of processes in which structure dominates.
1647:
1648: \section{Discussion and Conclusion}
1649: \label{Discussion}
1650:
1651: Complexity-entropy diagrams provide a common view of the intrinsic
1652: information processing embedded in different processes. We used them
1653: to compare markedly different systems: one-dimensional maps of the
1654: unit interval; one- and two-dimensional Ising models; cellular automata;
1655: Markov chains; and topological \eMs. The exploration of each class turned
1656: different knobs in the sense that we adjusted different parameters:
1657: temperature, nonlinearity, coupling strength, cellular automaton rule,
1658: and transition probabilities. Moreover, these parameters had very
1659: different effects. Changing the temperature and coupling constants in the
1660: Ising models altered the probabilities of configurations, but it did not
1661: change which configurations were allowed to occur. In contrast, the
1662: topological \eMs\ exactly expressed what it means for different processes to
1663: have different sets of allowed sequences. Changing the CA rules or the
1664: nonlinearity parameter in the logistic map combined these effects: the
1665: allowed sequences or the probability of sequences or both changed.
1666: In this way, the survey illustrated in dramatic fashion one of the benefits
1667: of the complexity-entropy diagram: it allows for a common comparison
1668: across rather varied systems.
1669:
1670: For example, the complexity-entropy diagram for the radius-$2$,
1671: one-dimensional cellular automata, shown in Fig.~\ref{1D.rad2.spatial.hvsE},
1672: is very different from that of the logistic map, shown in
1673: Fig.~\ref{logistic.banded.plot}. For the logistic map, there is a
1674: distinct lower bound for the excess entropy as a function of the
1675: entropy rate. In Fig.~\ref{logistic.banded.plot} this is seen as the
1676: large forbidden region at the diagram's lower portion. In sharp contrast,
1677: in Fig.~\ref{1D.rad2.spatial.hvsE} no such forbidden region is seen.
1678:
1679: At a more general level of comparison, the survey showed that for a given
1680: $\hmu$, the excess entropy $\EE$ can be arbitrarily small. This suggests
1681: that the intrinsic computation of cellular automata and the logistic map are
1682: organized in fundamentally different ways. In turn, the 1D and 2D
1683: Ising systems exhibit yet another kind of
1684: information processing capability. Each of has well defined ground
1685: states---seen as the zero-entropy tips of the ``batcapes'' in
1686: Figs.~\ref{Ising.Batcape} and \ref{2DIsingBatcape}. These ground states are
1687: robust under small amounts of noise---i.e., as the temperature increases from
1688: zero. Thus, there are almost-periodic configurations at low entropy. In
1689: contrast, there do not appear to be any almost-periodic configurations at low
1690: entropy for the logistic map of Fig.~\ref{logistic.banded.plot}.
1691:
1692: Our last example, topological \eMs, was a rather different kind of
1693: model class. In fact, we argued that it gave a direct view into the
1694: very structure of the space of processes. In this sense, the
1695: complexity-entropy diagram was parameter free. Note, however, that by
1696: choosing all branching probabilities to be fair, we intentionally biased
1697: this model class toward high-complexity, high-entropy processes. Nevertheless,
1698: the distinction between the topological \eM\ complexity-entropy diagram of
1699: Fig.~\ref{process.plot} and the others is striking.
1700:
1701: The diversity of possible complexity-entropy diagrams points to their
1702: utility as a way to compare information processing across different classes.
1703: Complexity-entropy diagrams can be empirically calculated from observed
1704: configurations themselves. The organization reflected in the complexity-entropy
1705: diagram then provides clues as to an appropriate model class to use for the
1706: system at hand. For example, if one found a complexity-entropy diagram with
1707: a batcape structure like that of Figs.~\ref{Ising.Batcape} and
1708: \ref{2DIsingBatcape}, this suggests that the class could be well modeled
1709: using energies that, in turn, were expressed via a Hamiltonian.
1710: Complexity-entropy diagrams may also be of use in classifying behavior
1711: within a model class. For example, as noted above, a type of
1712: complexity-entropy diagram has already been successfully used to
1713: distinguish between different types of structure in anatomical MRI
1714: images of brains \cite{Youn05a,Youn08a}.
1715:
1716: Ultimately, the main conclusion to draw from this survey is that there
1717: is a large diversity of complexity-entropy diagrams. There is certainly
1718: not a universal complexity-entropy curve, as once hoped. Nor is it the case
1719: that there are even qualitative similarities among complexity-entropy
1720: diagrams. They
1721: capture distinctive structure in the intrinsic information processing
1722: capabilities of a class of processes. This diversity is not a negative
1723: result. Rather, it indicates the utility of this type of
1724: intrinsic computation analysis, and it optimistically points
1725: to the richness of information processing available in the mathematical and
1726: natural worlds. Simply put, information processing is too complex to be
1727: simply universal.
1728:
1729: \section*{Acknowledgments}
1730:
1731: Our understanding of the relationships between complexity and entropy
1732: has benefited from numerous discussions with Chris Ellison, Kristian
1733: Lindgren, John Mahoney, Susan McKay, Cris Moore, Mats Nordahl, Dan Upper,
1734: Patrick Yannul, and Karl Young. The authors thank, in particular, Chris
1735: Ellison, for help in producing the \eM\ complexity-entropy diagram.
1736: This work was supported at the Santa Fe Institute under the Computation,
1737: Dynamics, and Inference Program via SFI's core grants from the National
1738: Science and MacArthur Foundations. Direct support was provided from DARPA
1739: contract F30602-00-2-0583. The CSC Network Dynamics Program funded by Intel
1740: Corporation also supported this work. DPF thanks the Department of Physics
1741: and Astronomy at the University of Maine for its hospitality. The REUs,
1742: including one of the authors (CM), who worked on related parts of the
1743: project at SFI were supported by the NSF during the summers of 2002 and
1744: 2003.
1745:
1746: \bibliography{dpf}
1747:
1748: \end{document}
1749: