cs0009001/cs0009001
1: \documentstyle[12pt,amsmath,amsfonts,theorem]{article}
2: 
3: 
4: 
5: % Parameters for both A4 and Letter paper
6: \setlength{\textheight}{214mm}  
7: \setlength{\textwidth}{140mm}    
8: \setlength{\topmargin}{0mm}
9: \setlength{\headheight}{0mm}
10: \setlength{\headsep}{16mm}
11: \setlength{\evensidemargin}{12mm}
12: \setlength{\oddsidemargin}{12mm}
13: \setlength{\footskip}{8mm}
14: \setlength{\parindent}{0mm}
15: \setlength{\parskip}{1.5mm}
16: \pagestyle{myheadings}
17: \markright{Andrei~N.~Soklakov}
18: 
19: 
20: %% Parameters for Letter paper
21: %\setlength{\textheight}{215mm}   %  297mm - 50mm
22: %\setlength{\textwidth}{165mm}    %  210mm - 50mm
23: %\setlength{\topmargin}{-5mm}
24: %\setlength{\headheight}{0mm}
25: %\setlength{\headsep}{15mm}
26: %\setlength{\evensidemargin}{0mm}
27: %\setlength{\oddsidemargin}{0mm}
28: %\setlength{\footskip}{8mm}
29: %\setlength{\parindent}{0mm}
30: %\setlength{\parskip}{1.5mm}
31: %\pagestyle{myheadings}
32: %\markright{Andrei~N.~Soklakov}
33: 
34: 
35: 
36: %% Parameters for A4 paper
37: %\setlength{\textheight}{237mm}   %  297mm - 50mm
38: %\setlength{\textwidth}{160mm}    %  210mm - 50mm
39: %\setlength{\topmargin}{-5mm}
40: %\setlength{\headheight}{0mm}
41: %\setlength{\headsep}{15mm}
42: %\setlength{\evensidemargin}{0mm}
43: %\setlength{\oddsidemargin}{0mm}
44: %\setlength{\footskip}{8mm}
45: %\setlength{\parindent}{0mm}
46: %\setlength{\parskip}{1.5mm}
47: %\pagestyle{myheadings}
48: %\markright{Andrei~N.~Soklakov}
49: 
50: \theoremstyle{break}
51: 
52: \renewcommand{\abstractname}{}
53: 
54: 
55: \newtheorem{definition}{Definition}
56: \newtheorem{remark}{Remark}
57: \newtheorem{lemma}{Lemma}
58: \newtheorem{theorem}{Theorem}
59: \newtheorem{example}{Example}
60: 
61: \newcommand{\set}[1]{{\mathbb{#1}}}
62: \newcommand{\ba}{\mbox{\boldmath $a$}}
63: \newcommand{\bepsilon}{\mbox{\boldmath $\epsilon$}}
64: \newcommand{\br}{\mbox{\boldmath $r$}}
65: \newcommand{\bv}{\mbox{\boldmath $v$}}
66: \newcommand{\bV}{\mbox{\boldmath $V$}}
67: \newcommand{\cU}{{\cal U}}
68: \newcommand{\cV}{{\cal V}}
69: 
70: \newcommand{\Jf}{{}^J\!f}
71: \newcommand{\ttS}{{\tt S}}
72: \newcommand{\ttU}{{\tt U}}
73: \newcommand{\ttV}{{\tt V}}
74: 
75: \newcommand{\ttu}{{\tt u}}
76: \newcommand{\ttv}{{\tt v}}
77: 
78: \newcommand{\oo}[1]{\overset{\circ}{#1}}
79: \newcommand{\ooo}[1]{\overset{\circ\circ}{#1}}
80: 
81: \begin{document}
82: \title{ \vspace{-1.7 cm}
83: Complexity analysis for algorithmically simple strings}
84: \author{Andrei N. Soklakov\footnote{e-mail: a.soklakov@rhul.ac.uk}\\
85: \\
86: {\it Department of Mathematics} \\
87: {\it Royal Holloway, University of London}\\
88: {\it Egham, Surrey TW20 0EX, United Kingdom}}
89: 
90: \date{25 February 2002}
91: 
92: \maketitle
93: 
94: \begin{abstract}
95: \vspace{-9mm}
96: Given a reference computer, Kolmogorov complexity is
97: a well defined function on all binary strings. 
98: In the standard approach, however, only
99: the asymptotic properties of such functions are considered
100: because they do not depend on the reference computer.
101: We argue that this approach can be more useful if it is refined
102: to include an important practical case of simple binary strings.
103: Kolmogorov complexity calculus may be developed
104: for this case if we restrict the class of available reference computers.
105: The interesting problem is to define a class of computers
106: which is restricted in a {\it natural} way modeling the
107: real-life situation where only a limited class of computers
108: is physically available to us. We give an example of what such a natural
109: restriction might look like mathematically, and show that under such
110: restrictions some error terms, even logarithmic in complexity, can
111: disappear from the standard complexity calculus.
112: 
113: {\it Keywords:} Kolmogorov complexity; Algorithmic information theory.
114: \end{abstract}
115: 
116: \section{Introduction}
117: 
118: The asymptotic nature of Kolmogorov complexity 
119: calculus renders it significantly less useful in practical applications
120: such as inference by the minimum description length (MDL)
121: principle~\cite{Rissanen_1978}.
122: In the classical MDL approach~\cite{Rissanen_1997}
123: this problem is solved by replacing
124: Kolmogorov complexity with a phenomenological
125: complexity measure just before performing the actual inference.
126: Such a measure can be chosen to suit a particular application,
127: whereas the general form of the MDL constructions can be
128: considered as a consequence of the asymptotic properties
129: of Kolmogorov complexity (consult section 5.5 in Ref.~\cite{LiVitanyi}).
130: Here we propose a different
131: approach. We argue that Kolmogorov complexity can become
132: more practical if we restrict the class of reference computers.
133: 
134: 
135: 
136: Computer science is not the only field which can benefit
137: from the proposed research. There is a growing
138: interest in using Kolmogorov complexity as a fundamental
139: {\it physical} concept. This includes applications in
140: thermodynamics~\cite{Bennett_1982,Bennett_1987,Zurek_1989}%
141: \footnote{consult~\cite{LiVitanyi} for further references.}, theory of
142: chaos~\cite{Brudno_1978,Brudno_1982,Ford_1983,SchackCaves_1992}%
143: \footnote{
144: consult~\cite{LiVitanyi} for further references.},
145: physics of 
146: computation~(consult~\cite{LiVitanyi} and references therein),
147: and many other areas of modern theoretical 
148: physics~\cite{Dzhunushaliev,SoklakovSchack,Soklakov00}.
149: It is however very difficult to use Kolmogorov complexity
150: in any concrete physical setting, or indeed, in any concrete
151: application. For that we need a much more detailed
152: calculus that can be applied to particular cases of reference computers.
153: The main aim of this article is to stimulate
154: further research in developing such a {\it practical} complexity calculus.
155: 
156: This article is organized as follows.
157: In section~\ref{Basic} we review some basic definitions.
158: In section~\ref{Main} we present the main conceptual arguments
159: of the paper. In section~\ref{Example} we give an example of how
160: one can build a restricted class of computers in a ``natural'' way.
161: Considering one of the central equalities of the standard complexity
162: calculus we give an illustration of how the error terms may be reduced.
163: 
164: 
165: \section{Basic definitions} \label{Basic}
166: 
167: Let
168: $\set{X}=\{\Lambda,0,1,00,01,10,11,000,\dots\}$
169: be the set of
170: finite binary strings where $\Lambda$ is the string of length 0.
171: A set of strings $\set{Y}\subset \set{X}$ with the property that no string in
172: $\set{Y}$ is a prefix of another is called an instantaneous code.
173:  A prefix computer is a partial recursive
174: function
175: $C: \set{Y}\times \set{X}\to \set{X}$.
176: For each $p\in \set{Y}$ (program string) and for
177: each $d\in \set{X}$ (data string) the output of the computation is either
178: undefined or given by $C(p,d)\in \set{X}$.
179: Kolmogorov complexity
180: of a string $\alpha$ given a data string $d$ relative to a computer
181: $C$ is defined as the length
182: $K_C(\alpha|d)$ of the shortest program that
183: makes $C$ compute $\alpha$ given data~$d$:
184: \begin{equation}
185: K_C(\alpha|d)\equiv\min_{p}\{ |p|\; {\big{|}}\;C(p,d)=\alpha\}\,,
186: \end{equation}
187: where $|p|$ denotes the length of the program $p$ (in bits).
188: 
189: Since this complexity measure depends strongly on the
190: reference computer, it is important to find an optimal computer $U$ such
191: that the complexity of any string relative to $U$ is not much higher that
192: the complexity of the same string relative to any other computer $C$.
193: Mathematically, a computer $U$ is called optimal if
194: \begin{equation}
195: \forall C\ \ \exists\kappa_C\ \mbox{such that } \forall \alpha,d:\ \  K_U(\alpha|d)\leq K_C(\alpha|d)
196: +\kappa_C\,,
197: \end{equation}
198:  where $\kappa_C$ is a constant depending
199: on $C$ (and $U$) but not on $\alpha$ or $d$. 
200: It turns out that the set of prefix computers contains such a $U$ and,
201: moreover, it can be constructed so that any prefix computer
202: can be simulated by $U$: for further details consult~\cite{LiVitanyi}.
203: Such a $U$ is called a universal prefix computer and its choice is not unique.
204: Using some particular universal prefix computer $U$ as a reference,
205: the conditional Kolmogorov complexity of $\alpha$
206: given $\beta$ is defined as $K_U(\alpha|\beta)$.
207: 
208: The above definitions are generalized for the case
209: of many strings as follows. We choose and fix a particular recursive bijection
210: $B: \set{X}\times \set{X}\to \set{X}$ for use throughout the rest of this paper.
211:  Let $\{\alpha^i\}_{i=1}^{n}$
212: be a set of $n$ strings $\alpha^i\in \set{X}$.
213: For $2\leq k\leq n\;$ we define
214: ${\langle\alpha^1,\alpha^2,\dots,\alpha^k\rangle}\equiv
215: B({\langle\alpha^1,\dots,\alpha^{k-1}\rangle},\alpha^k)$,
216: and ${\langle\alpha^1\rangle}\equiv\alpha^1$.
217: We can now define $K_U(\alpha^1,\dots,\alpha^n| \beta^1,\dots,\beta^k)
218: \equiv K_U({\langle \alpha^1,\dots,\alpha^n\rangle }|{\langle\beta^1,
219: \dots,\beta^k\rangle})$.
220: 
221: For any two universal prefix computers $U_1$ and $U_2$ we have, by
222: definition, $|K_{U_1}(\alpha|\beta) -K_{U_2}(\alpha|\beta)|
223: \leq \kappa(U_1,U_2)$
224: where $\kappa(U_1,U_2)$ is a constant that depends only on $U_1$
225: and $U_2$ and not on $\alpha$ or $\beta$. Most of the research on
226: Kolmogorov complexity is focused on the asymptotic case of
227: nearly random long strings, when $\kappa(U_1,U_2)$ can
228: be neglected in comparison to the value of the complexity.
229: In such cases, Kolmogorov
230: complexity becomes an asymptotically absolute measure of the
231: complexity of individual strings. For this reason,
232: many fundamental properties of Kolmogorov complexity are established
233: up to an error term which is asymptotically small compared to the
234: complexity of strings involved. For instance,  the standard
235: analysis of the prefix Kolmogorov
236: complexity~(\cite{LiVitanyi}, Section 3.9.2)
237: gives
238: \begin{equation}            \label{ErrorDelta}
239: K_U(\alpha,\gamma|\beta)=K_U(\alpha|\gamma,\beta)
240:                                                        +K_U(\gamma|\beta)+\Delta\,,
241: \end{equation}
242: where $\Delta$ is an error term which grows logarithmically
243: with the complexity of the considered strings. This is an example
244: of an asymptotic property that all Kolmogorov measures of complexity
245: have irrespective of the choice of the reference computer.
246: Of course, it is important to know that all Kolmogorov measures
247: of complexity share many of their asymptotic properties.
248: For any given reference computer, however,
249: Kolmogorov complexity is a well defined function on all binary strings.
250: Even from a purely mathematical viewpoint it is interesting
251: to study the properties of such functions beyond the asymptotics.
252: As for the applied viewpoint, consider, by analogy, mathematical
253: analysis. This theory would be much less useful if we studied
254: only asymptotic properties of functions.
255: 
256: 
257: 
258: 
259: 
260: 
261: \section{Main arguments}\label{Main}
262: 
263: Without significant knowledge about the reference computer,
264: Kolmogorov complexity can be considered only up to an additive
265: error term $O(1)$.
266: Error terms even as small as $O(1)$ make it impossible
267: to use Occam's razor to discriminate between simple
268: hypotheses. The importance of this problem becomes
269: apparent once we recognize that the domain of simple hypotheses
270: is absolutely crucial in our every-day life as well as in fundamental
271: science. Indeed, it is often the case that, after extensive analysis,
272: the greatest scientific discoveries can be expressed in a form so simple
273: that they are readily understood by even school children.
274: 
275: Humans can relatively easily discriminate between different hypotheses
276: even when the Kolmogorov complexities involved are rather small.
277: This gives them an enormous advantage over the present-day
278: theoretical models. A good example is Kepler's theory of planetary motion.
279: In what was
280: a major breakthrough in theoretical astronomy at the time,
281: Kepler introduced elliptical orbits as a better alternative to the complicated
282: Copernican planetary model of superimposed epicycles.
283: At the level of accuracy provided by Brahe's experiments, the original
284: Copernican model had to be refined by introducing additional
285: epicycles: the Keplerian theory appeared to be simpler
286: and therefore better by Occam's razor. This apparently obvious
287: fact cannot be established using the standard formalism
288: of Kolmogorov complexity: whereas Kepler's theory can be simpler
289: relative to some type of computers, the Copernican model can be
290: simpler relative to some other type of reference computers.
291: 
292: Much simpler examples can be found in tests that are
293: designed by humans to test their own intelligence.
294: A typical problem in such tests is to find the
295: next element in a sequence of symbols. For example,
296: if the first four elements of a sequence are 1,2,3,4 
297: an intelligent person is supposed to see the simplest
298: pattern and predict 5 as the next element of the sequence.
299: As in the previous example, all humans would agree that
300: predicting 5 would correspond to the choice of the simplest
301: hypothesis, whereas the standard formalism of Kolmogorov
302: complexity cannot be used to justify this.
303: It seems entirely plausible that
304: the ultimate theory of artificial intelligence and,
305: in particular, inductive
306: inference, can achieve human-like results only if the
307: building blocks of the theory, such as Kolmogorov complexity,
308: are made sensitive to small variations in the complexity of hypothesis.
309: 
310: 
311: The $O(1)$ ambiguity in the classical definition of Kolmogorov complexity
312: and the error terms like $\Delta$ in Eq.~(\ref{ErrorDelta})
313: is the price we pay for having an unrestricted class of reference computers.
314: Every human perceives complexity with respect to their own
315: built-in reference computer -- the brain.
316: As in the case of abstract reference computers,
317: human brains are not identical. However, they are similar enough to
318: allow for a sharper discrimination between individual theories
319: on the basis of their complexity. This suggests that further progress
320: in applications of Kolmogorov complexity to the theory of induction
321: can be made possible if we find a natural way
322: of restricting the class of reference computers.
323: 
324: 
325: We see from this discussion that some restrictions on the
326: class of reference computers are needed. 
327: It is desirable, however, to have a complexity
328: theory which would be as general as possible. As a compromise,
329: we can try to group all possible reference computers into restricted
330: classes. Although we may want to study all such classes,
331: we can argue that due to biological, technological, and
332: other limitations only one class of reference computers is
333: physically available to us.
334: A definition of this realistic class of reference computers would
335: be the crucial link between the abstract theory of
336: Kolmogorov complexity and the practical theories of induction and
337: computer learning.
338: 
339: What kind of restriction of the class of reference computers can be
340: seen as natural? It appears natural to assume that given some particular
341: level of technology one can build more powerful computers only at the
342: expense of a more complex internal design. 
343: In section~\ref{Example} we use this observation
344: to construct an example of a ``natural''
345: restriction of the class of reference computers. 
346: Roughly speaking, this restriction entails
347: the requirement that switching to a more complex reference
348: computer should always be accompanied by an
349: equivalent reduction of program lengths.
350: Using some particular universal computer $U$
351: as a reference, we define the complexity of a computer $W_s$
352: from the set $\{W_i\}$ given data $d$ as $K_U(s|d)$.
353: We then construct a particular set of computers $\{W_i\}$
354: such that the sum of the complexity of a computer and the length
355: of a program for it is the same for all 
356: equivalent\footnote{two programs $p_1$ and $p_2$ for computers
357: $C_1$ and $C_2$ are called equivalent iff $C_1(p_1|d)=C_2(p_2|d)$.}
358: programs and for all
359: computers in the set $\{W_i\}$
360: (consult section~\ref{Example} for details).
361: This gives us a tradeoff between computer complexity and
362: program lengths similar to what one would expect in the
363: real world where we face various practical limitations.
364: Together with the original reference computer $U$
365: computers $\{W_i\}$ form a ``naturally'' restricted class.
366:  It is natural to define a
367: computer $W$ which is universal for this class by setting $W(p,\langle
368: s,d\rangle)=W_s(p,d)$, where $U$ is included by defining
369: $W_\Lambda\equiv U$.
370: Using any such $W$ as a reference we can see that, in principle,
371: even error terms logarithmic in complexity can
372: be removed from the standard complexity calculus. In particular,
373: we prove that for any triple of simple strings $\alpha,\beta,\gamma$,
374: we have
375:  \begin{equation}                       \label{Kw}
376: K_W(\alpha,\gamma|\langle\Lambda,\beta\rangle)=
377:      K_W(\alpha|\gamma,\beta)
378:    +K_W(\gamma|\langle\Lambda,\beta\rangle)+{\mbox{\rm const}}\,,
379: \end{equation}
380: where the constant depends only on the reference machine $W$
381: (not on $\alpha$, $\beta$ or~$\gamma$). Apart from subtleties
382: associated with the operation of combining strings into pairs, this
383: is analogous to Eq.~(\ref{ErrorDelta}) with the important difference
384: that the error term is replaced by a constant.
385: 
386: In the standard complexity calculus the above equation holds only up to
387: an error term which grows logarithmically with the complexity
388: of the considered strings. As we explained earlier, this is unacceptable
389: if we want to analyze the complexity of simple strings. 
390: The error terms
391: are especially troublesome if we want to use the complexity calculus
392: as a part of inductive inference based on the MDL principle.
393: In such cases we are interested in the {\it position}
394: of the minimum rather than on the approximated value of complexity.
395: The error term can significantly shift the position of the minimum
396: even when mistakes on the value of complexity are minor. This can
397: introduce uncontrollable mistakes in the inference results.
398: In our case, however, equation~(\ref{Kw}) is exact in the sense
399: that the constant does not influence the position of critical points
400: so it can be safely ignored in applications such as induction by the
401: MDL principle.
402: 
403: 
404: \section{Example} \label{Example}
405: 
406: 
407: As we explained in section~\ref{Main}, a natural restriction of the class
408: of reference computers can make Kolmogorov complexity more
409: useful in applications such as inference and computer learning.
410: In this section we consider one possible way of making such a restriction.
411: We show that, in the important case of simple strings,
412: the proposed restriction effectively removes the
413: error term in Eq.~(\ref{ErrorDelta}),
414: which has important applications in physics~\cite{Soklakov00}.
415: 
416: \begin{definition}
417: Fix $\delta\in\set{N}$. A set of strings $\set{S}_\delta\subseteq \set{X}$
418: is called $\delta$-simple iff for any two strings $\alpha,\gamma \in \set{S}_\delta$
419: we have
420: \begin{equation}
421:         |\alpha|<\delta\,,\ \ \ |\gamma|<\delta\,, 
422:         {\rm\ \ \  and\ \ \ }|\langle\alpha,\gamma\rangle|<\delta\,,
423: \end{equation}
424: where $|\cdot|$ denotes the string length.
425: \end{definition}
426: 
427: Following Chaitin \cite{Chaitin75}, consider a list of infinitely many
428: requirements ${\langle r_k,l_k(d)\rangle}$ $(k=0,1,2,\dots)$ for the
429: construction of a computer. Each requirement
430: ${\langle r_k,l_k(d)\rangle}$ requests that a program
431: of length $l_k(d)$ be assigned to the result $r_k$ if the computer is given
432: data $d$.  The requirements are said to satisfy the Kraft inequality
433: if $\sum_{k}2^{-l_k(d)}\leq 1$: for such requirements there exists an
434: instantaneous code characterized by the set of string lengths $\{l_k(d)\}$.
435: A computer $C$ is said to satisfy the requirements if there are precisely as
436: many programs $p$ of length $l(d)$ such that $C(p,d)=r$ as there are pairs
437: ${\langle r,l(d)\rangle}$ in the list of requirements.
438: 
439: Fix a universal computer $U$ which can be constructed from an effectively 
440: given list of requirements (consult~\cite{Chaitin75}, Theorem 3.2).
441: Consider the set of all programs $\{p_k\}$ for $U$
442: such that the output of computation $U(p_k,d)$ is defined.
443: Since $B$ is a bijection, we can write $U(p_k,d)={\langle r_k,s_k\rangle}$,
444: where $r_k$ and $s_k$ are strings from $\set{X}$.
445: Moreover, because $U$ is a universal computer, any pair of strings
446: ${\langle \alpha,\gamma\rangle}$ can be generated this way.
447: In what follows we consider only those $p_k$ for which $s_k\neq\Lambda$.
448: For every fixed $s$ from
449: the set $\{s_k\}$ we construct a list of requirements
450: \begin{equation}                                                                         \label{requirements}
451:   {\langle r_k,|p_k|-K_U(s|d)+\kappa^s_d\rangle }\,,\ k=1,2,\dots
452: \end{equation}
453: where $|p_k|$ is the length of the program $p_k$, and $\kappa^s_d$ is some
454: constant.
455: It was shown~(\cite{Chaitin75}, Theorem 3.8)
456: that the constant $\kappa^s_d$ can be chosen large enough
457: such that these requirements satisfy the Kraft inequality.
458: Fix any $\delta\in\set{N}$, and consider
459: a sublist of requirements~(\ref{requirements}):
460: \begin{equation}                                                                  
461: {\langle r_k,|p_k|-K_U(s|d)+\kappa^s_d\rangle}
462: \ \ \ r_k,\, d\in \set{S}_\delta\,,
463: \end{equation}
464: where $\set{S}_\delta$ is the set of $\delta$-simple strings.
465: For any $s\in\set{S}_\delta$, we can find 
466: $\kappa\equiv\max\{\kappa^s_d|\,s,d\in \set{S}_\delta\}$,
467: then choose $\kappa^s_d=\kappa$, and construct a new list
468: of requirements
469: \begin{equation}                                                                      \label{requirements2}
470: {\langle r_k,|p_k|-K_U(s|d)+\kappa\rangle}
471: \ \ \ r_k,\, d\in \set{S}_\delta\,.
472: \end{equation}
473: For any fixed $s\in\set{S}_\delta$ these requirements satisfy
474: the Kraft inequality
475: by construction. Furthermore, since  $\set{S}_\delta$ is finite and
476: $B$ is recursive these requirements can be effectively given.
477: This means that for any $s\in\set{S}_\delta$ there
478: is a computer $W_s$ that satisfies these requirements:
479: consult (\cite{Chaitin75}, Theorem 3.2) for further details.
480: 
481: For each value of $s\in\set{S}_\delta\setminus\{\Lambda\}$ we use
482: (\ref{requirements2}) to construct
483: one $W_s$. We define $W_\Lambda=U$, and form the set 
484: $\set{W}_U\equiv\{W_s |\,s\in \set{S}_\delta\}$.
485: This set contains the original computer $U$ as a somewhat special
486: element. Having the computer $U$ at our disposal, it would take at least
487: $K_U(s|d)$ bits to specify any other $W_s$ from the set $\set{W}_U$
488: given data $d$. We can now see that requirements~(\ref{requirements2})
489: are designed
490: in such a way that more complex computers, i.e. larger $K_U(s|d)$,
491: will have shorter programs,
492: $l_k(d)= |p_k|-K_U(s|d)+\kappa$.
493: % so that the sum of the program length
494: %$l_k(d)$ and $K_U(s|d)$ is the same for all $W_s$.
495: This is exactly the property that we wanted to use as a
496: natural restriction that defines a realistic class of computers.
497: 
498: In what follows we restrict our attention
499: to the set $\set{W}_U$. We define a computer $W$
500: which is universal for the set $\set{W}_U$, i.e. which is designed
501: to simulate any computer $W_s\in \set{W}_U$:
502: \begin{equation}
503:   W(p,{\langle s,d\rangle})\equiv W_s(p,d)\,.
504: \end{equation}
505: 
506: \begin{theorem}
507: For any $\alpha,d \in \set{S}_\delta$, and for any
508: $\gamma \in \set{S}_\delta\setminus \{\Lambda\}$, we have
509: \begin{equation}                                                                                  \label{KwKu}
510: K_{W}(\alpha|\gamma,d)
511:    =K_W(\alpha,\gamma|\langle\Lambda, d\rangle)
512:       -K_W(\gamma|\langle\Lambda, d\rangle) + \kappa\,.
513: \end{equation}
514: \end{theorem}
515: 
516: {\bf Proof}\\
517: Consider the program
518: $\tilde{p}_k$ which causes $W_s\in\set{W}_U$
519: to produce the result $r_k\in\set{S}_\delta$
520: given data~$d$
521: \begin{equation}                                                                                          \label{r}
522:  W_s(\tilde{p}_k,d)=r_k\,.
523: \end{equation}
524: By definition of $W_s$, the length of $\tilde{p}_k$ satisfies the
525: requirement
526: \begin{equation}                                                                                     \label{pAbs}
527: \forall s\in\set{S}_\delta\setminus \{\Lambda\} {\rm\ \, and\ }
528:  \forall d\in\set{S}_\delta: \ \ \ 
529: |\tilde{p}_k|=|p_k|-K_U(s|d)+\kappa\,,
530: \end{equation}
531: where $p_k$ is the program for $U$ such that
532: \begin{equation}                                                                                               \label{rs}
533: U(p_k,d)={\langle r_k,s_k\rangle }\,,\ \ s_k\neq\Lambda\,.
534: \end{equation}
535: We define the set $\set{K}\equiv\{ i |\, U(p_{i},d)={\langle r_k,s_k\rangle}\}$,
536: which can contain more than one element since some
537: of the pairs $\{{\langle r_k,s_k\rangle}\}$ can coincide.
538:  From the construction of
539: $W_s$ we note that requirements
540: (\ref{requirements2}) associate exactly one program $\tilde{p}_k$
541: with the corresponding program $p_k$. In other words there is
542: a one-to-one correspondence between programs
543: $\tilde{p}_k$ and $p_k$ (which is given explicitly by the index $k$).
544: This means that the set
545: $\set{K}$ coincides with the set $\tilde{\set{K}}
546: \equiv\{ i |\, W_s(\tilde{p}_{i},d)=r_k\} $.
547: Since $U$, $d$ and $s$ are fixed, and using the identity
548: $\set{K}=\tilde{\set{K}}$, we have from Eq.~(\ref{pAbs})
549: \begin{equation}                                                                    \label{ShortestLengths}
550: \min_{k\in\tilde{\set{K}}}|\tilde{p}_k|
551:  =\min_{k\in{\set{K}}}|p_k|-K_U(s|d)+\kappa\,,
552: \ \ s\in \set{S}_\delta\setminus \{\Lambda\}\,.
553: \end{equation}
554: By definition of $W$ we have
555: \begin{equation}
556: W(\tilde{p}_k,{\langle s,d \rangle })
557: \equiv  W_s(\tilde{p}_k,d)=r_k\,,\ s\neq\Lambda\,.
558: \end{equation}
559: This means, by definition of Kolmogorov complexity, that
560: $K_W(r_k|s,d)=\min_{i\in\tilde{\set{K}}}|\tilde{p}_i|$, $s\neq\Lambda$.
561: Similarly from Eq.~(\ref{rs}), 
562: we have $K_U(r_k,s_k|d)=\min_{i\in{\set{K}}}|p_i|$
563: and therefore Eq.~(\ref{ShortestLengths}) becomes
564: \begin{equation} \label{AlmostThere}
565: K_W(r_k|s,d)=K_U(r_k,s_k|d)-K_U(s|d)+\kappa\,.
566: \end{equation}
567: Because $W(p,\langle\Lambda,d\rangle)=U(p,d)$ we have, for instance,
568: $K_U(s|d)=K_W(s,\langle\Lambda,d\rangle)$. Using this observation
569: to transform both terms at the right hand side of Eq.~(\ref{AlmostThere}), and 
570: choosing $s=s_k$ we have Eq.~(\ref{KwKu}) as required. $\Box$
571: 
572: Note that, since $U$ is an arbitrary prefix computer,
573: the above analysis provides a grouping of all
574: possible reference computers into naturally restricted classes.
575: 
576: 
577: \section{Acknowledgments}
578: 
579: It is my pleasure to acknowledge many helpful suggestions by
580: Jens G. Jensen, A.S. Johnson and Yuri Kalnishkan.
581: 
582: \thebibliography{References}
583: 
584: \bibitem{Bennett_1982} C.H. Bennett, Thermodynamics of computation --
585:                                                a review,
586:                                                IBM Int.\ J.\ Theor.\ Phys.\ {\bf 21}
587:                                                (1982)
588:                                                905-940.
589: 
590: \bibitem{Bennett_1987} C.H. Bennett, Demons, engines and the second
591:                                                law, Sci. American (Nov. 1987) 108-116.
592: 
593: 
594: \bibitem{Brudno_1978} A.A. Brudno, The complexity of the trajectories
595:                                              of a dynamical system, Russ.\ Math.\ Surv.\ {\bf 33}
596:                                               (1978) 197-198.
597: 
598: \bibitem{Brudno_1982}  A.A. Brudno, Entropy and the complexity of the
599:                                                trajectories of a dynamical system, Trans. Moscow
600:                                                Math.\ Soc.\ (1983) 127-151;
601:                                                and references therein.
602: 
603: 
604: \bibitem{Chaitin75} G.J. Chaitin, A theory of program size formally identical
605:                                        to information theory, J. ACM {\bf 22} (1975) 329-340.
606: 
607: 
608: \bibitem{Dzhunushaliev} V.D. Dzhunushaliev, Kolmogorov's algorithmic
609:                                complexity and its probability interpretation in quantum
610:                                gravity, Class.\ Quantum Grav. {\bf 15} (1998) 603-612.
611: 
612: 
613: \bibitem{Ford_1983} J. Ford, How random is a coin toss, Physics Today,
614:                                          {\bf 36} 40-47.
615: 
616: 
617: \bibitem{LiVitanyi} M. Li and P. Vit\'anyi, An introduction
618:                     to Kolmogorov Complexity and Its Applications
619:                     (Springer-Verlag New York, ed.\ 2, 1997) and references therein.
620: 
621: \bibitem{Rissanen_1978}  J. Rissanen, Modeling by shortest
622:                                                  data description,
623:                                                  Automatica {\bf 14} (1978), 465-471.
624: 
625: \bibitem{Rissanen_1997} J. Rissanen, Stochastic complexity in learning,
626:                                                 J.\ Comput.\ Sys.\ Sci.\ {\bf 55} (1997) 89-95.
627: 
628: 
629: \bibitem{SchackCaves_1992} R. Schack and C.M. Caves, Information
630:                                                         and entropy in the baker's map,
631:                                                         Phys.\ Rev.\ Lett.\ {\bf 69} (1992) 3413-3416;
632:                                                         and references therein.
633: 
634: \bibitem{SoklakovSchack} A.N. Soklakov and R. Schack,
635:                   Preparation information and optimal decompositions
636:                   for mixed quantum states, J.\ Mod.\ Optics {\bf 47}
637:                   (2000) 2265-2276.  
638: 
639: \bibitem{Soklakov00} A.N. Soklakov, Occam's razor as a formal basis for
640:                                           a physical theory, Found.\ Phys.\ Lett.\ to appear;
641:                                           also available as arXiv:math-ph/0009007.
642: 
643: 
644: \bibitem{Zurek_1989} W.H. Zurek, Algorithmic randomness and physical
645:                                            entropy, Phys.\ Rev.\ A {\bf 40} 4731-4751
646:                                            and references therein. 
647: 
648: \end{document}
649: