0009:cs0009001/cs0009001

1: \documentstyle[12pt,amsmath,amsfonts,theorem]{article}

2:

3:

4:

5: % Parameters for both A4 and Letter paper

6: \setlength{\textheight}{214mm}

7: \setlength{\textwidth}{140mm}

8: \setlength{\topmargin}{0mm}

9: \setlength{\headheight}{0mm}

10: \setlength{\headsep}{16mm}

11: \setlength{\evensidemargin}{12mm}

12: \setlength{\oddsidemargin}{12mm}

13: \setlength{\footskip}{8mm}

14: \setlength{\parindent}{0mm}

15: \setlength{\parskip}{1.5mm}

16: \pagestyle{myheadings}

17: \markright{Andrei~N.~Soklakov}

18:

19:

20: %% Parameters for Letter paper

21: %\setlength{\textheight}{215mm}   %  297mm - 50mm

22: %\setlength{\textwidth}{165mm}    %  210mm - 50mm

23: %\setlength{\topmargin}{-5mm}

24: %\setlength{\headheight}{0mm}

25: %\setlength{\headsep}{15mm}

26: %\setlength{\evensidemargin}{0mm}

27: %\setlength{\oddsidemargin}{0mm}

28: %\setlength{\footskip}{8mm}

29: %\setlength{\parindent}{0mm}

30: %\setlength{\parskip}{1.5mm}

31: %\pagestyle{myheadings}

32: %\markright{Andrei~N.~Soklakov}

33:

34:

35:

36: %% Parameters for A4 paper

37: %\setlength{\textheight}{237mm}   %  297mm - 50mm

38: %\setlength{\textwidth}{160mm}    %  210mm - 50mm

39: %\setlength{\topmargin}{-5mm}

40: %\setlength{\headheight}{0mm}

41: %\setlength{\headsep}{15mm}

42: %\setlength{\evensidemargin}{0mm}

43: %\setlength{\oddsidemargin}{0mm}

44: %\setlength{\footskip}{8mm}

45: %\setlength{\parindent}{0mm}

46: %\setlength{\parskip}{1.5mm}

47: %\pagestyle{myheadings}

48: %\markright{Andrei~N.~Soklakov}

49:

50: \theoremstyle{break}

51:

52: \renewcommand{\abstractname}{}

53:

54:

55: \newtheorem{definition}{Definition}

56: \newtheorem{remark}{Remark}

57: \newtheorem{lemma}{Lemma}

58: \newtheorem{theorem}{Theorem}

59: \newtheorem{example}{Example}

60:

61: \newcommand{\set}[1]{{\mathbb{#1}}}

62: \newcommand{\ba}{\mbox{\boldmath $a$}}

63: \newcommand{\bepsilon}{\mbox{\boldmath $\epsilon$}}

64: \newcommand{\br}{\mbox{\boldmath $r$}}

65: \newcommand{\bv}{\mbox{\boldmath $v$}}

66: \newcommand{\bV}{\mbox{\boldmath $V$}}

67: \newcommand{\cU}{{\cal U}}

68: \newcommand{\cV}{{\cal V}}

69:

70: \newcommand{\Jf}{{}^J\!f}

71: \newcommand{\ttS}{{\tt S}}

72: \newcommand{\ttU}{{\tt U}}

73: \newcommand{\ttV}{{\tt V}}

74:

75: \newcommand{\ttu}{{\tt u}}

76: \newcommand{\ttv}{{\tt v}}

77:

78: \newcommand{\oo}[1]{\overset{\circ}{#1}}

79: \newcommand{\ooo}[1]{\overset{\circ\circ}{#1}}

80:

81: \begin{document}

82: \title{ \vspace{-1.7 cm}

83: Complexity analysis for algorithmically simple strings}

84: \author{Andrei N. Soklakov\footnote{e-mail: a.soklakov@rhul.ac.uk}\\

85: \\

86: {\it Department of Mathematics} \\

87: {\it Royal Holloway, University of London}\\

88: {\it Egham, Surrey TW20 0EX, United Kingdom}}

89:

90: \date{25 February 2002}

91:

92: \maketitle

93:

94: \begin{abstract}

95: \vspace{-9mm}

96: Given a reference computer, Kolmogorov complexity is

97: a well defined function on all binary strings.

98: In the standard approach, however, only

99: the asymptotic properties of such functions are considered

100: because they do not depend on the reference computer.

101: We argue that this approach can be more useful if it is refined

102: to include an important practical case of simple binary strings.

103: Kolmogorov complexity calculus may be developed

104: for this case if we restrict the class of available reference computers.

105: The interesting problem is to define a class of computers

106: which is restricted in a {\it natural} way modeling the

107: real-life situation where only a limited class of computers

108: is physically available to us. We give an example of what such a natural

109: restriction might look like mathematically, and show that under such

110: restrictions some error terms, even logarithmic in complexity, can

111: disappear from the standard complexity calculus.

112:

113: {\it Keywords:} Kolmogorov complexity; Algorithmic information theory.

114: \end{abstract}

115:

116: \section{Introduction}

117:

118: The asymptotic nature of Kolmogorov complexity

119: calculus renders it significantly less useful in practical applications

120: such as inference by the minimum description length (MDL)

121: principle~\cite{Rissanen_1978}.

122: In the classical MDL approach~\cite{Rissanen_1997}

123: this problem is solved by replacing

124: Kolmogorov complexity with a phenomenological

125: complexity measure just before performing the actual inference.

126: Such a measure can be chosen to suit a particular application,

127: whereas the general form of the MDL constructions can be

128: considered as a consequence of the asymptotic properties

129: of Kolmogorov complexity (consult section 5.5 in Ref.~\cite{LiVitanyi}).

130: Here we propose a different

131: approach. We argue that Kolmogorov complexity can become

132: more practical if we restrict the class of reference computers.

133:

134:

135:

136: Computer science is not the only field which can benefit

137: from the proposed research. There is a growing

138: interest in using Kolmogorov complexity as a fundamental

139: {\it physical} concept. This includes applications in

140: thermodynamics~\cite{Bennett_1982,Bennett_1987,Zurek_1989}%

141: \footnote{consult~\cite{LiVitanyi} for further references.}, theory of

142: chaos~\cite{Brudno_1978,Brudno_1982,Ford_1983,SchackCaves_1992}%

143: \footnote{

144: consult~\cite{LiVitanyi} for further references.},

145: physics of

146: computation~(consult~\cite{LiVitanyi} and references therein),

147: and many other areas of modern theoretical

148: physics~\cite{Dzhunushaliev,SoklakovSchack,Soklakov00}.

149: It is however very difficult to use Kolmogorov complexity

150: in any concrete physical setting, or indeed, in any concrete

151: application. For that we need a much more detailed

152: calculus that can be applied to particular cases of reference computers.

153: The main aim of this article is to stimulate

154: further research in developing such a {\it practical} complexity calculus.

155:

156: This article is organized as follows.

157: In section~\ref{Basic} we review some basic definitions.

158: In section~\ref{Main} we present the main conceptual arguments

159: of the paper. In section~\ref{Example} we give an example of how

160: one can build a restricted class of computers in a ``natural'' way.

161: Considering one of the central equalities of the standard complexity

162: calculus we give an illustration of how the error terms may be reduced.

163:

164:

165: \section{Basic definitions} \label{Basic}

166:

167: Let

168: $\set{X}=\{\Lambda,0,1,00,01,10,11,000,\dots\}$

169: be the set of

170: finite binary strings where $\Lambda$ is the string of length 0.

171: A set of strings $\set{Y}\subset \set{X}$ with the property that no string in

172: $\set{Y}$ is a prefix of another is called an instantaneous code.

173:  A prefix computer is a partial recursive

174: function

175: $C: \set{Y}\times \set{X}\to \set{X}$.

176: For each $p\in \set{Y}$ (program string) and for

177: each $d\in \set{X}$ (data string) the output of the computation is either

178: undefined or given by $C(p,d)\in \set{X}$.

179: Kolmogorov complexity

180: of a string $\alpha$ given a data string $d$ relative to a computer

181: $C$ is defined as the length

182: $K_C(\alpha|d)$ of the shortest program that

183: makes $C$ compute $\alpha$ given data~$d$:

184: \begin{equation}

185: K_C(\alpha|d)\equiv\min_{p}\{ |p|\; {\big{|}}\;C(p,d)=\alpha\}\,,

186: \end{equation}

187: where $|p|$ denotes the length of the program $p$ (in bits).

188:

189: Since this complexity measure depends strongly on the

190: reference computer, it is important to find an optimal computer $U$ such

191: that the complexity of any string relative to $U$ is not much higher that

192: the complexity of the same string relative to any other computer $C$.

193: Mathematically, a computer $U$ is called optimal if

194: \begin{equation}

195: \forall C\ \ \exists\kappa_C\ \mbox{such that } \forall \alpha,d:\ \  K_U(\alpha|d)\leq K_C(\alpha|d)

196: +\kappa_C\,,

197: \end{equation}

198:  where $\kappa_C$ is a constant depending

199: on $C$ (and $U$) but not on $\alpha$ or $d$.

200: It turns out that the set of prefix computers contains such a $U$ and,

201: moreover, it can be constructed so that any prefix computer

202: can be simulated by $U$: for further details consult~\cite{LiVitanyi}.

203: Such a $U$ is called a universal prefix computer and its choice is not unique.

204: Using some particular universal prefix computer $U$ as a reference,

205: the conditional Kolmogorov complexity of $\alpha$

206: given $\beta$ is defined as $K_U(\alpha|\beta)$.

207:

208: The above definitions are generalized for the case

209: of many strings as follows. We choose and fix a particular recursive bijection

210: $B: \set{X}\times \set{X}\to \set{X}$ for use throughout the rest of this paper.

211:  Let $\{\alpha^i\}_{i=1}^{n}$

212: be a set of $n$ strings $\alpha^i\in \set{X}$.

213: For $2\leq k\leq n\;$ we define

214: ${\langle\alpha^1,\alpha^2,\dots,\alpha^k\rangle}\equiv

215: B({\langle\alpha^1,\dots,\alpha^{k-1}\rangle},\alpha^k)$,

216: and ${\langle\alpha^1\rangle}\equiv\alpha^1$.

217: We can now define $K_U(\alpha^1,\dots,\alpha^n| \beta^1,\dots,\beta^k)

218: \equiv K_U({\langle \alpha^1,\dots,\alpha^n\rangle }|{\langle\beta^1,

219: \dots,\beta^k\rangle})$.

220:

221: For any two universal prefix computers $U_1$ and $U_2$ we have, by

222: definition, $|K_{U_1}(\alpha|\beta) -K_{U_2}(\alpha|\beta)|

223: \leq \kappa(U_1,U_2)$

224: where $\kappa(U_1,U_2)$ is a constant that depends only on $U_1$

225: and $U_2$ and not on $\alpha$ or $\beta$. Most of the research on

226: Kolmogorov complexity is focused on the asymptotic case of

227: nearly random long strings, when $\kappa(U_1,U_2)$ can

228: be neglected in comparison to the value of the complexity.

229: In such cases, Kolmogorov

230: complexity becomes an asymptotically absolute measure of the

231: complexity of individual strings. For this reason,

232: many fundamental properties of Kolmogorov complexity are established

233: up to an error term which is asymptotically small compared to the

234: complexity of strings involved. For instance,  the standard

235: analysis of the prefix Kolmogorov

236: complexity~(\cite{LiVitanyi}, Section 3.9.2)

237: gives

238: \begin{equation}            \label{ErrorDelta}

239: K_U(\alpha,\gamma|\beta)=K_U(\alpha|\gamma,\beta)

240:                                                        +K_U(\gamma|\beta)+\Delta\,,

241: \end{equation}

242: where $\Delta$ is an error term which grows logarithmically

243: with the complexity of the considered strings. This is an example

244: of an asymptotic property that all Kolmogorov measures of complexity

245: have irrespective of the choice of the reference computer.

246: Of course, it is important to know that all Kolmogorov measures

247: of complexity share many of their asymptotic properties.

248: For any given reference computer, however,

249: Kolmogorov complexity is a well defined function on all binary strings.

250: Even from a purely mathematical viewpoint it is interesting

251: to study the properties of such functions beyond the asymptotics.

252: As for the applied viewpoint, consider, by analogy, mathematical

253: analysis. This theory would be much less useful if we studied

254: only asymptotic properties of functions.

255:

256:

257:

258:

259:

260:

261: \section{Main arguments}\label{Main}

262:

263: Without significant knowledge about the reference computer,

264: Kolmogorov complexity can be considered only up to an additive

265: error term $O(1)$.

266: Error terms even as small as $O(1)$ make it impossible

267: to use Occam's razor to discriminate between simple

268: hypotheses. The importance of this problem becomes

269: apparent once we recognize that the domain of simple hypotheses

270: is absolutely crucial in our every-day life as well as in fundamental

271: science. Indeed, it is often the case that, after extensive analysis,

272: the greatest scientific discoveries can be expressed in a form so simple

273: that they are readily understood by even school children.

274:

275: Humans can relatively easily discriminate between different hypotheses

276: even when the Kolmogorov complexities involved are rather small.

277: This gives them an enormous advantage over the present-day

278: theoretical models. A good example is Kepler's theory of planetary motion.

279: In what was

280: a major breakthrough in theoretical astronomy at the time,

281: Kepler introduced elliptical orbits as a better alternative to the complicated

282: Copernican planetary model of superimposed epicycles.

283: At the level of accuracy provided by Brahe's experiments, the original

284: Copernican model had to be refined by introducing additional

285: epicycles: the Keplerian theory appeared to be simpler

286: and therefore better by Occam's razor. This apparently obvious

287: fact cannot be established using the standard formalism

288: of Kolmogorov complexity: whereas Kepler's theory can be simpler

289: relative to some type of computers, the Copernican model can be

290: simpler relative to some other type of reference computers.

291:

292: Much simpler examples can be found in tests that are

293: designed by humans to test their own intelligence.

294: A typical problem in such tests is to find the

295: next element in a sequence of symbols. For example,

296: if the first four elements of a sequence are 1,2,3,4

297: an intelligent person is supposed to see the simplest

298: pattern and predict 5 as the next element of the sequence.

299: As in the previous example, all humans would agree that

300: predicting 5 would correspond to the choice of the simplest

301: hypothesis, whereas the standard formalism of Kolmogorov

302: complexity cannot be used to justify this.

303: It seems entirely plausible that

304: the ultimate theory of artificial intelligence and,

305: in particular, inductive

306: inference, can achieve human-like results only if the

307: building blocks of the theory, such as Kolmogorov complexity,

308: are made sensitive to small variations in the complexity of hypothesis.

309:

310:

311: The $O(1)$ ambiguity in the classical definition of Kolmogorov complexity

312: and the error terms like $\Delta$ in Eq.~(\ref{ErrorDelta})

313: is the price we pay for having an unrestricted class of reference computers.

314: Every human perceives complexity with respect to their own

315: built-in reference computer -- the brain.

316: As in the case of abstract reference computers,

317: human brains are not identical. However, they are similar enough to

318: allow for a sharper discrimination between individual theories

319: on the basis of their complexity. This suggests that further progress

320: in applications of Kolmogorov complexity to the theory of induction

321: can be made possible if we find a natural way

322: of restricting the class of reference computers.

323:

324:

325: We see from this discussion that some restrictions on the

326: class of reference computers are needed.

327: It is desirable, however, to have a complexity

328: theory which would be as general as possible. As a compromise,

329: we can try to group all possible reference computers into restricted

330: classes. Although we may want to study all such classes,

331: we can argue that due to biological, technological, and

332: other limitations only one class of reference computers is

333: physically available to us.

334: A definition of this realistic class of reference computers would

335: be the crucial link between the abstract theory of

336: Kolmogorov complexity and the practical theories of induction and

337: computer learning.

338:

339: What kind of restriction of the class of reference computers can be

340: seen as natural? It appears natural to assume that given some particular

341: level of technology one can build more powerful computers only at the

342: expense of a more complex internal design.

343: In section~\ref{Example} we use this observation

344: to construct an example of a ``natural''

345: restriction of the class of reference computers.

346: Roughly speaking, this restriction entails

347: the requirement that switching to a more complex reference

348: computer should always be accompanied by an

349: equivalent reduction of program lengths.

350: Using some particular universal computer $U$

351: as a reference, we define the complexity of a computer $W_s$

352: from the set $\{W_i\}$ given data $d$ as $K_U(s|d)$.

353: We then construct a particular set of computers $\{W_i\}$

354: such that the sum of the complexity of a computer and the length

355: of a program for it is the same for all

356: equivalent\footnote{two programs $p_1$ and $p_2$ for computers

357: $C_1$ and $C_2$ are called equivalent iff $C_1(p_1|d)=C_2(p_2|d)$.}

358: programs and for all

359: computers in the set $\{W_i\}$

360: (consult section~\ref{Example} for details).

361: This gives us a tradeoff between computer complexity and

362: program lengths similar to what one would expect in the

363: real world where we face various practical limitations.

364: Together with the original reference computer $U$

365: computers $\{W_i\}$ form a ``naturally'' restricted class.

366:  It is natural to define a

367: computer $W$ which is universal for this class by setting $W(p,\langle

368: s,d\rangle)=W_s(p,d)$, where $U$ is included by defining

369: $W_\Lambda\equiv U$.

370: Using any such $W$ as a reference we can see that, in principle,

371: even error terms logarithmic in complexity can

372: be removed from the standard complexity calculus. In particular,

373: we prove that for any triple of simple strings $\alpha,\beta,\gamma$,

374: we have

375:  \begin{equation}                       \label{Kw}

376: K_W(\alpha,\gamma|\langle\Lambda,\beta\rangle)=

377:      K_W(\alpha|\gamma,\beta)

378:    +K_W(\gamma|\langle\Lambda,\beta\rangle)+{\mbox{\rm const}}\,,

379: \end{equation}

380: where the constant depends only on the reference machine $W$

381: (not on $\alpha$, $\beta$ or~$\gamma$). Apart from subtleties

382: associated with the operation of combining strings into pairs, this

383: is analogous to Eq.~(\ref{ErrorDelta}) with the important difference

384: that the error term is replaced by a constant.

385:

386: In the standard complexity calculus the above equation holds only up to

387: an error term which grows logarithmically with the complexity

388: of the considered strings. As we explained earlier, this is unacceptable

389: if we want to analyze the complexity of simple strings.

390: The error terms

391: are especially troublesome if we want to use the complexity calculus

392: as a part of inductive inference based on the MDL principle.

393: In such cases we are interested in the {\it position}

394: of the minimum rather than on the approximated value of complexity.

395: The error term can significantly shift the position of the minimum

396: even when mistakes on the value of complexity are minor. This can

397: introduce uncontrollable mistakes in the inference results.

398: In our case, however, equation~(\ref{Kw}) is exact in the sense

399: that the constant does not influence the position of critical points

400: so it can be safely ignored in applications such as induction by the

401: MDL principle.

402:

403:

404: \section{Example} \label{Example}

405:

406:

407: As we explained in section~\ref{Main}, a natural restriction of the class

408: of reference computers can make Kolmogorov complexity more

409: useful in applications such as inference and computer learning.

410: In this section we consider one possible way of making such a restriction.

411: We show that, in the important case of simple strings,

412: the proposed restriction effectively removes the

413: error term in Eq.~(\ref{ErrorDelta}),

414: which has important applications in physics~\cite{Soklakov00}.

415:

416: \begin{definition}

417: Fix $\delta\in\set{N}$. A set of strings $\set{S}_\delta\subseteq \set{X}$

418: is called $\delta$-simple iff for any two strings $\alpha,\gamma \in \set{S}_\delta$

419: we have

420: \begin{equation}

421:         |\alpha|<\delta\,,\ \ \ |\gamma|<\delta\,,

422:         {\rm\ \ \  and\ \ \ }|\langle\alpha,\gamma\rangle|<\delta\,,

423: \end{equation}

424: where $|\cdot|$ denotes the string length.

425: \end{definition}

426:

427: Following Chaitin \cite{Chaitin75}, consider a list of infinitely many

428: requirements ${\langle r_k,l_k(d)\rangle}$ $(k=0,1,2,\dots)$ for the

429: construction of a computer. Each requirement

430: ${\langle r_k,l_k(d)\rangle}$ requests that a program

431: of length $l_k(d)$ be assigned to the result $r_k$ if the computer is given

432: data $d$.  The requirements are said to satisfy the Kraft inequality

433: if $\sum_{k}2^{-l_k(d)}\leq 1$: for such requirements there exists an

434: instantaneous code characterized by the set of string lengths $\{l_k(d)\}$.

435: A computer $C$ is said to satisfy the requirements if there are precisely as

436: many programs $p$ of length $l(d)$ such that $C(p,d)=r$ as there are pairs

437: ${\langle r,l(d)\rangle}$ in the list of requirements.

438:

439: Fix a universal computer $U$ which can be constructed from an effectively

440: given list of requirements (consult~\cite{Chaitin75}, Theorem 3.2).

441: Consider the set of all programs $\{p_k\}$ for $U$

442: such that the output of computation $U(p_k,d)$ is defined.

443: Since $B$ is a bijection, we can write $U(p_k,d)={\langle r_k,s_k\rangle}$,

444: where $r_k$ and $s_k$ are strings from $\set{X}$.

445: Moreover, because $U$ is a universal computer, any pair of strings

446: ${\langle \alpha,\gamma\rangle}$ can be generated this way.

447: In what follows we consider only those $p_k$ for which $s_k\neq\Lambda$.

448: For every fixed $s$ from

449: the set $\{s_k\}$ we construct a list of requirements

450: \begin{equation}                                                                         \label{requirements}

451:   {\langle r_k,|p_k|-K_U(s|d)+\kappa^s_d\rangle }\,,\ k=1,2,\dots

452: \end{equation}

453: where $|p_k|$ is the length of the program $p_k$, and $\kappa^s_d$ is some

454: constant.

455: It was shown~(\cite{Chaitin75}, Theorem 3.8)

456: that the constant $\kappa^s_d$ can be chosen large enough

457: such that these requirements satisfy the Kraft inequality.

458: Fix any $\delta\in\set{N}$, and consider

459: a sublist of requirements~(\ref{requirements}):

460: \begin{equation}

461: {\langle r_k,|p_k|-K_U(s|d)+\kappa^s_d\rangle}

462: \ \ \ r_k,\, d\in \set{S}_\delta\,,

463: \end{equation}

464: where $\set{S}_\delta$ is the set of $\delta$-simple strings.

465: For any $s\in\set{S}_\delta$, we can find

466: $\kappa\equiv\max\{\kappa^s_d|\,s,d\in \set{S}_\delta\}$,

467: then choose $\kappa^s_d=\kappa$, and construct a new list

468: of requirements

469: \begin{equation}                                                                      \label{requirements2}

470: {\langle r_k,|p_k|-K_U(s|d)+\kappa\rangle}

471: \ \ \ r_k,\, d\in \set{S}_\delta\,.

472: \end{equation}

473: For any fixed $s\in\set{S}_\delta$ these requirements satisfy

474: the Kraft inequality

475: by construction. Furthermore, since  $\set{S}_\delta$ is finite and

476: $B$ is recursive these requirements can be effectively given.

477: This means that for any $s\in\set{S}_\delta$ there

478: is a computer $W_s$ that satisfies these requirements:

479: consult (\cite{Chaitin75}, Theorem 3.2) for further details.

480:

481: For each value of $s\in\set{S}_\delta\setminus\{\Lambda\}$ we use

482: (\ref{requirements2}) to construct

483: one $W_s$. We define $W_\Lambda=U$, and form the set

484: $\set{W}_U\equiv\{W_s |\,s\in \set{S}_\delta\}$.

485: This set contains the original computer $U$ as a somewhat special

486: element. Having the computer $U$ at our disposal, it would take at least

487: $K_U(s|d)$ bits to specify any other $W_s$ from the set $\set{W}_U$

488: given data $d$. We can now see that requirements~(\ref{requirements2})

489: are designed

490: in such a way that more complex computers, i.e. larger $K_U(s|d)$,

491: will have shorter programs,

492: $l_k(d)= |p_k|-K_U(s|d)+\kappa$.

493: % so that the sum of the program length

494: %$l_k(d)$ and $K_U(s|d)$ is the same for all $W_s$.

495: This is exactly the property that we wanted to use as a

496: natural restriction that defines a realistic class of computers.

497:

498: In what follows we restrict our attention

499: to the set $\set{W}_U$. We define a computer $W$

500: which is universal for the set $\set{W}_U$, i.e. which is designed

501: to simulate any computer $W_s\in \set{W}_U$:

502: \begin{equation}

503:   W(p,{\langle s,d\rangle})\equiv W_s(p,d)\,.

504: \end{equation}

505:

506: \begin{theorem}

507: For any $\alpha,d \in \set{S}_\delta$, and for any

508: $\gamma \in \set{S}_\delta\setminus \{\Lambda\}$, we have

509: \begin{equation}                                                                                  \label{KwKu}

510: K_{W}(\alpha|\gamma,d)

511:    =K_W(\alpha,\gamma|\langle\Lambda, d\rangle)

512:       -K_W(\gamma|\langle\Lambda, d\rangle) + \kappa\,.

513: \end{equation}

514: \end{theorem}

515:

516: {\bf Proof}\\

517: Consider the program

518: $\tilde{p}_k$ which causes $W_s\in\set{W}_U$

519: to produce the result $r_k\in\set{S}_\delta$

520: given data~$d$

521: \begin{equation}                                                                                          \label{r}

522:  W_s(\tilde{p}_k,d)=r_k\,.

523: \end{equation}

524: By definition of $W_s$, the length of $\tilde{p}_k$ satisfies the

525: requirement

526: \begin{equation}                                                                                     \label{pAbs}

527: \forall s\in\set{S}_\delta\setminus \{\Lambda\} {\rm\ \, and\ }

528:  \forall d\in\set{S}_\delta: \ \ \

529: |\tilde{p}_k|=|p_k|-K_U(s|d)+\kappa\,,

530: \end{equation}

531: where $p_k$ is the program for $U$ such that

532: \begin{equation}                                                                                               \label{rs}

533: U(p_k,d)={\langle r_k,s_k\rangle }\,,\ \ s_k\neq\Lambda\,.

534: \end{equation}

535: We define the set $\set{K}\equiv\{ i |\, U(p_{i},d)={\langle r_k,s_k\rangle}\}$,

536: which can contain more than one element since some

537: of the pairs $\{{\langle r_k,s_k\rangle}\}$ can coincide.

538:  From the construction of

539: $W_s$ we note that requirements

540: (\ref{requirements2}) associate exactly one program $\tilde{p}_k$

541: with the corresponding program $p_k$. In other words there is

542: a one-to-one correspondence between programs

543: $\tilde{p}_k$ and $p_k$ (which is given explicitly by the index $k$).

544: This means that the set

545: $\set{K}$ coincides with the set $\tilde{\set{K}}

546: \equiv\{ i |\, W_s(\tilde{p}_{i},d)=r_k\} $.

547: Since $U$, $d$ and $s$ are fixed, and using the identity

548: $\set{K}=\tilde{\set{K}}$, we have from Eq.~(\ref{pAbs})

549: \begin{equation}                                                                    \label{ShortestLengths}

550: \min_{k\in\tilde{\set{K}}}|\tilde{p}_k|

551:  =\min_{k\in{\set{K}}}|p_k|-K_U(s|d)+\kappa\,,

552: \ \ s\in \set{S}_\delta\setminus \{\Lambda\}\,.

553: \end{equation}

554: By definition of $W$ we have

555: \begin{equation}

556: W(\tilde{p}_k,{\langle s,d \rangle })

557: \equiv  W_s(\tilde{p}_k,d)=r_k\,,\ s\neq\Lambda\,.

558: \end{equation}

559: This means, by definition of Kolmogorov complexity, that

560: $K_W(r_k|s,d)=\min_{i\in\tilde{\set{K}}}|\tilde{p}_i|$, $s\neq\Lambda$.

561: Similarly from Eq.~(\ref{rs}),

562: we have $K_U(r_k,s_k|d)=\min_{i\in{\set{K}}}|p_i|$

563: and therefore Eq.~(\ref{ShortestLengths}) becomes

564: \begin{equation} \label{AlmostThere}

565: K_W(r_k|s,d)=K_U(r_k,s_k|d)-K_U(s|d)+\kappa\,.

566: \end{equation}

567: Because $W(p,\langle\Lambda,d\rangle)=U(p,d)$ we have, for instance,

568: $K_U(s|d)=K_W(s,\langle\Lambda,d\rangle)$. Using this observation

569: to transform both terms at the right hand side of Eq.~(\ref{AlmostThere}), and

570: choosing $s=s_k$ we have Eq.~(\ref{KwKu}) as required. $\Box$

571:

572: Note that, since $U$ is an arbitrary prefix computer,

573: the above analysis provides a grouping of all

574: possible reference computers into naturally restricted classes.

575:

576:

577: \section{Acknowledgments}

578:

579: It is my pleasure to acknowledge many helpful suggestions by

580: Jens G. Jensen, A.S. Johnson and Yuri Kalnishkan.

581:

582: \thebibliography{References}

583:

584: \bibitem{Bennett_1982} C.H. Bennett, Thermodynamics of computation --

585:                                                a review,

586:                                                IBM Int.\ J.\ Theor.\ Phys.\ {\bf 21}

587:                                                (1982)

588:                                                905-940.

589:

590: \bibitem{Bennett_1987} C.H. Bennett, Demons, engines and the second

591:                                                law, Sci. American (Nov. 1987) 108-116.

592:

593:

594: \bibitem{Brudno_1978} A.A. Brudno, The complexity of the trajectories

595:                                              of a dynamical system, Russ.\ Math.\ Surv.\ {\bf 33}

596:                                               (1978) 197-198.

597:

598: \bibitem{Brudno_1982}  A.A. Brudno, Entropy and the complexity of the

599:                                                trajectories of a dynamical system, Trans. Moscow

600:                                                Math.\ Soc.\ (1983) 127-151;

601:                                                and references therein.

602:

603:

604: \bibitem{Chaitin75} G.J. Chaitin, A theory of program size formally identical

605:                                        to information theory, J. ACM {\bf 22} (1975) 329-340.

606:

607:

608: \bibitem{Dzhunushaliev} V.D. Dzhunushaliev, Kolmogorov's algorithmic

609:                                complexity and its probability interpretation in quantum

610:                                gravity, Class.\ Quantum Grav. {\bf 15} (1998) 603-612.

611:

612:

613: \bibitem{Ford_1983} J. Ford, How random is a coin toss, Physics Today,

614:                                          {\bf 36} 40-47.

615:

616:

617: \bibitem{LiVitanyi} M. Li and P. Vit\'anyi, An introduction

618:                     to Kolmogorov Complexity and Its Applications

619:                     (Springer-Verlag New York, ed.\ 2, 1997) and references therein.

620:

621: \bibitem{Rissanen_1978}  J. Rissanen, Modeling by shortest

622:                                                  data description,

623:                                                  Automatica {\bf 14} (1978), 465-471.

624:

625: \bibitem{Rissanen_1997} J. Rissanen, Stochastic complexity in learning,

626:                                                 J.\ Comput.\ Sys.\ Sci.\ {\bf 55} (1997) 89-95.

627:

628:

629: \bibitem{SchackCaves_1992} R. Schack and C.M. Caves, Information

630:                                                         and entropy in the baker's map,

631:                                                         Phys.\ Rev.\ Lett.\ {\bf 69} (1992) 3413-3416;

632:                                                         and references therein.

633:

634: \bibitem{SoklakovSchack} A.N. Soklakov and R. Schack,

635:                   Preparation information and optimal decompositions

636:                   for mixed quantum states, J.\ Mod.\ Optics {\bf 47}

637:                   (2000) 2265-2276.

638:

639: \bibitem{Soklakov00} A.N. Soklakov, Occam's razor as a formal basis for

640:                                           a physical theory, Found.\ Phys.\ Lett.\ to appear;

641:                                           also available as arXiv:math-ph/0009007.

642:

643:

644: \bibitem{Zurek_1989} W.H. Zurek, Algorithmic randomness and physical

645:                                            entropy, Phys.\ Rev.\ A {\bf 40} 4731-4751

646:                                            and references therein.

647:

648: \end{document}

649: