0504:cs0504006/cs0504006

1: \documentclass[12pt]{article}

2:

3: \newif\ifnew

4: \newtrue

5: \newcommand{\old}[1]{\ifnew\else #1\fi}

6: \newcommand{\new}[1]{\ifnew #1\fi}

7:

8: \begin{document}

9:

10: \title{       \new{Using Information Theory Approach to  Randomness Testing}

11:  \footnote {

12: \new{ The authors were  supported by INTAS

13:      grant no. 00-738 and Russian Foundation for Basic Research under Grant no. 03-01-00495.}

14:  } }

15: \author{ B. Ya. Ryabko and V.A. Monarev

16: }

17: \date{}

18: \maketitle

19:

20:

21: \begin{abstract}

22: We address the problem of detecting deviations of binary sequence

23: from randomness,which is very important for random number (RNG)

24: and pseudorandom number generators (PRNG). Namely, we consider a

25: null hypothesis $H_0$ that a given bit sequence is generated by

26: Bernoulli source with equal probabilities of 0 and 1 and the

27: alternative hypothesis $H_1$ that the sequence is generated by a

28: stationary and ergodic source which differs from the source under

29: $H_0$. We show that data compression methods can be used as a

30: basis for such testing and describe two new tests for randomness,

31: which are based on ideas of universal coding. Known statistical

32: tests and suggested ones are applied for testing  PRNGs. Those

33: experiments show that the power of the new tests is greater than

34: of many known algorithms.

35:

36: \end{abstract}

37:

38: \textbf{Keywords:} { \it  Hypothesis testing, Randomness testing,

39: Random number testing, Universal code, Information Theory, Random

40: number generator, Shannon entropy. }

41: %\end{keywords}

42: \newpage

43:

44: \section{Introduction }

45:

46: The randomness testing of random number and pseudorandom number

47: generators is used for many purposes including cryptographic,

48: modeling and simulation applications; see, for example, Knuth,

49: 1981; L'Ecuyer, 1994; Maurer,1992; Menezes A. and others, 1996.

50: For such applications a required bit sequence should be true

51: random, i.e., by definition, such a sequence could be interpreted

52: as the result of the flips of a "fair" coin with sides that are

53: labeled "0" and "1" (for short, it is called a random sequence;

54: see Rukhin and others, 2001). More formally, we will consider the

55: main hypothesis $H_0$ that a bit sequence is generated by the

56: Bernoulli source with equal probabilities of 0's and 1's.

57: Associated with this null hypothesis is the alternative

58: hypothesis $H_1$ that the sequence is generated by a stationary

59: and ergodic source which generates letters from $\{0,1\}$ and

60: differs from the source under $H_0$.

61:

62:

63: In this paper we will consider some tests which are based on

64: results and ideas of Information Theory and, in particular, the

65: source coding theory. First, we  show that a universal code can be

66: used for randomness testing. (Let us recall that, by definition,

67: the universal code can compress a sequence asymptotically till the

68: Shannon entropy per letter when the sequence is generated by a

69: stationary and ergodic source). If we take into account that the

70: Shannon per-bit entropy is maximal (1 bit) if $H_0$ is true and is

71: less than 1 if $H_1$ is true (Billingsley, 1965; Gallager, 1968),

72: we see that it is natural to use this property and universal codes

73: for randomness testing because, in principle,  such a test can

74: distinguish each deviation from randomness, which can be described

75: in a framework of the stationary and ergodic source model. Loosely

76: speaking, the test rejects $H_0$ if a binary sequence can be

77: compressed by a considered universal code (or a data compression

78: method.)

79:

80: It should be noted that the idea to use the compressibility as a

81: measure of randomness has a long history in mathematics. The point

82: is that, on the one hand, the problem of randomness testing is

83: quite important for practice, but, on the other hand, this problem

84: is closely connected with such deep theoretical issues as the

85: definition of randomness, the logical  basis of probability

86: theory, randomness and complexity, etc; see Kolmogorov, 1965; Li

87: and Vitanyi, 1997; Knuth, 1981; Maurer,1992. Thus, Kolmogorov

88: suggested to define the randomness of a sequence, informally, as

89: the length of the shortest program, which can create the sequence

90: (if one of the universal Turing machines is used as a computer).

91: So, loosely speaking, the randomness (or Kolmogorov complexity) of

92: the finite sequence is equal to its shortest description. It is

93: known that the Kolmogorov complexity is not computable and,

94: therefore, cannot be used for randomness testing. On the other

95: hand, each lossless data compression code can be considered as a

96: method for upper bounding the Kolmogorov complexity. Indeed, if

97: $x$ is a binary word, $\phi$ is a data compression code  and

98: $\phi(x)$ is the codeword of $x$, then the length of the codeword

99: $|\phi(x)|$ is the upper bound for the Kolmogorov complexity of

100: the word $x$. So, again we see that the codeword length of the

101: lossless data compression method can be used for randomness

102: testing.

103:

104: In this paper we suggest tests for randomness, which are based on

105: results and ideas of the source coding theory.

106:

107: Firstly, we show how to build a test basing on any data

108: compression method and give some examples of application of such

109: test to PRNG's testing. It should be noted that data compression

110: methods were considered as a basis for randomness testing in

111: literature. For example, Maurer's Universal Statistical Test,

112: Lempel-Ziv Compression Test and Approximate Entropy Test are

113: connected with universal codes and are quite popular in practice,

114: see, for example, Rukhin  and others, 2001. In contrast to known

115: methods, the suggested approach gives a possibility to make a test

116: for randomness, basing on any lossless data compression method

117: even if a distribution law of the codeword lengths is not known.

118:

119: Secondly, we describe two new tests, conceptually connected with

120: universal codes. When both tests are applied, a tested sequence

121: $x_1 x_2 ... x_n$ is divided into subwords  $x_1 x_2 ... x_s,$

122: $\:x_{s+1} x_{s+2} ... x_{2s},\: \ldots ,\,$ $s\geq 1,$ and the

123: hypothesis $H^*_0$ that the subwords obey the uniform distribution

124: (i.e. each subword is generated with the probability $2^{-s}$) is

125: tested against $H^*_1 =\neg H^*_0$. The key idea of the new tests

126: is as follows. All subwords from the set $ \{0,1\}^s $ are ordered

127: and this order changes after processing each subword $\:x_{j s+1}

128: x_{j s+2} ... x_{(j+1)s}, \, j= 0,1, \ldots $ in such a way that,

129: loosely speaking,  the more frequent subwords have small ordinals.

130: When the new tests are applied, the frequency of different

131: ordinals are estimated (instead of frequencies of the subwords as

132: for, say, chi- square test).

133:

134: The natural question is how to choose the block length $s$ in such

135: schemes. We show that, informally speaking, the block length $s$

136: should be taken quite large  due to the existence of  so called

137: {\it two-faced processes}. More precisely, it is shown that for

138: each integer $s^*$ there exists such a process $\xi$ that for each

139: binary word $u$ the process $\xi$ creates $u$ with the probability

140: $2^{-|u|}$ if the length of the $u$  ($|u|$) is less than or equal

141: to  $s^*$, but, on the other hand,  the probability distribution

142: $\xi(v)$ is very far from uniform if the length of the words $v$

143: is greater than $s^*.$ (So, if we use a test with the block length

144: $s \leq s^*,$ the sequences generated by $\xi$ will look like

145: random, in spite of $\xi$ is far from being random.)

146:

147:

148: The outline of the paper is as follows. In Section 2 the general

149: method for construction randomness testing algorithms basing on

150: lossless data compressors is described. Two new tests for

151: randomness, which are based on constructions of universal coding,

152: as well as the two-faced processes, are described in the Section

153: 3. In Section 4 the new tests are experimentally  compared with

154: methods from  " A statistical test suite for random and

155: pseudorandom number generators for cryptographic applications",

156: which was recently suggested by Rukhin  and others, 2001. It turns

157: out that the new tests are more powerful than known ones.

158: \section{Data compression methods as a basis for randomness testing }

159:

160: \textbf{2.1. Randomness testing based on data compression}

161:

162: Let $A$ be a finite alphabet and $A^n$ be the set of all  words of

163: the length $n$ over $A$, where $n$ is an integer. By definition,

164: $A^* =\bigcup_{n=1}^\infty A^n $ and $A^\infty$ is the set of all

165: infinite words $x_1x_2 \ldots $ over the alphabet $A$. A data

166: compression method (or code) $\varphi$ is defined as a set of

167: mappings $\varphi_n $ such that $\varphi_n : A^n \rightarrow \{

168: 0,1 \}^*,\, n= 1,2, \ldots\, $ and for each pair of different

169: words $x,y \in A^n \:$ $\varphi_n(x) \neq \varphi_n(y) .$

170: Informally, it means that the code $\varphi$ can be applied for

171: compression of each message of any length $n, n

172: > 0 $ over alphabet $A$ and the message can be decoded if

173: its code is known.

174:

175: Now we can describe a statistical test which can be constructed

176: basing on any code $\varphi$.  Let $n$ be an integer and

177: $\hat{H}_0$ be a hypothesis that the words from the set $ A^n $

178: obey the uniform distribution, i.e., $p(u)= |A|^{-n}\, $ for each

179: $ \, u \in \{0,1\}^n .$ (Here and below $|x|$ is the length if $x$

180: is a word, and the number of elements if $x$ is a set.)  Let a

181: required  level of significance (or a Type I error) be $\alpha ,\,

182: \alpha \in (0,1).$ The following  main idea of a suggested test is

183: quite natural: The well compressed words should be considered as

184: non- random and $\hat{H}_0$ should be rejected. More exactly, we

185: define a critical value of the suggested test by

186: \begin{equation}\label{cr}

187: t_\alpha = n \log |A| - \log (1/ \alpha) - 1\,.

188: \end{equation}

189: (Here and below $\log x = \log_2 x$.)

190:

191: Let $u$ be a word from $A^n$. By definition, the hypothesis

192: $\hat{H}_0$  is accepted if $ |\varphi_n (u) | > t_\alpha $ and

193: rejected, if $ |\varphi_n (u) | \leq t_\alpha .$  We denote this

194: test by $\Gamma_{\alpha,\,\varphi}^{(n)}.$

195:

196: \textbf{Theorem 1.} { \it For each integer $n$ and  a code

197: $\varphi$, the Type I error of the described test

198: $\Gamma_{\alpha,\,\varphi}^{(n)}$ is not larger than $\alpha .$ }

199:

200: \emph{Proof} is given in Appendix.

201:

202: \textbf{ Comment 1}. The described test can be modified in such a

203: way that the Type I error will be equal to $\alpha.$ For this

204: purpose we define the set $A_\gamma$ by $$ A_\gamma = \{x: x \in

205: A^n\: \: \& \:\;|\varphi_n(x)| = \gamma \} $$ and an integer $g$

206: for which the two following inequalities are valid:

207: \begin{equation}\label{s}  \sum_{j=0}^g

208: |A_j|\: \leq\,  \alpha |A|^n\, <\, \sum_{j=0}^{g+1} |A_j| \,.

209: \end{equation} Now the modified test can be described as

210: follows:

211:

212: If for $x \in A^n\;\; |\varphi_n(x)| \leq g\:\; $ then $\hat{H}_0$

213: is rejected, if $|\varphi_n(x)| >  (g+1) \:$ then $\hat{H}_0$ is

214: accepted and if $|\varphi_n(x)| =  (g+1) \:$ the hypothesis

215: $\hat{H}_0$ is accepted with the probability $$ (\sum_{j=1}^{g+1}

216: |A_j|\, - \, \alpha |A|^n\,) / |A_{g+1}| $$ and rejected with the

217: probability $$ 1\:-\, (\sum_{j=1}^{g+1} |A_j|\, - \, \alpha

218: |A|^n\,) / |A_{g+1}| \,.$$ (Here we used a randomized criterion,

219: see for definition, for example, Kendall and  Stuart, 1961, part

220: 22.11.) We denote this test by

221: $\Upsilon_{\alpha,\,\varphi}^{(n)}.$

222:

223: \textbf{ Claim 1}. { \it For each integer $n$ and  a code

224: $\varphi$, the Type I error of the described test

225:  $\Upsilon_{\alpha,\,\varphi}^{(n)}$ is equal to $\alpha .$ }

226:

227:

228: \emph{Proof} is given in Appendix.

229:

230: We can see that this criterion has the level of significance (or

231: Type I error) exactly $\alpha,$ whereas the  first criterion,

232: which is  based on critical value (\ref{cr}), has the level of

233: significance that could be less than $\alpha .$ In spite of this

234: drawback, the first criterion may be more useful due to its

235: simplicity. Moreover, such an approach gives a possibility to use

236: a data compression method $\psi$ for testing even in case where

237: the distribution of the length $|\psi_n(x)|, x \in A^n$ is not

238: known.

239:

240: \textbf{ Comment 2.} We have considered  codes, for which

241: different words of the same length have different codewords (In

242: Information Theory sometimes such codes are called non- singular.)

243: Quite often a stronger restriction is required in Information

244: Theory. Namely, it is required that each sequence

245: $\varphi_n(x_1)\varphi_n(x_2) ...\varphi(x_r), r \geq 1,$ of

246: encoded words from the set $A^n, n\geq 1,$ can be uniquely decoded

247: into $x_1x_2 ...x_r$. Such codes are called uniquely decodable.

248: For example, let $A=\{a,b\}$, the code $\psi_1(a) = 0, \psi_1(b) =

249: 00, $  obviously, is non- singular, but is not uniquely decodable.

250: (Indeed, the word $000$ can be decoded in both $ab$ and $ba.$) It

251: is well known in Information Theory that a code $\varphi$ can be

252: uniquely decoded if the following Kraft inequality is valid:

253: \begin{equation}\label{KRAFT}

254: \Sigma_{u \in A^n}\: 2^{- |\varphi_n (u) |} \leq 1\:,

255: \end{equation}

256: see, for ex., Gallager, 1968.

257:

258: If it is known that the code is uniquely decodable, the suggested

259:  critical value (\ref{cr}) can be changed. Let us define \begin{equation}\label{cr2}

260: \hat{t}_\alpha = n \log |A| - \log (1/ \alpha) \,.

261: \end{equation}

262:

263: Let, as before,  $u$ be a word from $A^n$. By definition, the

264: hypothesis $\hat{H}_0$  is accepted if $ |\varphi_n (u) | >

265: \hat{t}_\alpha $ and rejected, if $ |\varphi_n (u) | \leq

266: \hat{t}_\alpha .$ We denote this test by

267: $\hat{\Gamma}_{\alpha,\varphi}^{(n)}.$

268:

269: \textbf{Claim 2.} { \it For each integer $n$ and  a uniquely

270: decodable code $\varphi$, the Type I error of the described test

271: $\hat{\Gamma}_{\alpha,\varphi}^{(n)}$ is not larger than

272: $\alpha.$}

273:

274: \emph{Proof} is given in Appendix.

275:

276:

277: So, we can see from (\ref{cr}) and (\ref{cr2}) that the critical

278: value is larger, if the code is uniquely decodable. On the other

279: hand, the difference is quite small and (\ref{cr}) can be used

280: without a large loose of the test power even in a case of the

281: uniquely decodable codes.

282:

283: It should not be a surprise that the level of significance (or a

284: Type I error) does not depend on the alternative hypothesis $H_1,$

285: but, of course,  the power of a test (and the Type II error) will

286: be  determined by $H_1.$

287:

288: The examples of testing by real data compression methods will be

289: given in Section 4.

290:

291:

292:

293: \textbf{ 2.2. Randomness testing based on universal codes. }

294:

295:

296:   We will consider  the  main

297: hypothesis $H_0$ that the letters of a given sequence $x_1x_2

298: ...x_t, \, x_i \in A,\, $ are independent and identically

299: distributed (i.i.d.) with equal probabilities of all $a \in A $

300: and the alternative hypothesis $H_1$ that the sequence is

301: generated by a stationary and ergodic source, which generates

302: letters from $A$ and differs from the source under $H_0$. (If $A=

303: \{0,1\}$, i.i.d. coincides with Bernoulli source.) The definition

304: of the stationary and ergodic source and the Shannon entropy of

305: such sources can be found in Billingsley, 1965, and Gallager,

306: 1968.

307:

308: We will consider statistical tests, which are based on universal

309: coding and universal prediction. First we define a universal code.

310:

311: By definition,  $\varphi$ is a universal code if for each

312: stationary and ergodic source  (or a process) $\pi$ the following

313: equality is valid with probability 1 (according to the measure

314: $\pi \,) $

315:

316: \begin{equation}\label{un}

317:  \lim_{n \rightarrow \infty}  (|\varphi_n(x_1 ... x_n)|) /

318: n = h(\pi)\,,

319: \end{equation}

320: where  $h(\pi)$ is the Shannon entropy. ( Such codes exist, see

321: Ryabko, 1984.) It is well known in Information Theory that

322: $h(\pi)= \log |A|$ if $H_0$ is true, and $h(\pi)< \log |A|$ if

323: $H_1$ is true, see, for ex., Billingsley, 1965; Gallager, 1968.

324: From this property and (\ref{un}) we can easily yield  the

325: following theorem.

326:

327: \textbf{Theorem 2.} { \it Let $\varphi$ be a universal code,

328: $\alpha \in (0,1)$ be a level of significance and a sequence

329: $x_1x_2 ...x_n, \, n \geq 1, \, $ be generated by a stationary

330: ergodic source $\pi$. If the described above test

331: $\Gamma_{\alpha,\,\varphi}^{(n)}$ is applied for testing $H_0$

332: (against $H_1$), then, with probability 1, the Type I error is not

333: larger than $\alpha$, and the Type II error goes to 0, when

334: $n\rightarrow \infty$. }

335:

336: So, we can see that each good universal code can be used as a

337: basis for randomness testing. But converse proposition is not

338: true. Let, for example, there be a code, whose codeword length is

339: asymptotically equal to $ (0.5+ h(\pi) / 2 ) $ for each source

340: $\pi$ (with probability 1, where, as before, $h(\pi)$ is the

341: Shannon entropy). This code is not good, because its codeword

342: length does not tend to the entropy, but, obviously, such code

343: could be used as a basis for a test of randomness. So, informally

344: speaking, the set of tests is larger than the set of universal

345: codes.

346:

347: Note that the close problems were considered by Bailey (1974), who

348: obtained many important results in this field.

349:

350: \section{Two new tests for randomness and two-faced processes }

351:

352: Firstly, we suggest two tests which are based on ideas of

353: universal coding, but they are described in such a way that can be

354: understood without any knowledge of Information Theory.

355:

356:

357: \textbf{ 3.1. The "book stack" test }

358:

359: Let, as before, there be given an alphabet $A= \{a_1, ... , a_S

360: \},$ a source, which generates letters from $A,$ and two following

361: hypotheses: the source is i.i.d. and $p(a_1)= ....= p(a_S) =

362: 1/S\:$ ($H_0$) and $H_1 = \neg H_0.$ We should test the hypotheses

363: basing on a sample $x_1 x_2 \,... \,x_n,\, n\geq 1\,,\,$ generated

364: by the source. When the "book stack" test is applied, all letters

365: from $A$ are ordered from 1 to $S$ and this order is changed after

366: observing each letter $x_t$ according to the formula

367:

368: \begin{equation}\label{nu}

369: \nu^{t+1}(a)=\cases{1,&if $x_t = a\,$;\cr

370:            \nu^t(a)+1,&if $\nu^t(a) < \nu^t(x_t)$;\cr

371:            \nu^t(a), &if $ \nu^t(a) > \nu^t(x_t)$\, ,}

372: \end{equation}

373: where $\nu^t$ is the order after observing $x_1 x_2 \,... \,x_t,\,

374: t = 1\,,, ...\,, n\,,$ $\nu^1$ is defined arbitrarily. (For ex.,

375: we can define $\nu^1 = \{a_1, ... , a_S \}.$) Let us explain

376: (\ref{nu}) informally. Suppose that the letters of $A$ make a

377: stack, like a stack of books and  $\nu^1(a)$ is a position of $a$

378: in the stack. Let the first letter $x_1$ of the word $x_1 x_2

379: \,... \,x_n$ be $a$. If it takes $i_1-$th position in the stack

380: ($\nu^1(a)= i_1$), then  take $a$ out of the stack and put it on

381: the top. (It means that the order is changed according to

382: (\ref{nu}).) Repeat the procedure with the second letter $x_2$

383: and the stack obtained, etc.

384:

385: It can help to  understand the main idea of the suggested method

386: if we take into account that, if $H_1$ is true, then frequent

387: letters from $A$ (as frequently used books) will have relatively

388: small numbers (will spend more time next to the top of the stack).

389: On the other hand, if $H_0$ is true, the probability to find each

390: letter $x_i$ at each position $j$ is equal to $1/S$.

391:

392: Let us proceed with the description of the test. The set of all

393: indexes  $ \{1, \ldots, S \} $ is divided into $r, r \geq 2,  $

394: subsets $A_1 = \{ 1,2,\ldots, k_1 \}, $ $ A_2 = \{ k_1+1,\ldots,

395: k_2 \}, \ldots , A_r = \{ k_{r-1}+1,\ldots, k_r \}.$ Then, using

396: $x_1 x_2 \,... \,x_n$, we calculate how many $\nu^t(x_t),$ $

397: t=1,..., n,$ belong to a subset $A_k, k=1,..., r$. We define this

398: number as $n_k$ (or, more formally, $n_k = | \{ t : \nu^t(x_t) \in

399: A_k, t=1,\ldots, n \}|,  k=1,..., r .$) Obviously, if $H_0$ is

400: true, the probability of the event $ \nu^t(x_t) \in A_k$ is equal

401: to $ |A_j|/S.$ Then, using a "common" chi- square test we test the

402: hypothesis $\hat{H}_0= P\{  \nu^t(x_t) \in A_k \}= |A_k|/S $

403: basing on the empirical frequencies $n_1,\ldots,n_r$, against

404: $\hat{H}_1= \neg \hat{H}_0.$ Let us recall that the value

405: \begin{equation}\label{x2}

406:  x^2=\sum_{i=1}^{r}\frac{(n_i - n (|A_i|/S ) )^2}{n

407: (|A_i|/S )} \end{equation} is calculated, when  chi- square test

408: is applied, see, for ex., Kendall and  Stuart, 1961. It is known

409: that $x^2$ asymptotically follows the $\chi$-square distribution

410: with $(k-1)$ degrees of freedom ($\chi^2_{k-1}$) if $\hat{H}_0$ is

411: true. If the level of significance (or a Type I error)

412:  of the $\chi^2$

413: test is $\alpha, \alpha \in (0,1), $ the hypothesis $\hat{H}_0$ is

414: accepted when $x^2$ from (\ref{x2}) is less than the

415: \emph{$(1-\alpha)$ -value } of the $\chi^2_{k-1}$ distribution;

416: see, for ex.,  Kendall, Stuart, 1961.

417:

418: We do not describe the exact rule  how to construct the subsets

419: $\{A_1, A_2, $ $  \ldots, $ $  A_r \}$, but we recommend to

420: perform some experiments for finding the parameters, which make

421: the sample size minimal (or, at least, acceptable). The point is

422: that there are many cryptographic and other applications where it

423: is possible to implement some experiments for optimizing the

424: parameter values and, then, to test hypothesis basing on

425: independent data. For example, in case of testing a PRNG it is

426: possible to seek suitable parameters using a part of generated

427: sequence and then to test the PRNG using a new part of the

428: sequence.

429:

430: Let us consider a simple example. Let $A= \{a_1, \ldots , a_6 \},

431: $ $ r=2, A_1= \{a_1,a_2, a_3 \} , A_2=  \{a_4, a_5, a_6 \}, $ $

432: x_1 \ldots x_8 =$ $ a_3 a_6 a_3 a_3 a_6 a_1 a_6 a_1.$ If $\nu_1=

433: 1, 2, 3, 4,$ $ 5,6 ,$ then $\nu_2= 3, 1, 2,  4, 5,6 ,$ $\nu_3= 6,

434: 3, 1, 2, 4, 5 ,$ etc., and  $n_1 = 7, n_2 = 1.$ We can see that

435: the letters $ a_3 $ and $a_6$ are quite frequent and  the "book

436: stack" indicates this nonuniformity quite well. (Indeed, the

437: average values of $n_1$ and $n_2$ equal $4$, whereas the real

438: values are 7 and 1, correspondingly.)

439:

440: Examples of practical applications of this test will be given in

441: Section 4, but here we make two notes. Firstly, we pay attention

442: to the complexity of this algorithm. The "naive" method of

443: transformation according to (\ref{nu}) could take the number of

444: operations proportional to $S,$ but there exist algorithms, which

445: can perform all operations in (\ref{nu}) using $O( \log S )$

446: operations. Such algorithms can be based on AVL- trees, see, for

447: ex., Aho,Hopcroft and Ulman, 1976.

448:

449: The last comment concerns with the name of the method. The "book

450: stack" structure is quite popular in Information Theory and

451: Computer Science. In Information Theory this structure was firstly

452: suggested as a basis of an universal code by Ryabko, 1980, and

453:  was rediscovered by Bently, Sleator,  Tarjan, Wei in 1986, and

454: Elias in 1987 (see also a comment of Ryabko (1987) about a history

455: of this code). In English language literature this code is

456: frequently called as "Move-to-Front" (MTF) scheme  as it was

457: suggested by Bently, Sleator,  Tarjan and  Wei. Now this data

458: structure is used in a caching and many other algorithms in

459: Computer Science under the name "Move-to-Front". It is also worth

460: noting that the book stack was firstly considered by a soviet

461: mathematician M.L. Cetlin as an example of a self- adaptive

462: system in 1960's, see Rozanov, 1971.

463:

464:

465:

466:

467:

468: \textbf{ 3.2. The order test }

469:

470: This test is also based on changing the order $\nu^t(a)$  of

471: alphabet letters but the rule of the order change  differs from

472: (\ref{nu}). To describe the rule we first define $

473: \lambda^{t+1}(a)$ as a count of occurrences of $a$ in the word

474: $x_1\ldots x_{t-1}x_t .$ At each moment $t$ the alphabet letters

475: are ordered according to $\nu^t$ in such a way that, by

476: definition, for each pair of letters $a$ and $b$ $\nu^t(a) \prec

477: \nu^t(b)$ if $\lambda^t(a) \leq \lambda^t(b).$ For example, if $A=

478: \{a_1, a_2, a_3 \}$ and $x_1 x_2 x_3 = a_3 a_2 a_3$,  the possible

479: orders can be as follows: $\nu^1=(1, 2, 3),$ $ \nu^2=(3, 1, 2),$ $

480: \nu^3=(3, 2, 1),$ $ \nu^4=(3, 2,  1).$ In all other respects this

481: method coincides with the book stack. (The set of all indexes  $

482: \{1, \ldots, S \} $ is divided into $r  $ subsets,

483:  etc.)

484:

485: Obviously,  after observing each letter $x_t$ the value

486: $\lambda^t(x_t)$ should be increased and the order $\nu^t$ should

487: be changed.  It is worth noting that there exist a data structure

488: and algorithm, which allow maintaining the alphabet letters

489: ordered in such a way that the number of operations spent is

490: constant, independently of the size of the alphabet. This data

491: structure was described by Moffat, 1999 and Ryabko, Rissanen,

492: 2003.

493:

494: \textbf{ 3.3. Two- faced processes and the choice of the block

495: length for a process testing }

496:

497: There are quite many methods for testing $H_0$ and $H_1$, where

498: the bit stream is divided into words (blocks) of the length $s, s

499: \geq 1,$ and the sequence of the blocks $x_1x_2\ldots x_s$,

500: $x_{s+1}\ldots x_{2s},\ldots $ is considered as letters, where

501: each letter belongs to the alphabet $B_s = \{ 0,1 \}^s $ and has

502: the probability $2^{-s},$  if $H_0$ is true. For instance, both

503: above described tests, methods from Ryabko, Stognienko and Shokin

504: (2003) and quite  many other algorithms belong to this kind. That

505: is why the questions of choosing  the block length $s$ will be

506: considered here.

507:

508: As it was mentioned in the introduction there exist  two-faced

509: processes, which, on the one hand, are far from being truly

510: random, but, on the other hand, they can be distinguished from

511: truly random only in the case when the block length $s$ is large.

512: From the information theoretical point of view the two- faced

513: processes can be simply described as follows. For a two- faced

514: process, which generates letters from  $ \{ 0,1 \}$, the limit

515: Shannon entropy is (much) less than 1 and, on the other hand, the

516: $s-$ order entropy ($h_s$) is maximal $(h_s =1$ bit per letter)

517: for relatively large $s.$

518:

519: We describe two families of two- faced processes $T(k, \pi)$ and

520: $\bar{T}(k, \pi)$, where $k=1,2, \ldots,\,$ and $ \pi \in (0,1)$

521: are parameters. The processes $T(k,\pi)$ and $\bar{T}(k, \pi)$ are

522: Markov chains of the connectivity (memory) $k$, which generate

523: letters from $\{0,1\}$. It is convenient to define them

524: inductively. The process $T(1,\pi)$ is defined by conditional

525: probabilities $P_{T(1, \pi)}(0/0) = \pi, P_{T(1, \pi)}(0/1) =

526: 1-\pi $ (obviously, $P_{T(1, \pi)}(1/0) =1- \pi, P_{T(1,

527: \pi)}(1/1) = \pi $). The process $\bar{T}(1,\pi)$ is defined by

528: $P_{\bar{T}(1, \pi)}(0/0) =1- \pi, P_{\bar{T}(1, \pi)}(0/1) = \pi

529: $. Assume that $T(k, \pi)$ and $\bar{T}(k, \pi)$ are defined and

530: describe $T(k+1, \pi)$ and $\bar{T}(k+1, \pi)$ as follows $$

531: P_{T(k+1, \pi)}(0/0u) = P_{T(k, \pi)}(0/u), P_{T(k+1, \pi)}(1/0u)

532: = P_{T(k, \pi)}(1/u), $$ $$ P_{T(k+1, \pi)}(0/1u) = P_{\bar{T}(k,

533: \pi)}(0/u), P_{T(k+1, \pi)}(1/1u) = P_{\bar{T}(k, \pi)}(1/u) ,$$

534: and, vice versa, $$ P_{\bar{T}(k+1, \pi)}(0/0u) = P_{\bar{T}(k,

535: \pi)}(0/u), P_{\bar{T}(k+1, \pi)}(1/0u) = P_{\bar{T}(k,

536: \pi)}(1/u), $$ $$ P_{\bar{T}(k+1, \pi)}(0/1u) = P_{T(k,

537: \pi)}(0/u), P_{\bar{T}(k+1, \pi)}(1/1u) = P_{T(k, \pi)}(1/u) $$

538: for each $u \in B_k$ (here $vu$ is a concatenation of the words

539: $v$ and $u$). For example, $$ P_{T(2,\pi)}(0/00) = \pi,

540: P_{T(2,\pi)}(0/01) = 1-\pi, P_{T(2,\pi)}(0/10) = 1-\pi,

541: P_{T(2,\pi)}(0/11) = \pi. $$ The following theorem shows that the

542: two-faced processes exist.

543:

544: \textbf{Theorem 3.} { \it  For each $\pi \in (0,1) $ the s-order

545: Shannon entropy ($h_s$) of the processes $T(k, \pi)$ and

546: $\bar{T}(k, \pi)$ equals 1 bit per letter for $s=0,1,\ldots , k$

547: whereas the limit Shannon entropy ($h_\infty $) equals $ - (\pi

548: \log_2 \pi + (1- \pi) \log_2 (1-\pi) ).$ }

549:

550: The proofs of the theorem is given in Appendix, but here we

551: consider examples of "typical" sequences of the processes

552: $T(1,\pi)$ and $\bar{T}(1,\pi)$ for $\pi$, say, 1/5. Examples are:

553: $ 010101101010100101...$ and $ 000011111000111111000.... .$ We can

554: see that each sequence contains approximately one half of 1's and

555: one half of 0's. (That is why the first order Shannon entropy is

556: 1 per a letter.) On the other hand, both sequences do not look

557: like truly random, because they, obviously, have too long

558: subwords like either  $101010 ..$ or $000.. 11111.. .$ (In other

559: words, the second order Shannon entropy is much less than 1 per

560: letter.) Hence, if a randomness test is based on estimation of

561: frequencies of 0's and 1's only, then such a test will not be

562: able to find deviations from randomness.

563:

564: So, if we revert to the question about the block length of tests

565: and take into account the existence of two- faced processes,  it

566: seems that the block length could be taken as large as possible.

567: But it is not so. The following informal consideration could be

568: useful for choosing the block length. The point is that

569: statistical tests can be applied if words from the sequence

570:

571: \begin{equation}\label{s}

572: x_1x_2\ldots x_s, \:x_{s+1}\ldots x_{2s},\ldots,  \:x_{(m-1)s+1}

573: x_{(m-1)s+2}\ldots x_{m s} \end{equation}

574:  are repeated (at least a few times)

575: with high probability (here $m s $ is the sample length).

576: Otherwise, if all words in (\ref{s}) are unique (with high

577: probability) when $H_0$ is true, a sensible test cannot be

578: constructed basing on a division into $s-$letter words. So, the

579: word length $s$ should be chosen in such a way that some words

580: from the sequence (\ref{s}) are repeated with high probability,

581: when $H_0$ is true. So, now our problem can be formulated as

582: follows. There is a binary sequence $x_1x_2\ldots x_n$ generated

583: by the Bernoulli source with $P(x_i=0)= P(x_i=1)= 1/2 $ and we

584: want to find such a block length $s$ that the sequence (\ref{s})

585: with $ m= \lfloor n/s\rfloor, $ contains some repetitions (with

586: high probability). This problem is well known in the probability

587: theory and sometimes called as the birthday problem. Namely, the

588: standard statement of the problem is as follows. There are $S=

589: 2^s$ cells and $m\, (=n/s)$ pellets. Each pellet is put in one of

590: the cells with the probability $1/S$. It is known in Probability

591: Theory that, if $m = c\, \sqrt{ S}, c >0$ then the average number

592: of cells with at least two pellets equals $c^2\, (1/2 + \circ

593: (1)\,),$ where $S$ goes to $\infty \,;$ see Kolchin, Sevast'yanov

594: and Chistyakov, 1976. In our case the number of cells with at

595: least two pellets is equal to the number of the words from the

596: sequence (\ref{s}) which are met two (or more) times. Having into

597: account that $S=2^s, m= n/s,$ we obtain from $m = c\, \sqrt{ S}, c

598: >0$ an informal rule for choosing the length of words

599: in (\ref{s}): \begin{equation}\label{sn} n \asymp s 2^{s/2}

600: \end{equation} where $n$ is the length of a sample $x_1x_2

601: ... x_n,$ $s$ is the block length. If $s$ is much larger, the

602: sequence (\ref{s}) does not have repeated words (in case $H_0$ )

603: and it is difficult to build a a sensible test. On the other hand,

604: if $s$ is much smaller, large classes of the alternative

605: hypotheses cannot be tested (due to existence of the two-faced

606: processes). It is worth noting that it is impossible to have a

607: universal choice of $s,$ because it is impossible to avoid the

608: two- faced phenomenon. In other words this fact can be explained

609: basing on the following known result of Information Theory: it is

610: impossible to have guaranteed rate of code convergence

611: universally for all ergodic sources; see Bailey, 1976, Ryabko,

612: 1984. That is why, it is impossible to choose a universal length

613: $s.$ On the other hand, there are many applications where the

614: word length $s$ can be chosen experimentally. (But, of course,

615: such experiments should be performed on the independent data.)

616:

617:

618:

619: \section{The experiments }

620:

621: In this part we describe some experiments carried out to compare

622: new tests with known ones. We will compare order test, book stack

623: test, tests which are based on standard data compression methods,

624: and tests from Rukhin and others, 2001. The point is that the

625: tests from Rukhin and others are selected basing on comprehensive

626: theoretical and experimental analysis and can be considered as the

627: state-of-the-art in randomness testing. Besides, we will also test

628: the method recently published  by Ryabko, Stognienko, Shokin,

629: (2004), because it was published later than the book of Rukhin

630: and others.

631:

632: We used data generated by the  PRNG  "RANDU" (described in

633: Dudewicz and Ralley, 1981) and random bits from "The Marsaglia

634: Random Number CDROM", see: http://stat.fsu.edu/diehard/cdrom/ ).

635: RANDU is  a linear congruent generators (LCG), which is defined by

636: the following equality $$X_{n+1}=(A \: X_n+C)\: mod\, M \, ,$$

637: where $X_{n}$ is $n$-th generated number. RANDU is defined by

638: parameters $A=2^{16}+3  , C= 0 , M= 2^{31} , X_0 = 1.$ Those kinds

639: of sources of random data were chosen because random bits from

640: "The Marsaglia Random Number CDROM"  are considered as good random

641: numbers, whereas it is known that RANDU is not a good PRNG. It is

642:  known   that the  lowest digits of $X_n$ are "less random" than

643: the leading digits (Knuth, 1981), that is why in our experiments

644: with RANDU we extract an eight-bit word from each generated $X_i$

645: by formula $ \hat{X}_i = \lfloor X_i/2^{23} \rfloor .$

646:

647:

648: The behavior of the tests was investigated for files of different

649: lengths (see the tables below). We generated 100 different files

650: of each length and applied each  mentioned above test to each file

651: with level of significance 0.01 (or less, see below). So, if a

652: test is applied to a truly random bit sequence, on average 1 file

653: from 100 should be rejected. All results are given in the tables,

654: where integers in boxes are the number of rejected files (from

655: 100). If a number of the rejections is not given for a certain

656: length and test, it means that the test cannot be applied for

657: files of such a length.

658:

659: The table 1 contains information about testing of sequences of

660: different lengths generated by RANDU, whereas the table 2 contains

661: results of application of all tests to  5 000 000- bit sequences

662: either generated by RANDU or  taken from "The Marsaglia Random

663: Number CDROM".

664:  For example, the first number of the second row of

665: the table 1 is 56. It means that there were 100 files of the

666: length $5 \: 10^4$ bits generated by PRNG RANDU. When the Order

667: test was applied, the hypothesis $H_0$ was rejected 56 times from

668: 100 (and, correspondingly, $H_0$ was accepted 44 times.) The first

669: number of the third line shows that $H_0$ was rejected 42 times,

670: when the Book stack test was applied to the same 100 files. The

671: third number of the second line shows that the hypothesis $H_0$

672: was rejected 100 times, when the Order test was applied for

673: testing of 100 $ 100000-$bit files generated by RANDU, etc.

674:

675:

676: Let us first give some comments about the tests, which are based

677: on popular data compression methods RAR and ARJ. In those cases we

678: applied each  method to a file and first estimated the length of

679: compressed data. Then we use the test

680: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

681: $\Gamma_{\alpha,\,\varphi}^{(n)}$ with the critical value

682: (\ref{cr}) as follows. The alphabet size $|A|= 2^8 = 256$, $ n

683: \log |A|$ is simply the length of file (in bits) before

684: compression, (whereas $n$ is the length in bytes). So, taking

685: $\alpha = 0.01,$ from (\ref{cr}), we see that  the hypothesis

686: about randomness ($H_0$) should be rejected, if the length of

687: compressed file is less than or equal to $ n \log |A| - 8$ bits.

688: (Strictly speaking, in this case $\alpha \leq 2^{-7} = 1/128.$)

689: So, taking into account that the length of computer files is

690: measured in bytes, this rule is very simple: if the $n-$byte file

691: is really compressed (i.e. the length of the encoded file is

692: $n-1$ bytes or less), this file is not random (and $H_0$ is

693: rejected). So, the following tables contain numbers of cases,

694: where files were really compressed.

695:

696:

697: Let us now give some comments about parameters of the considered

698: methods. As it was mentioned, we investigated all methods from the

699: book of Rukhin and others (2001), the test of Ryabko, Stognienko

700: and Shokin, 2004 (RSS test for short), the described above two

701: tests based on data compression algorithms,  the order tests and

702: the book stack test. For some tests there are parameters, which

703: should be specified. In such cases the values of parameters are

704: given in the table in the row, which  follows the test results.

705: There are some tests from the book of  Rukhin and others, where

706: parameters can be chosen from a certain  interval. In such cases

707: we repeated all calculations three times, taking the minimal

708: possible value of the parameter, the maximal one and the average

709: one. Then the data for the case when the number of rejections of

710: the hypothesis $H_0$ is maximal, is given in the table.

711:

712: The choice of  parameters for RSS, the book stack test and the

713: order test was made on the basis of special experiments, which

714: were carried out for independent data. (Those algorithms are

715: implemented as a Java program and can be found on the internet,

716: see  $ http://web.ict.nsc.ru/\: \tilde{}\: rng/ $.) In all cases

717: such experiments have shown that for all three algorithms the

718: optimal blocklength is close to the one defined by informal

719: equality (\ref{sn}).

720:

721: We can see from the tables that the new tests can detect

722: non-randomness more efficiently than the known ones. Seemingly,

723: the main reason is that RSS, book stack tests and order test deal

724: with such large blocklength as it is possible, whereas  many other

725: tests are focused on other goals. The second reason could be an

726: ability for adaptation. The point is that the new tests  can find

727: subwords, which are more frequent than others, and use them for

728: testing, whereas many other tests are looking for particular

729: deviations from randomness.

730:

731: In conclusion, we can say that the  obtained results show that the

732: new tests, as well as the ideas of Information Theory in general,

733: can be useful tools for randomness testing.

734:

735:

736:

737: \begin{table}[h]

738: \caption{ Number of files generated by PRNG RANDU and recognized

739: as non-random for different tests and different file lengths (in

740: bits). }

741: \begin{center}

742:  \begin{tabular}{|c|c|c|c|c|}

743:

744: \hline \rule{0pt}{2.8ex}Name of test/Length of file

745:   &$5 \: 10^4$ &$10^5$&$ 5 \:10^5$& $10^6$\\

746: \hline \rule{0pt}{2.3ex}Order test &56&100&100&100\\

747:  \rule{0pt}{2.3ex}Book stack &42&100&100&100\\

748: \cline{2-5}

749:  \rule{0pt}{2.3ex}{\it parameters for both tests} &\multicolumn{4}{|c|} { s=20, $|A_1|=5\sqrt{2^{s}}$}\\

750: \hline

751:  \rule{0pt}{2.3ex}RSS &4&75&100&100\\

752: \cline{4-5}

753:  \rule{0pt}{2.3ex}{\it parameters} &s=16&s=17&\multicolumn{2}{|c|} {s=20}\\

754: \hline

755: \rule{0pt}{2.3ex} RAR &0&0&100&100\\

756: \rule{0pt}{2.3ex} ARJ &0&0&99&100\\

757: \hline \rule{0pt}{2.3ex}Frequency& 2&1&1&2\\ \hline

758: \rule{0pt}{2.3ex}Block Frequency &1&2&1&1\\

759:  \rule{0pt}{2.3ex}{\it parameters} &M=1000&M=2000&$M=10^5$&M=20000\\

760: \hline \rule{0pt}{2.3ex}Cumulative Sums&2&1&2&1\\ \hline

761:  \rule{0pt}{2.3ex}Runs&0&2&1&1\\

762: \hline

763:   \rule{0pt}{2.3ex}Longest Run of Ones &0&1&0&0\\

764: \hline

765:   \rule{0pt}{2.3ex}Rank &0&1&1&0\\

766: \hline \rule{0pt}{2.3ex}Discrete Fourier Transform &0&0&0&1\\

767: \hline

768: \rule{0pt}{2.3ex}NonOverlapping Templates &--&--&--&2\\

769:  \rule{0pt}{2.3ex}{\it parameters} &&&&m=10\\

770: \hline\rule{0pt}{2.3ex} Overlapping Templates&--&--&--&2\\

771:  \rule{0pt}{2.3ex}{\it parameters} &&&&m=10\\

772: \hline

773: \rule{0pt}{2.3ex}Universal Statistical &--&--&1&1\\

774:  \rule{0pt}{2.3ex}{\it parameters} &&&L=6&L=7\\

775:  \rule{0pt}{2.3ex} &&& Q=640&Q=1280\\

776: \hline

777: \rule{0pt}{2.3ex}Approximate Entropy&1&2&2&7\\

778:  \rule{0pt}{2.3ex}{\it parameters} &m=5&m=11&m=13&m=14\\

779: \hline \rule{0pt}{2.3ex}Random Excursions &--&--&--&2\\ \hline

780: \rule{0pt}{2.3ex}Random Excursions Variant&--&--&--&2\\ \hline

781: \rule{0pt}{2.3ex}Serial &0&1&2&2\\

782:  \rule{0pt}{2.3ex}{\it parameters} &m=6&m=14&m=16&m=8\\

783: \hline \rule{0pt}{2.3ex}Lempel-Ziv Complexity&--&--&--&1\\ \hline

784: \rule{0pt}{2.3ex}Linear Complexity &--&--&--&3\\

785:  \rule{0pt}{2.3ex}{\it parameters} &&&&M=2500\\

786: \hline

787:  \end{tabular}

788:  \end{center}

789:  \end{table}

790:

791: \begin{table}[!hbt]

792:  %\refstepcounter{table}

793:   \caption{ Number of $5 \,000 \,000-$ bit  files

794: generated by PRNG RANDU  and random, which are recognized as

795: non-random. }

796: \begin{center}

797:  \begin{tabular}{|c|c|c|}

798: \hline

799:  \rule{0pt}{2.8ex}Name of test/ Kind of file

800:   &$ \: RANDU$ &$random $\\

801:

802:

803: \hline \rule{0pt}{2.3ex}Order test &100&3\\

804:  \rule{0pt}{2.3ex}Book stack &100&0\\

805: \cline{2-3}

806:

807:  \rule{0pt}{2.3ex}{\it parameters for both tests} &\multicolumn{2}{|c|} {s=24, $|A_1|=5\sqrt{2^{s}}$}\\

808: \hline

809:  \rule{0pt}{2.3ex}RSS &100&1\\

810: \cline{2-3}

811:

812:  \rule{0pt}{2.3ex}{\it parameters} &s=24&s=24\\

813: \hline

814: \rule{0pt}{2.3ex} RAR &100&0\\

815: \rule{0pt}{2.3ex} ARJ &100&0\\

816:

817: \hline \rule{0pt}{2.3ex}Frequency& 2&1\\

818: \hline \rule{0pt}{2.3ex}Block Frequency &2&1\\

819:  \rule{0pt}{2.3ex}{\it parameters} &$M=10^6$&$M=10^5$\\

820: \hline

821:

822: \rule{0pt}{2.3ex}Cumulative Sums&3&2\\

823: \hline

824:

825:  \rule{0pt}{2.3ex}Runs&2&2\\

826: \hline

827:

828:   \rule{0pt}{2.3ex}Longest Run of Ones &2&0\\

829: \hline

830:

831:   \rule{0pt}{2.3ex}Rank &1&1\\

832: \hline

833:

834: \rule{0pt}{2.3ex}Discrete Fourier Transform &89&9\\

835: \hline

836:

837: \rule{0pt}{2.3ex} NonOverlapping Templates&5&5\\

838:  \rule{0pt}{2.3ex}{\it parameters}&m=10&m=10\\

839:

840: \hline

841:

842: \rule{0pt}{2.3ex} Overlapping Templates&4&1\\

843:  \rule{0pt}{2.3ex}{\it parameters} &m=10&m=10\\

844:

845: \hline

846:

847: \rule{0pt}{2.3ex}Universal Statistical &1&2\\

848:  \rule{0pt}{2.3ex}{\it parameters} &L=9&L=9\\

849:  \rule{0pt}{2.3ex} &Q=5120&Q=5120\\

850:

851: \hline

852:

853: \rule{0pt}{2.3ex}Approximate Entropy&100&89\\

854:  \rule{0pt}{2.3ex}{\it parameters} &m=17&m=17\\

855:

856: \hline

857:

858: \rule{0pt}{2.3ex}Random Excursions &4&3\\

859: \hline

860:

861: \rule{0pt}{2.3ex}Random Excursions Variant&3&3\\

862: \hline

863:

864: \rule{0pt}{2.3ex}Serial &100&2\\

865:  \rule{0pt}{2.3ex}{\it parameters} &m=19&m=19\\

866:

867: \hline

868:

869: \rule{0pt}{2.3ex}Lempel-Ziv Complexity&0&0\\

870: \hline

871:

872: \rule{0pt}{2.3ex}Linear Complexity &4&3\\

873:  \rule{0pt}{2.3ex}{\it parameters} &M=5000 & M=2500 \\

874:

875: \hline

876:  \end{tabular}

877:  \end{center}

878:  \end{table}

879:

880:

881:

882:

883: \section{Appendix. }

884:

885: \emph{Proof} of Theorem 1. First we estimate the number of words

886: $\varphi_n(u)  $  whose length is less than or equal to an integer

887: $\tau$. Obviously, at most one word can be encoded by the empty

888: codeword, at most two words by the words of the length 1, ..., at

889: most $2^i$ can be encoded by the words of length $i,$ etc. Having

890: taken into account that the codewords $\varphi_n(u) \neq

891: \varphi_n(v)$ for different $u$ and $v$, we obtain the inequality

892: $$ | \{ u: |\varphi_n(u) | \leq \tau \} | \leq \sum_{i=0}^\tau 2^i

893: = 2^{\tau+1}- 1. $$ From this inequality and (\ref{cr}) we can see

894: that the number of words from the set $ \{A^n \} ,$ whose

895: codelength is less than or equal to $t_\alpha = n \log |A| - \log

896: (1/ \alpha) - 1 ,$ is not greater than $ 2^{n \log |A| - \log (1/

897: \alpha)}.$ So, we obtained that $$ | \{ u: |\varphi_n(u) | \leq

898: t_\alpha \} | \leq \alpha |A|^n .$$ Taking into account that all

899: words from $A^n$ have equal probabilities if $H_0$ is true, we

900: obtain from the last inequality, (\ref{cr}) and the description of

901: the test $\Gamma_{\alpha,\varphi}^{(n)}$ that $$ Pr \{

902: |\varphi_n(u) | \leq t_\alpha | \} \leq  (\alpha |A|^n / |A|^n ) =

903: \alpha $$ if $H_0$ is true. The theorem is proved.

904:

905: \emph{Proof} of Claim 1. The proof is based on a direct

906: calculation of the probability of  rejection for a case  where

907: $H_0$ is true. From the description of the test

908: $\Upsilon_{\alpha,\varphi}^{(n)}$ and definition of $g$ (see

909: (\ref{s})) we obtain the following chain of equalities.

910:

911: $$ Pr \{ H_0\: is \:rejected \,\}= Pr \{\, |\varphi_n(u) | \leq g \}

912: $$ $$+\, Pr \{ |\varphi_n(u) | = g+1 \}\: (\:

913:  1\:-\, (\sum_{j=1}^{g+1} |A_j|\, - \, \alpha

914: |A|^n\,) / |A_{g+1}|\,)\,) $$ $$= \frac{1}{A^n} \: ( \sum_{j=0}^g

915: |A_j|\:+ \: |A_{g+1}|\: (\:

916:  1\:-\, (\sum_{j=1}^{g+1} |A_j|\, - \, \alpha

917: |A|^n\,) / |A_{g+1}|\,)\,)= \alpha .$$ The claim is proved.

918:

919:

920: \emph{Proof} of Claim 2. We can think that $\hat{t}_\alpha$ in

921: (\ref{cr2}) is an integer. (Otherwise, we obtain the same test

922: taking $\lfloor\hat{t}_\alpha\rfloor$ as a new critical value of

923: the test.) From the  Kraft inequality (\ref{KRAFT}) we obtain that

924: $$ 1\geq \sum_{u \in A^n } 2^{- |\varphi_n (u)|} \geq | \{u: |\,

925: \varphi_n (u)|\leq \hat{t}_\alpha  \} | \: \:2^{-\hat{t}_\alpha}.

926: $$ This inequality and (\ref{cr2}) yield: $$ | \{u: |\, \varphi_n

927: (u)|\leq \hat{t}_\alpha  \} | \leq \alpha |A|^n. $$ If $H_0$ is

928: true then the probability of each $u \in A^n $ equals $|A|^{-n} $

929: and from the last inequality we obtain that $$ Pr \{ |\varphi (u)

930: | \leq \hat{t}_\alpha  \} = |A|^{-n} \: | \{u: |\, \varphi_n

931: (u)|\leq \hat{t}_\alpha  \} | \leq \alpha , $$ if $H_0$ is true.

932: The claim is proved.

933:

934:

935: \emph{Proof} of Theorem 3. We prove the theorem for the process

936: $T(k, \pi),$  but this proof is valid for $\bar{T}(k, \pi),$ too.

937:  First we show that

938: \begin{equation}\label{a3} p^*(x_1...x_d)=2^{-d}, \end{equation}

939:  $ (x_1...x_{d}) \in \{ 0,1 \}^{d}, $ $d =1, ... , k,$ is a

940: stationary distribution for the processes $T(k, \pi)$ (and

941: $\bar{T}(k, \pi)$) for all $k=1,2, \ldots $ and $ \pi \in (0,1)$.

942:  For any values of $k, k \geq 1,$ (\ref{a3}) will be proved if we

943: show that the system of equations $$ P_{T(k, \pi)}(x_1...x_d)=

944: P_{T(k, \pi)}(0x_1...x_{d-1})\, P_{T(k,

945: \pi)}(x_d/0x_1...x_{d-1})\: $$ $$ +\,P_{T(k,

946: \pi)}(1x_1...x_{d-1})\, P_{T(k, \pi)}(x_d/1x_1...x_{d-1}) $$ has

947: the solution $p(x_1...x_d)=2^{-d}$, $ (x_1...x_{d}) \in \{ 0,1

948: \}^{d}, $ $d =1,2, \ldots, k $. It can be easily seen for $d =

949: k,$ if we take into account that, by definition of $T(k, \pi)$

950: and $\bar{T}(k, \pi)$, the equality $P_{T(k,

951: \pi)}(x_k/0x_1...x_{k-1})\: +\, P_{T(k,

952: \pi)}(x_k/1x_1...x_{k-1})=1 $ is valid for all $ (x_1...x_{k})

953: \in \{ 0,1 \}^{k} $. From this equality and the law of total

954: probability we immediately obtain (\ref{a3}) for $d < k.$

955:

956: Let us prove the second claim of the theorem. From the definition

957: $T(k, \pi)$ and $\bar{T}(k, \pi)$ we can see that either $P_{T(k,

958: \pi)}(0/x_1...x_{k})= \pi,\, P_{T(k, \pi)}(1/x_1...x_{k})=1-\pi$

959: or $P_{T(k, \pi)}(0/x_1...x_{k})=1- \pi,\, P_{T(k,

960: \pi)}(1/x_1...x_{k})=\,\pi$. That is why $h(x_{k+1}/x_1...x_{k}) =

961: - (\pi \log_2 \pi + (1- \pi) \log_2 (1-\pi) )$ and, hence,

962: $h_\infty = - (\pi \log_2 \pi + (1- \pi) \log_2 (1-\pi) )$. The

963: theorem is proved.

964:

965: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

966: \section{Acknowledgment } The authors wish to thank  one of anonymous

967: reviewers for information about a unpublished thesis of David

968: Harold Bailey.

969: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

970:

971: %**********************************************************

972: %*                   Bibliography                         *

973: %**********************************************************

974: \newpage

975: \begin{thebibliography}{5}

976:

977: \bibitem{Aho}

978: A.V.Aho,J.E. Hopcroft, J.D.Ulman.{ \it The desighn and analysis of

979: computer algorithms }, Reading, MA: Addison- Wesley, 1976.

980:

981: \bibitem{B}

982: Bailey D. H. { \it Sequential schemes for classifying and

983: predicting ergodic processes }, PhD Dissertation, Stanford

984: University, 1976.

985:

986: \bibitem{BSTW}

987:  Bently J.L.,   Sleator  D.D.,  Tarjan  R.E.,  Wei V.K. {\it A

988: Locally Adaptive Data Compression Scheme.} Comm. ACM, v.29, 1986,

989: pp.320-330.

990:

991: \bibitem{Billingsley}  Billingsley P., {\it Ergodic theory and information},

992:  John Wiley \& Sons (1965).

993:

994:  \bibitem{RANDU}

995: Dudewicz E.J. and  Ralley T.G.  {\it The Handbook of Random

996: Number Generation and Testing With TESTRAND Computer Code,} v. 4

997: of American Series in Mathematical and Management Sciences.

998: American Sciences Press, Inc., Columbus, Ohio, 1981.

999:

1000:

1001: \bibitem{E}

1002:  Elias P. {\it Interval and  Recency  Rank  Source  Coding:  Two

1003: On-Line Adaptive Variable-Length Schemes,} IEEE Trans. Inform.

1004: Theory, v.33, N 1,1987,  pp.3-10.

1005:

1006:

1007: \bibitem{Ga}

1008: Gallager R.G. { \it Information Theory and Reliable Communication.

1009: } Wiley, New York,1968.

1010:

1011:

1012:

1013: \bibitem{KS}

1014:  Kendall M.G., Stuart A.{ \it  The advanced theory of statistics;

1015: Vol.2: Inference and relationship }. London, 1961.

1016:

1017:

1018: \bibitem{K}

1019:  Knuth D.E. { \it   The art of computer programming.} Vol.2.

1020: Addison Wesley, 1981.

1021:

1022: \bibitem{Ko}

1023: Kolmogorov A.N. {\it Three approaches to the quantitative

1024: definition of information. } Problems of Inform. Transmission,

1025: v.1, 1965, pp.3-11.

1026:

1027: \bibitem{Kr}

1028: Krichevsky R. {\it Universal Compression and Retrival}. Kluver

1029: Academic Publishers, 1993.

1030:

1031: \bibitem{Le}

1032: L'Ecuyer P.  {\it Uniform random numbers generation.} Annals of

1033: Operation Research, 1994.

1034:

1035:

1036: \bibitem{Vi}

1037:  Li M.,  Vitanyi P.  { \it An Introduction to Kolmogorov

1038: Complexity and Its Applications}, Springer-Verlag, New York, 2nd

1039: Edition, 1997.

1040:

1041:

1042: \bibitem{M1}

1043:  Marsaglia G. { \it The structure of linear congruential

1044: sequences.} In: S. K. Zaremba, editor, Applications of Number

1045: Theory to Numerical Analysis, pages 248-285. Academic Press, New

1046: York, 1972.

1047:

1048: \bibitem{M2} Marsaglia G. and  Zaman A. { \it Monkey tests for random number

1049: generators.} Computers Math. Applic., 26:1-10, 1993.

1050:

1051: \bibitem{ma}

1052:  Maurer U. { \it A universal statistical test for random bit

1053: generators.} Journal of Cryptology, v.5, n.2, 1992, pp.89-105.

1054:

1055: \bibitem{me}

1056: Menzes A., van Oorschot P., Vanstone S.  { \it Handbook of Applied

1057: Cryptography }, CRC Press, 1996.

1058:

1059:

1060:

1061:

1062:  \bibitem{mo}

1063: Moeschlin O.,  Grycko E.,  Pohl C., and  Steinert F. { \it

1064: Experimental Stochastics.} Springer-Verlag, Berlin Heidelberg,

1065: 1998.

1066:

1067: \bibitem{Moffat99}

1068: Moffat A., {\it An improved data structure for cumulative

1069: probability tables,} 1999, Software -- Practice and Experience,

1070:         v.29,

1071:         no. 7,

1072:         pp.647-659.

1073:

1074: \bibitem{RO}

1075: Rozanov Yu.A. { \it The Random Processes }, Moscow, "Nauka"

1076: ("Science"), 1971.

1077:

1078:

1079: \bibitem{rng}

1080: Rukhin A. and others. { \it A statistical test suite for random

1081: and pseudorandom number generators for cryptographic applications.

1082: } NIST Special Publication 800-22 (with revision dated

1083: May,15,2001).

1084:  http://csrc.nist.gov/rng/SP800-22b.pdf

1085:

1086: \bibitem{R1}

1087:  Ryabko B.Ya. {\it Information Compression by  a  Book  Stack.}

1088: Problems  of  Information  Transmission,  v.16,  N  4,  1980,

1089: pp.16-21.

1090:

1091: \bibitem{R3}

1092: Ryabko B.Ya. {\it  Twice-universal coding.} Problems of

1093: Information Transmission, 1984,n 3, pp.173-177.

1094:

1095:

1096: \bibitem{R2}

1097: Ryabko B.Ya. {\it A locally adaptive  data  compression  scheme

1098: (Letter).} Comm. ACM, v.30, N 9, 1987, p.792.

1099:

1100:

1101:

1102: \bibitem{RR}

1103: Ryabko B., Rissanen J. { \it  Fast Adaptive Arithmetic Code for

1104: Large Alphabet Sources with Asymmetrical Distributions.} IEEE

1105: Communications Letters,v. 7, no. 1, 2003,pp.33- 35.

1106: \bibitem{RSS}

1107:  Ryabko B. Ya., Stognienko V. S.,  Shokin Yu. I. { \it     A new

1108: test for randomness and its application to some cryptographic

1109: problems.} Journal of Statistical Planning and Inference, 2004,

1110: (accepted; available online, see: JSPI,

1111:                           doi:10.1016/S0378-3758(03)00149-6 )

1112:

1113:

1114: \end{thebibliography}

1115:

1116: \end{document}

1117: