1: \documentclass[12pt]{article}
2:
3: \newif\ifnew
4: \newtrue
5: \newcommand{\old}[1]{\ifnew\else #1\fi}
6: \newcommand{\new}[1]{\ifnew #1\fi}
7:
8: \begin{document}
9:
10: \title{ \new{Using Information Theory Approach to Randomness Testing}
11: \footnote {
12: \new{ The authors were supported by INTAS
13: grant no. 00-738 and Russian Foundation for Basic Research under Grant no. 03-01-00495.}
14: } }
15: \author{ B. Ya. Ryabko and V.A. Monarev
16: }
17: \date{}
18: \maketitle
19:
20:
21: \begin{abstract}
22: We address the problem of detecting deviations of binary sequence
23: from randomness,which is very important for random number (RNG)
24: and pseudorandom number generators (PRNG). Namely, we consider a
25: null hypothesis $H_0$ that a given bit sequence is generated by
26: Bernoulli source with equal probabilities of 0 and 1 and the
27: alternative hypothesis $H_1$ that the sequence is generated by a
28: stationary and ergodic source which differs from the source under
29: $H_0$. We show that data compression methods can be used as a
30: basis for such testing and describe two new tests for randomness,
31: which are based on ideas of universal coding. Known statistical
32: tests and suggested ones are applied for testing PRNGs. Those
33: experiments show that the power of the new tests is greater than
34: of many known algorithms.
35:
36: \end{abstract}
37:
38: \textbf{Keywords:} { \it Hypothesis testing, Randomness testing,
39: Random number testing, Universal code, Information Theory, Random
40: number generator, Shannon entropy. }
41: %\end{keywords}
42: \newpage
43:
44: \section{Introduction }
45:
46: The randomness testing of random number and pseudorandom number
47: generators is used for many purposes including cryptographic,
48: modeling and simulation applications; see, for example, Knuth,
49: 1981; L'Ecuyer, 1994; Maurer,1992; Menezes A. and others, 1996.
50: For such applications a required bit sequence should be true
51: random, i.e., by definition, such a sequence could be interpreted
52: as the result of the flips of a "fair" coin with sides that are
53: labeled "0" and "1" (for short, it is called a random sequence;
54: see Rukhin and others, 2001). More formally, we will consider the
55: main hypothesis $H_0$ that a bit sequence is generated by the
56: Bernoulli source with equal probabilities of 0's and 1's.
57: Associated with this null hypothesis is the alternative
58: hypothesis $H_1$ that the sequence is generated by a stationary
59: and ergodic source which generates letters from $\{0,1\}$ and
60: differs from the source under $H_0$.
61:
62:
63: In this paper we will consider some tests which are based on
64: results and ideas of Information Theory and, in particular, the
65: source coding theory. First, we show that a universal code can be
66: used for randomness testing. (Let us recall that, by definition,
67: the universal code can compress a sequence asymptotically till the
68: Shannon entropy per letter when the sequence is generated by a
69: stationary and ergodic source). If we take into account that the
70: Shannon per-bit entropy is maximal (1 bit) if $H_0$ is true and is
71: less than 1 if $H_1$ is true (Billingsley, 1965; Gallager, 1968),
72: we see that it is natural to use this property and universal codes
73: for randomness testing because, in principle, such a test can
74: distinguish each deviation from randomness, which can be described
75: in a framework of the stationary and ergodic source model. Loosely
76: speaking, the test rejects $H_0$ if a binary sequence can be
77: compressed by a considered universal code (or a data compression
78: method.)
79:
80: It should be noted that the idea to use the compressibility as a
81: measure of randomness has a long history in mathematics. The point
82: is that, on the one hand, the problem of randomness testing is
83: quite important for practice, but, on the other hand, this problem
84: is closely connected with such deep theoretical issues as the
85: definition of randomness, the logical basis of probability
86: theory, randomness and complexity, etc; see Kolmogorov, 1965; Li
87: and Vitanyi, 1997; Knuth, 1981; Maurer,1992. Thus, Kolmogorov
88: suggested to define the randomness of a sequence, informally, as
89: the length of the shortest program, which can create the sequence
90: (if one of the universal Turing machines is used as a computer).
91: So, loosely speaking, the randomness (or Kolmogorov complexity) of
92: the finite sequence is equal to its shortest description. It is
93: known that the Kolmogorov complexity is not computable and,
94: therefore, cannot be used for randomness testing. On the other
95: hand, each lossless data compression code can be considered as a
96: method for upper bounding the Kolmogorov complexity. Indeed, if
97: $x$ is a binary word, $\phi$ is a data compression code and
98: $\phi(x)$ is the codeword of $x$, then the length of the codeword
99: $|\phi(x)|$ is the upper bound for the Kolmogorov complexity of
100: the word $x$. So, again we see that the codeword length of the
101: lossless data compression method can be used for randomness
102: testing.
103:
104: In this paper we suggest tests for randomness, which are based on
105: results and ideas of the source coding theory.
106:
107: Firstly, we show how to build a test basing on any data
108: compression method and give some examples of application of such
109: test to PRNG's testing. It should be noted that data compression
110: methods were considered as a basis for randomness testing in
111: literature. For example, Maurer's Universal Statistical Test,
112: Lempel-Ziv Compression Test and Approximate Entropy Test are
113: connected with universal codes and are quite popular in practice,
114: see, for example, Rukhin and others, 2001. In contrast to known
115: methods, the suggested approach gives a possibility to make a test
116: for randomness, basing on any lossless data compression method
117: even if a distribution law of the codeword lengths is not known.
118:
119: Secondly, we describe two new tests, conceptually connected with
120: universal codes. When both tests are applied, a tested sequence
121: $x_1 x_2 ... x_n$ is divided into subwords $x_1 x_2 ... x_s,$
122: $\:x_{s+1} x_{s+2} ... x_{2s},\: \ldots ,\,$ $s\geq 1,$ and the
123: hypothesis $H^*_0$ that the subwords obey the uniform distribution
124: (i.e. each subword is generated with the probability $2^{-s}$) is
125: tested against $H^*_1 =\neg H^*_0$. The key idea of the new tests
126: is as follows. All subwords from the set $ \{0,1\}^s $ are ordered
127: and this order changes after processing each subword $\:x_{j s+1}
128: x_{j s+2} ... x_{(j+1)s}, \, j= 0,1, \ldots $ in such a way that,
129: loosely speaking, the more frequent subwords have small ordinals.
130: When the new tests are applied, the frequency of different
131: ordinals are estimated (instead of frequencies of the subwords as
132: for, say, chi- square test).
133:
134: The natural question is how to choose the block length $s$ in such
135: schemes. We show that, informally speaking, the block length $s$
136: should be taken quite large due to the existence of so called
137: {\it two-faced processes}. More precisely, it is shown that for
138: each integer $s^*$ there exists such a process $\xi$ that for each
139: binary word $u$ the process $\xi$ creates $u$ with the probability
140: $2^{-|u|}$ if the length of the $u$ ($|u|$) is less than or equal
141: to $s^*$, but, on the other hand, the probability distribution
142: $\xi(v)$ is very far from uniform if the length of the words $v$
143: is greater than $s^*.$ (So, if we use a test with the block length
144: $s \leq s^*,$ the sequences generated by $\xi$ will look like
145: random, in spite of $\xi$ is far from being random.)
146:
147:
148: The outline of the paper is as follows. In Section 2 the general
149: method for construction randomness testing algorithms basing on
150: lossless data compressors is described. Two new tests for
151: randomness, which are based on constructions of universal coding,
152: as well as the two-faced processes, are described in the Section
153: 3. In Section 4 the new tests are experimentally compared with
154: methods from " A statistical test suite for random and
155: pseudorandom number generators for cryptographic applications",
156: which was recently suggested by Rukhin and others, 2001. It turns
157: out that the new tests are more powerful than known ones.
158: \section{Data compression methods as a basis for randomness testing }
159:
160: \textbf{2.1. Randomness testing based on data compression}
161:
162: Let $A$ be a finite alphabet and $A^n$ be the set of all words of
163: the length $n$ over $A$, where $n$ is an integer. By definition,
164: $A^* =\bigcup_{n=1}^\infty A^n $ and $A^\infty$ is the set of all
165: infinite words $x_1x_2 \ldots $ over the alphabet $A$. A data
166: compression method (or code) $\varphi$ is defined as a set of
167: mappings $\varphi_n $ such that $\varphi_n : A^n \rightarrow \{
168: 0,1 \}^*,\, n= 1,2, \ldots\, $ and for each pair of different
169: words $x,y \in A^n \:$ $\varphi_n(x) \neq \varphi_n(y) .$
170: Informally, it means that the code $\varphi$ can be applied for
171: compression of each message of any length $n, n
172: > 0 $ over alphabet $A$ and the message can be decoded if
173: its code is known.
174:
175: Now we can describe a statistical test which can be constructed
176: basing on any code $\varphi$. Let $n$ be an integer and
177: $\hat{H}_0$ be a hypothesis that the words from the set $ A^n $
178: obey the uniform distribution, i.e., $p(u)= |A|^{-n}\, $ for each
179: $ \, u \in \{0,1\}^n .$ (Here and below $|x|$ is the length if $x$
180: is a word, and the number of elements if $x$ is a set.) Let a
181: required level of significance (or a Type I error) be $\alpha ,\,
182: \alpha \in (0,1).$ The following main idea of a suggested test is
183: quite natural: The well compressed words should be considered as
184: non- random and $\hat{H}_0$ should be rejected. More exactly, we
185: define a critical value of the suggested test by
186: \begin{equation}\label{cr}
187: t_\alpha = n \log |A| - \log (1/ \alpha) - 1\,.
188: \end{equation}
189: (Here and below $\log x = \log_2 x$.)
190:
191: Let $u$ be a word from $A^n$. By definition, the hypothesis
192: $\hat{H}_0$ is accepted if $ |\varphi_n (u) | > t_\alpha $ and
193: rejected, if $ |\varphi_n (u) | \leq t_\alpha .$ We denote this
194: test by $\Gamma_{\alpha,\,\varphi}^{(n)}.$
195:
196: \textbf{Theorem 1.} { \it For each integer $n$ and a code
197: $\varphi$, the Type I error of the described test
198: $\Gamma_{\alpha,\,\varphi}^{(n)}$ is not larger than $\alpha .$ }
199:
200: \emph{Proof} is given in Appendix.
201:
202: \textbf{ Comment 1}. The described test can be modified in such a
203: way that the Type I error will be equal to $\alpha.$ For this
204: purpose we define the set $A_\gamma$ by $$ A_\gamma = \{x: x \in
205: A^n\: \: \& \:\;|\varphi_n(x)| = \gamma \} $$ and an integer $g$
206: for which the two following inequalities are valid:
207: \begin{equation}\label{s} \sum_{j=0}^g
208: |A_j|\: \leq\, \alpha |A|^n\, <\, \sum_{j=0}^{g+1} |A_j| \,.
209: \end{equation} Now the modified test can be described as
210: follows:
211:
212: If for $x \in A^n\;\; |\varphi_n(x)| \leq g\:\; $ then $\hat{H}_0$
213: is rejected, if $|\varphi_n(x)| > (g+1) \:$ then $\hat{H}_0$ is
214: accepted and if $|\varphi_n(x)| = (g+1) \:$ the hypothesis
215: $\hat{H}_0$ is accepted with the probability $$ (\sum_{j=1}^{g+1}
216: |A_j|\, - \, \alpha |A|^n\,) / |A_{g+1}| $$ and rejected with the
217: probability $$ 1\:-\, (\sum_{j=1}^{g+1} |A_j|\, - \, \alpha
218: |A|^n\,) / |A_{g+1}| \,.$$ (Here we used a randomized criterion,
219: see for definition, for example, Kendall and Stuart, 1961, part
220: 22.11.) We denote this test by
221: $\Upsilon_{\alpha,\,\varphi}^{(n)}.$
222:
223: \textbf{ Claim 1}. { \it For each integer $n$ and a code
224: $\varphi$, the Type I error of the described test
225: $\Upsilon_{\alpha,\,\varphi}^{(n)}$ is equal to $\alpha .$ }
226:
227:
228: \emph{Proof} is given in Appendix.
229:
230: We can see that this criterion has the level of significance (or
231: Type I error) exactly $\alpha,$ whereas the first criterion,
232: which is based on critical value (\ref{cr}), has the level of
233: significance that could be less than $\alpha .$ In spite of this
234: drawback, the first criterion may be more useful due to its
235: simplicity. Moreover, such an approach gives a possibility to use
236: a data compression method $\psi$ for testing even in case where
237: the distribution of the length $|\psi_n(x)|, x \in A^n$ is not
238: known.
239:
240: \textbf{ Comment 2.} We have considered codes, for which
241: different words of the same length have different codewords (In
242: Information Theory sometimes such codes are called non- singular.)
243: Quite often a stronger restriction is required in Information
244: Theory. Namely, it is required that each sequence
245: $\varphi_n(x_1)\varphi_n(x_2) ...\varphi(x_r), r \geq 1,$ of
246: encoded words from the set $A^n, n\geq 1,$ can be uniquely decoded
247: into $x_1x_2 ...x_r$. Such codes are called uniquely decodable.
248: For example, let $A=\{a,b\}$, the code $\psi_1(a) = 0, \psi_1(b) =
249: 00, $ obviously, is non- singular, but is not uniquely decodable.
250: (Indeed, the word $000$ can be decoded in both $ab$ and $ba.$) It
251: is well known in Information Theory that a code $\varphi$ can be
252: uniquely decoded if the following Kraft inequality is valid:
253: \begin{equation}\label{KRAFT}
254: \Sigma_{u \in A^n}\: 2^{- |\varphi_n (u) |} \leq 1\:,
255: \end{equation}
256: see, for ex., Gallager, 1968.
257:
258: If it is known that the code is uniquely decodable, the suggested
259: critical value (\ref{cr}) can be changed. Let us define \begin{equation}\label{cr2}
260: \hat{t}_\alpha = n \log |A| - \log (1/ \alpha) \,.
261: \end{equation}
262:
263: Let, as before, $u$ be a word from $A^n$. By definition, the
264: hypothesis $\hat{H}_0$ is accepted if $ |\varphi_n (u) | >
265: \hat{t}_\alpha $ and rejected, if $ |\varphi_n (u) | \leq
266: \hat{t}_\alpha .$ We denote this test by
267: $\hat{\Gamma}_{\alpha,\varphi}^{(n)}.$
268:
269: \textbf{Claim 2.} { \it For each integer $n$ and a uniquely
270: decodable code $\varphi$, the Type I error of the described test
271: $\hat{\Gamma}_{\alpha,\varphi}^{(n)}$ is not larger than
272: $\alpha.$}
273:
274: \emph{Proof} is given in Appendix.
275:
276:
277: So, we can see from (\ref{cr}) and (\ref{cr2}) that the critical
278: value is larger, if the code is uniquely decodable. On the other
279: hand, the difference is quite small and (\ref{cr}) can be used
280: without a large loose of the test power even in a case of the
281: uniquely decodable codes.
282:
283: It should not be a surprise that the level of significance (or a
284: Type I error) does not depend on the alternative hypothesis $H_1,$
285: but, of course, the power of a test (and the Type II error) will
286: be determined by $H_1.$
287:
288: The examples of testing by real data compression methods will be
289: given in Section 4.
290:
291:
292:
293: \textbf{ 2.2. Randomness testing based on universal codes. }
294:
295:
296: We will consider the main
297: hypothesis $H_0$ that the letters of a given sequence $x_1x_2
298: ...x_t, \, x_i \in A,\, $ are independent and identically
299: distributed (i.i.d.) with equal probabilities of all $a \in A $
300: and the alternative hypothesis $H_1$ that the sequence is
301: generated by a stationary and ergodic source, which generates
302: letters from $A$ and differs from the source under $H_0$. (If $A=
303: \{0,1\}$, i.i.d. coincides with Bernoulli source.) The definition
304: of the stationary and ergodic source and the Shannon entropy of
305: such sources can be found in Billingsley, 1965, and Gallager,
306: 1968.
307:
308: We will consider statistical tests, which are based on universal
309: coding and universal prediction. First we define a universal code.
310:
311: By definition, $\varphi$ is a universal code if for each
312: stationary and ergodic source (or a process) $\pi$ the following
313: equality is valid with probability 1 (according to the measure
314: $\pi \,) $
315:
316: \begin{equation}\label{un}
317: \lim_{n \rightarrow \infty} (|\varphi_n(x_1 ... x_n)|) /
318: n = h(\pi)\,,
319: \end{equation}
320: where $h(\pi)$ is the Shannon entropy. ( Such codes exist, see
321: Ryabko, 1984.) It is well known in Information Theory that
322: $h(\pi)= \log |A|$ if $H_0$ is true, and $h(\pi)< \log |A|$ if
323: $H_1$ is true, see, for ex., Billingsley, 1965; Gallager, 1968.
324: From this property and (\ref{un}) we can easily yield the
325: following theorem.
326:
327: \textbf{Theorem 2.} { \it Let $\varphi$ be a universal code,
328: $\alpha \in (0,1)$ be a level of significance and a sequence
329: $x_1x_2 ...x_n, \, n \geq 1, \, $ be generated by a stationary
330: ergodic source $\pi$. If the described above test
331: $\Gamma_{\alpha,\,\varphi}^{(n)}$ is applied for testing $H_0$
332: (against $H_1$), then, with probability 1, the Type I error is not
333: larger than $\alpha$, and the Type II error goes to 0, when
334: $n\rightarrow \infty$. }
335:
336: So, we can see that each good universal code can be used as a
337: basis for randomness testing. But converse proposition is not
338: true. Let, for example, there be a code, whose codeword length is
339: asymptotically equal to $ (0.5+ h(\pi) / 2 ) $ for each source
340: $\pi$ (with probability 1, where, as before, $h(\pi)$ is the
341: Shannon entropy). This code is not good, because its codeword
342: length does not tend to the entropy, but, obviously, such code
343: could be used as a basis for a test of randomness. So, informally
344: speaking, the set of tests is larger than the set of universal
345: codes.
346:
347: Note that the close problems were considered by Bailey (1974), who
348: obtained many important results in this field.
349:
350: \section{Two new tests for randomness and two-faced processes }
351:
352: Firstly, we suggest two tests which are based on ideas of
353: universal coding, but they are described in such a way that can be
354: understood without any knowledge of Information Theory.
355:
356:
357: \textbf{ 3.1. The "book stack" test }
358:
359: Let, as before, there be given an alphabet $A= \{a_1, ... , a_S
360: \},$ a source, which generates letters from $A,$ and two following
361: hypotheses: the source is i.i.d. and $p(a_1)= ....= p(a_S) =
362: 1/S\:$ ($H_0$) and $H_1 = \neg H_0.$ We should test the hypotheses
363: basing on a sample $x_1 x_2 \,... \,x_n,\, n\geq 1\,,\,$ generated
364: by the source. When the "book stack" test is applied, all letters
365: from $A$ are ordered from 1 to $S$ and this order is changed after
366: observing each letter $x_t$ according to the formula
367:
368: \begin{equation}\label{nu}
369: \nu^{t+1}(a)=\cases{1,&if $x_t = a\,$;\cr
370: \nu^t(a)+1,&if $\nu^t(a) < \nu^t(x_t)$;\cr
371: \nu^t(a), &if $ \nu^t(a) > \nu^t(x_t)$\, ,}
372: \end{equation}
373: where $\nu^t$ is the order after observing $x_1 x_2 \,... \,x_t,\,
374: t = 1\,,, ...\,, n\,,$ $\nu^1$ is defined arbitrarily. (For ex.,
375: we can define $\nu^1 = \{a_1, ... , a_S \}.$) Let us explain
376: (\ref{nu}) informally. Suppose that the letters of $A$ make a
377: stack, like a stack of books and $\nu^1(a)$ is a position of $a$
378: in the stack. Let the first letter $x_1$ of the word $x_1 x_2
379: \,... \,x_n$ be $a$. If it takes $i_1-$th position in the stack
380: ($\nu^1(a)= i_1$), then take $a$ out of the stack and put it on
381: the top. (It means that the order is changed according to
382: (\ref{nu}).) Repeat the procedure with the second letter $x_2$
383: and the stack obtained, etc.
384:
385: It can help to understand the main idea of the suggested method
386: if we take into account that, if $H_1$ is true, then frequent
387: letters from $A$ (as frequently used books) will have relatively
388: small numbers (will spend more time next to the top of the stack).
389: On the other hand, if $H_0$ is true, the probability to find each
390: letter $x_i$ at each position $j$ is equal to $1/S$.
391:
392: Let us proceed with the description of the test. The set of all
393: indexes $ \{1, \ldots, S \} $ is divided into $r, r \geq 2, $
394: subsets $A_1 = \{ 1,2,\ldots, k_1 \}, $ $ A_2 = \{ k_1+1,\ldots,
395: k_2 \}, \ldots , A_r = \{ k_{r-1}+1,\ldots, k_r \}.$ Then, using
396: $x_1 x_2 \,... \,x_n$, we calculate how many $\nu^t(x_t),$ $
397: t=1,..., n,$ belong to a subset $A_k, k=1,..., r$. We define this
398: number as $n_k$ (or, more formally, $n_k = | \{ t : \nu^t(x_t) \in
399: A_k, t=1,\ldots, n \}|, k=1,..., r .$) Obviously, if $H_0$ is
400: true, the probability of the event $ \nu^t(x_t) \in A_k$ is equal
401: to $ |A_j|/S.$ Then, using a "common" chi- square test we test the
402: hypothesis $\hat{H}_0= P\{ \nu^t(x_t) \in A_k \}= |A_k|/S $
403: basing on the empirical frequencies $n_1,\ldots,n_r$, against
404: $\hat{H}_1= \neg \hat{H}_0.$ Let us recall that the value
405: \begin{equation}\label{x2}
406: x^2=\sum_{i=1}^{r}\frac{(n_i - n (|A_i|/S ) )^2}{n
407: (|A_i|/S )} \end{equation} is calculated, when chi- square test
408: is applied, see, for ex., Kendall and Stuart, 1961. It is known
409: that $x^2$ asymptotically follows the $\chi$-square distribution
410: with $(k-1)$ degrees of freedom ($\chi^2_{k-1}$) if $\hat{H}_0$ is
411: true. If the level of significance (or a Type I error)
412: of the $\chi^2$
413: test is $\alpha, \alpha \in (0,1), $ the hypothesis $\hat{H}_0$ is
414: accepted when $x^2$ from (\ref{x2}) is less than the
415: \emph{$(1-\alpha)$ -value } of the $\chi^2_{k-1}$ distribution;
416: see, for ex., Kendall, Stuart, 1961.
417:
418: We do not describe the exact rule how to construct the subsets
419: $\{A_1, A_2, $ $ \ldots, $ $ A_r \}$, but we recommend to
420: perform some experiments for finding the parameters, which make
421: the sample size minimal (or, at least, acceptable). The point is
422: that there are many cryptographic and other applications where it
423: is possible to implement some experiments for optimizing the
424: parameter values and, then, to test hypothesis basing on
425: independent data. For example, in case of testing a PRNG it is
426: possible to seek suitable parameters using a part of generated
427: sequence and then to test the PRNG using a new part of the
428: sequence.
429:
430: Let us consider a simple example. Let $A= \{a_1, \ldots , a_6 \},
431: $ $ r=2, A_1= \{a_1,a_2, a_3 \} , A_2= \{a_4, a_5, a_6 \}, $ $
432: x_1 \ldots x_8 =$ $ a_3 a_6 a_3 a_3 a_6 a_1 a_6 a_1.$ If $\nu_1=
433: 1, 2, 3, 4,$ $ 5,6 ,$ then $\nu_2= 3, 1, 2, 4, 5,6 ,$ $\nu_3= 6,
434: 3, 1, 2, 4, 5 ,$ etc., and $n_1 = 7, n_2 = 1.$ We can see that
435: the letters $ a_3 $ and $a_6$ are quite frequent and the "book
436: stack" indicates this nonuniformity quite well. (Indeed, the
437: average values of $n_1$ and $n_2$ equal $4$, whereas the real
438: values are 7 and 1, correspondingly.)
439:
440: Examples of practical applications of this test will be given in
441: Section 4, but here we make two notes. Firstly, we pay attention
442: to the complexity of this algorithm. The "naive" method of
443: transformation according to (\ref{nu}) could take the number of
444: operations proportional to $S,$ but there exist algorithms, which
445: can perform all operations in (\ref{nu}) using $O( \log S )$
446: operations. Such algorithms can be based on AVL- trees, see, for
447: ex., Aho,Hopcroft and Ulman, 1976.
448:
449: The last comment concerns with the name of the method. The "book
450: stack" structure is quite popular in Information Theory and
451: Computer Science. In Information Theory this structure was firstly
452: suggested as a basis of an universal code by Ryabko, 1980, and
453: was rediscovered by Bently, Sleator, Tarjan, Wei in 1986, and
454: Elias in 1987 (see also a comment of Ryabko (1987) about a history
455: of this code). In English language literature this code is
456: frequently called as "Move-to-Front" (MTF) scheme as it was
457: suggested by Bently, Sleator, Tarjan and Wei. Now this data
458: structure is used in a caching and many other algorithms in
459: Computer Science under the name "Move-to-Front". It is also worth
460: noting that the book stack was firstly considered by a soviet
461: mathematician M.L. Cetlin as an example of a self- adaptive
462: system in 1960's, see Rozanov, 1971.
463:
464:
465:
466:
467:
468: \textbf{ 3.2. The order test }
469:
470: This test is also based on changing the order $\nu^t(a)$ of
471: alphabet letters but the rule of the order change differs from
472: (\ref{nu}). To describe the rule we first define $
473: \lambda^{t+1}(a)$ as a count of occurrences of $a$ in the word
474: $x_1\ldots x_{t-1}x_t .$ At each moment $t$ the alphabet letters
475: are ordered according to $\nu^t$ in such a way that, by
476: definition, for each pair of letters $a$ and $b$ $\nu^t(a) \prec
477: \nu^t(b)$ if $\lambda^t(a) \leq \lambda^t(b).$ For example, if $A=
478: \{a_1, a_2, a_3 \}$ and $x_1 x_2 x_3 = a_3 a_2 a_3$, the possible
479: orders can be as follows: $\nu^1=(1, 2, 3),$ $ \nu^2=(3, 1, 2),$ $
480: \nu^3=(3, 2, 1),$ $ \nu^4=(3, 2, 1).$ In all other respects this
481: method coincides with the book stack. (The set of all indexes $
482: \{1, \ldots, S \} $ is divided into $r $ subsets,
483: etc.)
484:
485: Obviously, after observing each letter $x_t$ the value
486: $\lambda^t(x_t)$ should be increased and the order $\nu^t$ should
487: be changed. It is worth noting that there exist a data structure
488: and algorithm, which allow maintaining the alphabet letters
489: ordered in such a way that the number of operations spent is
490: constant, independently of the size of the alphabet. This data
491: structure was described by Moffat, 1999 and Ryabko, Rissanen,
492: 2003.
493:
494: \textbf{ 3.3. Two- faced processes and the choice of the block
495: length for a process testing }
496:
497: There are quite many methods for testing $H_0$ and $H_1$, where
498: the bit stream is divided into words (blocks) of the length $s, s
499: \geq 1,$ and the sequence of the blocks $x_1x_2\ldots x_s$,
500: $x_{s+1}\ldots x_{2s},\ldots $ is considered as letters, where
501: each letter belongs to the alphabet $B_s = \{ 0,1 \}^s $ and has
502: the probability $2^{-s},$ if $H_0$ is true. For instance, both
503: above described tests, methods from Ryabko, Stognienko and Shokin
504: (2003) and quite many other algorithms belong to this kind. That
505: is why the questions of choosing the block length $s$ will be
506: considered here.
507:
508: As it was mentioned in the introduction there exist two-faced
509: processes, which, on the one hand, are far from being truly
510: random, but, on the other hand, they can be distinguished from
511: truly random only in the case when the block length $s$ is large.
512: From the information theoretical point of view the two- faced
513: processes can be simply described as follows. For a two- faced
514: process, which generates letters from $ \{ 0,1 \}$, the limit
515: Shannon entropy is (much) less than 1 and, on the other hand, the
516: $s-$ order entropy ($h_s$) is maximal $(h_s =1$ bit per letter)
517: for relatively large $s.$
518:
519: We describe two families of two- faced processes $T(k, \pi)$ and
520: $\bar{T}(k, \pi)$, where $k=1,2, \ldots,\,$ and $ \pi \in (0,1)$
521: are parameters. The processes $T(k,\pi)$ and $\bar{T}(k, \pi)$ are
522: Markov chains of the connectivity (memory) $k$, which generate
523: letters from $\{0,1\}$. It is convenient to define them
524: inductively. The process $T(1,\pi)$ is defined by conditional
525: probabilities $P_{T(1, \pi)}(0/0) = \pi, P_{T(1, \pi)}(0/1) =
526: 1-\pi $ (obviously, $P_{T(1, \pi)}(1/0) =1- \pi, P_{T(1,
527: \pi)}(1/1) = \pi $). The process $\bar{T}(1,\pi)$ is defined by
528: $P_{\bar{T}(1, \pi)}(0/0) =1- \pi, P_{\bar{T}(1, \pi)}(0/1) = \pi
529: $. Assume that $T(k, \pi)$ and $\bar{T}(k, \pi)$ are defined and
530: describe $T(k+1, \pi)$ and $\bar{T}(k+1, \pi)$ as follows $$
531: P_{T(k+1, \pi)}(0/0u) = P_{T(k, \pi)}(0/u), P_{T(k+1, \pi)}(1/0u)
532: = P_{T(k, \pi)}(1/u), $$ $$ P_{T(k+1, \pi)}(0/1u) = P_{\bar{T}(k,
533: \pi)}(0/u), P_{T(k+1, \pi)}(1/1u) = P_{\bar{T}(k, \pi)}(1/u) ,$$
534: and, vice versa, $$ P_{\bar{T}(k+1, \pi)}(0/0u) = P_{\bar{T}(k,
535: \pi)}(0/u), P_{\bar{T}(k+1, \pi)}(1/0u) = P_{\bar{T}(k,
536: \pi)}(1/u), $$ $$ P_{\bar{T}(k+1, \pi)}(0/1u) = P_{T(k,
537: \pi)}(0/u), P_{\bar{T}(k+1, \pi)}(1/1u) = P_{T(k, \pi)}(1/u) $$
538: for each $u \in B_k$ (here $vu$ is a concatenation of the words
539: $v$ and $u$). For example, $$ P_{T(2,\pi)}(0/00) = \pi,
540: P_{T(2,\pi)}(0/01) = 1-\pi, P_{T(2,\pi)}(0/10) = 1-\pi,
541: P_{T(2,\pi)}(0/11) = \pi. $$ The following theorem shows that the
542: two-faced processes exist.
543:
544: \textbf{Theorem 3.} { \it For each $\pi \in (0,1) $ the s-order
545: Shannon entropy ($h_s$) of the processes $T(k, \pi)$ and
546: $\bar{T}(k, \pi)$ equals 1 bit per letter for $s=0,1,\ldots , k$
547: whereas the limit Shannon entropy ($h_\infty $) equals $ - (\pi
548: \log_2 \pi + (1- \pi) \log_2 (1-\pi) ).$ }
549:
550: The proofs of the theorem is given in Appendix, but here we
551: consider examples of "typical" sequences of the processes
552: $T(1,\pi)$ and $\bar{T}(1,\pi)$ for $\pi$, say, 1/5. Examples are:
553: $ 010101101010100101...$ and $ 000011111000111111000.... .$ We can
554: see that each sequence contains approximately one half of 1's and
555: one half of 0's. (That is why the first order Shannon entropy is
556: 1 per a letter.) On the other hand, both sequences do not look
557: like truly random, because they, obviously, have too long
558: subwords like either $101010 ..$ or $000.. 11111.. .$ (In other
559: words, the second order Shannon entropy is much less than 1 per
560: letter.) Hence, if a randomness test is based on estimation of
561: frequencies of 0's and 1's only, then such a test will not be
562: able to find deviations from randomness.
563:
564: So, if we revert to the question about the block length of tests
565: and take into account the existence of two- faced processes, it
566: seems that the block length could be taken as large as possible.
567: But it is not so. The following informal consideration could be
568: useful for choosing the block length. The point is that
569: statistical tests can be applied if words from the sequence
570:
571: \begin{equation}\label{s}
572: x_1x_2\ldots x_s, \:x_{s+1}\ldots x_{2s},\ldots, \:x_{(m-1)s+1}
573: x_{(m-1)s+2}\ldots x_{m s} \end{equation}
574: are repeated (at least a few times)
575: with high probability (here $m s $ is the sample length).
576: Otherwise, if all words in (\ref{s}) are unique (with high
577: probability) when $H_0$ is true, a sensible test cannot be
578: constructed basing on a division into $s-$letter words. So, the
579: word length $s$ should be chosen in such a way that some words
580: from the sequence (\ref{s}) are repeated with high probability,
581: when $H_0$ is true. So, now our problem can be formulated as
582: follows. There is a binary sequence $x_1x_2\ldots x_n$ generated
583: by the Bernoulli source with $P(x_i=0)= P(x_i=1)= 1/2 $ and we
584: want to find such a block length $s$ that the sequence (\ref{s})
585: with $ m= \lfloor n/s\rfloor, $ contains some repetitions (with
586: high probability). This problem is well known in the probability
587: theory and sometimes called as the birthday problem. Namely, the
588: standard statement of the problem is as follows. There are $S=
589: 2^s$ cells and $m\, (=n/s)$ pellets. Each pellet is put in one of
590: the cells with the probability $1/S$. It is known in Probability
591: Theory that, if $m = c\, \sqrt{ S}, c >0$ then the average number
592: of cells with at least two pellets equals $c^2\, (1/2 + \circ
593: (1)\,),$ where $S$ goes to $\infty \,;$ see Kolchin, Sevast'yanov
594: and Chistyakov, 1976. In our case the number of cells with at
595: least two pellets is equal to the number of the words from the
596: sequence (\ref{s}) which are met two (or more) times. Having into
597: account that $S=2^s, m= n/s,$ we obtain from $m = c\, \sqrt{ S}, c
598: >0$ an informal rule for choosing the length of words
599: in (\ref{s}): \begin{equation}\label{sn} n \asymp s 2^{s/2}
600: \end{equation} where $n$ is the length of a sample $x_1x_2
601: ... x_n,$ $s$ is the block length. If $s$ is much larger, the
602: sequence (\ref{s}) does not have repeated words (in case $H_0$ )
603: and it is difficult to build a a sensible test. On the other hand,
604: if $s$ is much smaller, large classes of the alternative
605: hypotheses cannot be tested (due to existence of the two-faced
606: processes). It is worth noting that it is impossible to have a
607: universal choice of $s,$ because it is impossible to avoid the
608: two- faced phenomenon. In other words this fact can be explained
609: basing on the following known result of Information Theory: it is
610: impossible to have guaranteed rate of code convergence
611: universally for all ergodic sources; see Bailey, 1976, Ryabko,
612: 1984. That is why, it is impossible to choose a universal length
613: $s.$ On the other hand, there are many applications where the
614: word length $s$ can be chosen experimentally. (But, of course,
615: such experiments should be performed on the independent data.)
616:
617:
618:
619: \section{The experiments }
620:
621: In this part we describe some experiments carried out to compare
622: new tests with known ones. We will compare order test, book stack
623: test, tests which are based on standard data compression methods,
624: and tests from Rukhin and others, 2001. The point is that the
625: tests from Rukhin and others are selected basing on comprehensive
626: theoretical and experimental analysis and can be considered as the
627: state-of-the-art in randomness testing. Besides, we will also test
628: the method recently published by Ryabko, Stognienko, Shokin,
629: (2004), because it was published later than the book of Rukhin
630: and others.
631:
632: We used data generated by the PRNG "RANDU" (described in
633: Dudewicz and Ralley, 1981) and random bits from "The Marsaglia
634: Random Number CDROM", see: http://stat.fsu.edu/diehard/cdrom/ ).
635: RANDU is a linear congruent generators (LCG), which is defined by
636: the following equality $$X_{n+1}=(A \: X_n+C)\: mod\, M \, ,$$
637: where $X_{n}$ is $n$-th generated number. RANDU is defined by
638: parameters $A=2^{16}+3 , C= 0 , M= 2^{31} , X_0 = 1.$ Those kinds
639: of sources of random data were chosen because random bits from
640: "The Marsaglia Random Number CDROM" are considered as good random
641: numbers, whereas it is known that RANDU is not a good PRNG. It is
642: known that the lowest digits of $X_n$ are "less random" than
643: the leading digits (Knuth, 1981), that is why in our experiments
644: with RANDU we extract an eight-bit word from each generated $X_i$
645: by formula $ \hat{X}_i = \lfloor X_i/2^{23} \rfloor .$
646:
647:
648: The behavior of the tests was investigated for files of different
649: lengths (see the tables below). We generated 100 different files
650: of each length and applied each mentioned above test to each file
651: with level of significance 0.01 (or less, see below). So, if a
652: test is applied to a truly random bit sequence, on average 1 file
653: from 100 should be rejected. All results are given in the tables,
654: where integers in boxes are the number of rejected files (from
655: 100). If a number of the rejections is not given for a certain
656: length and test, it means that the test cannot be applied for
657: files of such a length.
658:
659: The table 1 contains information about testing of sequences of
660: different lengths generated by RANDU, whereas the table 2 contains
661: results of application of all tests to 5 000 000- bit sequences
662: either generated by RANDU or taken from "The Marsaglia Random
663: Number CDROM".
664: For example, the first number of the second row of
665: the table 1 is 56. It means that there were 100 files of the
666: length $5 \: 10^4$ bits generated by PRNG RANDU. When the Order
667: test was applied, the hypothesis $H_0$ was rejected 56 times from
668: 100 (and, correspondingly, $H_0$ was accepted 44 times.) The first
669: number of the third line shows that $H_0$ was rejected 42 times,
670: when the Book stack test was applied to the same 100 files. The
671: third number of the second line shows that the hypothesis $H_0$
672: was rejected 100 times, when the Order test was applied for
673: testing of 100 $ 100000-$bit files generated by RANDU, etc.
674:
675:
676: Let us first give some comments about the tests, which are based
677: on popular data compression methods RAR and ARJ. In those cases we
678: applied each method to a file and first estimated the length of
679: compressed data. Then we use the test
680: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
681: $\Gamma_{\alpha,\,\varphi}^{(n)}$ with the critical value
682: (\ref{cr}) as follows. The alphabet size $|A|= 2^8 = 256$, $ n
683: \log |A|$ is simply the length of file (in bits) before
684: compression, (whereas $n$ is the length in bytes). So, taking
685: $\alpha = 0.01,$ from (\ref{cr}), we see that the hypothesis
686: about randomness ($H_0$) should be rejected, if the length of
687: compressed file is less than or equal to $ n \log |A| - 8$ bits.
688: (Strictly speaking, in this case $\alpha \leq 2^{-7} = 1/128.$)
689: So, taking into account that the length of computer files is
690: measured in bytes, this rule is very simple: if the $n-$byte file
691: is really compressed (i.e. the length of the encoded file is
692: $n-1$ bytes or less), this file is not random (and $H_0$ is
693: rejected). So, the following tables contain numbers of cases,
694: where files were really compressed.
695:
696:
697: Let us now give some comments about parameters of the considered
698: methods. As it was mentioned, we investigated all methods from the
699: book of Rukhin and others (2001), the test of Ryabko, Stognienko
700: and Shokin, 2004 (RSS test for short), the described above two
701: tests based on data compression algorithms, the order tests and
702: the book stack test. For some tests there are parameters, which
703: should be specified. In such cases the values of parameters are
704: given in the table in the row, which follows the test results.
705: There are some tests from the book of Rukhin and others, where
706: parameters can be chosen from a certain interval. In such cases
707: we repeated all calculations three times, taking the minimal
708: possible value of the parameter, the maximal one and the average
709: one. Then the data for the case when the number of rejections of
710: the hypothesis $H_0$ is maximal, is given in the table.
711:
712: The choice of parameters for RSS, the book stack test and the
713: order test was made on the basis of special experiments, which
714: were carried out for independent data. (Those algorithms are
715: implemented as a Java program and can be found on the internet,
716: see $ http://web.ict.nsc.ru/\: \tilde{}\: rng/ $.) In all cases
717: such experiments have shown that for all three algorithms the
718: optimal blocklength is close to the one defined by informal
719: equality (\ref{sn}).
720:
721: We can see from the tables that the new tests can detect
722: non-randomness more efficiently than the known ones. Seemingly,
723: the main reason is that RSS, book stack tests and order test deal
724: with such large blocklength as it is possible, whereas many other
725: tests are focused on other goals. The second reason could be an
726: ability for adaptation. The point is that the new tests can find
727: subwords, which are more frequent than others, and use them for
728: testing, whereas many other tests are looking for particular
729: deviations from randomness.
730:
731: In conclusion, we can say that the obtained results show that the
732: new tests, as well as the ideas of Information Theory in general,
733: can be useful tools for randomness testing.
734:
735:
736:
737: \begin{table}[h]
738: \caption{ Number of files generated by PRNG RANDU and recognized
739: as non-random for different tests and different file lengths (in
740: bits). }
741: \begin{center}
742: \begin{tabular}{|c|c|c|c|c|}
743:
744: \hline \rule{0pt}{2.8ex}Name of test/Length of file
745: &$5 \: 10^4$ &$10^5$&$ 5 \:10^5$& $10^6$\\
746: \hline \rule{0pt}{2.3ex}Order test &56&100&100&100\\
747: \rule{0pt}{2.3ex}Book stack &42&100&100&100\\
748: \cline{2-5}
749: \rule{0pt}{2.3ex}{\it parameters for both tests} &\multicolumn{4}{|c|} { s=20, $|A_1|=5\sqrt{2^{s}}$}\\
750: \hline
751: \rule{0pt}{2.3ex}RSS &4&75&100&100\\
752: \cline{4-5}
753: \rule{0pt}{2.3ex}{\it parameters} &s=16&s=17&\multicolumn{2}{|c|} {s=20}\\
754: \hline
755: \rule{0pt}{2.3ex} RAR &0&0&100&100\\
756: \rule{0pt}{2.3ex} ARJ &0&0&99&100\\
757: \hline \rule{0pt}{2.3ex}Frequency& 2&1&1&2\\ \hline
758: \rule{0pt}{2.3ex}Block Frequency &1&2&1&1\\
759: \rule{0pt}{2.3ex}{\it parameters} &M=1000&M=2000&$M=10^5$&M=20000\\
760: \hline \rule{0pt}{2.3ex}Cumulative Sums&2&1&2&1\\ \hline
761: \rule{0pt}{2.3ex}Runs&0&2&1&1\\
762: \hline
763: \rule{0pt}{2.3ex}Longest Run of Ones &0&1&0&0\\
764: \hline
765: \rule{0pt}{2.3ex}Rank &0&1&1&0\\
766: \hline \rule{0pt}{2.3ex}Discrete Fourier Transform &0&0&0&1\\
767: \hline
768: \rule{0pt}{2.3ex}NonOverlapping Templates &--&--&--&2\\
769: \rule{0pt}{2.3ex}{\it parameters} &&&&m=10\\
770: \hline\rule{0pt}{2.3ex} Overlapping Templates&--&--&--&2\\
771: \rule{0pt}{2.3ex}{\it parameters} &&&&m=10\\
772: \hline
773: \rule{0pt}{2.3ex}Universal Statistical &--&--&1&1\\
774: \rule{0pt}{2.3ex}{\it parameters} &&&L=6&L=7\\
775: \rule{0pt}{2.3ex} &&& Q=640&Q=1280\\
776: \hline
777: \rule{0pt}{2.3ex}Approximate Entropy&1&2&2&7\\
778: \rule{0pt}{2.3ex}{\it parameters} &m=5&m=11&m=13&m=14\\
779: \hline \rule{0pt}{2.3ex}Random Excursions &--&--&--&2\\ \hline
780: \rule{0pt}{2.3ex}Random Excursions Variant&--&--&--&2\\ \hline
781: \rule{0pt}{2.3ex}Serial &0&1&2&2\\
782: \rule{0pt}{2.3ex}{\it parameters} &m=6&m=14&m=16&m=8\\
783: \hline \rule{0pt}{2.3ex}Lempel-Ziv Complexity&--&--&--&1\\ \hline
784: \rule{0pt}{2.3ex}Linear Complexity &--&--&--&3\\
785: \rule{0pt}{2.3ex}{\it parameters} &&&&M=2500\\
786: \hline
787: \end{tabular}
788: \end{center}
789: \end{table}
790:
791: \begin{table}[!hbt]
792: %\refstepcounter{table}
793: \caption{ Number of $5 \,000 \,000-$ bit files
794: generated by PRNG RANDU and random, which are recognized as
795: non-random. }
796: \begin{center}
797: \begin{tabular}{|c|c|c|}
798: \hline
799: \rule{0pt}{2.8ex}Name of test/ Kind of file
800: &$ \: RANDU$ &$random $\\
801:
802:
803: \hline \rule{0pt}{2.3ex}Order test &100&3\\
804: \rule{0pt}{2.3ex}Book stack &100&0\\
805: \cline{2-3}
806:
807: \rule{0pt}{2.3ex}{\it parameters for both tests} &\multicolumn{2}{|c|} {s=24, $|A_1|=5\sqrt{2^{s}}$}\\
808: \hline
809: \rule{0pt}{2.3ex}RSS &100&1\\
810: \cline{2-3}
811:
812: \rule{0pt}{2.3ex}{\it parameters} &s=24&s=24\\
813: \hline
814: \rule{0pt}{2.3ex} RAR &100&0\\
815: \rule{0pt}{2.3ex} ARJ &100&0\\
816:
817: \hline \rule{0pt}{2.3ex}Frequency& 2&1\\
818: \hline \rule{0pt}{2.3ex}Block Frequency &2&1\\
819: \rule{0pt}{2.3ex}{\it parameters} &$M=10^6$&$M=10^5$\\
820: \hline
821:
822: \rule{0pt}{2.3ex}Cumulative Sums&3&2\\
823: \hline
824:
825: \rule{0pt}{2.3ex}Runs&2&2\\
826: \hline
827:
828: \rule{0pt}{2.3ex}Longest Run of Ones &2&0\\
829: \hline
830:
831: \rule{0pt}{2.3ex}Rank &1&1\\
832: \hline
833:
834: \rule{0pt}{2.3ex}Discrete Fourier Transform &89&9\\
835: \hline
836:
837: \rule{0pt}{2.3ex} NonOverlapping Templates&5&5\\
838: \rule{0pt}{2.3ex}{\it parameters}&m=10&m=10\\
839:
840: \hline
841:
842: \rule{0pt}{2.3ex} Overlapping Templates&4&1\\
843: \rule{0pt}{2.3ex}{\it parameters} &m=10&m=10\\
844:
845: \hline
846:
847: \rule{0pt}{2.3ex}Universal Statistical &1&2\\
848: \rule{0pt}{2.3ex}{\it parameters} &L=9&L=9\\
849: \rule{0pt}{2.3ex} &Q=5120&Q=5120\\
850:
851: \hline
852:
853: \rule{0pt}{2.3ex}Approximate Entropy&100&89\\
854: \rule{0pt}{2.3ex}{\it parameters} &m=17&m=17\\
855:
856: \hline
857:
858: \rule{0pt}{2.3ex}Random Excursions &4&3\\
859: \hline
860:
861: \rule{0pt}{2.3ex}Random Excursions Variant&3&3\\
862: \hline
863:
864: \rule{0pt}{2.3ex}Serial &100&2\\
865: \rule{0pt}{2.3ex}{\it parameters} &m=19&m=19\\
866:
867: \hline
868:
869: \rule{0pt}{2.3ex}Lempel-Ziv Complexity&0&0\\
870: \hline
871:
872: \rule{0pt}{2.3ex}Linear Complexity &4&3\\
873: \rule{0pt}{2.3ex}{\it parameters} &M=5000 & M=2500 \\
874:
875: \hline
876: \end{tabular}
877: \end{center}
878: \end{table}
879:
880:
881:
882:
883: \section{Appendix. }
884:
885: \emph{Proof} of Theorem 1. First we estimate the number of words
886: $\varphi_n(u) $ whose length is less than or equal to an integer
887: $\tau$. Obviously, at most one word can be encoded by the empty
888: codeword, at most two words by the words of the length 1, ..., at
889: most $2^i$ can be encoded by the words of length $i,$ etc. Having
890: taken into account that the codewords $\varphi_n(u) \neq
891: \varphi_n(v)$ for different $u$ and $v$, we obtain the inequality
892: $$ | \{ u: |\varphi_n(u) | \leq \tau \} | \leq \sum_{i=0}^\tau 2^i
893: = 2^{\tau+1}- 1. $$ From this inequality and (\ref{cr}) we can see
894: that the number of words from the set $ \{A^n \} ,$ whose
895: codelength is less than or equal to $t_\alpha = n \log |A| - \log
896: (1/ \alpha) - 1 ,$ is not greater than $ 2^{n \log |A| - \log (1/
897: \alpha)}.$ So, we obtained that $$ | \{ u: |\varphi_n(u) | \leq
898: t_\alpha \} | \leq \alpha |A|^n .$$ Taking into account that all
899: words from $A^n$ have equal probabilities if $H_0$ is true, we
900: obtain from the last inequality, (\ref{cr}) and the description of
901: the test $\Gamma_{\alpha,\varphi}^{(n)}$ that $$ Pr \{
902: |\varphi_n(u) | \leq t_\alpha | \} \leq (\alpha |A|^n / |A|^n ) =
903: \alpha $$ if $H_0$ is true. The theorem is proved.
904:
905: \emph{Proof} of Claim 1. The proof is based on a direct
906: calculation of the probability of rejection for a case where
907: $H_0$ is true. From the description of the test
908: $\Upsilon_{\alpha,\varphi}^{(n)}$ and definition of $g$ (see
909: (\ref{s})) we obtain the following chain of equalities.
910:
911: $$ Pr \{ H_0\: is \:rejected \,\}= Pr \{\, |\varphi_n(u) | \leq g \}
912: $$ $$+\, Pr \{ |\varphi_n(u) | = g+1 \}\: (\:
913: 1\:-\, (\sum_{j=1}^{g+1} |A_j|\, - \, \alpha
914: |A|^n\,) / |A_{g+1}|\,)\,) $$ $$= \frac{1}{A^n} \: ( \sum_{j=0}^g
915: |A_j|\:+ \: |A_{g+1}|\: (\:
916: 1\:-\, (\sum_{j=1}^{g+1} |A_j|\, - \, \alpha
917: |A|^n\,) / |A_{g+1}|\,)\,)= \alpha .$$ The claim is proved.
918:
919:
920: \emph{Proof} of Claim 2. We can think that $\hat{t}_\alpha$ in
921: (\ref{cr2}) is an integer. (Otherwise, we obtain the same test
922: taking $\lfloor\hat{t}_\alpha\rfloor$ as a new critical value of
923: the test.) From the Kraft inequality (\ref{KRAFT}) we obtain that
924: $$ 1\geq \sum_{u \in A^n } 2^{- |\varphi_n (u)|} \geq | \{u: |\,
925: \varphi_n (u)|\leq \hat{t}_\alpha \} | \: \:2^{-\hat{t}_\alpha}.
926: $$ This inequality and (\ref{cr2}) yield: $$ | \{u: |\, \varphi_n
927: (u)|\leq \hat{t}_\alpha \} | \leq \alpha |A|^n. $$ If $H_0$ is
928: true then the probability of each $u \in A^n $ equals $|A|^{-n} $
929: and from the last inequality we obtain that $$ Pr \{ |\varphi (u)
930: | \leq \hat{t}_\alpha \} = |A|^{-n} \: | \{u: |\, \varphi_n
931: (u)|\leq \hat{t}_\alpha \} | \leq \alpha , $$ if $H_0$ is true.
932: The claim is proved.
933:
934:
935: \emph{Proof} of Theorem 3. We prove the theorem for the process
936: $T(k, \pi),$ but this proof is valid for $\bar{T}(k, \pi),$ too.
937: First we show that
938: \begin{equation}\label{a3} p^*(x_1...x_d)=2^{-d}, \end{equation}
939: $ (x_1...x_{d}) \in \{ 0,1 \}^{d}, $ $d =1, ... , k,$ is a
940: stationary distribution for the processes $T(k, \pi)$ (and
941: $\bar{T}(k, \pi)$) for all $k=1,2, \ldots $ and $ \pi \in (0,1)$.
942: For any values of $k, k \geq 1,$ (\ref{a3}) will be proved if we
943: show that the system of equations $$ P_{T(k, \pi)}(x_1...x_d)=
944: P_{T(k, \pi)}(0x_1...x_{d-1})\, P_{T(k,
945: \pi)}(x_d/0x_1...x_{d-1})\: $$ $$ +\,P_{T(k,
946: \pi)}(1x_1...x_{d-1})\, P_{T(k, \pi)}(x_d/1x_1...x_{d-1}) $$ has
947: the solution $p(x_1...x_d)=2^{-d}$, $ (x_1...x_{d}) \in \{ 0,1
948: \}^{d}, $ $d =1,2, \ldots, k $. It can be easily seen for $d =
949: k,$ if we take into account that, by definition of $T(k, \pi)$
950: and $\bar{T}(k, \pi)$, the equality $P_{T(k,
951: \pi)}(x_k/0x_1...x_{k-1})\: +\, P_{T(k,
952: \pi)}(x_k/1x_1...x_{k-1})=1 $ is valid for all $ (x_1...x_{k})
953: \in \{ 0,1 \}^{k} $. From this equality and the law of total
954: probability we immediately obtain (\ref{a3}) for $d < k.$
955:
956: Let us prove the second claim of the theorem. From the definition
957: $T(k, \pi)$ and $\bar{T}(k, \pi)$ we can see that either $P_{T(k,
958: \pi)}(0/x_1...x_{k})= \pi,\, P_{T(k, \pi)}(1/x_1...x_{k})=1-\pi$
959: or $P_{T(k, \pi)}(0/x_1...x_{k})=1- \pi,\, P_{T(k,
960: \pi)}(1/x_1...x_{k})=\,\pi$. That is why $h(x_{k+1}/x_1...x_{k}) =
961: - (\pi \log_2 \pi + (1- \pi) \log_2 (1-\pi) )$ and, hence,
962: $h_\infty = - (\pi \log_2 \pi + (1- \pi) \log_2 (1-\pi) )$. The
963: theorem is proved.
964:
965: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
966: \section{Acknowledgment } The authors wish to thank one of anonymous
967: reviewers for information about a unpublished thesis of David
968: Harold Bailey.
969: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
970:
971: %**********************************************************
972: %* Bibliography *
973: %**********************************************************
974: \newpage
975: \begin{thebibliography}{5}
976:
977: \bibitem{Aho}
978: A.V.Aho,J.E. Hopcroft, J.D.Ulman.{ \it The desighn and analysis of
979: computer algorithms }, Reading, MA: Addison- Wesley, 1976.
980:
981: \bibitem{B}
982: Bailey D. H. { \it Sequential schemes for classifying and
983: predicting ergodic processes }, PhD Dissertation, Stanford
984: University, 1976.
985:
986: \bibitem{BSTW}
987: Bently J.L., Sleator D.D., Tarjan R.E., Wei V.K. {\it A
988: Locally Adaptive Data Compression Scheme.} Comm. ACM, v.29, 1986,
989: pp.320-330.
990:
991: \bibitem{Billingsley} Billingsley P., {\it Ergodic theory and information},
992: John Wiley \& Sons (1965).
993:
994: \bibitem{RANDU}
995: Dudewicz E.J. and Ralley T.G. {\it The Handbook of Random
996: Number Generation and Testing With TESTRAND Computer Code,} v. 4
997: of American Series in Mathematical and Management Sciences.
998: American Sciences Press, Inc., Columbus, Ohio, 1981.
999:
1000:
1001: \bibitem{E}
1002: Elias P. {\it Interval and Recency Rank Source Coding: Two
1003: On-Line Adaptive Variable-Length Schemes,} IEEE Trans. Inform.
1004: Theory, v.33, N 1,1987, pp.3-10.
1005:
1006:
1007: \bibitem{Ga}
1008: Gallager R.G. { \it Information Theory and Reliable Communication.
1009: } Wiley, New York,1968.
1010:
1011:
1012:
1013: \bibitem{KS}
1014: Kendall M.G., Stuart A.{ \it The advanced theory of statistics;
1015: Vol.2: Inference and relationship }. London, 1961.
1016:
1017:
1018: \bibitem{K}
1019: Knuth D.E. { \it The art of computer programming.} Vol.2.
1020: Addison Wesley, 1981.
1021:
1022: \bibitem{Ko}
1023: Kolmogorov A.N. {\it Three approaches to the quantitative
1024: definition of information. } Problems of Inform. Transmission,
1025: v.1, 1965, pp.3-11.
1026:
1027: \bibitem{Kr}
1028: Krichevsky R. {\it Universal Compression and Retrival}. Kluver
1029: Academic Publishers, 1993.
1030:
1031: \bibitem{Le}
1032: L'Ecuyer P. {\it Uniform random numbers generation.} Annals of
1033: Operation Research, 1994.
1034:
1035:
1036: \bibitem{Vi}
1037: Li M., Vitanyi P. { \it An Introduction to Kolmogorov
1038: Complexity and Its Applications}, Springer-Verlag, New York, 2nd
1039: Edition, 1997.
1040:
1041:
1042: \bibitem{M1}
1043: Marsaglia G. { \it The structure of linear congruential
1044: sequences.} In: S. K. Zaremba, editor, Applications of Number
1045: Theory to Numerical Analysis, pages 248-285. Academic Press, New
1046: York, 1972.
1047:
1048: \bibitem{M2} Marsaglia G. and Zaman A. { \it Monkey tests for random number
1049: generators.} Computers Math. Applic., 26:1-10, 1993.
1050:
1051: \bibitem{ma}
1052: Maurer U. { \it A universal statistical test for random bit
1053: generators.} Journal of Cryptology, v.5, n.2, 1992, pp.89-105.
1054:
1055: \bibitem{me}
1056: Menzes A., van Oorschot P., Vanstone S. { \it Handbook of Applied
1057: Cryptography }, CRC Press, 1996.
1058:
1059:
1060:
1061:
1062: \bibitem{mo}
1063: Moeschlin O., Grycko E., Pohl C., and Steinert F. { \it
1064: Experimental Stochastics.} Springer-Verlag, Berlin Heidelberg,
1065: 1998.
1066:
1067: \bibitem{Moffat99}
1068: Moffat A., {\it An improved data structure for cumulative
1069: probability tables,} 1999, Software -- Practice and Experience,
1070: v.29,
1071: no. 7,
1072: pp.647-659.
1073:
1074: \bibitem{RO}
1075: Rozanov Yu.A. { \it The Random Processes }, Moscow, "Nauka"
1076: ("Science"), 1971.
1077:
1078:
1079: \bibitem{rng}
1080: Rukhin A. and others. { \it A statistical test suite for random
1081: and pseudorandom number generators for cryptographic applications.
1082: } NIST Special Publication 800-22 (with revision dated
1083: May,15,2001).
1084: http://csrc.nist.gov/rng/SP800-22b.pdf
1085:
1086: \bibitem{R1}
1087: Ryabko B.Ya. {\it Information Compression by a Book Stack.}
1088: Problems of Information Transmission, v.16, N 4, 1980,
1089: pp.16-21.
1090:
1091: \bibitem{R3}
1092: Ryabko B.Ya. {\it Twice-universal coding.} Problems of
1093: Information Transmission, 1984,n 3, pp.173-177.
1094:
1095:
1096: \bibitem{R2}
1097: Ryabko B.Ya. {\it A locally adaptive data compression scheme
1098: (Letter).} Comm. ACM, v.30, N 9, 1987, p.792.
1099:
1100:
1101:
1102: \bibitem{RR}
1103: Ryabko B., Rissanen J. { \it Fast Adaptive Arithmetic Code for
1104: Large Alphabet Sources with Asymmetrical Distributions.} IEEE
1105: Communications Letters,v. 7, no. 1, 2003,pp.33- 35.
1106: \bibitem{RSS}
1107: Ryabko B. Ya., Stognienko V. S., Shokin Yu. I. { \it A new
1108: test for randomness and its application to some cryptographic
1109: problems.} Journal of Statistical Planning and Inference, 2004,
1110: (accepted; available online, see: JSPI,
1111: doi:10.1016/S0378-3758(03)00149-6 )
1112:
1113:
1114: \end{thebibliography}
1115:
1116: \end{document}
1117: