0504:cs0504005/cs0504005

1: \documentclass[12pt]{article}

2:

3: \begin{document}

4:

5: \title{ Fast Codes for Large Alphabets.

6:  \footnote {Supported by the

7: INTAS under the Grant no. 00-738. } }

8:

9:

10:

11: \author{ Boris Ryabko, Jaakko Astola, Karen Egiazarian.  }

12: \date{}

13: \maketitle

14:

15:

16: \begin{abstract}

17: We address the problem of constructing a fast lossless code in the

18: case when the source alphabet is large. The main idea of the new

19: scheme may be described as follows. We group letters with small

20: probabilities in subsets (acting as super letters) and use time

21: consuming coding for these subsets only, whereas letters in the

22: subsets have the same code length and therefore can be coded fast.

23: %This procedure makes the code more redundant but

24: %faster. If we denote the extra redundancy by $\delta $ and apply

25: %the proposed scheme to the arithmetic code and  $N$- symbol

26: %alphabet, we obtain the time of encoding and decoding $ c( \log

27: %\log N + \log (1/ \delta ))+ c_1 $ instead of  $ c \log N + c_2 $

28: %for a usual arithmetic code, $N \rightarrow \infty $.

29: The

30: described scheme can be applied to sources with known and unknown

31: statistics.

32:

33:

34: \end{abstract}

35:

36: \textbf{Keywords.} { \it  fast algorithms, source coding, adaptive

37: algorithm, cumulative probabilities, arithmetic coding, data

38: compression, grouped alphabet.}

39: %\end{keywords}

40:

41:

42:

43:

44: \section{Introduction.}

45:

46: The computational efficiency of lossless data compression  for

47: large alphabets has attracted attention of researches  for

48: ages  due to its great importance in practice. The alphabet

49: of $2^8 = 256$ symbols, which is commonly used in compressing

50: computer files, may already be treated as a large one, and with

51: adoption of the UNICODE the alphabet size will grow up to $2^{16}=

52: 65536 $.

53:  Moreover, there are many data compression methods when

54: the coding is carried out in such a way that, first input data

55: are transformed by some algorithm, and then the resulting sequence

56:  is compressed by a lossless code. It turns out that

57: very often the alphabet of the sequence is very large or even

58: infinite. For instance, the run length code, many implementations

59: of Lempel- Ziv codes, Grammar - Based codes \cite{Ki1,Ki2} and

60: many methods of image compression can be described in this way.

61: That is why the problem of constructing high-speed codes for large

62: alphabets has attracted great attention by researches. Important

63: results have been obtained  by Moffat, Turpin

64: \cite{Moffat90,Moffat94,Moffat99,M-T1,M-T,T-M} and others

65: \cite{Jo,RyabkoDAN,Ryabko,Fenwick, R-Ri}.

66:

67: For many adaptive lossless codes the speed of coding depends substantially

68:  on the alphabet size, because of the need to maintain

69: cumulative probabilities. The speed of an obvious (or naive)

70: method of updating the cumulative probabilities is proportional to

71: the alphabet size $N$. Jones \cite{Jo} and Ryabko \cite{RyabkoDAN}

72: have independently suggested two different algorithms, which

73: perform all the necessary transitions between individual and

74: cumulative probabilities in $O(\log N)$ operations under $ (\log N

75: + \tau)$- bit words , where $\tau$ is a constant depending on

76: the redundancy required, $N$ is the alphabet size. Later many such

77: algorithms have been developed and investigated in numerous papers

78: \cite{Moffat90,Ryabko,Fenwick,Moffat94,Moffat99}.

79:

80: In this paper we suggest a method for speeding up codes

81:  based on the following main idea. Letters of the alphabet are

82: put in order according to their probabilities (or frequencies of

83: occurrence), and the letters with  probabilities close to each others

84: are grouped

85: in subsets (as new super letters), which contain letters

86: with small probabilities. The key

87: point is the following: equal probability is ascribed

88: to all letters in one subset, and, consequently, their codewords

89: have the same length. This gives a possibility  to encode and

90: decode them much faster than if they are

91: different. Since each subset of the grouped letters

92: is treated as one letter in the new alphabet, whose size is much

93: smaller than the original alphabet.

94: Such a grouping can increase the redundancy of the code. It

95: turns out, however, that a large decrease in the alphabet size may cause a

96: relatively small increase in the redundancy. More exactly, we

97: suggest a method of grouping for which the number of the

98: groups as a function of the redundancy ($\delta$) increases as $c

99: ( \log  N + 1/ \delta )+ c_1 $, where $N$ is the alphabet size and

100: $c, c_1$ are constants.

101: %It should be noted that, in fact, the

102: %number of the groups can be considered as the size of the new

103: %alphabet and the last formula shows that the alphabet reduction

104: %can be quite large.

105:

106: In order to explain the main idea we consider the following

107: example. Let a source generate letters $ \{ a_0,\ldots , a_4 \}$

108: with probabilities $ p(a_0) = 1/16,\, p(a_1) = 1/16, \,p(a_2) =

109: 1/8, \,p(a_3) = 1/4, \,p(a_4) = 1/2, $ correspondingly. It is easy

110: to see that the following code $$ code(a_0) = 0000, code(a_1) =

111: 0001, code(a_2) = 001, code(a_3) = 01, code(a_4) = 1 $$ has the

112: minimal average codeword length. It seems that for decoding one needs

113: to look at one bit for decoding $a_4$, two bits for

114: decoding $a_3$, 3 bits for $a_2$ and 4 bits for $a_1$ and $a_0$.

115: However,

116: consider another code $$ \widetilde{code}(a_4) = 1,

117: \widetilde{code}(a_0) = 000, \widetilde{code}(a_1) = 001,

118: \widetilde{code}(a_2) = 010, \widetilde{code}(a_3) = 011, $$ and we

119:  see that, on the one hand, its average codeword length is a

120: little larger than in the first code (2 bits instead of 1.825

121: bits), but, on the other hand, the decoding is simpler. In fact,

122: the decoding can be carried out as follows. If the first bit is 1,

123: the letter is $a_4$. Otherwise, read the next two bits and treat

124: them as an integer (in a binary system) denoting the

125: code of the letter (i.e. 00 corresponds $a_0$, 01 corresponds

126: $a_1$, etc.) This simple observation can be

127: generalized and extended for constructing a new coding scheme with the

128: property that the larger the alphabet size is,

129: the more speeding-up we get.

130:

131: In principle, the proposed method can be  applied to the Huffman

132: code, arithmetic code, and other lossless codes for speeding them

133: up, but for the sake of simplicity, we will consider the

134: arithmetic code in the main part of the paper, whereas the Huffman

135: code  and some others will be mentioned only briefly, because, on

136: the one hand, the arithmetic code is widely used in practice and,

137: on the other hand, generalizations are obvious.

138:

139: The suggested scheme can be applied to sources with unknown

140: statistics. As we mentioned above, the alphabet letters should be

141: ordered according to their frequency of occurrences when the

142: encoding and decoding are carried out. Since the frequencies are

143: changing after coding of each message letter, the order should be

144: updated, and the time of such updating should be taken into

145: account when we estimate the speed of the coding. It turns out

146: that there exists an algorithm and data structure, which give a

147: possibility to carry out the updating with few operations per

148: message letter, and the amount of these operations does not depend

149: on the alphabet size and/or a probability distribution.

150:

151: The rest of the paper is organized as follows. The second part

152: contains estimations of the redundancy caused by the grouping of

153: letters, and it contains examples for several values of the

154: redundancy. A fast method of the adaptive arithmetic code for the

155: grouped alphabet as well as the data structure and algorithm for

156: easy maintaining the alphabet ordered according to  the frequency

157: of the occurrences are given in the third and

158:  the fourth parts. Appendix contains all the proofs.

159:

160: \section{The redundancy due to grouping. }

161:

162: First we give some definitions. Let $A = \{ a_1, a_2,\ldots, a_N

163: \}$ be an alphabet with a probability distribution $\bar{p} = \{

164: p_1, p_2,\ldots, p_N \}$ where $ p_1 \geq p_2 \geq \ldots \geq

165: p_N, N \geq 1 $. The distribution can be either known a priori or

166: it can be estimated from the occurrence counts. In the last case

167: the order of the probabilities should be updated after encoding

168: each letter, and it should be taken into account when the speed of

169: coding is estimated. The simple data structure and algorithm for

170: maintaining the order of the probabilities  will be described in

171: the fourth part, whereas here we discuss estimation of the

172: redundancy.

173:

174: Let the letters from the alphabet $A$ be grouped as follows : $A_1 =

175: \{ a_1, a_2,$ $  \ldots, a_{n_1} \},$ $A_2 = \{

176: a_{n_1+1},a_{n_1+2},\ldots, a_{n_2}  \},\ldots, A_s =  \{

177: a_{n_{s-1}+1},a_{n_{s-1}+2},\ldots, a_{n_{s}}  \} $ where $n_s =

178: N, s \geq 1 $. We define the probability distribution $\pi$ and

179: the vector $\bar{m}= (m_1,$ $ m_2,..., $ $ m_s)$ by

180: \begin{equation}\label{pi}\pi_i = \sum

181: _{a_j \in A_i} p_j

182: \end{equation}

183:  and $m_i = (n_i - n_{i-1}), n_0 =0, i

184: = 1, 2, \ldots,s $, correspondingly. In fact,the grouping is

185: defined by the vector $\bar{m}$. We intend to encode all

186: letters from one subset $A_i$ by the codewords of equal length.

187: For this purpose we ascribe equal probabilities to the letters

188: from $A_i$ by

189: \begin{equation}\label{code}

190:  \hat{p}_j = \pi_i / m_i

191: \end{equation}

192:  if $a_j \in A_i, i = 1, 2,

193: \ldots,s.$ Such encoding causes redundancy, defined by

194: \begin{equation}\label{red}

195: r(\bar{p}, \bar{m}) = \sum_{i=1}^N p_i \log ( p_i / \hat{p}_i ).

196: \end{equation}

197: (Here and below $\log(\:)= \log_2(\:).$)

198:

199: The suggested method of grouping is based on information about the

200: order of probabilities (or their estimations). We are

201: interested in an upper bound for the redundancy (\ref{red})

202:  defined by

203: \begin{equation}\label{Red}

204: \ R( \bar{m})= \sup_{ \bar{p} \in \bar{P }_N} r(\bar{p}, \bar{m})

205: ;  \: \bar{P}_N = \{ p_1, p_2,\ldots, p_N \} : p_1 \geq p_2 \geq

206: \ldots \geq p_N \}.

207: \end{equation}

208: The following theorem gives  the redundancy estimate.

209:

210: \textbf{Theorem 1.}

211:

212: {\it The following equality for the redundancy (\ref{Red}) is

213: valid.

214: \begin{equation}\label{th}

215: \ R( \bar{m})= \max_{i=1,...,s} \max_{l=1,...,m_i} l\, \log (m_i

216: /l)/ (n_i + l),

217: \end{equation}

218: where, as before, $\bar{m}= (m_1, m_2,...,m_s), n_i = \sum_{j=1}^i

219: m_j, i=1, ...,s. $ }

220:

221: \emph{The proof }is given in Appendix.

222:

223: The practically interesting question is how to find a grouping

224: which minimizes the number of groups for a given  upper bound of

225: the redundancy $\delta$. Theorem 1 can be used as the basis

226: for such an algorithm. This algorithm

227: %is given in Appendix (it is also

228: is implemented as a Java program and has been used for preparation

229: of all examples given below. The program can be found on the

230: internet and used for practical needs, see

231:

232: $http://www.ict.nsc.ru/~ryabko/GroupYourAlphabet.html  .$

233:

234: Let us consider some examples of such grouping  carried

235: out by the program  mentioned.

236:

237: First we consider the Huffman code. It should be noted

238: that in the case of the Huffman code the size of each group should

239: be a power of 2, whereas it can be any integer

240: in case of an arithmetic code. This is because the length of

241: Huffman codewords must be integers whereas this limitation is

242: absent in arithmetic code.

243:

244: For example, let the alphabet have 256 letters and let the additional

245: redundancy (\ref{code}) not exceed 0.08 per letter.

246: (The choice of these parameters is appropriate, because an alphabet of $2^8 =

247: 256$ symbols is commonly used in compressing computer files, and

248: the redundancy 0.08  a letter gives 0.01 a bit.) In this case the

249: following grouping

250: %$m[1], m[2], ..., m[s])$

251: gives the minimal number of the groups $s$. $$A_1= \{ a_1 \} ,

252: A_2= \{ a_2 \} , \ldots , A_{12}= \{ a_{12}\}, $$ $$ A_{13}= \{

253: a_{13}, a_{14}\}, A_{14}= \{ a_{15}, a_{16}\}, \ldots,A_{19}= \{

254: a_{25}, a_{26}\}, $$ $$A_{20}= \{ a_{27}, a_{28}, a_{29}, a_{30}

255: \}, \ldots, A_{26}= \{ a_{51}, a_{52}, a_{53}, a_{54} \}, $$  $$

256: A_{27}= \{ a_{55}, a_{56},\ldots, a_{62} \},\ldots, A_{32}= \{

257: a_{95},\ldots, a_{102} \}, $$ $$ A_{33}= \{ a_{103},

258: a_{104},\ldots, a_{118} \},\ldots, A_{39}= \{ a_{199},\ldots,

259: a_{214} \}, $$ $$ A_{40}= \{ a_{215}, a_{216},\ldots, a_{246} \},

260: A_{41}= \{ a_{247},\ldots, a_{278} \}. $$ We see that each

261: of the first 12 subsets contains one letter, each of the subsets

262: $A_{13}, \ldots, A_{19}$ contains two letters, etc., and the total

263: number of the subsets $s$ is 41. In reality we could let the last

264: subset $A_{41}$ contain the letters $\{ a_{247},\ldots, a_{278}

265: \}$ rather than the letters $ \{ a_{247},\ldots, a_{256} \}$, since each

266: letter from this subset will be encoded \emph{inside} the subset

267: by 5- bit words (because $\log 32 = 5$).

268:

269: Let us proceed with this example in order to show how such a

270: grouping can be used to simplify the encoding and

271:  decoding of the Huffman code. If someone knows the

272: letter probabilities, he can calculate the probability

273: distribution $\pi$ by (\ref{pi}) and the Huffman code for the new

274: alphabet $\hat{A} = A_{1}, \ldots, A_{41}$ with the distribution

275: $\pi$. If we denote a codeword of $A_i$ by $code (A_i)$ and

276: enumerate all letters in each subset $A_i$ from 0 to $|A_i| -1 $,

277: then the code of a letter $a_j \in A_i $ can be presented as the

278: pair of the words $$code (A_i)\: \{number\, of \, a_j \, \in A_i

279: \},$$ where $ \{number\, of \, a_j \, \in A_i \} $ is the $\log

280: |A_i|$\,- bit notations of the $a_j$ number (inside $A_i$). For

281: instance, the letter $a_{103}$ is the first in the 16- letter

282: subset $A_{33}$ and $a_{246}$ is the last in the 32- letter subset

283: $A_{40}$. They will be encoded by $code( A_{33})\,0000$ and

284: $code(A_{40})\,11111$, correspondingly. It is worth noting that

285: the $code (A_i)\, ,i=1,\ldots, s,$ depends on the probability

286: distribution whereas the second part of the codewords $\{number\,

287: of \, a_j \, \in A_i \}$ does not do that. So, in fact, the

288: Huffman code should be constructed for the 41- letter alphabet

289: instead of the 256- one, whereas the encoding and decoding inside

290: the subsets may be implemented with few operations. Of course,

291: this scheme can be applied to a Shannon code, alphabetical code,

292: arithmetic code and many others. It is also important that the

293: decrease of the alphabet size is larger when the alphabet size is

294: large.

295:

296: Let us consider one more example of grouping, where the subset

297: sizes don't need to  be powers of two. Let, as before, the

298: alphabet have 256 letters and let the additional redundancy

299: (\ref{code}) not to exceed 0.08 per letter. In this case the

300: optimal grouping is as follows.  $$ |A_1| = |A_2| =  \ldots ,

301: |A_{12}| = 1, |A_{13}| = |A_{14}| = \ldots= |A_{16}|= 2, |A_{17}|=

302: |A_{18}| = 3,

303: $$

304: $$|A_{19}|= |A_{20}| = 4, |A_{21}| =5 , |A_{22}| = 6,|A_{23}| = 7,

305: |A_{24}| = 8, |A_{25}| =9,$$ $$ |A_{26}| = 11,|A_{27}| =

306: 12,|A_{28}| = 14, |A_{29}| = 16, |A_{30}| = 19, $$ $$|A_{31}| =

307: 22, |A_{32}| = 25, |A_{33}| = 29,|A_{34}| = 34,|A_{35}| = 39.$$ We

308: see that the total number of the subsets (or the size of the

309: new alphabet) is less than in the previous example (35 instead of

310: 41), because in the first example the subset sizes should be

311: powers of two, whereas there is no such limitation in the

312: second case. So, if someone can accept the additional redundancy

313: 0.01 per bit, he can use the new alphabet $ \hat{A} = \{ A_{1},

314: \ldots, A_{35} \} $ instead of 256- letter alphabet and implement

315: the arithmetic coding in the same manner as it was described for

316: the Huffman code. (The exact description of the method will be

317: given in the next part). We will not consider the new examples in

318: details, but note again that the decrease in the number of the

319: letters is more, when the alphabet size is larger. Thus, if the

320: alphabet size is $2^{16}$ and the redundancy upper bound is 0.16

321: (0.01 per bit), the number of groups $s$ is 39, and if the size is

322: $2^{20}$ then $s= 40 $ whereas the redundancy per bit is the same.

323: (Such calculations can be easily carried out by the above

324: mentioned program).

325:

326: The required grouping for decreasing the

327: alphabet size is based on the simple theorem 2, for which

328:  we need to give some definitions standard in source

329: coding.

330:

331: Let $\gamma$ be a certain method of source coding which can be

332: applied to letters from a certain alphabet $A$. If $p$ is a

333: probability distribution on $A$, then the redundancy of $\gamma$

334: and its upper bound are defined by

335: \begin{equation}\label{red1}

336: \rho(\gamma, p) = \sum_{a \in A} p(a)( |\gamma (a) |+ \log p(a)),

337: \quad \hat{\rho}(\gamma ) = sup_{p} \:\rho(\gamma, p),

338: \end{equation}

339: where the supremum is taken over all distributions  $p$, $|\gamma

340: (a) |$ and $p(a)$ are the length of the code word and the

341: probability of $a \in A$, correspondingly. For example,

342: $\hat{\rho} $ equals 1 for the Huffman and the Shannon codes

343: whereas for the arithmetic code $\hat{\rho}$ can be done as small

344: as it is required by choosing some parameters, see, for ex.,

345: \cite{Ryabko-Fionov}.

346: %There are such codes, that their  redundancy

347: %depend on the alphabet size, that is why we will use the notation

348: %$\hat{\rho}(|A|)$.

349:

350: The following theorem gives a formal justification for applying

351: the above described grouping for source coding.

352:

353: \textbf{Theorem 2.} {\it Let  the redundancy of a certain code

354: $\gamma$ be not  more than some $\Delta$ for all probability

355: distributions. Then, if the alphabet is divided into subsets $A_i,

356: i=1,\ldots, s ,$ in such a way that the additional redundancy

357: (\ref{red}) equals $\delta$, and the code $\gamma$ is applied to

358: the probability distribution $\hat{p}$ defined by (\ref{code}),

359: then the total redundancy of this new code $\gamma_{gr}$ is upper

360: bounded by $\Delta+\delta$.}

361:

362: Theorem 1 gives a simple algorithm for finding the grouping

363: which gives the minimal number of the groups $s$ when the upper

364: bound for the admissible redundancy  (\ref{Red}) is given. On the

365: other hand, the simple asymptotic estimate of the number of

366: such groups and the group sizes can be interesting when the

367: number of the alphabet letters is large. The following theorem can

368: be used for this purpose.

369:

370: \textbf{Theorem 3.}

371:

372: {\it Let $\delta > 0 $ be an admissible redundancy (\ref{Red}) of

373: a grouping.

374: %Let the admissible redundancy (\ref{Red}) of  a grouping $m=

375: %(m_1, \ldots, m_s ) $ should not exceed some $\delta, \delta >0 $.

376:

377: i) If

378: \begin{equation}\label{co1}

379: \quad m_i \,\leq \,\lfloor \,\delta\, n_{i-1}\,\, e \,/ (\log e -

380: \delta \,e)\,\rfloor,

381: \end{equation}

382: then the redundancy of the grouping $(m_1, m_2, \ldots )$ does not

383: exceed $\delta$, where $n_i = \sum_{j=1}^i\, m_j, \;$ $e\approx

384: 2.718... .$).

385:

386:

387: ii) the minimal number of groups $\:s\:$ as a function of the

388: redundancy $\delta$ is upper bounded by

389: \begin{equation}\label{co}

390: c \log N / \delta + c_1,

391: \end{equation}

392: where  $c$ and $c_1$ are constants and $N$ is the alphabet

393: size,$\:N \rightarrow \infty.$ }

394:

395: \emph{The proof} is given in Appendix.

396:

397: \textbf{Comment 1.} {\it The first statement of the theorem 3

398: gives

399:  construction of the $\delta-$ redundant grouping $(m_1,

400: m_2, ...)$ for an infinite alphabet, because $m_i$ in (\ref{co1})

401: depends only  on previous $m_1, m_2, \ldots, m_{i-1}$.}

402:

403: \textbf{Comment 2.} {\it Theorem 3 is valid for grouping where the

404: subset sizes $(m_1, m_2, \ldots )$ should be powers of 2. }

405:

406: \section{The arithmetic code for grouped alphabets. }

407:

408:

409: Arithmetic coding was introduced by Rissanen \cite{Riss76} in 1976

410: and now it is  one of the most popular methods of source coding,

411: see, e.g., \cite{Moffat94}, \cite{Ryabko-Fionov}. The advantage of

412: arithmetic coding over other coding techniques is that it achieves

413: arbitrarily small coding redundancy per source symbol at less

414: computational effort than any other method.

415:

416:

417: We give first a brief description of an arithmetic code by paying

418: attention to features which determine the speed of encoding and

419: decoding. As before, consider a memoryless source generating

420: letters from the alphabet $A= \{ a_1, ..., a_{N} \}$ with unknown

421: probabilities. Let the source generate a message $x_1\ldots

422: x_{t-1}x_t\ldots $, $x_i\in A$ for all $i$, and let $ \nu^t(a)$

423: denote the occurrence count of letter $a$  in the word $x_1\ldots

424: x_{t-1}x_t $. After

425:  first $t$ letters $x_1,\ldots, x_{t-1},x_t$ have been processed

426: the following letter $ x_{t+1}$ needs to be encoded. In the most

427: popular version of the arithmetic code the  current estimated

428: probability distribution is taken as

429: \begin{equation}\label{piti}

430:  p^t(a)= (\nu^t(a)+c)/(t+Nc) , a \in A ,

431: \end{equation}

432: where $c$ is a constant (as a rule $c$ is 1 or 1/2). Let $x_{t+1}=

433: a_i$, and let the interval $[\alpha, \beta )$ represent the word

434: $x_1 \ldots x_{t-1} x_t $. Then the word $x_1 \ldots x_{t-1} x_t

435: x_{t+1}$, $x_{t+1}= a_i $ will be encoded by the interval

436: \begin{equation}\label{int}

437: [\alpha + ( \beta - \alpha)\: q^t_i,\quad  \alpha + ( \beta -

438: \alpha)\: q^t_{i+1}\: )\,,

439: \end{equation}

440:  where

441: \begin{equation}\label{qu}

442: q^t_i = \sum _{j=1}^{i-1} p^t(a_j).

443: \end{equation}

444: When the size of the alphabet $N$ is large, the calculation of

445: $q^t_i$ is the most time consuming part in the encoding process.

446: As it was mentioned in the introduction, there are fast algorithms

447: for calculation of $q^t_i$ in

448: \begin{equation}\label{time}

449: T= c_1 \log N + c_2,

450: \end{equation}

451: operations under $ (\log N + \tau)$- bit words, where $\tau$ is

452: the constant determining the redundancy of the arithmetic code.

453: (As a rule, this length is in proportional to the length of the

454: computer word: 16 bits, 32 bits, etc.)

455:

456: We describe a new algorithm for the alphabet whose letters are

457: divided into subsets $ A_1^t,\ldots, A_s^t, $ and the same

458: probability is ascribed to all letters in the subset. Such a

459: separation of the alphabet $A$ can depend on $t$ which is why the

460: notation $A_i^t$ is used. But, on the other hand, the number of

461: the letters in each subset $A_i^t$ will not depend on $t$ which is

462: why it is denoted as $|A_i^t| = m_i$.

463:

464: In principle, the scheme for the arithmetic coding is the same as

465: in the above considered case of the Huffman code: the

466: codeword of the letter $ x_{t+1}= a_i $ consists of two parts,

467: where the first part encodes the set $A^t_k$ that contains $a_i$,

468: and the second part encodes the ordinal of the element $a_i$ in the

469: set $A^t_k$. It turns out that it is easy to encode and decode

470: letters in the sets $A^t_k$, and the time

471: consuming operations should be used to encode the sets $A^t_k$, only.

472:

473: We proceed with the formal description of the algorithm. Since the

474: probabilities of the letters in $A$ can depend on $t$ we define in

475: analogy with (\ref{pi}),(\ref{code})

476: \begin{equation}\label{PQ1}

477: \pi_i^t = \sum _{a_j \in A_i} p_j,\quad \hat{p}_i^{\,t} = \pi_i^t

478: / m_i

479: \end{equation}

480: and let

481: \begin{equation}\label{PQ}

482:  Q^t_i=  \sum _{j=1}^{i-1} \pi_j^t\:.

483: \end{equation}

484:

485: The arithmetic encoding and decoding are implemented for the

486: probability distribution (\ref{PQ1}), where the probability

487: $\hat{p}_i^{\,t}$ is ascribed to all letters from the subset

488: $A_i$. More precisely, assume that the letters in each $A^t_k$ are

489: enumerated from 1 to $m_i$, and that the encoder and the decoder

490: know this enumeration. Let, as before, $ x_{t+1}= a_i $, and let

491: $a_i$ belong to $A^t_k$ for some $k$. Then the coding interval for

492: the word $x_1\ldots x_{t-1}x_t x_{t+1}$ is calculated as follows

493: \begin{equation}\label{newint}

494: [\alpha + ( \beta - \alpha)( Q^t_k + (\delta (a_i)-1)\,

495: \hat{p}_i^{\,t}\,)\, ,\quad

496:  \alpha + ( \beta - \alpha) ( Q^t_k + \delta (a_i)\,\hat{p}_i^{\,t})\; ),

497: \end{equation}

498: where $ \delta(a_i)$ is the ordinal of $a_i$ in the subset

499: $A^t_k$. It can be easily seen that this definition is equivalent

500: with (\ref{int}), where the probability of each letter from $A_i$

501: equals $  \hat{p}_i^{\,t}$.

502:  Indeed, let us order the letters of $A$

503: according to their count of

504:  occurrence in the word $x_1\ldots x_{t-1}x_t, $ and let the letters

505:  in  $A^t_k,k=1,2,...,s\, ,$ be ordered according to the

506:  enumeration mentioned above. We then immediately obtain

507:  (\ref{newint}) from  (\ref{int}) and (\ref{PQ1}). The additional redundancy which

508:  is caused by the replacement of the distribution (\ref{piti}) by

509:  $ \hat{p}_i^{\,t}$  can be estimated using (\ref{red}) and the theorems 1-3,

510: which is why

511:  we may

512:  concentrate our attention on the  encoding and decoding speed

513:  and the storage space needed.

514:

515: First we  compare the time needed for the calculation in

516: (\ref{int}) and (\ref{newint}). If we ignore the expressions

517: $(\delta (a_i)-1) \hat{p}_i^{\,t}$ and $ \delta (a_i)

518: \hat{p}_i^{\,t}$ for a while, we see that (\ref{newint}) can be

519: considered as the arithmetic encoding of the new alphabet $ \{

520: A^t_1$, $A^t_2,...,$ $A^t_s \} $. Therefore, the number of

521: operations for encoding by (\ref{newint}) is the same as the time

522: of arithmetic coding for the $s$ letter alphabet, which by

523: (\ref{time}) equals $c_1 \log s + c_2 $. The expressions $(\delta

524: (a_i)-1)\hat{p}_i^{\:t}$ and $ \delta (a_i) \hat{p}_i^{\:t}$

525: require two multiplications, and two additions are needed to

526: obtain bounds of the interval in (\ref{newint}). Hence, the number

527: of operations for encoding ($T$) by (\ref{newint}) is given by

528: \begin{equation}\label{newtime}

529: T= c_1^* \log s + c_2^* ,

530: \end{equation}

531: where $c_1^*, c_2^*$ are constants and all operations are carried

532: out under the word of the length  $ (\log N + \tau)$- bit as it

533: was required for the usual arithmetic code. In case $s$ is much

534: less than $N$, the time of encoding in the new method is less than

535: the time of the usual arithmetic code, see (\ref{newtime}) and

536: (\ref{time}).

537:

538: We describe shortly decoding with the new method. Suppose that the

539: letters $x_1 \ldots x_{t-1} x_t $ have been decoded and the letter

540: $x_{t+1}$ is to be decoded.

541:  There are two steps required:

542: first, the algorithm finds  the set $A^t_k$ with the usual

543: arithmetic code that contains the (unknown) letter $a_i$. The

544: ordinal of the letter $a_i$ is calculated as follows:

545: \begin{equation}\label{decode}

546: \delta ( ) = \lfloor(code (x_{t+1}...) - Q^t_j )/

547: \hat{p}_i^{\,t}\rfloor,

548: \end{equation}

549: where $ code (x_{t+1}...)$ is the number that encodes the word

550: $x_{t+1}x_{t+2}...$. It can be seen that (\ref{decode}) is the

551: inverse of (\ref{newint}). In order to calculate (\ref{decode})

552: the decoder should carry out one division and one subtraction.

553: That is why the  total number of decoding operations is  given by

554: the same formula as for the encoding, see (\ref{newtime}).

555:

556: It is worth noting that multiplications and divisions in

557: (\ref{newint}) and  (\ref{decode}) could be carried out faster if

558: the subset sizes are powers of two. But, on the other hand, in

559: this case the number of the subsets is larger, that is why both

560: version could be useful.

561:

562: We did not estimate yet the time needed for maintaining the order

563: of letters from $A$ according to their frequencies (\ref{piti}).

564: The point is that the order should be updated by the encoder and

565: the decoder after encoding and decoding each letter $x_t$. It

566: turns out that it is possible  to update the order using a fixed

567: number of operations. Such a method is described in the next

568: section. Besides, we should take into account that, when $x_t$ is

569: encoded (or decoded), one frequency (\ref{piti}) should be changed

570: and at most two $\pi_i$ (\ref{PQ1}) must be recalculated. It is

571: easy to see that all these transformations can be done with no

572: more than two additions and two subtractions. Therefore, the

573: total number of operations for encoding and decoding is given by

574: (\ref{newtime}) with the new constant $c_2^*$.

575:

576: So we can see that

577: %\section{Conclusion and discussion} The main result of the paper can be formulated as follows.

578: %\begin{theorem}

579: %

580: if the arithmetic code can be applied to an $N \:- $ letter source, so

581: that the number of operations (under words of a certain length) of

582: coding is $$ T= c_1 \log N + c_2,$$ then there exists an algorithm

583: of coding, which can be applied to the grouped alphabet $

584: A_1^t,\ldots, A_s^t $ in such a way that, first, at each moment

585: $t$ the letters are ordered by decreasing

586: frequencies and, second, the number of coding operations is  $$

587: T= c_1 \log s + c_2^* $$ with words of the same length, where $

588: c_1, c_2, c_2^* $ are constants.

589:

590: %\section{Conclusion and discussion}

591: \section{ A fast algorithm for keeping the alphabet letters ordered.}

592:  In this section we describe a data structure and an algorithm, which allow

593: one to carry out all the operations for maintaining the alphabet letters

594: ordered by their frequencies, in such a way that the

595: number of such operations is constant, independently of the

596: probability distribution, the size of the alphabet, and other

597: characteristics.

598:

599: The  data structure suggested is based on five  arrays $Fr [1 :

600: N], Sorted$ $ Alphabet [1:N],$ $ Inverse Sort[1:N], SetBegin [0:

601: MAX ], SetEnd [0: MAX ]$, where, as before, $N$ is the size of the

602: alphabet, $\Lambda^t_k$ is the set of the letters from $A$, which

603: frequency of the occurrence equals $k$ at the moment $t$ and $MAX$

604: is an upper bound for the maximal count of occurrence (For

605: example, if the code uses the sliding window to adapt to the

606: source, $MAX$ is upper bounded by the length of the window). At

607: each moment $t$ the array $Fr$ contains information about

608: frequencies of occurrence of the letters from $A$ in the word $

609: x_1 \ldots x_{t-1} x_t$  such that $Fr[i]= \nu^t(a_i)$. The array

610: $SortedAlphabet [1:N]$ consists of letters from $A$ ordered by the

611: frequency of occurrence. More precisely, the following property is

612: satisfied: if $i \leq j$ and $ Sorted Alphabet [i]= b $ and $

613: Sorted Alphabet [j]= c$, then $ \nu^t(b) \leq \nu^t(c)$. In

614: particular, it means that all letters from a subset $\Lambda^t_k,

615: k=0,1,...$, are situated in succession in $Sorted Alphabet [1:N]$

616: and forming a string. $SetBegin [k ]$ and $SetEnd [k]$ contain

617: information about the beginning and the end of such a string.

618:  At last, by definition,$ Inverse Sort[i]$

619: contains  an integer $j$ such that $Sorted Alphabet [j]= a_i$.

620:

621: Let us consider a small example. Let $N = 4 $, $t = 4 $ and  the

622: frequencies

623:  $\nu^t(a_1)=0, \nu^t(a_2)=1, \nu^t(a_3)=2 $ and $ \nu^t(a_4)=1 $.

624: Then, $Fr= $ $ [0,1,2,1],$ $ Sorted Alphabet $ $ =

625: [a_1,a_4,a_2,a_3],$ $ Inverse Sort =[1,3,4,2]$,$Set $ $ Begin$ $

626: =[1,2,4]$, $ SetEnd $ $ =[1,3,4] $ is one possible configuration

627: of the contents of the relevant arrays.

628:

629: Consider next updating the information in the arrays, which should

630: be done by the encoder (and decoder) after encoding (and decoding)

631: of each letter, in such a way that only a constant number of

632: operations is needed. Suppose we encode the letter $a_4$ and

633: increment its occurrence count. The arrays should be changed as

634: follows : the processed letter ($a_4$) should be exchanged with

635: the last letter from $ \Lambda^t_k $ ($ \Lambda^t_1 $ in our case)

636: and the relevant modifications should be done in $ SortedAlphabet$

637: and $InverseSort $. Then the letter processed should be included

638: in the set $ \Lambda^t_{k+1} $ and excluded from the set $

639: \Lambda^t_k $. In fact, it is enough to change two elements in

640: $SetBegin$ and $ SetEnd $, namely, $SetBegin [k+1]=

641: SetBegin[k+1]-1 $ and $ SetEnd[k]= SetEnd [k]- 1 $.  (In our

642: example, $a_4$ should be moved from $ \Lambda^t_1 $ into $

643: \Lambda^t_2 $. When we carry out these calculations the result is

644: $Fr= [0,1,2,2],$ $ SortedAlphabet = [a_1,a_2,a_4,a_3],$ $

645: InverseSort=[1,2,4,3], $ $ SetBegin =[1,2,3] $ and $ SetEnd

646: =[1,2,4] $.)

647:

648: We have considered the case when the occurrence count should be

649: incremented. Decrementing, which is used in certain schemes of the

650: adaptive arithmetic code, can be carried out in a similar manner.

651:

652:

653: \section{Appendix. }

654:

655: \textbf{The proof of Theorem 1.} It is easy to see that the set

656: $\bar{P}_N$ of all  distributions  which are ordered according to

657: the probability decreasing is convex. Indeed, each $ \bar{p} = \{

658: p_1, p_2,\ldots, p_N \} \in \bar{P}_N$ may be presented as a

659:  linear combination of vectors from the set

660:  \begin{equation}\label{q}

661: Q_N = \{q_1 = (1,0,\ldots,0), q_2= (1/2,1/2,0,\ldots,0),\ldots,

662: q_N = ( 1/N,   \ldots, 1/N)

663: \end{equation}

664:  as follows:

665: $$ \bar{p} = \sum_{i=1}^N (p_i - p_{i+1 } ) q_i ;  $$

666: where $p_{N+1}= 0 .$

667:

668: On the other hand, the redundancy (\ref{red}) is a convex function,

669: because the direct calculation shows that its second partial

670: derivatives are nonnegative. Indeed, the redundancy (\ref{red})

671: can be represented as follows. $$ r(\bar{p}, \bar{m}) = \sum_{i=1

672: }^N p_i \log ( p_i ) \: - \,\sum_{j=1}^s \pi_j (\log \pi_j - \log

673: m_j) = $$

674:

675: $$ \sum_{i=2 }^N p_i \log ( p_i )  \:- \,

676: \sum_{j=2}^s \pi_j (\log \pi_j - \log m_j)\, +$$

677: $$(1-\sum_{k=2}^N p_k ) \log (1-\sum_{k=2}^N p_k )\,

678: -\,(1-\sum_{l=2}^s \pi_l )

679: ( \log (1-\sum_{l=2}^s \pi_l ) - \log m_1). $$ If $a_i$ is a

680: certain letter from $A$ and $j$ is such a subset that $a_i \in A_j

681: $ then, the direct calculation shows that

682: $$ \partial r / \partial p_i = \log_2 e \,(\,\ln p_i - \ln \pi_j-

683: \ln (1 - \sum_{k=2}^N p_k) + \ln (1 - \sum_{l=2}^s \pi_l)\,) +

684: constant ,

685: $$

686:

687: $$\partial^2 r /

688: \partial^2 p_i = \log_2 e \,((- 1/\pi_i + 1/ p_j) +

689: (- 1/\pi_1 + 1/ p_1)) .$$ The last value is nonnegative, because,

690: by definition, $\pi_i = \sum _{k= n_i }^{n_{i+1}-1}p_k$ and $p_j$

691: is one of the summands as well as $p_1$ is one of the summands of

692: $\pi_1$.

693:

694: Thus, the redundancy is a convex function defined on a

695: convex set, and its  extreme points are $Q_N$ from (\ref{q}). So

696: $$sup_{ \bar{p} \in \bar{P }_N} r(\bar{p}, \bar{m}) = \max_{ q

697: \,\in \;Q_N} r(q, \bar{m})  .$$ Each  $q \in Q_N$ can be presented

698: as a vector $ q= (1/(n_i + l), \ldots, 1/(n_i + l), 0, \ldots, 0 )

699: $ where $ 1 \leq l \leq m_{i+1} , i=0, \ldots, s-1.$ This

700: representation, the last equality, the definitions (\ref{q}) ,

701: (\ref{red}) and (\ref{Red}) give (\ref{th}).

702:

703: \textbf{Proof of the theorem 2.} Obviously,

704: \begin{equation}\label{obv}

705: \sum_{a \in A} p(a)( |\gamma_{gr} (a) |+ \log p(a)) =$$ $$ \sum_{a

706: \in A} p(a)( |\gamma_{gr}(a) |+ \log \hat{p}(a)) + \sum_{a \in A}

707: p(a)(

708:  \log (p(a)/ \hat{p}(a)).

709: \end{equation}

710: and, from (\ref{pi}),(\ref{code}) we obtain $$ \sum_{a \in A}

711: p(a)( |\gamma_{gr}(a) |+ \log \hat{p}(a))= \sum_{i=1}^s (

712: |\gamma_{gr}(a) |+ \log \hat{p}(a)) \sum_{a \in A_i} p(a) = $$ $$

713: \sum_{i=1}^s ( |\gamma_{gr}(a) |+ \log \hat{p}(a)) \sum_{a \in

714: A_i} \hat{p}(a) = \sum_{a \in A} \hat{p}(a)( |\gamma_{gr}(a) |+

715: \log \hat{p}(a)). $$ This equality and (\ref{obv}) gives

716: $$

717: \sum_{a \in A} p(a)( |\gamma_{gr} (a) |+ \log p(a)) =$$ $$ \sum_{a

718: \in A} \hat{p}(a)( |\gamma_{gr}(a) |+ \log \hat{p}(a)) + \sum_{a

719: \in A} p(a)(

720:  \log (p(a)/ \hat{p}(a)).$$ From this equality, the statement of the theorem and

721: the definitions (\ref{red}) and (\ref{red1}) we obtain

722: $$

723: \sum_{a \in A} p(a)( |\gamma_{gr} (a) |+ \log p(a)) \leq \Delta +

724: \delta.

725: $$

726: Theorem 2 is proved.

727:

728: \textbf{The proof of the theorem 3.} The proof is based on the

729: theorem 1. From (\ref{th}) we obtain the following obvious

730: inequality

731:  \begin{equation}\label{cr}

732: R( \bar{m})\leq \max_{i=1,...,s} \max_{l=1,...,m_i} l\, \log (m_i

733: /l)/ n_i .

734: \end{equation} Direct calculation shows that

735: $$

736: \partial (\log (m_i /l)/n_i)/\partial l = \log_2 e \,(\ln

737: (m_i/l) - 1 )/n_i ,$$

738: $$ \partial^2(\log (m_i /l)/n_i)/\partial l^2 = - \log_2e/(l \,n_i)

739: <0

740: $$ and, consequently, the maximum of the function

741: $\log (m_i /l)/n_i$ is equal to $ m_i\log e / (e \,n_i) ,$ when

742: $l= m_i/e $. So,

743: $$ \max_{l=1,...,m_i} l\, \log (m_i/l)/ n_i \leq m_i\log e / (e\,

744: n_i) $$ and from (\ref{cr}) we obtain

745: \begin{equation}\label{cr1}

746: R( \bar{m})\leq \max_{i=1,...,s} m_i\log e / (e \,n_i).

747: \end{equation}

748: That is why, if

749: \begin{equation}\label{cr2}

750: m_i \leq \delta \,e \,n_i/ \log e

751: \end{equation}

752:  then $R( \bar{m})\leq \delta $. By

753: definition ( see the statement of the theorem ) , $n_i = n_{i-1} +

754: m_i$ and we obtain from (\ref{cr2}) the first claim of the

755: theorem. Taking into account that $n_{s-1} < N \leq n_s $ and

756: (\ref{cr1}), (\ref{cr2}) we can see that, if

757: $$ N = \acute{c}_1 (1+\delta e/ \log e)^s + \acute{c}_2, $$  then

758: $R( \bar{m})\leq \delta ,$ where $ \acute{c}_1$ and $\acute{c}_2$

759: are constants and $N \rightarrow \infty .$ Taking the logarithm

760: and applying the well known estimation $\ln (1+\varepsilon )

761: \approx \varepsilon$ when $ \varepsilon \approx 0, $ we obtain

762: (\ref{co}). The theorem is proved.

763:

764:

765: %\section{Appendix 2. The program for grouping.}

766: %\noindent {\bf begin\  }

767:

768:

769:  %{\bf end}

770:

771: %\vskip .1in

772:

773:

774:

775:

776:

777:

778:

779:

780:

781: %**********************************************************

782: %*                   Bibliography                         *

783: %**********************************************************

784: \newpage

785: \begin{thebibliography}{5}

786:

787: \bibitem{Aho}

788: A.V.Aho,J.E. Hopcroft, J.D.Ulman.{ \it The desighn and analysis of

789: computer algorithms }, Reading, MA: Addison- Wesley, 1976.

790:

791: \bibitem{Fenwick}

792: P. Fenwick, ``A new data structure for cumulative probability

793: tables,'' {\it Software -- Practice and Experience,} vol. 24, no.

794: 3, pp. 327--336, March 1994. Errata published in vol. 24, no. 7,

795: p. 667, July 1994.

796:

797: \bibitem{Jo}

798: D. W. Jones", "Application of splay trees to data compression",

799:         {\it Communications of the ACM}, v 31, n. 8,1988,

800:         pp. "996-1007",

801:

802:

803: \bibitem{Ki1}

804: Kieffer, J.C.; Yang, E.H. Grammar-based codes: a new class of

805: universal lossless source codes. {\it IEEE Trans. Inform. Theory},

806: v.46 (2000), no. 3, 737--754.

807:

808: \bibitem{Ki2}

809: Kieffer, J.C.; Yang, E.H.; Nelson, G.J.; Cosman, P. Universal

810: lossless compression via multilevel pattern matching.{\it IEEE

811: Trans. Inform. Theory,} v.46 (2000), no. 4, 1227--1245.

812:

813: \bibitem{Moffat90}

814: A. Moffat, Linear time adaptive arithmetic coding",{\it  IEEE

815: Transactions on Information Theory }

816:  1990,    v.36, no. 2, pp.401-406.

817:

818:

819:

820: \bibitem{Moffat99}

821: A. Moffat, An improved data structure for cumulative probability

822: tables, 1999,{\it Software -- Practice and Experience},

823:         v.29,

824:         no. 7,

825:         pp.647-659.

826:

827:

828: \bibitem{Moffat94}

829: A.Moffat,R.Neal,I.Witten. "Arithmetic Coding Revisited",  {\it ACM

830: Transactions on Information Systems,} 16(3):256-294, July 1998.

831:

832: \bibitem{T-M}

833: Moffat, A.; Turpin, A. On the implementation of minimum redundancy

834: prefix codes, {\it  IEEE Transactions on Communications,} v.45,

835: no. 10, pp. 1200 - 1207, 1997.

836:

837: \bibitem{M-T1}

838: A.Moffat,A.Turpin,  Efficient Construction of Minimum-Redundancy

839: Codes for Large Alphabets. {\it IEEE Trans. Inform. Theory,} vol.

840: IT-44, no. 4, pp. 1650--1657, July 1998.

841:

842:

843: \bibitem{Riss76}

844: J.Rissanen, ``Generalized Kraft inequality and arithmetic

845: coding,'' {\it IBM J. Res. Dev.,} vol. 20, pp. 198--203, May 1976.

846:

847:

848:

849: \bibitem{RyabkoDAN}

850: Ryabko, B. Ya. A fast sequential code. {\it  Dokl. Akad. Nauk SSSR

851: } v.306 (1989), no. 3, pp.548--552 (Russian); translation in {\it

852: Soviet Math. Dokl.}, v. 39 (1989), no. 3, pp. 533--537.

853:

854:

855:

856: \bibitem{Ryabko}

857: B.Ryabko, A fast on-line adaptive code, {\it IEEE Trans. Inform.

858: Theory,} vol. IT-38, no. 4, pp. 1400--1404, 1992.

859:

860: %\bibitem{R-R}

861: %D.B. Ryabko, B.Ya. Ryabko. ''The program for grouping

862: %letters'',2002.

863: %In: http://www.ict.nsc.ru/$\sim$ryabko /

864: %GroupYourAlphabet.html

865:

866:

867: \bibitem{Ryabko-Fionov}

868: B.Ryabko, A.Fionov, ''Fast and Space-Efficient Adaptive Arithmetic

869: Coding'',{ \it in :Cryptography and Coding, 7th IMA International

870: Conference, Cirencester, UK, December 1999. Proceedings }, LNCS

871: 1746, pp. 270 -279.

872:

873: \bibitem{R-Ri}

874: B.Ryabko, J.Rissanen. " Fast Adaptive Arithmetic Code for Large

875: Alphabet Sources

876:  with Asymmetrical Distributions", { \it IEEE Communications Letters,}

877: 2002, (accepted for publication).

878:

879: See also B. Ryabko, J. Rissanen. Fast Adaptive Arithmetic Code for

880: Large Alphabet Sources with Asymmetrical Distributions . { \it

881: Proceedings of the IEEE International Symposium on Information

882: Theory, 2002, Lausanne, Switzelend, } p.319

883:

884:

885: \bibitem{M-T}

886: Turpin, A.; Moffat, A., ''On-line adaptive canonical prefix coding

887: with bounded compression loss'',{\it IEEE Trans. Inform. Theory,}

888: vol. IT-47, no. 1, pp.88- 98, 2001.

889:

890:

891: \end{thebibliography}

892: \newpage

893:

894: \noindent{\bf Authors:}

895:

896: \noindent B.Ya. Ryabko\\

897:  Professor.\\

898:  Siberian State University of Telecommunication and Computer Science\\

899: Kirov Street, 86\\

900: 630102 Novosibirsk, Russia.

901:  \vskip .05in

902: \noindent e-mail: \verb"ryabko@neic.nsk.su" \\

903: URL: \verb"www.ict.nsc.ru/~ryabko"

904:

905: \vskip .1in

906:

907:

908: \noindent J. Astola \\

909: Professor.\\

910: Tampere University of Technology\\

911: P.O.B. 553, FIN- 33101 Tampere,\\ Finland.

912:  \vskip .05in \noindent

913: e-mail: \verb"jta@cs.tut.fi"

914:

915: \vskip .1in

916:

917:

918: \noindent K. Eguiazarian \\

919: Professor.\\

920: Tampere University of Technology\\

921: P.O.B. 553, FIN- 33101 Tampere, \\ Finland.

922:  \vskip .05in

923: \noindent e-mail: \verb"karen@cs.tut.fi"

924:

925:

926:

927: \vskip .1in

928: \noindent{\bf  Address for Correspondence:}\\

929: \noindent  prof. B. Ryabko \\

930:   Siberian State University of Telecommunication and

931: Computer

932: Science\\

933: Kirov Street, 86\\

934: 630102 Novosibirsk, Russia.\\

935: \noindent e-mail: \verb"ryabko@neic.nsk.su" \\

936:

937:

938: %\fi

939: \end{document}

940: