0511:cs0511003/IT

1: % 25 Nov 07

2:

3: % Additional files: inftree3d.eps, R0.eps, R1.eps, Rlin.eps,

4: %                   inf_humb3d.eps, ga2.eps, radrat.eps,

5: %                   [IT_inf.bbl], (IEEEbib.bst, IEEEtran.cls)

6:

7: \documentclass[10pt]{IEEEtran}

8: \usepackage{cite,graphicx,psfrag,amsmath,amssymb,subfigure,url,supertabular,color}

9: \newtheorem{theorem}{Theorem}

10: \newtheorem{corollary}{Corollary}

11: \newtheorem{lemma}{Lemma}

12: \newtheorem{defi}{Definition}

13:

14: \def\CampCost{L}

15: \def\definedas{\triangleq}

16: \def\order{O}

17: \def\s{\mbox{'s}}

18: \def\boldp{p}

19: \def\bigp{P}

20: \def\boldw{w}

21: \def\bigw{W}

22: \def\kval{k}

23: \def\kvals{k}

24: \def\len{n}

25: \def\biglen{N}

26: \def\E{{\mathbb E}}

27: \def\P{{\mathbb P}}

28: \def\R{{\mathbb R}}

29: \def\Rp{{{\mathbb R}_+}}

30: \def\W{{\bigw}}

31: \def\X{{\mathcal X}}

32: \def\Z{{\mathbb Z}}

33: \def\lg{{\log_2}}

34:

35: \newcommand{\defn}[0]{\it}

36: \hyphenation{szpan-kow-ski}

37:

38: \begin{document}

39: \bibliographystyle{IEEEtran} \title{Optimal Prefix Codes for Infinite

40: Alphabets with Nonlinear Costs}

41: \author{Michael~B.~Baer,~\IEEEmembership{Member,~IEEE}%

42: \thanks{This work was supported in part by the National Science

43: Foundation (NSF) under Grant CCR-9973134 and the Multidisciplinary

44: University Research Initiative (MURI) under Grant DAAD-19-99-1-0215.

45: Part of this work was performed while the author was at Stanford

46: University.  This material was presented in part at the IEEE

47: International Symposium on Information Theory, Seattle, Washington,

48: USA, July 2006 and at the IEEE International Symposium on Information Theory,

49: Nice, France, June 2007}%

50: \thanks{The author is with Ocarina Networks, Inc., 42 Airport Parkway, San Jose, CA  95110-1009  USA (e-mail:{\color{white}{i}}calbear{\color{black}{@}}{\bf \tiny \.{1}}eee.org).}

51: \thanks{This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.}}

52: \markboth{IEEE Transactions on Information Theory}{Optimal Prefix Codes for Infinite Alphabets with Nonlinear Costs}

53: %\pubid{0000--0000/00\$00.00~\copyright~2007 IEEE}

54: \maketitle

55:

56: \begin{abstract}

57: Let $\bigp = \{\boldp(i)\}$ be a measure of strictly positive

58: probabilities on the set of nonnegative integers.  Although the

59: countable number of inputs prevents usage of the Huffman algorithm,

60: there are nontrivial $\bigp$ for which known methods find a source

61: code that is optimal in the sense of minimizing expected codeword

62: length.  For some applications, however, a source code should instead

63: minimize one of a family of nonlinear objective functions,

64: $\beta$-exponential means, those of the form $\log_a \sum_i \boldp(i)

65: a^{\len(i)}$, where $\len(i)$ is the length of the $i$th codeword and

66: $a$ is a positive constant.  Applications of such minimizations

67: include a novel problem of maximizing the chance of message receipt in

68: single-shot communications ($a<1$) and a previously known problem of

69: minimizing the chance of buffer overflow in a queueing system ($a>1$).

70: This paper introduces methods for finding codes optimal for

71: such exponential means.  One method applies to geometric

72: distributions, while another applies to distributions with lighter

73: tails.  The latter algorithm is applied to Poisson distributions and

74: both are extended to alphabetic codes, as well as to minimizing

75: maximum pointwise redundancy.  The aforementioned application of

76: minimizing the chance of buffer overflow is also considered.

77: \end{abstract}

78:

79: \begin{keywords}

80: Communication networks, generalized entropies, generalized means,

81: Golomb codes, Huffman algorithm, optimal prefix codes, queueing,

82: worst case minimax redundancy.

83: \end{keywords}

84:

85: \IEEEpeerreviewmaketitle

86:

87: \section{Introduction, Motivation, and Main Results}

88: \label{intro}

89:

90: If probabilities are known, optimal lossless source coding of

91: individual symbols (and blocks of symbols) is usually done using David

92: Huffman's famous algorithm\cite{Huff}.  There are, however, cases that

93: this algorithm does not solve.  Problems with an

94: infinite number of possible inputs --- e.g., geometrically-distributed

95: variables --- are not covered.  Also, in some instances, the

96: optimality criterion --- or {\defn penalty} --- is not the linear

97: penalty of expected length.  Both variants of the problem have been

98: considered in the literature, but not simultaneously.  This paper

99: discusses cases which are both infinite and nonlinear.

100:

101: An infinite-alphabet source emits symbols drawn from the alphabet

102: $\X_\infty = \{0, 1, 2, \ldots \}$.  (More generally, we use $\X$ to

103: denote an input alphabet whether infinite or finite.)  Let $\bigp =

104: \{\boldp(i)\}$ be the sequence of probabilities for each symbol, so

105: that the probability of symbol $i$ is $\boldp(i) > 0$.  The source

106: symbols are coded into binary codewords.  The codeword $c(i) \in

107: \{0,1\}^*$ in code $C$, corresponding to input symbol~$i$, has length

108: $\len(i)$, thus defining length distribution~$\biglen$.  Such codes

109: are called {\defn integer codes} (as in, e.g., \cite{YaQi}).

110:

111: Perhaps the most well-known integer codes are the codes derived by

112: Golomb for geometric distributions\cite{Golo,GaVV}, and many other

113: types of integer codes have been considered by others\cite{Abr01}.

114: There are many reasons for using such integer codes rather than codes

115: for finite alphabets, such as Huffman codes.  The most obvious use is

116: for cases with no upper bound --- or at least no known upper bound ---

117: on the number of possible items.  In addition, for many cases it is

118: far easier to come up with a general code for integers rather than a

119: Huffman code for a large but finite number of inputs.  Similarly, it

120: is often faster to encode and decode using such well-structured codes.

121: For these reasons, integer codes and variants of them are widely used

122: in image and video compression standards\cite{WSBL, WSS}, as well as

123: for compressing text, audio, and numerical data.

124:

125: To date, the literature on integer codes has considered only finding

126: efficient uniquely decipherable codes with respect to minimizing

127: expected codeword length $\sum_i \boldp(i) \len(i)$.  Other utility

128: functions, however, have been considered for finite-alphabet codes.

129: Campbell~\cite{Camp} introduced a problem in which the penalty to

130: minimize, given some continuous (strictly) monotonic increasing {\defn

131: cost function} $\varphi(x):\Rp \rightarrow \Rp$, is

132: $$

133: \CampCost(\bigp,\biglen,\varphi) = \varphi^{-1}\left(\sum_i \boldp(i)

134: \varphi(\len(i))\right)

135: $$ and specifically considered the exponential subcases with exponent $a>1$:

136: \begin{equation}

137: \CampCost_a(\bigp,\biglen) \definedas \log_a \sum_i \boldp(i) a^{\len(i)}

138: \label{ExpCost}

139: \end{equation}

140: that is, $\varphi(x) = a^x$.  Note that minimizing penalty $\CampCost$

141: is also an interesting problem for $0<a<1$ and approaches the standard

142: penalty $\sum_i \boldp(i) \len(i)$ for $a \rightarrow 1$\cite{Camp}.

143: While $\varphi(x)$ decreases for $a<1$, one can map decreasing

144: $\varphi$ to a corresponding increasing function $\tilde{\varphi}(l) =

145: \varphi_{\max} - \varphi(l)$ (e.g., for $\varphi_{\max} = 1$) without

146: changing the penalty value.  Thus this problem, equivalent to

147: maximizing $\sum_i \boldp(i) a^{\len(i)}$, is a subset of those

148: considered by Campbell.  All penalties of the form (\ref{ExpCost}) are

149: called $\beta$-exponential means, where $\beta = \lg

150: a$\cite[p.~158]{AcDa}.

151:

152: Campbell noted certain properties for $\beta$-exponential means, but

153: did not consider applications for these means.  Applications were

154: later found for the problem with $a>1$ \cite{Jeli,Humb2,BlMc};

155: these applications all relate to a buffer overflow problem

156: discussed in Section~\ref{application}.

157:

158: Here we introduce a novel application for problems of the form $a<1$.

159: Consider a situation related by Alfred R\'{e}nyi, an ancient scenario

160: in which a rebel fortress was besieged by Romans.  The rebels' only

161: hope was the knowledge gathered by a mute, illiterate spy, one who

162: could only nod and shake his head \cite[pp.~13-14]{Reny}.  This

163: apocryphal tale --- based upon a historical siege --- is the premise

164: behind the Hungarian version of the spoken parlor game Twenty

165: Questions.  A modern parallel in the 21\textsuperscript{st} century

166: occurred when Russian forces gained the knowledge needed to defeat

167: hostage-takers by asking hostages ``yes'' or ``no'' questions over

168: mobile phones\cite{MSN,Tar}.

169:

170: R\'{e}nyi presented this problem in narrative form in order to

171: motivate the relation between Shannon entropy and binary prefix

172: coding.  Note however that Twenty Questions, traditional prefix

173: coding, and the siege scenario actually have three different

174: objectives.  In Twenty Questions, the goal is to be able to determine

175: the symbol (i.e., the item or message) by asking at most twenty

176: questions.  In prefix coding, the goal is to minimize the expected

177: number of questions --- or, equivalently, bits --- necessary to

178: determine the message.  For the siege scenario, the goal is survival;

179: that is, assuming partial information is not useful, the besieged

180: would wish to maximize the probability that the message is

181: successfully transmitted within a certain window of opportunity.  When

182: this window closes --- e.g., when the fortress falls --- the

183: information becomes worthless.  An analogous situation occurs when a

184: wireless device is losing power or is temporarily within range of a

185: base station; one can safely assume that the channel, when available,

186: will transmit at the lowest (constant) bitrate, and will be lost after a

187: nondeterministic time period.

188:

189: Assume that the duration of the window of opportunity is independent

190: of the communicated message and is memoryless, the latter being a

191: common assumption --- due to both its accuracy and expedience --- of

192: such stochastic phenomena.  Memorylessness implies that the window

193: duration is distributed exponentially.  Therefore, quantizing time in

194: terms of the number of bits $T$ that we can send within our window,

195: $$\P(T = t) = (1-a)a^t, ~ t = 0, 1, 2, \ldots $$ with known positive parameter

196: $a<1$.  We then wish to maximize the probability of success, i.e., the

197: probability that the message length does not exceed the quantized

198: window length:

199: \begin{eqnarray*}

200: \P[\len(X) \leq T] &=& \sum_{t=0}^\infty \P(T=t) \cdot \P[\len(X) \leq t] \\

201: &=& \sum_{t=0}^\infty (1-a)a^t \cdot \sum_{i \in \X} p(i) 1_{\len(i) \leq t} \\

202: &=& \sum_{i \in \X} p(i) \cdot (1-a) \sum_{t=\len(i)}^\infty a^t \\

203: &=& \sum_{i \in \X} p(i) a^{\len(i)} \cdot (1-a) \sum_{t=0}^\infty a^t \\

204: &=& \sum_{i \in \X} p(i) a^{\len(i)}

205: \end{eqnarray*}

206: where $1_{\len(i) \leq t}$ is $1$ if $\len(i) \leq t$, $0$ otherwise.

207: Minimizing (\ref{ExpCost}) is an equivalent objective.

208:

209: Note that this problem can be constrained or otherwise modified for

210: the application in question.  For example, in some cases, we might

211: need some extra time to send the first bit, or, alternatively, the

212: window of opportunity might be of at least a certain duration,

213: increasing or reducing the probability that no bits can be sent,

214: respectively.  Thus we might have

215: $$ \P(T = t) = \left\{

216: \begin{array}{ll}

217: t_0,& t = 0 \\

218: (1-t_0)(1-a)a^{t-1},& t = 1, 2, \ldots

219: \end{array}

220: \right.

221: $$ for some~$t_0 \in (0,1)$.  In this case,

222: $$\P[\len(X) \leq T] = \frac{(1-t_0)}{a} \sum_{i \in \X} p(i)

223: a^{\len(i)}$$ and the maximizing code is identical to that of the more

224: straightforward case.  Likewise, if we need to send multiple messages,

225: the same code maximizes the expected number of independent messages we can

226: send within the window, due to the memoryless property.

227:

228: We must be careful regarding the meaning of an ``optimal code'' when

229: there are an infinite number of possible codes under consideration.

230: One might ask whether there must exist an optimal code or if there can

231: be an infinite sequence of codes of decreasing penalty without any

232: code achieving the limit penalty value.  Fortunately the answer is the

233: former, the proof being a special case of Theorem~2 in~\cite{Baer06}

234: (a generalization of the result for the expected-length

235: penalty\cite{LTZ}).  The question is then how to find one of these

236: optimal source codes given parameter $a$ and probability

237: measure~$\bigp$.

238:

239: As in the linear case, a general solution for (\ref{ExpCost}) is not

240: known for general $\bigp$ over a countably infinite number of events,

241: but methods and properties for finite numbers of events --- discussed

242: in the next section --- can be used to find optimal codes for certain

243: common infinite-item distributions.  In Section~\ref{geometric}, we

244: consider geometric distributions and find that Golomb codes are

245: optimal, although the optimal Golomb code for a given probability mass

246: function varies according to $a$.  The main result of

247: Section~\ref{geometric} is that, for $\boldp_\theta(i) =

248: (1-\theta)\theta^i$ and $a \in \Rp$, G$\kval$, the Golomb code with

249: parameter $\kval$, is optimal, where $$\kval = \max\left(1,

250: \left\lceil -\log_\theta a -\log_\theta (1+\theta)

251: \right\rceil\right).$$ In Section~\ref{other}, we consider

252: distributions that are relatively light-tailed, that is, that decline

253: faster than certain geometric distributions.  If there is a

254: nonnegative integer $r$ such that for all $j>r$ and $i<j$,

255: $$\boldp(i) \geq \max\left(\boldp(j), \sum_{k=j+1}^\infty \boldp(k)

256: a^{k-j}\right)$$ then an optimal binary prefix code tree exists which

257: consists of a unary code tree appended to a leaf of a finite code

258: tree.  A specific case of this is the Poisson distribution,

259: $\boldp_\lambda(i)=\lambda^i e^{-\lambda}/i!$, where $e$ is the base

260: of the natural logarithm ($e \approx 2.71828$).  We show that in this

261: case the aforementioned $r$ is given by $r = \max(\lceil 2 a \lambda

262: \rceil - 2, \lceil e \lambda \rceil - 1)$.  An application, that of

263: minimizing probability of buffer overflow, as in~\cite{Humb2}, is

264: considered in Section~\ref{application}, where we show that the

265: algorithm developed in \cite{Humb2} readily extends to coding

266: geometric and light-tailed distributions.  Section~\ref{nonexp}

267: discusses the maximum pointwise redundancy penalty, which has a

268: similar solution for light-tailed distributions and for which the

269: Golomb code G$\kval$ with $\kval = \lceil -1/\lg \theta \rceil$ is

270: optimal for with geometric distributions.  We conclude with some

271: remarks on possible extensions to this work.

272:

273: Throughout the following, a set or sequence of items $x(i)$ is

274: represented by its uppercase counterpart, $X$.  A glossary of terms is

275: given in Appendix~\ref{glossary}.

276:

277: \section{Background: Finite Alphabets}

278: \label{background}

279:

280: If a finite number of events comprise $\bigp$ (i.e., $|\X|<\infty$),

281: the exponential penalty (\ref{ExpCost}) is minimized using an

282: algorithm found independently by Hu {\it et al.}~\cite[p.~254]{HKT},

283: Parker \cite[p.~485]{Park}, and Humblet

284: \cite[p.~25]{Humb0},\cite[p.~231]{Humb2}, although only the last of

285: these considered $a < 1$.  (The simultaneity of these lines of

286: research was likely due to the appearance of the first paper on

287: adapting the Huffman algorithm to a nonlinear penalty, $\max_i

288: (\boldp(i) + \len(i))$ for given $\boldp(i) \in \Rp$, in

289: 1976\cite{Golu}.)  We will use this finite-alphabet

290: exponential-penalty algorithm in the sections that follow in order to

291: prove optimally for infinite distributions, so let us reproduce the

292: algorithm here:

293:

294: \textbf{Procedure for Exponential Huffman Coding (finite alphabets):}

295: This procedure finds the optimal code

296: whether $a>1$ (a minimization of the average of a growing exponential)

297: or $a<1$ (a maximization of the average of a decaying exponential).

298: Note that it minimizes (\ref{ExpCost}), even if the ``probabilities''

299: do not add to $1$.  We refer to such arbitrary positive inputs as

300: {\defn weights}, denoted by $\boldw(i)$ instead of~$\boldp(i)$:

301:

302: \begin{enumerate}

303: \item Each item $i$ has weight $\boldw(i) \in \bigw_{\X}$, where $\X$

304: is the (finite) alphabet and $\bigw_{\X} = \{w(i)\}$ is the set of all

305: such weights.  Assume each item $i$ has codeword $c(i)$, to be

306: determined later.

307: \item Combine the items with the two smallest weights $\boldw(j)$ and

308:   $\boldw(k)$ into one compound item with the combined weight

309:   $\tilde{\boldw}(j) = a \cdot (\boldw(j) + \boldw(k))$.  This item

310:   has codeword $\tilde{c}(j)$, to be determined later, while item $j$ is

311:   assigned codeword $c(j) = \tilde{c}(j)0$ and $k$ codeword $c(k) =

312:   \tilde{c}(j)1$.  Since these have been assigned in terms of

313:   $\tilde{c}(j)$, replace $\boldw(j)$ and $\boldw(k)$ with

314:   $\tilde{\boldw}(j)$ in $\bigw_\X$ to form $\bigw_{\tilde{\X}}$.

315: \item Repeat procedure, now with the remaining codewords (reduced in

316:   number by $1$) and corresponding weights, until only one

317:   item is left.  The weight of this item is $\sum_i \boldw(i)

318:   a^{\len(i)}$.  All codewords are now defined by assigning the null

319:   string to this trivial item.

320: \end{enumerate}

321: This algorithm assigns a weight to each node of

322: the resulting implied code tree by having each item represented by a

323: node with its parent representing the items combined into its subtree,

324: as in Fig.~\ref{buildgolo}: If a node is a leaf, its weight is given

325: by the associated probability; otherwise its weight is defined

326: recursively as $a$ times the sum of its children.  This concept is

327: useful in visualizing both the coding procedure and its output.

328:

329: Van Leeuwen implemented the Huffman algorithm in linear time (to input

330: size) given sorted weights in \cite{Leeu}, and this implementation was

331: extended to the exponential problem in \cite{Baer05} as follows:

332:

333: \textbf{Two-Queue Implementation of Exponential Huffman Coding:}

334: The two-queue method of implementing the Huffman algorithm puts

335: nodes/items in two queues, the first of which is initialized with the

336: input items (eventual leaf nodes) arranged from head to tail in order

337: of nondecreasing weight, and the second of which is initially empty.

338: At any given step, a node with lowest weight among all nodes in both

339: queues is at the head of one of the two queues, and thus two

340: lowest-weighted nodes can be combined in constant time.  This compound

341: node is then inserted into (the tail of) the second queue, and the

342: algorithm progresses until only one node is left.  This node is the

343: root of the coding tree and is obtained in linear time.

344:

345: The presentation of the algorithm in \cite{Baer05} did not include a

346: formal proof, so we find it useful to present one here:

347:

348: \begin{lemma}

349: The two-queue method using the exponential combining rule

350: results in an optimal exponential Huffman code given a finite number

351: of input items.

352: \label{twoqueue}

353: \end{lemma}

354:

355: \begin{proof}

356: The method is clearly a valid implementation of the exponential

357: Huffman algorithm so long as both queues' sets of nodes remain in

358: nondecreasing order.  This is clearly satisfied prior to the first

359: combination step.  Here we show that, if nodes are in order at all

360: points prior to a given combination step, they must be in order at the

361: end of that step as well, inductively proving the correctness of the

362: algorithm.  It is obvious that order is preserved in the single-item

363: queue, since nodes are only removed from it, not added to it.  In the

364: compound-node queue, order is only a concern if there is already at

365: least one node in it at the beginning of this step, a step that

366: combines nodes we call node $i_{-1}$ and node $i_{-2}$.  If so, the

367: item at the tail of the compound-node queue at the beginning of the

368: step was two separate items, $i_{-3}$ and $i_{-4}$, at the beginning

369: of the prior step.  At the beginning of this prior step, all four

370: items must have been distinct --- i.e., corresponding to distinct sets

371: of (possibly combined) leaf nodes --- and, because the algorithm

372: chooses the smallest two nodes to combine, neither $i_{-3}$ nor

373: $i_{-4}$ can have a greater weight than either $i_{-1}$ or $i_{-2}$.

374: Thus --- since $a\cdot(\boldw(i_{-3})+\boldw(i_{-4})) \leq

375: a\cdot(\boldw(i_{-1})+\boldw(i_{-2}))$ and the node with weight

376: $a\cdot(\boldw(i_{-3})+\boldw(i_{-4}))$ is the compound node with the

377: largest weight in the compound-node queue at the beginning of the step

378: in question --- the queues remain properly ordered at the end of the

379: step in question.

380: \end{proof}

381:

382: If $a < 0.5$, the compound-node queue will never have more than one

383: item.  At each step after the first, the sole compound item will be

384: removed from its queue since it has a weight less than the maximum

385: weight of each of the two nodes combined to create it, which in turn

386: is no greater than the weight of any node in the single-item queue.

387: It is replaced by the new (sole) compound item.  This extends to $a =

388: 0.5$ if we prefer to merge combined nodes over single items of the

389: same weight.  Thus, any finite input distribution can be optimally

390: coded for $a \leq 0.5$ using a {\defn truncated unary code}, a

391: truncated version of the {\defn unary code}, the latter of which has

392: codewords of the form $\{1^j0 : j \geq 0\}$.  The truncated unary code

393: has identical codewords as the unary code except for the longest

394: codeword, which is of the form $\{1^{|\X|-1}\}$.  This results from

395: each compound node being formed using at least one single item (leaf).

396: Taking limits, informally speaking, results in a unary limit code.

397: Formally, this is a straightforward corollary of Theorem~\ref{tailthm}

398: in Section~\ref{other}.

399:

400: If $a>0.5$, a code with finite penalty exists if and only if R\'{e}nyi

401: entropy of order $\alpha(a) = {(1+\lg a)}^{-1}$ is finite, as shown in

402: \cite{Baer06}.  It was Campbell who first noted the connection between

403: the optimal code's penalty, $\CampCost_a(\bigp,\biglen^*)$, and

404: R\'{e}nyi entropy

405: \begin{eqnarray*}

406: H_{\alpha}(\bigp) &\definedas& \frac{1}{1-\alpha} \lg \sum_{i \in \X}

407: \boldp(i)^\alpha \\

408: \Rightarrow H_{\alpha(a)}(\bigp) &=& \frac{1+\lg a}{\lg a}

409: \lg \sum_{i \in \X} \boldp(i)^{(1+\lg a)^{-1}} .

410: \end{eqnarray*}

411: This relationship is

412: $$H_{\alpha(a)}(\bigp) \leq \CampCost_a(\bigp,\biglen^*) <

413: H_{\alpha(a)}(\bigp) + 1$$ which should not be surprising given the

414: similar relationship between Huffman-optimal codes and Shannon

415: entropy\cite{Shan}, which corresponds to $a \rightarrow 1$ ($\alpha

416: \rightarrow 1$)\cite{Ren2,Camp}; due to this correspondence, Shannon

417: entropy is sometimes expressed as $H_1(\bigp)$.

418:

419: \section{Geometric Distribution with Exponential Penalty}

420: \label{geometric}

421:

422: Consider the geometric distribution $$\boldp_\theta(i) =

423: (1-\theta)\theta^i$$ for parameter $\theta \in (0,1)$.  This

424: distribution arises in run-length coding among other

425: circumstances\cite{Golo,GaVV}.

426:

427: For the traditional linear penalty, a Golomb code with

428: parameter~$\kval$ --- or G$\kval$ --- is optimal for $\theta^\kval +

429: \theta^{\kval+1} \leq 1 < \theta^{\kval-1} + \theta^\kval$.  Such a

430: code consists of a unary prefix followed by a binary suffix, the

431: latter taking one of $\kval$ possible values.  If $\kval$ is a power

432: of two, all binary suffix possibilities have the same length;

433: otherwise, their lengths $\sigma(i)$ differ by at most $1$ and $\sum_i

434: 2^{-\sigma(i)}=1$.  Binary codes such as these suffix codes are called

435: {\defn complete} codes.  This defines the Golomb code; for example,

436: the Golomb code for $\kval = 3$ is:

437: \begin{center}

438: $$

439: \begin{array}{rll}

440: \hline

441: \hline

442: i&\boldp(i)&c(i) \\

443: \hline

444: 0&1-\theta&0~0 \\

445: 1&(1-\theta)\theta&0~10 \\

446: 2&(1-\theta)\theta^2&0~11 \\

447: 3&(1-\theta)\theta^3&10~0 \\

448: 4&(1-\theta)\theta^4&10~10 \\

449: 5&(1-\theta)\theta^5&10~11 \\

450: 6&(1-\theta)\theta^6&110~0 \\

451: 7&(1-\theta)\theta^7&110~10 \\

452: 8&(1-\theta)\theta^8&110~11 \\

453: 9&(1-\theta)\theta^9&1110~0 \\

454: \vdots&\qquad \vdots&\qquad \vdots \\

455: \hline

456: \end{array}

457: $$

458: \end{center}

459: where the space in the code separates the unary prefix from the complete

460: suffix.  In general, codeword $j$ for G$\kval$ is of the form

461: $\{1^{\lfloor j/\kval \rfloor} 0 b(j \bmod \kval,\kval) : j \geq 0\}$,

462: where $b(j \bmod \kval, \kval)$ is a complete binary code for the $(j -

463: \kval \lfloor j/\kval \rfloor+1)$th of $\kval$ items.

464:

465: It turns out that such codes are optimal for the exponential penalty:

466: \begin{theorem}

467: For $a \in \Rp$, if

468: \begin{equation}

469: \theta^\kval + \theta^{\kval+1} \leq \frac{1}{a} < \theta^{\kval-1} +

470: \theta^\kval

471: \label{ineq}

472: \end{equation}

473: for $\kval \geq 1$, then the Golomb code G$\kval$ is the optimal code

474: for $\bigp_\theta$.  If no such $\kval$ exists, the unary code G$1$ is

475: optimal.

476: \label{optgeo}

477: \end{theorem}

478:

479: \textit{Remark:} This rule for finding an optimal Golomb G$\kval$ code

480: is equivalent to

481: $$\kval = \max\left(1, \left\lceil -\log_\theta a -\log_\theta

482: (1+\theta) \right\rceil\right).$$ This is a generalization of the

483: traditional linear result, which corresponds to $a \rightarrow 1$.

484: Cases in which the left inequality is an equality have multiple

485: solutions, as with linear coding; see, e.g., \cite[p.~289]{Goli2}.

486: The proof of the optimality of the Golomb code for exponential

487: penalties is somewhat similar to that of \cite{GaVV}, although it must

488: be significantly modified due to the nonlinearity involved.

489:

490: Before proving Theorem~\ref{optgeo}, we need the following lemma:

491:

492: \begin{lemma}

493: Consider a Huffman combining procedure, such as the exponential

494: Huffman coding procedure, implemented using the two-queue method presented in the previous section just prior to Lemma~\ref{twoqueue}.  Now consider a step at which the first (single-item)

495: queue is empty, so that remaining are only compound items, that is,

496: items representing internal nodes rather than leaves in the final

497: Huffman coding tree.  Then, in this final tree, the nodes corresponding to these compound items will be on

498: levels differing by at most one; that is, the nodes will form a

499: complete tree.  Furthermore, if $n$ is the number of items remaining

500: at this point, all items that finish at level $\lceil \lg n \rceil$

501: appear closer to the head of the (second, nonempty) queue than any

502: item at level $\lceil \lg n \rceil - 1$ (if any).

503: \label{thelemma}

504: \end{lemma}

505:

506: \begin{proof}[Lemma~\ref{thelemma}]

507: We use an inductive proof, in which the base cases of one and two

508: compound items (i.e., internal nodes) are trivial.  Suppose the lemma is

509: true for every case with $n-1$ items for $n>2$, that is, that all

510: nodes are at levels $\lfloor \lg (n-1) \rfloor$ or $\lceil \lg (n-1)

511: \rceil$, with the latter items closer to the head of the queue than

512: the former.  Consider now a case with $n$ nodes.  The first step of

513: coding is to merge two nodes, resulting in a combined item that is

514: placed at the end of the combined-item queue.  Because it is at the

515: end of the queue in the reduced problem of size $n-1$, this combined node is at level

516: $\lfloor \lg (n-1) \rfloor$ in the final tree, and its children are at

517: level $1+\lfloor \lg (n-1) \rfloor = \lceil \lg n \rceil$.  If $n$ is

518: a power of two, the remaining items end up on level $\lg n = \lceil \lg

519: (n-1) \rceil$, satisfying this lemma.  If $n-1$ is a

520: power of two, they end up on level $\lg (n-1) = \lfloor \lg n \rfloor$,

521: also satisfying the lemma.  Otherwise, there is at least one item ending up at

522: level $\lceil \lg n \rceil = \lceil \lg (n-1) \rceil$ near the head of

523: the queue, followed by the remaining items, which end up at level

524: $\lfloor \lg n \rfloor = \lfloor \lg (n-1) \rfloor$.  In any case, the

525: lemma is satisfied for $n$ items, and thus, inductively, for any number of items.

526: \end{proof}

527:

528: This lemma applies to any problem in which a two-queue Huffman algorithm provides an optimal solution, including the original Huffman

529: problem and the tree-height problem of \cite{Park}.  Here we apply the lemma to the exponential Huffman algorithm to prove Theorem~\ref{optgeo}:

530:

531: \begin{figure*}

532: \psfrag{  0}{\mbox{\tiny $w(0)$}}

533: \psfrag{  1}{\mbox{\tiny $w(1)$}}

534: \psfrag{  2}{\mbox{\tiny $w(2)$}}

535: \psfrag{  3}{\mbox{\tiny $w(3)$}}

536: \psfrag{  4}{\mbox{\tiny $w(4)$}}

537: \psfrag{  5}{\mbox{\tiny $w(5)$}}

538: \psfrag{  6}{\mbox{\tiny $w(6)$}}

539: \psfrag{  7}{\mbox{\tiny $w(7)$}}

540: \psfrag{  8}{\mbox{\tiny $w(8)$}}

541: \psfrag{  9}{\mbox{\tiny $w(9)$}}

542: \psfrag{  10}{\mbox{\tiny $w(10)$}}

543: \psfrag{  11}{\mbox{\tiny $w(11)$}}

544: \psfrag{  12}{\mbox{\tiny $w(12)$}}

545: \psfrag{  13}{\mbox{\tiny $w(13)$}}

546: \psfrag{  14}{\mbox{\tiny $w(14)$}}

547: \psfrag{  15}{\mbox{\tiny $w(15)$}}

548: \psfrag{  16}{\mbox{\tiny $w(16)$}}

549: \psfrag{  17}{\mbox{\tiny $w(17)$}}

550: \psfrag{  18}{\mbox{\tiny $w(18)$}}

551: \psfrag{  19}{\mbox{\tiny $w(19)$}}

552: \psfrag{  20}{\mbox{\tiny $w(20)$}}

553: \psfrag{  21}{\mbox{\tiny $w(21)$}}

554: \psfrag{  22}{\mbox{\tiny $w(22)$}}

555: \begin{center}

556: \resizebox{14cm}{!}{\includegraphics{inftree3d.eps}}

557: \caption{Formation of a Golomb code using a code for an $m$-reduced

558: source.  In this illustration, $m=17$ and $\kval=5$, and smaller weights are pictorially lower.  Weights are merged bottom-up, in a manner consistent with the exponential Huffman algorithm, first in separate (truncated) unary subtrees, then in a (five-leaf) complete tree.}

559: \label{buildgolo}

560: \end{center}

561: \end{figure*}

562:

563: \begin{proof}[Theorem~\ref{optgeo}]

564: We start with an optimal exponential Huffman code for a sequence of

565: similar finite weight distributions.  These finite weight

566: distributions, called {\defn $m$-reduced geometric sources} $\bigw_m$,

567: are defined as:

568: $$

569: \boldw_m(i) \definedas \left\{

570: \begin{array}{ll}

571: (1-\theta)\theta^i,& 0 \leq i \leq m \\

572: \displaystyle

573: \frac{(1-\theta)a\theta^i}{1-a\theta^\kval},& m < i \leq m + \kval .\\

574: \end{array}

575: \right.

576: $$ where $\kval$ is as given in the statement of the theorem, or $1$

577: if no such $\kval$ exists.

578:

579: Weights $\boldw_m(0)$ through $\boldw_m(m)$ are decreasing, as are

580: $\boldw_m(m+1)$ through $\boldw_m(m+\kval)$.  Thus we can combine the

581: nodes with weights $\boldw_{m}(m)$ and $\boldw_m(m+\kval)$ if

582: $$\frac{(1-\theta)a\theta^{m+\kval}}{1-a\theta^\kval} \leq

583: (1-\theta)\theta^{m-1}$$

584: and

585: $$\frac{(1-\theta)a\theta^{m+\kval-1}}{1-a\theta^\kval} >

586: (1-\theta)\theta^m \mbox{ or } \kval=1.$$ These conditions

587: are equivalent to the left and right sides, respectively, of

588: (\ref{ineq}).  Thus the combined item is

589: $$\boldw_{m-1}(m) = \frac{(1-\theta)a\theta^m}{1-a\theta^\kval}$$

590: and the code is reduced to the $\bigw_{m-1}$ case.

591:

592: After merging the two smallest weights for $m=0$, the reduced source

593: is $$\boldw_{-1}(i) = \frac{(1-\theta)a\theta^i}{1-a\theta^\kval}, ~ 0

594: \leq i \leq \kval-1 .$$ For $\kval=1$ (including all instances of the

595: degenerate $a \leq 0.5$ case and all instances in which (\ref{ineq})

596: cannot be satisfied), this proves that the optimal tree is the

597: truncated unary tree.  Considering now only $\kval>1$ for $m \geq

598: \kval-1$, the two-queue algorithm assures that, when the problem is

599: reduced to weights $\{\boldw_{-1}(i)\}$, all corresponding nodes are

600: in the combined-item queue.  Lemma~\ref{thelemma} thus proves that

601: these nodes form a complete code.  The overall optimal tree for any

602: $m$-reduced code with $m \geq \kval-1$ is then a truncated Golomb

603: tree, as pictorially represented in Fig.~\ref{buildgolo}, where $m=17$

604: and $\kval=5$.  Note that $m+1$ is the number of leaves in common with

605: what we call the ``Golomb tree,'' the tree we show to be optimal for

606: the original geometric source.  The number of remaining leaves in the

607: truncated tree is~$\kval$, which is thus the number of distinct unary

608: subtrees in the Golomb tree.

609:

610: Fig.~\ref{buildgolo} represents both the truncated and full Golomb

611: trees, along with how to merge the weights.  Squares represent items

612: to code, while circles represent other nodes of the tree.  Smaller

613: weights are below larger ones, so that items are merged as pictured.

614: Rounded squares are items $m+1$ through $m+\kval$, the items which are

615: replaced in the Golomb tree by unary subtrees, that is, subtrees

616: representing the unary code.  Other squares are items $0$ through $m$,

617: those corresponding to single items in the integer code.  White

618: circles are the leaves used for the complete tree.

619:

620: \begin{figure*}[ht]

621: \psfrag{L-Ha}{$\bar{R}_a(\biglen_{\theta,a}^*,\bigp_\theta)$}

622: \psfrag{a}{$a$}

623: \psfrag{ag}{\mbox {\huge $a$}}

624: \psfrag{Theta}{$\theta$}

625:      \centering

626:      \subfigure[$a>1$]

627: 	       { \label{apos} \includegraphics[width=.45\textwidth]{R0.eps} }

628:      \subfigure[$a<1$]

629: 	       { \label{aneg} \includegraphics[width=.45\textwidth]{R1.eps} }

630:      \caption{Redundancy of the optimal code for the geometric

631:      distribution with the exponential penalty (parameter $a$).

632:      $\bar{R}_a(\biglen_{\theta,a}^*,\bigp_\theta) =

633:      \CampCost_a(\bigp_\theta,\biglen_{\theta,a}^*) - H_{\alpha(a)}(\bigp_\theta)$,

634:      where $\alpha(a) = (1+\lg a)^{-1}$, $\bigp_\theta$ is the

635:      geometric probability sequence implied by $\theta$, and

636:      $\biglen_{\theta,a}^*$ is the optimal length sequence for

637:      distribution $\bigp_\theta$ and parameter $a$.}

638:      \label{aall}

639: \end{figure*}

640:

641: It is equivalent to follow the complete portion of the code with the

642: unary portion --- as in the exponential Huffman tree in

643: Fig.~\ref{buildgolo} --- or to reorder the bits and follow the unary

644: portion with the complete portion --- as in the Golomb

645: code\cite{Golo}.  The latter is more often used in practice and has

646: the advantage of being alphabetic, that is, $i>j$ if and only if

647: $c(i)$ is lexicographically after $c(j)$.

648:

649: The truncated Golomb tree for any $m \geq \kval-1$ represents a code

650: that has the same penalty for the $m$-reduced distribution as does the

651: Golomb code with the corresponding geometric distribution.  We now

652: show that this is the minimum penalty for any code with this geometric

653: distribution.

654:

655: Let $\biglen_{\theta,a}^*$ (or $\biglen^*$ if there is no ambiguity)

656: be codeword lengths that minimize the penalty for the geometric

657: distribution (which, as we noted, exist as shown in Theorem~2

658: of~\cite{Baer06}).  Let $\biglen_m$ be codeword lengths for the

659: $m$-reduced distribution found earlier; that is, $\len_m(i)$ is the

660: Golomb length for $i \leq m$ and $\len_m(i) = \len_m(i-\kval)$ for the

661: remaining values.  Finally, let $\biglen_{\infty}$ be the lengths of

662: the code implied by $m \rightarrow \infty$, that is, the lengths of

663: the Golomb code G$\kval$.  Then

664: \begin{equation}

665: \begin{array}{rcl}

666: \displaystyle

667: \log_a \sum_{i=0}^\infty \boldp(i) a^{\len^*(i)} &\leq&

668: \displaystyle

669: \log_a \sum_{i=0}^\infty \boldp(i) a^{\len_{\infty}(i)} \\

670: &=&

671: \displaystyle

672: \log_a \sum_{i=0}^{m+\kval} \boldw_m(i) a^{\len_m(i)} \\

673: &\leq&

674: \displaystyle

675: \log_a \sum_{i=0}^{m+\kval} \boldw_m(i) a^{\len^*(i)}

676: \end{array}

677: \label{fininf}

678: \end{equation}

679: where the inequalities are due to the optimality of the respective

680: codes and the facts that $\boldw_m(i)=\boldp(i)$ for $i \leq m$ and

681: $$\boldw_m(i)=\sum_{j=0}^\infty (1-\theta)\theta^{i+j\kval}a^{j+1} =

682: \sum_{j=0}^\infty a^{j+1} \boldp(i+j\kval)$$

683: for $i \in (m,m+\kval]$.  The difference between the exponent of the

684: first and the last of the expressions in (\ref{fininf}) is

685: $$

686: \begin{array}{l}

687: \displaystyle

688: \sum_{i=0}^\infty \boldp(i) a^{\len^*(i)} - \sum_{i=0}^{m+\kval}

689: \boldw_m(i) a^{\len^*(i)} \\

690: \displaystyle

691: \qquad ~ =

692: \sum_{i=m+1}^\infty \boldp(i) a^{\len^*(i)}

693: - \sum_{i=m+1}^{m+\kval} \boldw_m(i) a^{\len^*(i)} .

694: \end{array}

695: $$ As $m \rightarrow \infty$ for $m \geq \kval-1$, the sums on the

696: right-hand side approach~$0$; the first is the difference between a

697: limit (an infinite sum) and its approaching sequence of finite sums,

698: all upper bounded in~(\ref{fininf}), and each of the terms in the

699: second summation is upper-bounded by a multiplicative constant of the

700: corresponding term in the first.  (In the latter finite summation,

701: terms are $0$ for $i>m+\kval$.)  Their difference therefore also

702: approaches zero, so the summations on the left-hand side approach

703: equality, as do those in (\ref{fininf}), and the Golomb code must be

704: optimal.

705: \end{proof}

706:

707: It is equivalent for the bits of the unary portion to be complemented,

708: that is, to use $\{0^{\lfloor j/\kval \rfloor} 1 b(j \bmod

709: \kval,\kval) : j \geq 0\}$ (as in \cite{GaVV}) instead of

710: $\{1^{\lfloor j/\kval \rfloor} 0 b(j \bmod \kval,\kval) : j \geq 0\}$

711: (as in \cite{Golo}).  It is also worth noting that Golomb originally

712: proposed his code in the context of a spy reporting run lengths; this

713: is similar to R\'{e}nyi's context for communications, related in

714: Section~\ref{intro} as a motivation for the nonlinear penalty with

715: $a<1$.

716:

717: A little algebra reveals that, for a distribution $\bigp_\theta$ and a Golomb

718: code with parameter $\kval$ (lengths $\biglen_\kval$),

719: \begin{equation}

720: \begin{array}{rcl}

721: \CampCost_a(\bigp_\theta,\biglen_\kval) &=& \displaystyle

722: \log_a \sum_{i=0}^\infty

723: (1-\theta)\theta^i a^{(\left\lceil\frac{i+1-z}{\kval} \right\rceil + g)} \\

724: \displaystyle

725: &=& g + {\log}_a

726: \left(1+\frac{(a-1)\theta^z}{1-a\theta^\kval}\right)

727: \end{array}

728: \label{geosum}

729: \end{equation}

730: where

731: $g=\lfloor \log_2 \kval \rfloor + 1$ and $z = 2^g - \kval$.

732: Therefore, Theorem~\ref{optgeo} provides the $\kval$ that minimizes

733: (\ref{geosum}).  If $a>0.5$, the corresponding R\'{e}nyi entropy is

734: \begin{equation}

735: H_{\alpha(a)}(\bigp_\theta) = \log_a

736: \frac{1-\theta}{(1-\theta^{\alpha(a)})^{1/\alpha(a)}}

737: \label{geoent}

738: \end{equation}

739: where we recall that $\alpha(a) = (1 +

740: \lg a)^{-1}$.  (Again, $a \leq 0.5$ is degenerate, an

741: optimal code being unary with no corresponding R\'{e}nyi entropy.)

742:

743: In evaluating the effectiveness of the optimal code, one might use the

744: following definition of {\defn average pointwise redundancy} (or just

745: {\defn redundancy}): $$\bar{R}_a(\biglen, \bigp) \definedas

746: \CampCost_a (\bigp,\biglen) - H_{\alpha(a)}(\bigp) .$$

747: For nondegenerate values, we can plot the $\bar{R}_a(\biglen_{\theta,a}^*,

748: \bigp_\theta)$ obtained from the minimization.  This is done for $a>1$

749: and $a<1$ in Fig.~\ref{aall}.  Note that as $a \rightarrow 1$, the

750: plot approaches the redundancy plot for the linear case, e.g.,

751: \cite{GaVV}, reproduced as Fig.~\ref{shannon}.

752:

753: In many potential applications of nonlinear penalties --- such as the

754: aforementioned for $a>1$\cite{Jeli,Humb2,BlMc} and $a<1$

755: (Section~\ref{intro}) --- $a$ is very close to~$1$.  Since the preceding

756: analysis shows that the Golomb code that is optimal for given $a$

757: and $\theta$ is optimal not only for these particular values, but for

758: a range of $a$ (fixing $\theta$) and a range of $\theta$ (fixing $a$),

759: the Golomb code for the traditional linear penalty is, in some sense,

760: much more robust and general than previously appreciated.

761:

762: \begin{figure}[t]

763: \psfrag{L-H}{\mbox{\huge $\bar{R}_1(\biglen_{\theta,1}^*,\bigp_\theta)$}}

764: \psfrag{THETA}{\mbox{\huge $\theta$}}

765:      \centering

766:      \resizebox{8cm}{!}{\includegraphics{Rlin.eps}}

767:      \caption{Redundancy of the optimal code for the geometric

768:      distribution with the traditional linear penalty.}

769:      \label{shannon}

770: \end{figure}

771:

772: \section{Other Infinite Sources}

773: \label{other}

774:

775: Abrahams noted that, in the linear case, slight deviation from the

776: geometric distribution in some cases does not change the optimal

777: code\cite[Proposition~(2)]{Abr1}.  Other extensions to and deviations

778: of the geometric distribution have also been

779: considered\cite{MSW,GoMa,BCSV}, including optimal codes for nonbinary

780: alphabets\cite{Abr1,GoMa}.  Many of these approaches can be adapted to

781: the nonlinear penalties considered here.  However, in this section we

782: instead consider another type of probability distribution for binary

783: coding, the type with a light tail.

784:

785: Humblet's approach\cite{Humb1}, later extended in \cite{KHN}, uses the

786: fact that there is an optimal code tree with a unary subtree for any

787: probability distribution with a relatively light tail, one for which

788: there is an $r$ such that, for all $j>r$ and $i<j$, $\boldp(i) \geq

789: \boldp(j)$ and $\boldp(i) \geq \sum_{k=j+1}^\infty \boldp(k)$.  Due to

790: the additive nature of Huffman coding, items beyond $r$ form the unary

791: subtree, while the remaining tree can be coded via the Huffman

792: algorithm.  Once again, this has to be modified for exponential

793: penalties.

794:

795: \begin{figure*}

796: \psfrag{  0x}{\mbox{\tiny $p(0)$}}

797: \psfrag{  1x}{\mbox{\tiny $p(1)$}}

798: \psfrag{  2x}{\mbox{\tiny $p(2)$}}

799: \psfrag{  3x}{\mbox{\tiny $p(3)$}}

800: \psfrag{  4x}{\mbox{\tiny $p(4)$}}

801: \psfrag{  5x}{\mbox{\tiny $p(5)$}}

802: \psfrag{  6x}{\mbox{\tiny $p(6)$}}

803: \psfrag{  7x}{\mbox{\tiny $p(7)$}}

804: \psfrag{  8x}{\mbox{\tiny $p(8)$}}

805: \psfrag{  9x}{\mbox{\tiny $p(9)$}}

806: \psfrag{  10x}{\mbox{\tiny $p(10)$}}

807: \psfrag{  11x}{\mbox{\tiny $p(11)$}}

808: \psfrag{  12x}{\mbox{\tiny $p(12)$}}

809: \psfrag{  13r}{\mbox{\tiny $w(13)$}}

810: \begin{center}

811: \resizebox{14cm}{!}{\includegraphics{inf_humb3d.eps}}

812: \caption{Formation of a unary-ended infinite code using a Huffman-like

813: code.  (Smaller weights are pictorially lower.)  Weights are merged

814: bottom-up, in a manner consistent with the exponential Huffman

815: algorithm, first in the (truncated) unary subtree, then as in the

816: exponential Huffman algorithm.}

817: \label{buildhumb}

818: \end{center}

819: \end{figure*}

820:

821: We wish to show that the optimal code can be obtained when there is a

822: nonnegative integer $r$ such that, for all $j>r$ and $i<j$, $$\boldp(i)

823: \geq \max\left(\boldp(j), \sum_{k=j+1}^\infty \boldp(k) a^{k-j}\right).$$

824: The optimal code is obtained by considering the reduced alphabet

825: consisting of symbols $0,1,\ldots,r+1$ with weights

826: \begin{equation}

827: \boldw(i) = \left\{

828: \begin{array}{ll}

829: \boldp(i),& i \leq r \\

830: \sum_{k=r+1}^\infty \boldp(k) a^{k-r},& i = r+1 . \\

831: \end{array}

832: \right.

833: \label{weights}

834: \end{equation}

835: Apply exponential Huffman coding to this reduced set of weights.  For

836: items $0$ through $r$, the Huffman codewords for the reduced and the

837: infinite alphabets are identical.  Each other item $i>r$ has a

838: codeword consisting of the reduced codeword for item $r+1$ (which,

839: without loss of generality, consists of all $1\s$) followed by the

840: unary code for $i-r-1$, that is, $i-r-1$ ones followed by a zero.  We

841: call such codes {\defn unary-ended}.  A pictorial example is shown in

842: Fig.~\ref{buildhumb} for a problem instance for which $r=12$.

843:

844: \begin{theorem}

845: Let $\boldp(\cdot)$ be a probability measure on the set of nonnegative

846: integers, and let $a$ be the parameter of the penalty to be optimized.

847: If there is a nonnegative integer $r$

848: such that for all $j>r$ and $i<j$,

849: \begin{equation}

850: \boldp(i) \geq \boldp(j)

851: \label{cond1}

852: \end{equation}

853: and

854: \begin{equation}

855: \boldp(i) \geq \sum_{k=j+1}^\infty \boldp(k) a^{k-j}

856: \label{cond2}

857: \end{equation}

858: then there exists a minimum-penalty binary prefix code with every

859: codeword~$j>r$ consisting of $j-x$ $1\s$ followed by one $0$ for some

860: fixed nonnegative integer~$x$.

861: \label{tailthm}

862: \end{theorem}

863:

864: \begin{proof}

865: The idea here is similar to that for geometric distributions, to show

866: a sequence of finite codes which in some sense converges to the

867: optimal code for the infinite alphabet.  In this case we consider the

868: infinite sequence of codes implicit in the above; for a given $m \geq -1$, the

869: corresponding codeword weights are

870: $$

871: \boldw_m(i) = \left\{

872: \begin{array}{ll}

873: \boldp(i),& i < r+m+2 \\

874: \sum_{k=r+m+2}^\infty \boldp(k) a^{k-r-m-1},& i = r+m+2. \\

875: \end{array}

876: \right.

877: $$  It is obvious that an optimal code for

878: each $m$-reduced distribution is identical to the proposed code for

879: the infinite alphabet, except for the item $r+m+2$, which is the

880: code tree sibling of item $r+m+1$.

881:

882: For $a<1$, we show, as in the geometric case, that the difference

883: between the penalties for the optimal and the proposed codes

884: approaches~$0$.  In this case, the equivalent of

885: inequality~(\ref{fininf}) is

886: \begin{equation}

887: \begin{array}{rcl}

888: \displaystyle

889: \log_a \sum_{i=0}^\infty \boldp(i) a^{\len^*(i)} &\leq&

890: \displaystyle

891: \log_a \sum_{i=0}^\infty \boldp(i) a^{\len_{\infty}(i)} \\

892: &=&

893: \displaystyle

894: \log_a \sum_{i=0}^{r+m+2} \boldw_m(i) a^{\len_m(i)} \\

895: &\leq&

896: \displaystyle

897: \log_a \sum_{i=0}^{r+m+2} \boldw_m(i) a^{\len^*(i)}

898: \end{array}

899: \label{fininf2}

900: \end{equation}

901: where in this case $n_\infty(i)$ denotes a codeword of the proposed

902: code, $n_m(i) = n_\infty(i)$ for $i<r+m+2$ and $n_m(i) =

903: n_\infty(i-1)$ for $i=r+m+2$, and, again, $\len^*(\cdot)$ denotes the

904: lengths of codewords in an optimal code.  The corresponding difference

905: between the exponent of the first and the last expressions of

906: (\ref{fininf2}) is

907: \begin{equation}

908: \begin{array}{l}

909: \displaystyle

910: \sum_{i=0}^\infty \boldp(i) a^{\len^*(i)} -

911: \sum_{i=0}^{r+m+2} \boldw_m(i) a^{\len^*(i)} \\

912: \displaystyle

913: \qquad =

914: \sum_{i=r+m+2}^\infty

915: \boldp(i) a^{\len^*(i)} - \boldw_m(r+m+2)

916: a^{\len^*(r+m+2)}.

917: \end{array}

918: \label{fininf3}

919: \end{equation}

920: As $m \rightarrow \infty$, both terms in the difference

921: on the second line of (\ref{fininf3}) clearly approach $0$, so the

922: terms in~(\ref{fininf2}) approach equality, showing the proposed code to

923: be optimal.

924:

925: For $a>1$, the same method will work, but it is not so obvious that

926: the terms in the difference on the second line of (\ref{fininf3})

927: approach~$0$.  Let us first find an upper bound for $\boldw_m(r+m+2)$

928: in terms of $\boldp(r+m+2)$:

929: \begin{eqnarray*}

930: \boldw_m(r+m+2)

931: &=& a\boldp(r+m+2)+a^2\boldp(r+m+3)+\\

932: && \displaystyle\qquad \sum_{i=r+m+4}^\infty \boldp(i) a^{i-r-m-1} \\

933: &\leq& (a^2+a)\boldp(r+m+2)+a^2\boldp(r+m+3) \\

934: &\leq& (2a^2+a)\boldp(r+m+2)

935: \end{eqnarray*}

936: where the first equality is due to the definition of

937: $\boldw_m(\cdot)$, the first inequality due to (\ref{cond2}), and the

938: second inequality due to (\ref{cond1}).  Thus $\boldw_m(r+m+2)$ has an

939: upper bound of $(2a^2+a)\boldp(r+m+2)$ for all $m \geq -1$.  In

940: addition, since the proposed code has a finite penalty --- identical

941: to that of any reduced code --- the optimal code has a finite penalty,

942: and the sequence of its terms --- each one of which has the form

943: $\boldp(r+m+2) a^{\len^*(r+m+2)}$ --- approaches $0$ as $m$ increases.

944: Thus $\boldw_m(r+m+2) a^{\len^*(r+m+2)}$ approaches $0$ as well.  Due

945: to the optimality of $\len^*(\cdot)$, $\boldw_m(r+m+2)

946: a^{\len^*(r+m+2)}$ serves as an upper bound for $\sum_{i=r+m+2}^\infty

947: \boldp(i) a^{\len^*(i)}$, and thus both terms approach~$0$.  As with

948: $a<1$, then, the terms in~(\ref{fininf2}) approach equality for $m

949: \rightarrow \infty$, showing the proposed code to be optimal.

950: \end{proof}

951:

952: The rate at which $\boldp(\cdot)$ must decrease in order to satisfy

953: condition~(\ref{cond2}) clearly depends on $a$.  One simple sufficient

954: condition --- provable via induction --- is that it satisfy $\boldp(i)

955: \geq a \boldp(i+1) + a \boldp(i+2)$ for large $i$.  A less general

956: condition is that $\boldp(i)$ eventually decrease at least as quickly

957: as $g^i$ where $g = (\sqrt{1+4/a}-1)/2$, the same ratio needed for a

958: unary geometric code for $\theta=g$, as in~(\ref{ineq}).  The ratio

959: $g$ is plotted in Fig.~\ref{ga}.

960:

961: \begin{figure}[t]

962: \psfrag{a}{$a$}

963: \psfrag{ag}{\mbox {\huge $a$}}

964: \psfrag{g}{\mbox {\huge $g$}}

965:      \centering

966:      \resizebox{8cm}{!}{\includegraphics{ga2.eps}}

967:      \caption{Ratio $g$, probability distribution fall-off sufficient

968:      for the optimality of a unary-ended code.  Note that

969:      $1/g = \Phi \definedas \frac{1}{2}(1+\sqrt{5})$,

970:      the golden ratio, at $a=1$.}

971:      \label{ga}

972: \end{figure}

973:

974: For $a \rightarrow 1$, these conditions approach those derived in

975: \cite{Humb1}.  The stronger results of \cite{KHN} do not easily extend

976: here due to the nonadditivity of the exponential penalty.  An attempt

977: at such an extension in \cite[pp.~103--105]{Baer} gives no criteria

978: for success, so that, while one could produce certain codewords for

979: certain codes, one might fail in producing other codewords for the

980: same codes or for other codes.  Thus this extension is not truly a

981: workable algorithm.

982:

983: Consider the example of optimal codes for the Poisson distribution,

984: $$\boldp_\lambda(i)=\frac{\lambda^i e^{-\lambda}}{i!} . $$ How does

985: one find a suitable value for $r$ (as in Section~\ref{other}) in such

986: a case?  It has been shown that $r \geq \lceil e \lambda \rceil - 1$

987: yields $\boldp(i) \geq \boldp(j)$ for all $j>r$ and $i<j$, satisfying

988: the first condition of Theorem~\ref{tailthm} \cite{Humb1}.  Moreover,

989: if, in addition, $j \geq \lceil 2 a \lambda \rceil - 1$ (and thus $j >

990: a \lambda - 1$), then

991: \begin{eqnarray*}

992: \sum_{k=1}^\infty \boldp(j+k)a^k

993: &=& \frac{e^{-\lambda}\lambda^j}{j!}\left[

994: \frac{a \lambda}{j+1} + \frac{a^2 \lambda^2}{(j+1)(j+2)} + \cdots \right] \\

995: &<& \boldp(j) \left[\frac{a \lambda}{j+1} + \frac{a^2 \lambda^2}{(j+1)^2} + \cdots \right] \\

996: &=& \boldp(j) \frac{\frac{a \lambda}{j+1}}{1-\frac{a \lambda}{j+1}} \\

997: &\leq& \boldp(j) \\

998: &\leq& \boldp(i) .

999: \end{eqnarray*}

1000: Thus, since we consider $j > r$, $r = \max(\lceil 2 a \lambda \rceil -

1001: 2, \lceil e \lambda \rceil - 1)$ is sufficient to establish an $r$

1002: such that the above method yields the optimal infinite-alphabet code.

1003:

1004: In order to find the optimal reduced code, use

1005: $$\boldw_{-1}(r+1)=\sum_{k=r+1}^\infty \boldp(k) a^{k-r} = a^{-r}e^{\lambda(a-1)} - \sum_{k=0}^r \boldp(k) a^{k-r} .

1006: $$  For example, consider the Poisson distribution with $\lambda = 1$.  We

1007: code this for both $a=1$ and $a=2$.  For both values, $r = 2$, so both

1008: are easy to code.  For $a=1$, $\boldw_{-1}(3) = 1 - 2.5 e^{-1} \approx

1009: 0.0803 \ldots$, while, for $a=2$, $\boldw_{-1}(3) = 0.25 e - 1.25

1010: e^{-1} \approx 0.2197 \ldots$.  After using the appropriate Huffman

1011: procedure on each reduced source of $4$ weights, we find that the

1012: optimal code for $a=1$ has lengths $\biglen = \{1, 2, 3, 4, 5, 6, \ldots\}$

1013: --- those of the unary code --- while the optimal code for $a=2$ has

1014: lengths $\biglen = \{2, 2, 2, 3, 4, 5, \ldots\}$.

1015:

1016: It is worthwhile to note that these techniques are easily extensible

1017: to finding an optimal alphabetic code --- that is, one with $c(i)\s$

1018: arranged in lexicographical order --- for $a>1$.  One needs only to

1019: find the optimal alphabetic code for the reduced code with weights

1020: given in equation~(\ref{weights}), as in \cite{HKT}, with codewords

1021: for $i>r$ consisting of the reduced code's codeword for $r+1$ followed

1022: by $i-r-1$ ones and one zero.  As previously mentioned, Golomb codes

1023: are also alphabetic and thus are optimal alphabetic codes for the

1024: geometric distribution.

1025:

1026: \section{Application: Buffer Overflow}

1027: \label{application}

1028:

1029: The application of the exponential penalty in \cite{Humb2} concerns

1030: minimizing the probability of a buffer overflowing.  It requires that

1031: each candidate code for overall optimality be an optimal code

1032: for one of a series of exponential parameters ($a\s$ where $a>1$).  An

1033: iterative approach yields a final output code by noting that, for the

1034: overall utility function, each candidate code is no worse than

1035: its predecessor, and there are a finite number of possible candidate

1036: codes.  Therefore, eventually a candidate code yields the same value

1037: as the prior candidate code, and this can be shown to be the optimal

1038: code.  This application of exponential Huffman coding can, using the

1039: above techniques, be extended to infinite alphabets.

1040:

1041: In the application, integers with a known distribution $\bigp$ arrive

1042: with independent intermission times having a known probability density

1043: function.  Encoded bits are sent at a given rate, with bits to be sent

1044: waiting in a buffer of fixed size.  Constant $b$ represents the buffer

1045: size in bits, random variable $T$ represents the probability

1046: distribution of source integer intermission times measured in units of

1047: encoded bit transmission time, and function $A(s)$ is the

1048: Laplace-Stieltjes transform of $T$, $\E[e^{-sT}]$.

1049:

1050: When the integers are coded using $\biglen = \{\len(i)\}$, the

1051: probability per input integer of buffer overflow is of the order of

1052: $e^{-s^*b}$, where $s^*$ is the largest $s$ such that

1053: $$

1054: f(\biglen,s) \leq 1

1055: $$

1056: where

1057: \begin{equation}

1058: f(\biglen,s) \definedas A(s) \sum_{i=0}^\infty

1059: \boldp(i)e^{s\len(i)} .

1060: \label{buff}

1061: \end{equation}

1062:

1063: The previously known algorithm to maximize $s^*$ is as follows:

1064:

1065: {\bf Procedure for Finding Code with Largest $s^*$} \cite{Humb2}

1066:

1067: \begin{enumerate}

1068: \item Choose any $s_0 \in \Rp$.

1069: \item $j \leftarrow 0$.

1070: \item $j \leftarrow j+1$.

1071: \item Find codeword lengths $\biglen_j$ minimizing $\sum_i \boldp(i) e^{s_{j-1}

1072: \len(i)}$.

1073: \item Compute $s_j \definedas \max\{s \in \R : f(\biglen_j,s) \leq 1\}$.

1074: \item If $s_j \neq s_{j-1}$ then go to step 3; otherwise stop.

1075: \end{enumerate}

1076:

1077: We can use the above methods in order to accomplish step 4, but we

1078: still need to examine how to modify steps 1 and 5 for an infinite

1079: input alphabet.

1080:

1081: First note that, unlike in the finite case, $s^*<\infty$, that is, there

1082: always exists an $s^* \in \Rp$ such that, for all $s>s^*$,

1083: $f(\biglen,s) > 1$.  For any stable system, the buffer cannot receive

1084: integers more quickly than it can transit bits, so there is a positive

1085: probability that $\P[T \geq 1]$.  Thus the Laplace-Stieltjes transform

1086: $A(s)$ exceeds $c_1 e^{-s}$ for some constant $c_1>0$.  Also, without

1087: loss of generality, we can assume that $\boldp(i)$ is monotonic

1088: nonincreasing and an optimal $\len(i)$ is monotonic nondecreasing.

1089: This monotonicity means that $\len(i) \geq \lg i$, and there is no

1090: exponential base $a_0$ and offset constant $c_2$ for which

1091: $\sum_{i=0}^\infty \boldp(i) e^{s\len(i)} \leq a_0^{s+c_2}$ for all~$s

1092: \in \Rp$.  Thus the summation in~(\ref{buff}) must increase

1093: superexponentially, and, multiplying the $A(s)$ and summation terms,

1094: there is an $s$ such that $f(\biglen,s)>1$ for $s>s^*$.

1095:

1096: For step 1, the initial guess proposed in \cite{Humb2} is an upper

1097: bound for all possible values of $s^*$.  The R\'{e}nyi entropy of

1098: $\bigp$ is used to find an initial guess using

1099: \begin{equation}

1100: A(s) \left(\sum_{i=0}^\infty \boldp(i)^{\frac{1}{1+\lg e^s}}\right)^{1+\lg e^s}

1101: \leq A(s) \sum_{i=0}^\infty \boldp(i)e^{s\len(i)},

1102: \label{humbbound}

1103: \end{equation}

1104: and choosing $s_0$ as the largest $s$ such that the left term of

1105: (\ref{humbbound}) is no greater than one.  Thus, $s_0 \geq s^*$ for any

1106: value of $s^*$ corresponding to step 5.

1107:

1108: This technique is well-suited to a geometric distribution, for which

1109: entropy has the closed form shown in equation (\ref{geoent}), so

1110: $$A(s) \cdot \frac{1-\theta}{\left(1-\theta^{(1+\lg

1111: e^s)^{-1}}\right)^{1+\lg e^s}} \leq f(\biglen,s).$$ However, a general

1112: distribution with a light tail, such as the Poisson distribution,

1113: might have no closed form for this bound.  One solution to this is to

1114: use more relaxed lower bounds on the sum --- such as using a partial

1115: sum with a fixed number of terms --- yielding looser upper bounds

1116: for~$s^*$.  Another approach would be to note that, because of the

1117: light tail, the infinite sum can usually be quickly calculated to the

1118: precision of the architecture used.  Note, however, that no matter

1119: what the technique, the bound must be chosen so that $s_0$ is an

1120: real number and not infinity.  Partial sums may be refined to accomplish

1121: this.

1122:

1123: In calculating $f(\biglen,s)$ for use in step 5, the geometric

1124: distribution has the closed-form value for $f$ obtainable from

1125: equation (\ref{geosum}), while the other distributions must instead

1126: rely on approximations of~$f$.  As before, this is easily done due to

1127: the light tail of the distribution.  Alternatively, a partial sum and

1128: a geometric approximation can be used to bound $f(\biglen,s)$ and thus

1129: $s^*$, and these two bounds used to find two codes.  If the two codes

1130: are identical, the algorithm may proceed; otherwise, we must roll back

1131: to the summation and improve the bounds until the codes are identical.

1132:

1133: These variations make the steps of the algorithm possible, but the

1134: algorithm itself must also be proven correct with the variations.

1135:

1136: \begin{theorem}

1137: Given a geometric distribution or an input distribution satisfying the

1138: conditions of Theorem~\ref{tailthm} for $a=e^{s_0}$, where $s_0$ is an

1139: upper-bound on $s^*$, the above Procedure for Finding Code

1140: with Largest $s^*$ terminates with an optimal code.

1141: \label{qthm}

1142: \end{theorem}

1143:

1144: \begin{proof}

1145: The number of codes that can be generated in the course of running the

1146: algorithm should be bounded so that the algorithm is guaranteed to

1147: terminate.  Optimality for the algorithm then follows as for the

1148: finite case~\cite{Humb2}.  As in the finite case, $s_{j+1} \geq s_j$

1149: for $j \geq 1$ (but not $j=0$) due to step 5 [$f(\biglen_j,s_j) \leq

1150: 1$], step 4 [$f(\biglen_{j+1},s_j) \leq f(\biglen_j,s_j)$], and the

1151: definition of $s_{j+1}$.

1152:

1153: In the case of a geometric distribution,

1154: each $\biglen_j$ is a Golomb code G$\kval_j$ for some positive

1155: integer~$\kval_j$.  Clearly, if we choose $s_0$ as detailed above, it

1156: is the greatest value of $s_j$, being either optimal or unachievable

1157: due to its derivation as a bound of the problem.  Since

1158: $\mbox{G}\kval_i$ (with lengths $\biglen_i$) is the optimal code for

1159: the code with exponential base $a=e^{s_{i-1}}$, (\ref{ineq}) means

1160: that $\theta^{\kval_i} + \theta^{\kval_i+1} \leq e^{-s_{i-1}} <

1161: \theta^{\kval_i-1} + \theta^{\kval_i}$, and thus

1162: $$(1+\theta)\theta^{\kval_1} \leq e^{-s_0} \leq e^{-s_{j-1}} <

1163: (1+\theta)\theta^{\kval_j-1}$$ and, since $\theta < 1$, we have

1164: $\kval_j-1 < \kval_1$ (or, equivalently, $\kval_j \leq \kval_1$) for all

1165: $j \geq 1$.  Therefore, there are only $\kval_1$ possible codes the

1166: algorithm can generate.

1167:

1168: In the case of a distribution with a lighter tail, the minimum $r$ of

1169: Theorem~\ref{tailthm} increases with each iteration after the first,

1170: and the first $r_1$ (corresponding to $s_0$) upper bounds the

1171: remaining $r_i$.  Thus all candidate codes can be specified by their

1172: first $r_1$ codeword lengths, none of which is greater than $r_1$.

1173: The number of codes is then bounded for both cases, and the algorithm

1174: terminates with the optimal code.

1175: \end{proof}

1176:

1177: \section{Redundancy penalties}

1178: \label{nonexp}

1179:

1180: It is natural to ask whether the above results can be extended to

1181: other penalties.  One penalty discussed in the literature is that of

1182: maximal pointwise redundancy\cite{DrSz}, which is

1183: $$R^*(\biglen,\bigp) \definedas \sup_{i \in \X} [\len(i)+\lg

1184: \boldp(i)]$$ where we use $\sup$ when we are not assured the existence

1185: of a maximum.  This can be shown to be a limit of the exponential

1186: case, as in \cite{Baer05}, allowing us to analyze its minimization

1187: using the same techniques as exponential Huffman coding.  This limit

1188: can be shown by defining {\defn $d$th exponential redundancy} as

1189: follows:

1190: \begin{eqnarray*}

1191: R_d(\biglen,\bigp) &\definedas&

1192: \frac{1}{d} \lg \sum_{i \in \X} \boldp(i) 2^{d\left(\len(i)+\lg \boldp(i)\right)} \\

1193:  &=& \frac{1}{d} \lg

1194: \sum_{i \in \X} \boldp(i)^{1+d} 2^{d\len(i)}.

1195: \end{eqnarray*}

1196: Thus $R^*(\biglen,\bigp) = \lim_{d \rightarrow \infty}

1197: R_d(\biglen,\bigp)$, and the above methods should apply in the limit.

1198: In particular:

1199:

1200: \begin{theorem}

1201: The Golomb code G$\kval$ for $\kval = \lceil -1/\lg \theta \rceil$

1202: is optimal for minimizing maximal pointwise redundancy for $\bigp_\theta$.

1203: \label{optmmr}

1204: \end{theorem}

1205:

1206: \begin{figure*}%[htp]

1207: \psfrag{DABRRR}{$R_d(\biglen_{\theta,a,d}^*,\bigp_\theta)$}

1208: \psfrag{Theta}{$\theta$}

1209: \psfrag{theta}{$\theta$}

1210: \psfrag{THETA}{\mbox{\huge $\theta$}}

1211:      \centering

1212: \begin{picture}(0,0)%

1213: \includegraphics{radrat.eps}%

1214: \end{picture}%

1215: \setlength{\unitlength}{1865sp}%

1216: %

1217: \begingroup\makeatletter\ifx\SetFigFont\undefined%

1218: \gdef\SetFigFont#1#2#3#4#5{%

1219:   \reset@font\fontsize{#1}{#2pt}%

1220:   \fontfamily{#3}\fontseries{#4}\fontshape{#5}%

1221:   \selectfont}%

1222: \fi\endgroup%

1223: \begin{picture}(16332,6837)(-8,-6007)

1224: \put(4051,-5911){\makebox(0,0)[b]{\smash{{\SetFigFont{8}{9.6}{\familydefault}{\mddefault}{\updefault}{(a) $\theta \in (0.5,1)$}%

1225: }}}}

1226: \put(12511,-5911){\makebox(0,0)[b]{\smash{{\SetFigFont{8}{9.6}{\familydefault}{\mddefault}{\updefault}{(b) $\theta \in (2^{-0.1},2^{-0.001})$, with $x$-axis $\propto \lg (- 1/{\lg \theta})$}%

1227: }}}}

1228: \end{picture}%

1229:      \caption{Maximal pointwise redundancy of the optimal maximal

1230:      redundancy code for the geometric distribution, solid

1231:      (with discontinuities represented by dashed); optimal $d$th exponential

1232:      redundancy for the geometric distribution, dotted for

1233:      $d=\{1,2,4,16,256,65536\}$, from lowest to highest.}

1234:      \label{mmr}

1235: \end{figure*}

1236:

1237: \begin{proof}

1238:

1239: {\it Case 1:} Consider first when $-1/\lg \theta$ is not an integer.

1240: We show that $\kval = \lceil -1/\lg \theta\rceil$ is optimal by

1241: finding a $D$ such that, for all $d > D$, the optimal code for the

1242: $d$th exponential redundancy penalty is G$\kval$.  For

1243: a fixed $d$, (\ref{ineq}) implies that such a code should satisfy

1244: \begin{equation}

1245: (\theta^{1+d})^\kval + (\theta^{1+d})^{\kval+1} \leq \frac{1}{2^d} <

1246: (\theta^{1+d})^{\kval-1} + (\theta^{1+d})^\kval,

1247: \label{dineq}

1248: \end{equation}

1249: and thus we wish to show that this holds for all $d > D$.  Consider

1250: $\kvals = \lceil -1/\lg \theta\rceil$.  Clearly, $\kvals >

1251: -1/\lg \theta$, or, equivalently,

1252: \begin{equation}

1253: \theta^\kvals < \frac{1}{2}.

1254: \label{mmr1}

1255: \end{equation}

1256: Now consider $$D=-1+\frac{1}{1+(\kvals-1)\lg \theta}$$ so that

1257: $(\kvals-1)\lg \theta \in (-1,0]$ and therefore $D \geq 0.$ Taken

1258: together with the fact that $\theta \in (0,1)$, (\ref{mmr1}) yields

1259: $\theta^{d\kval} < 2^{-d}$ and $(1+\theta^{1+d})\theta^\kval <

1260: 2\theta^k < 1$.  Multiplication yields the left-hand side of

1261: (\ref{dineq}) for any $d > D$.  For any such $d$, algebra easily shows

1262: that we also have the inequality $(2\theta^{\kvals-1})^{1+d} \geq 2$,

1263: yielding

1264: \begin{eqnarray*}

1265: \left[(\theta^{1+d})^{\kvals-1}+(\theta^{1+d})^{\kvals}\right]2^d

1266: &=& \frac{1}{2}(2\theta^{\kvals-1})^{1+d} +

1267: \frac{1}{2}(2\theta^{\kvals})^{1+d} \\

1268: &=& \frac{1}{2}(2\theta^{\kvals})^{1+d}

1269: (\theta^{-1-d}+1) \\

1270: &=& \frac{1}{2}(2\theta^{\kvals-1})^{1+d}

1271: (1+\theta^{1+d}) \\

1272: &>& 1 .

1273: \end{eqnarray*}

1274: This is equivalent to the right-hand side of

1275: inequality~(\ref{dineq}) for the values implied by the definition of

1276: $R_d(\biglen,\bigp)$.  Then G$\kvals$ is an optimal code for

1277: $d > D$, and thus for the limit case of maximal pointwise redundancy.

1278:

1279: {\it Case 2:} Now consider when $-1/\lg \theta$ is an integer.  It

1280: should be noted that, for the traditional (linear) penalty, these are

1281: precisely the $\kval$ values that Golomb considered in his original

1282: paper\cite{Golo} and that they are local infima for the minimum

1283: maximal pointwise redundancy function in~$\theta$, as in

1284: Fig.~\ref{mmr}.  Here we show they are local minima.

1285:

1286: Since $\theta=0.5$ is a dyadic probability distribution and thus

1287: trivial, we can assume that $\theta > 0.5$.  We wish to show that

1288: optimality is preserved in these right limits of Case~1.  Note that,

1289: for each $i$ with fixed $\biglen$,

1290: $$\lim_{\theta' \uparrow \theta} \left[\len(i) + \lg

1291: \boldp_{\theta'}(i) \right] = \len(i) + \lg \boldp_{\theta}(i).$$ This

1292: is of particular interest for the value of $i$ maximizing pointwise

1293: redundancy for G$\kval$ at $\theta'$, where $\theta' \in

1294: (\theta^{1/\lg 2\theta}, \theta)$, allowing us to use the right limit

1295: of $\theta$.  Let $i^{**} \definedas 2^{\lceil \lg \kval \rceil}-\kval$, the

1296: smallest $i$ which has codeword length exceeding the codeword length

1297: for item~$0$.  Clearly the pointwise redundancy for this value is

1298: greater than that for all items with $i<i^{**}$, since they are one

1299: bit shorter but not more than twice as likely.  Similarly, items in

1300: $(i^{**},\kval)$ have identical length but lower probability, and thus

1301: smaller redundancy.  For items with $i \geq \kval$, note that the

1302: redundancy of items in the sequence $\{j, j+\kval, j+2\kval, \ldots\}$

1303: for any $j$ must be nonincreasing because the difference in redundancy

1304: is constant yet redundancy is upper-bounded by the maximum.  Thus

1305: $i^{**}$ maximizes pointwise redundancy for G$\kval$ at $\theta'$.

1306:

1307: We know the pointwise redundancy of $i^{**}$ for G$\kval$

1308: at $\theta$, although we have yet to show that $i^{**}$ yields the

1309: maximal pointwise redundancy for G$\kval$ at $\theta$ or that G$\kval$

1310: minimizes maximal pointwise redundancy.  However, for any code,

1311: including the optimal code, as a result of pointwise continuity,

1312: \begin{eqnarray*}

1313: \sup_{i \in \X_\infty} [\len(i)+\lg \boldp_\theta(i)] &\geq& \len(i^{**}) + \lg

1314: \boldp_\theta(i^{**}) \\ &=& \lim_{\theta' \uparrow \theta} [\len(i^{**}) +

1315: \lg \boldp_{\theta'}(i^{**})] .

1316: \end{eqnarray*}

1317: From the above discussion, it is clear that the right-hand side is

1318: minimized by the Golomb code with $\kval=-1/\lg \theta$, so, because

1319: the left-hand side achieves same value with this code, the left-hand

1320: side is indeed minimized by G$\kval$.  Thus this code

1321: minimizes maximal pointwise redundancy for~$\theta$.  The

1322: corresponding maximal pointwise redundancy is

1323: $$

1324: \begin{array}{l}

1325: \max_i [\len_\theta^{**}(i)+\lg \boldp_\theta(i)] \\

1326: \begin{array}{rcl}

1327: &=& \len_\theta^{**}(2^{\lceil \lg \kval \rceil}-\kval) +\lg

1328: \boldp_\theta(2^{\lceil \lg \kval \rceil}-\kval) \\

1329: &=& \lceil

1330: \lg \kval \rceil + 1 + \lg(1-\theta) + (2^{\lceil \lg \kval

1331: \rceil}-\kval) \lg \theta

1332: \end{array}

1333: \end{array}

1334: $$

1335: where $\biglen_\theta^{**} = \{\len_\theta^{**}(i)\}$ is defined as

1336: the lengths of a code minimizing maximal pointwise redundancy.  Note

1337: that this is the redundancy for all items $i=2^{\lceil \lg \kval

1338: \rceil}+ j \kval$ with integer $j \geq -1$.

1339: \end{proof}

1340:

1341: It is worthwhile to observe the behavior of maximal pointwise

1342: redundancy in a fixed (not necessarily optimal) Golomb code with

1343: length distribution $\biglen_\kval$.  The maximal pointwise redundancy

1344: $$R^*(\biglen_\kval,\bigp_\theta) = \sup_{i \in \X_\infty}

1345: [\len_\kval(i)+\lg \boldp_\theta(i)]$$ decreases with increasing

1346: $\theta$ --- and is an optimal code for $\theta \in (2^{-1/(\kval-1)},

1347: 2^{-1/\kval}]$ --- until $\theta$ exceeds $2^{-1/\kval}$, after which

1348: there is no maximum, that is, pointwise redundancy is unbounded.  This

1349: explains the discontinuous behavior of minimum maximal redundancy for

1350: an optimal code as a function of $\theta$, illustrated in

1351: Fig.~\ref{mmr}, where each continuous segment corresponds to an

1352: optimal code for $\theta \in (2^{-1/(\kval-1)}, 2^{-1/\kval}]$.

1353:

1354: Note also the oscillating behavior as $\theta \uparrow 1$.  We show in

1355: Appendix~\ref{maxred} that $\lim \inf_{\theta \uparrow 1}

1356: R^*(\biglen_\theta^{**},\bigp_\theta) = 1-\lg \lg e$ and $\lim

1357: \sup_{\theta \uparrow 1} R^*(\biglen_\theta^{**},\bigp_\theta) = 2 -

1358: \lg e$, and we characterize this oscillating behavior.  This technique

1359: is extensible to other redundancy scenarios of the kind introduced

1360: in~\cite{Baer05}.

1361:

1362: For distributions with light tails, one can use a technique much like

1363: the technique of Theorem~\ref{tailthm} in Section~\ref{other}.  First

1364: note that this requires, as a necessary step, the ability to construct

1365: a minimum maximal pointwise redundancy code for finite alphabets.

1366: This can be done either with the method in \cite{DrSz} or any of those

1367: in \cite{Baer05}, the simplest of which uses a variant of the

1368: tree-height problem\cite{Park}, solved via a different extension of

1369: Huffman coding.  Simply put, the weight combining rule, rather than

1370: $\boldw(j) + \boldw(k)$ or $a \cdot (\boldw(j) + \boldw(k))$, is

1371: \begin{equation}

1372: \tilde{\boldw}(j) = 2\cdot\max(\boldw(j),\boldw(k)).

1373: \label{maxrule}

1374: \end{equation}

1375: This rule is used to create an optimal code with lengths

1376: $\biglen^{(r)}$ for $\bigw^{(r)} \definedas \{\boldp(0), \boldp(1),

1377: \ldots, \boldp(r), 2\boldp(r+1)\}$, assuming a unary subtree for items

1378: with index $i\geq r$ (and no other items) is part of an optimal code

1379: tree.  As in the coding method corresponding to Theorem~\ref{tailthm},

1380: the codewords for items $0$ through $r$ of this reduced code are

1381: identical to those of the infinite alphabet.  Each other item $i>r$

1382: has a codeword consisting of the reduced codeword for $r+1$ followed

1383: by the unary code for $i-r-1$, that is, $i-r-1$ ones followed by a

1384: zero.

1385:

1386: A sufficient condition for using this method is finding an $r$ such that

1387: $$\mbox{for all } i<r,~\boldp(i) \geq \boldp(r)$$

1388: and

1389: $$\mbox{for all } j \geq r,~\boldp(j) \geq 2 \boldp(j+1).$$ For

1390: such~$j$, pointwise redundancy is nonincreasing along a unary subtree,

1391: as

1392: \begin{eqnarray*}

1393: \len(j) + \lg \boldp(j) &=& \len(j+1) + \lg (\boldp(j)/2) \\

1394: &\geq& \len(j+1) + \lg \boldp(j+1).

1395: \end{eqnarray*}

1396: The aforementioned coding method works because, for each $j$, an

1397: optimal subtree consisting of the items with index $i\geq j$ and

1398: higher has $\len(i) = \len(j) - j + i$; this subtree is optimal because the

1399: weight of the root node of {\it any} subtree cannot be less than

1400: $2\boldp(j)$.  A formal proof, similar to that of

1401: Theorem~\ref{tailthm}, is omitted in the interest of space.

1402:

1403: For a Poisson random variable, $r = \lceil e \lambda \rceil - 1$

1404: satisfies this condition, since, for $i < r \leq j$, $\boldp(i) \geq

1405: \boldp(r)$ (as in \cite{Humb1}), and

1406: $$\boldp(j) = \frac{j+1}{\lambda} \boldp(j+1) \geq \frac{r+1}{\lambda}

1407: \boldp(j+1) \geq e\boldp(j+1) > 2\boldp(j+1).$$  Thus such a random

1408: variable can be coded in this manner.

1409:

1410: Note that other sufficient conditions can be obtained through

1411: alternative methods.  One simple rule is that any code for which $p(i)

1412: \leq 2^{-i}p(0)$ for all $i > 0$ will necessarily have $\len(0) + \lg

1413: p(0)$ minimized by letting $\len(0)=1$, and this will be the maximum

1414: redundancy if $\len(i)=i-1$ in general.  For example, a unary tree

1415: optimizes $\bigp = \{0.6, 0.15, 0.15, 0.0375, 0.0375, \ldots\}$, since

1416: $\lg 1.2 \approx 0.263$ is a lower bound on maximal pointwise

1417: redundancy for any code given $p(1)=0.6$, and this bound is achieved

1418: for the unary code.  If viewed as a rule for a unary subtree, this is

1419: looser than the above, since, unlike linear and exponential penalties,

1420: not all subtrees of the subtree need be optimal.  Other relaxations

1421: can be obtained, although, as they are usually not needed, we do not

1422: discuss them here.

1423:

1424: \section{Conclusion}

1425:

1426: The aforementioned methods for coding integers are applicable to

1427: geometric and light-tailed distributions with exponential and related

1428: penalties.  Although they are not direct applications of Huffman

1429: coding, per se, these methods are derived from the properties of

1430: generalizations of the Huffman algorithm.  This allows examination of

1431: subtrees of a proposed optimal code independently of the rest of the

1432: code tree, and thus specification of finite codes which in some sense

1433: converge to the optimal integer code.  Different penalties --- e.g.,

1434: $\varphi(x) = x^2$, implying the minimization of $\sqrt{\sum_i

1435: \boldp(i) \len(i)^2}$ --- do not share this independence property, as

1436: an optimal code tree with optimal subtrees need not exist.  Thus

1437: finding an optimal code for such penalties is more difficult.  There

1438: should, however, be cases in which this is possible for convex

1439: $\varphi$ which grow more slowly than some exponential.

1440:

1441: Another extension of this work would be to find coding algorithms for

1442: other probability mass functions under the nonlinear penalties already

1443: considered, e.g., to attempt to use the techniques of

1444: \cite[pp.~103--105]{Baer} for a more reliable algorithm.  Other

1445: possible extensions and generalizations involve variants of geometric

1446: probability distributions; in addition to the one we mentioned that is

1447: analogous to Proposition~(2) in \cite{Abr1}, there are others in

1448: \cite{MSW, GoMa, BCSV}.  Extending these methods to nonbinary codes

1449: should also be feasible, following the approaches in \cite{Abr1} and

1450: \cite{KHN}.  Finally, as a nonalgorithmic result, it might be

1451: worthwhile to characterize {\it all} optimal codes --- not merely

1452: finding {\it an} optimal code --- as in \cite[p.~289]{Goli2}.

1453:

1454: \section*{Acknowledgments}

1455:

1456: The author wishes to thank the anonymous reviewers, David

1457: Morgenthaler, and Andrew Brzezinski for their suggestions in improving

1458: the rigor and clarity of this paper.

1459:

1460: \appendices

1461: \section{Optimal Maximal Redundancy Golomb Codes for Large~$\theta$}

1462: \label{maxred}

1463:

1464: Let us calculate optimal maximal redundancy as a function of $\theta

1465: \geq 0.5$:

1466: $$

1467: \begin{array}{rcl}

1468: R^*(\biglen_\theta^{**},\bigp_\theta)

1469: &=& \max_i \len_\theta^{**}(i) + \lg \boldp_\theta(i) \\

1470: &=& 1 +

1471: \left\lceil \lg \lceil - \frac{1}{\lg \theta}

1472: \rceil \right\rceil + \\

1473: &&\lg (1-\theta) + \\

1474: &&\left(2^{\left\lceil \lg \lceil - \frac{1}{\lg \theta}

1475: \rceil \right\rceil} - \left\lceil - \frac{1}{\lg \theta} \right\rceil

1476: \right)\lg \theta \\

1477: &=& 1 - \left\lceil -\frac{1}{\lg \theta} \right\rceil \lg \theta + \\

1478: &&\lg \left( - \frac{1-\theta}{\lg \theta} \right) - \\

1479: && 2^{\left\lceil \lg \left(- \frac{1}{\lg \theta}\right)

1480: \right\rceil - \lg (- \frac{1}{\lg \theta})} + \\

1481: &&\left\lceil \lg \left(- \frac{1}{\lg \theta}\right)

1482: \right\rceil - \lg \left(- \frac{1}{\lg \theta}\right) \\

1483: &=& 2 + \lg \left( - \frac{1-\theta}{\lg \theta} \right) -

1484: \left\lceil -\frac{1}{\lg \theta} \right\rceil \lg \theta - \\

1485: && 2^{1-\langle \lg (- \frac{1}{\lg \theta})

1486: \rangle} - \left\langle \lg \left(- \frac{1}{\lg \theta}\right)

1487: \right\rangle,

1488: \end{array}

1489: $$

1490: where $\langle x \rangle$ denotes the fractional

1491: part of $x$, i.e., $\langle x \rangle \definedas x - \lfloor x \rfloor$, since

1492: $$\left\lceil \lg \lceil - \frac{1}{\lg \theta} \rceil \right\rceil =

1493: \left\lceil \lg \left( - \frac{1}{\lg \theta} \right) \right\rceil$$

1494: for $\theta > 0.25$ (and thus for $\theta \geq 0.5$).  Using the

1495: Taylor series expansion about $\theta = 1$, we find

1496: $$

1497: \lg \left( - \frac{1-\theta}{\lg \theta} \right) = - \lg \lg e -

1498: (\lg \sqrt{e})(1-\theta)+\order((1-\theta)^2)

1499: $$

1500: where $e$ is the base of the natural logarithm.

1501: Additionally,

1502: $$-\left\lceil-\frac{1}{\lg \theta} \right\rceil \lg \theta = 1 +

1503: \order(1-\theta).$$ Note that this actually oscillates between $1$ and

1504: $1+(1-\theta)\lg e$ in the limit, so this first-order asymptotic term

1505: cannot be improved upon.  However, the remaining terms

1506: \begin{equation}

1507: 2-2^{1-\langle \lg (- \frac{1}{\lg \theta})

1508: \rangle} - \left\langle \lg \left(- \frac{1}{\lg \theta}\right)\right\rangle

1509: \label{osc}

1510: \end{equation}

1511: oscillate in the zero-order term.  Assigning $x = \langle \lg (-

1512: 1/\lg \theta)\rangle$, we find that (\ref{osc}) achieves

1513: its minimum value, $0$, at $0$ and $1$.  The maximum point is easily

1514: found via a first derivative test.  This point is achieved at $x=1-\lg

1515: \lg e$, at which point (\ref{osc}) achieves the maximum value $1-\lg

1516: e+\lg \lg e$.

1517: Thus, gathering all terms,

1518: $$\lim \inf_{\theta \uparrow 1} R^*(\biglen_\theta^{**},\bigp_\theta) = 1-\lg

1519: \lg e = 0.4712336270 \ldots,$$ $$\lim \sup_{\theta \uparrow 1}

1520: R^*(\biglen_\theta^{**},\bigp_\theta) = 2 - \lg e = 0.5573049591 \ldots,$$

1521: and, overall,

1522: \begin{eqnarray*}

1523: R^*(\biglen_\theta^{**},\bigp_\theta)

1524: &=& 3 - \lg \lg e - \\

1525: && 2^{1-\langle \lg (- \frac{1}{\lg \theta})

1526: \rangle} - \left\langle \lg \left(- \frac{1}{\lg \theta}\right)

1527: \right\rangle+ \\

1528: && \order(1-\theta).

1529: \end{eqnarray*}

1530: This oscillating behavior is similar to that of the average redundancy

1531: of a complete tree, as in \cite{Gall} and \cite[p.~192]{Knu3}.

1532: Contrast this with the periodicity of the minimum {\it average}

1533: redundancy for a Golomb code:\cite{Szpa}

1534: \begin{eqnarray*}

1535: \bar{R}(\biglen_{\theta,1}^*,\bigp_\theta)

1536: &=& 1 - \lg \lg e - \lg e + \\

1537: &&

1538: 2^{2-2^{1-\langle \lg (-\frac{1}{\lg \theta}) \rangle}}

1539: - \left\langle \lg \left(-\frac{1}{\lg

1540:   \theta}\right) \right\rangle + \\

1541: &&\order(1-\theta)

1542: \end{eqnarray*}

1543: where $\biglen_{\theta,1}^*$ is the optimal code for the traditional

1544: (linear) penalty.

1545:

1546: \section{Glossary of Terms}

1547: \label{glossary}

1548:

1549: \tablefirsthead{\hline \multicolumn{1}{c}{Notation}

1550:                      & \multicolumn{1}{l}{~Meaning} \\ \hline }

1551: \tablehead{\hline \multicolumn{2}{l}{\small \sl ~~continued}\\

1552:            \hline \multicolumn{1}{c}{Notation}

1553:                      & \multicolumn{1}{l}{~Meaning} \\ \hline }

1554: \tabletail{}

1555: \tablelasttail{}

1556: \begin{supertabular}{l|l}

1557: $a$ & Base of exponential penalty \\

1558: $b(x,k)$ & $(x+1)$th codeword of complete binary code \\

1559: & with $k$ items (i.e., the order-preserving \\

1560: & [alphabetic] code having the first $2^{\lceil \lg k \rceil}-k$ \\

1561: & items with length $\lfloor \lg k \rfloor$ and the last \\

1562: & $2k - 2^{\lceil \lg k \rceil}$ items with length $\lceil \lg k \rceil$) \\

1563: $c(i)$ & Codeword (for symbol) $i$ \\

1564: $C$ & Code $\{c(i)\}$ \\

1565: $e$ & Base of the natural logarithm ($e \approx 2.71828$) \\

1566: G$\kval$ & Golomb code with parameter $\kval$, one of the \\

1567: &form $\{1^{\lfloor j/\kval \rfloor} 0 b(j \bmod \kval, \kval) : j \geq 0\}$ \\

1568: $H_{\alpha}(\bigp)$ & R\'{e}nyi entropy $(1-\alpha)^{-1} \lg \sum_{i \in \X}

1569: \boldp(i)^{\alpha}$ \\

1570: & (or, if $\alpha \in \{0,1,\infty\}$, the limit of this) \\

1571: $i^{**}$ & Index of the codeword that, among a \\

1572: & given code's inputs $i \in \X$, maximizes \\

1573: & pointwise redundancy, $\len(i)+\lg \boldp(i)$ \\

1574: $j \bmod k$ & $j-k \lfloor j/k \rfloor$ \\

1575: $\CampCost_a (\bigp,\biglen)$ & Penalty $\log_a \sum_{i\in \X} \boldp(i) a^{\len(i)}$ \\

1576: $\len(i)$ & Length of codeword (for symbol) $i$ \\

1577: $\biglen$ & $\{\len(i)\}$, the lengths for a given code \\

1578: $\len^{(r)}(i)$ & Length of codeword $i$ of an optimal code \\

1579: & minimizing maximum redundancy for $\bigw^{(r)}$ \\

1580: $\biglen^{(r)}$ & $\{\len^{(r)}(i)\}$, the lengths of an optimal code \\

1581: & minimizing maximum redundancy for $\bigw^{(r)}$  \\

1582: $\len^*(i)$ & Length of codeword $i$ of an optimal code \\

1583: & for an exponential penalty, $\CampCost$ \\

1584: $~~(\len_{\theta,a}^*(i))$ & ~~(...if $\theta$ and $a$ are specified) \\

1585: $\biglen^*$ & $\{\len^*(i)\}$, the lengths of an optimal code \\

1586: $~~(\biglen_{\theta,a}^*)$ & ~~(...if $\theta$ and $a$ are specified) \\

1587: $\len_{\theta,a,d}^*(i)$ & Length of codeword $i$ of an optimal code \\

1588: & minimizing $d$th exponential redundancy \\

1589: $\biglen_{\theta,a,d}^*$ & $\{\len_{\theta,a,d}^*(i)\}$, the lengths of an optimal code\\

1590: & minimizing $d$th exponential redundancy \\

1591: $\len^{**}(i)$ & Length of codeword $i$ of an optimal code \\

1592: & minimizing maximum redundancy \\

1593: $\biglen^{**}$ & $\{\len^{**}(i)\}$, the lengths of an optimal code \\

1594: & minimizing maximum redundancy \\

1595: $\order(\cdot)$ & Order of $\cdot$ asymptotic complexity \\

1596: $\boldp(i)$ & Probability of input symbol $i$ \\

1597: $~~(\boldp_\theta(i))$ & ~~(...for geometric dist\textsuperscript{r} with parameter $\theta$) \\

1598: $~~(\boldp_\lambda(i))$ & ~~(...for Poisson dist\textsuperscript{r} with parameter~$\lambda$) \\

1599: $\bigp$ & $\{\boldp(i)\}$, the input probability mass function \\

1600: $~~(\bigp_\theta)$ & ~~(...for geometric dist\textsuperscript{r} with parameter $\theta$) \\

1601: $~~(\bigp_\lambda)$ & ~~(...for Poisson dist\textsuperscript{r} with parameter~$\lambda$) \\

1602: $\bar{R}_a(\biglen, \bigp)$&$\CampCost_a (\bigp,\biglen) - H_{\alpha(a)}(\bigp)$, the average \\

1603: &pointwise redundancy \\

1604: $R_d(\biglen, \bigp)$&$d^{-1} \lg \sum_{i \in \X} \boldp(i) 2^{d\left(\len(i)+\lg \boldp(i)\right)}$, \\

1605: &the $d$th exponential redundancy \\

1606: $R^*(\biglen, \bigp)$&$\max_{i \in \X} [\len(i)+\lg \boldp(i)]$, the maximum \\

1607: & pointwise redundancy \\

1608: $\R$ & The set of real numbers \\

1609: $\Rp$ & The set of positive real numbers \\

1610: $s_0$ & Upper bound on $s^*$ \\

1611: $s^*$ & $\ln a$ for $a$ corresponding to optimal coding \\

1612: & for buffer overflow \\

1613: $\boldw(i)$ & Weight (for symbol) $i$ \\

1614: $\bigw$ & $\{\boldw(i)\}$, the set of weights \\

1615: $\boldw^{(r)}(i)$ & $\boldp(i)$ for $i \leq r$, $2\boldp(r+1)$ for $i=r+1$ \\

1616: $\bigw^{(r)}$ & $\{\boldp(0), \boldp(1), \ldots, \boldp(r), 2\boldp(r+1)\}$ \\

1617: $\X$ & Input alphabet (usually $\X_\infty = \{0, 1, \ldots \}$) \\

1618: $\alpha(a)$ & $1/(1+\lg a)$ (parameter for R\'{e}nyi entropy) \\

1619: $\theta$ & Geometric dist\textsuperscript{r} parameter ($\boldp_\theta(i) = (1-\theta)\theta^i$) \\

1620: $\lambda$ & Poisson dist\textsuperscript{r} parameter ($\boldp_\lambda(i)=\lambda^i e^{-\lambda}/i!$) \\

1621: $\Phi$ & Golden ratio, $\frac{1}{2}(1+\sqrt{5})$ \\

1622: \end{supertabular}

1623:

1624: \ifx \cyr \undefined \let \cyr = \relax \fi

1625: \begin{thebibliography}{10}

1626: \providecommand{\url}[1]{#1}

1627: \csname url@rmstyle\endcsname

1628: \providecommand{\newblock}{\relax}

1629: \providecommand{\bibinfo}[2]{#2}

1630: \providecommand\BIBentrySTDinterwordspacing{\spaceskip=0pt\relax}

1631: \providecommand\BIBentryALTinterwordstretchfactor{4}

1632: \providecommand\BIBentryALTinterwordspacing{\spaceskip=\fontdimen2\font plus

1633: \BIBentryALTinterwordstretchfactor\fontdimen3\font minus

1634:   \fontdimen4\font\relax}

1635: \providecommand\BIBforeignlanguage[2]{{%

1636: \expandafter\ifx\csname l@#1\endcsname\relax

1637: \typeout{** WARNING: IEEEtran.bst: No hyphenation pattern has been}%

1638: \typeout{** loaded for the language `#1'. Using the pattern for}%

1639: \typeout{** the default language instead.}%

1640: \else

1641: \language=\csname l@#1\endcsname

1642: \fi

1643: #2}}

1644:

1645: \bibitem{Huff}

1646: D.~A. Huffman, ``A method for the construction of minimum-redundancy codes,''

1647:   \emph{Proc. IRE}, vol.~40, no.~9, pp. 1098--1101, Sept. 1952.

1648:

1649: \bibitem{YaQi}

1650: S.~Yang and P.~Qiu, ``Efficient integer coding for arbitrary probability

1651:   distributions,'' \emph{IEEE Trans. Inf. Theory}, vol. IT-52, no.~8, pp.

1652:   3764--3772, Aug. 2006.

1653:

1654: \bibitem{Golo}

1655: S.~W. Golomb, ``Run-length encodings,'' \emph{IEEE Trans. Inf. Theory}, vol.

1656:   IT-12, no.~3, pp. 399--401, July 1966.

1657:

1658: \bibitem{GaVV}

1659: R.~G. Gallager and D.~C. {van Voorhis}, ``Optimal source codes for

1660:   geometrically distributed integer alphabets,'' \emph{IEEE Trans. Inf.

1661:   Theory}, vol. IT-21, no.~2, pp. 228--230, Mar. 1975.

1662:

1663: \bibitem{Abr01}

1664: J.~Abrahams, ``Code and parse trees for lossless source encoding,''

1665:   \emph{Communications in Information and Systems}, vol.~1, no.~2, pp.

1666:   113--146, Apr. 2001.

1667:

1668: \bibitem{WSBL}

1669: T.~Wiegand, G.~J. Sullivan, G.~Bj{\o}ntegaard, and A.~Luthra, ``Overview of the

1670:   {H.264/AVC} video coding standard,'' \emph{IEEE Trans. Circuits and Systems

1671:   for Video Technology}, vol.~13, no.~7, pp. 560--576, July 2003.

1672:

1673: \bibitem{WSS}

1674: M.~Weinberger, G.~Seroussi, and G.~Sapiro, ``The {LOCO-I} lossless image

1675:   compression algorithm: Principles and standardization into {JPEG-LS},''

1676:   \emph{IEEE Trans. Image Processing}, vol.~9, no.~8, pp. 1309--1324, Aug.

1677:   2000, originally as Hewlett-Packard Laboratories Technical Report No.

1678:   HPL-98-193R1, November 1998, revised October 1999. Available from

1679:   \url{http://www.hpl.hp.com/loco/}.

1680:

1681: \bibitem{Camp}

1682: L.~L. Campbell, ``Definition of entropy by means of a coding problem,''

1683:   \emph{Z. Wahrscheinlichkeitstheorie und verwandte Gebiete}, vol.~6, pp.

1684:   113--118, 1966.

1685:

1686: \bibitem{AcDa}

1687: J.~Acz{\'{e}}l and Z.~Dar{\'{o}}czy, \emph{On Measures of Information and Their

1688:   Characterizations}.\hskip 1em plus 0.5em minus 0.4em\relax New York, NY:

1689:   Academic, 1975.

1690:

1691: \bibitem{Jeli}

1692: F.~Jelinek, ``Buffer overflow in variable length coding of fixed rate

1693:   sources,'' \emph{IEEE Trans. Inf. Theory}, vol. IT-14, no.~3, pp. 490--501,

1694:   May 1968.

1695:

1696: \bibitem{Humb2}

1697: P.~A. Humblet, ``Generalization of {Huffman} coding to minimize the probability

1698:   of buffer overflow,'' \emph{IEEE Trans. Inf. Theory}, vol. IT-27, no.~2, pp.

1699:   230--232, Mar. 1981.

1700:

1701: \bibitem{BlMc}

1702: A.~C. Blumer and R.~J. McEliece, ``The {R\'{e}nyi} redundancy of generalized

1703:   {Huffman} codes,'' \emph{IEEE Trans. Inf. Theory}, vol. IT-34, no.~5, pp.

1704:   1242--1249, Sept. 1988.

1705:

1706: \bibitem{Reny}

1707: A.~R{\'{e}}nyi, \emph{A Diary on Information Theory}.\hskip 1em plus 0.5em

1708:   minus 0.4em\relax New York, NY: John Wiley {\&} Sons Inc., 1987, original

1709:   publication: {\it Napl\`{o} az inform\'{a}ci\'{o}elm\'{e}letr\H{o}l},

1710:   Gondolat, Budapest, Hungary, 1976.

1711:

1712: \bibitem{MSN}

1713: P.~Mendenhall. (2002, Oct. 26) Cell phones were rebels' downfall. MSNBC News.

1714:

1715: \bibitem{Tar}

1716: J.~Taranto. (2002, Oct. 28) {Best of the Web Today}. OpinionJournal, from {The

1717:   Wall Street Journal} Editorial Page. Available from

1718:   \url{http://www.opinionjournal.com/best/?id=110002538}.

1719:

1720: \bibitem{Baer06}

1721: M.~B. Baer, ``Source coding for quasiarithmetic penalties,'' \emph{IEEE Trans.

1722:   Inf. Theory}, vol. IT-52, no.~10, pp. 4380--4393, Oct. 2006.

1723:

1724: \bibitem{LTZ}

1725: T.~Linder, V.~Tarokh, and K.~Zeger, ``Existence of optimal prefix codes for

1726:   infinite source alphabets,'' \emph{IEEE Trans. Inf. Theory}, vol. IT-43,

1727:   no.~6, pp. 2026--2028, Nov. 1997.

1728:

1729: \bibitem{HKT}

1730: T.~C. Hu, D.~J. Kleitman, and J.~K. Tamaki, ``Binary trees optimum under

1731:   various criteria,'' \emph{SIAM J. Appl. Math.}, vol.~37, no.~2, pp. 246--256,

1732:   Apr. 1979.

1733:

1734: \bibitem{Park}

1735: D.~S. Parker, Jr., ``Conditions for optimality of the {Huffman} algorithm,''

1736:   \emph{SIAM J. Comput.}, vol.~9, no.~3, pp. 470--489, Aug. 1980.

1737:

1738: \bibitem{Humb0}

1739: P.~A. Humblet, ``Source coding for communication concentrators,'' Ph.D.

1740:   dissertation, Massachusetts Institute of Technology, 1978.

1741:

1742: \bibitem{Golu}

1743: M.~C. Golumbic, ``Combinatorial merging,'' \emph{IEEE Trans. Comput.}, vol.

1744:   C-25, no.~11, pp. 1164--1167, Nov. 1976.

1745:

1746: \bibitem{Leeu}

1747: J.~{van Leeuwen}, ``On the construction of {Huffman} trees,'' in \emph{Proc.

1748:   3rd Int. Colloquium on Automata, Languages, and Programming}, July 1976, pp.

1749:   382--410.

1750:

1751: \bibitem{Baer05}

1752: M.~B. Baer, ``A general framework for codes involving redundancy

1753:   minimization,'' \emph{IEEE Trans. Inf. Theory}, vol. IT-52, no.~1, pp.

1754:   344--349, Jan. 2006.

1755:

1756: \bibitem{Shan}

1757: C.~E. Shannon, ``A mathematical theory of communication,'' \emph{Bell Syst.

1758:   Tech. J.}, vol.~27, pp. 379--423, July 1948.

1759:

1760: \bibitem{Ren2}

1761: A.~R{\'{e}}nyi, ``On measures of entropy and information,'' in \emph{Proc. 4th

1762:   Berkeley Symposium on Mathematical Statistics and Probability}, vol.~1, 1961,

1763:   pp. 547--561.

1764:

1765: \bibitem{Goli2}

1766: M.~J. Golin, ``A combinatorial approach to {Golomb} forests,''

1767:   \emph{Theoretical Computer Science}, vol. 263, no. 1--2, pp. 283--304, July

1768:   2001.

1769:

1770: \bibitem{Abr1}

1771: J.~Abrahams, ``Huffman-type codes for infinite source distributions,''

1772:   \emph{Journal of the Franklin Institute}, vol. 331B, no.~3, pp. 265--271, May

1773:   1994.

1774:

1775: \bibitem{MSW}

1776: N.~Merhav, G.~Seroussi, and M.~Weinberger, ``Optimal prefix codes for sources

1777:   with two-sided geometric distributions,'' \emph{IEEE Trans. Inf. Theory},

1778:   vol. IT-46, no.~2, pp. 121--135, Mar. 2000.

1779:

1780: \bibitem{GoMa}

1781: M.~J. Golin and K.~K. Ma, ``Algorithms for constructing infinite {Huffman}

1782:   codes,'' Hong Kong University of Science {\&} Technology Theoretical Computer

1783:   Science Center, Tech. Rep. HKUST-TCSC-2004-07, Aug. 2004, available from

1784:   \url{http://www.cs.ust.hk/tcsc/RR/index_7.html}.

1785:

1786: \bibitem{BCSV}

1787: F.~Bassino, J.~Cl\'{e}ment, G.~Seroussi, and A.~Viola, ``Optimal prefix codes

1788:   for two-dimensional geometric distributions,'' in \emph{Proc., IEEE Data

1789:   Compression Conf.}, Mar. 28--30, 2006, pp. 113--122.

1790:

1791: \bibitem{Humb1}

1792: P.~A. Humblet, ``Optimal source coding for a class of integer alphabets,''

1793:   \emph{IEEE Trans. Inf. Theory}, vol. IT-24, no.~1, pp. 110--112, Jan. 1978.

1794:

1795: \bibitem{KHN}

1796: A.~Kato, T.~S. Han, and H.~Nagaoka, ``{Huffman} coding with an infinite

1797:   alphabet,'' \emph{IEEE Trans. Inf. Theory}, vol. IT-42, no.~3, pp. 977--984,

1798:   May 1996.

1799:

1800: \bibitem{Baer}

1801: M.~B. Baer, ``Coding for general penalties,'' Ph.D. dissertation, Stanford

1802:   University, 2003.

1803:

1804: \bibitem{DrSz}

1805: M.~Drmota and W.~Szpankowski, ``Precise minimax redundancy and regret,''

1806:   \emph{IEEE Trans. Inf. Theory}, vol. IT-50, no.~11, pp. 2686--2707, Nov.

1807:   2004.

1808:

1809: \bibitem{Gall}

1810: R.~G. Gallager, ``Variations on a theme by {Huffman},'' \emph{IEEE Trans. Inf.

1811:   Theory}, vol. IT-24, no.~6, pp. 668--674, Nov. 1978.

1812:

1813: \bibitem{Knu3}

1814: D.~E. Knuth, \emph{The Art of Computer Programming, Vol. 3: Sorting and

1815:   Searching}, 2nd~ed.\hskip 1em plus 0.5em minus 0.4em\relax Reading, MA:

1816:   Addison-Wesley, 1998.

1817:

1818: \bibitem{Szpa}

1819: W.~Szpankowski, ``Asymptotic redundancy of {Huffman} (and other) block codes,''

1820:   \emph{IEEE Trans. Inf. Theory}, vol. IT-46, no.~7, pp. 2434--2443, Nov. 2000.

1821:

1822: \end{thebibliography}

1823:

1824: \end{document}

1825: