cs0511003/IT_inf.tex
1: % 25 Nov 07
2: 
3: % Additional files: inftree3d.eps, R0.eps, R1.eps, Rlin.eps, 
4: %                   inf_humb3d.eps, ga2.eps, radrat.eps, 
5: %                   [IT_inf.bbl], (IEEEbib.bst, IEEEtran.cls)
6: 
7: \documentclass[10pt]{IEEEtran}
8: \usepackage{cite,graphicx,psfrag,amsmath,amssymb,subfigure,url,supertabular,color}
9: \newtheorem{theorem}{Theorem}
10: \newtheorem{corollary}{Corollary}
11: \newtheorem{lemma}{Lemma}
12: \newtheorem{defi}{Definition}
13: 
14: \def\CampCost{L}
15: \def\definedas{\triangleq}
16: \def\order{O}
17: \def\s{\mbox{'s}}
18: \def\boldp{p}
19: \def\bigp{P}
20: \def\boldw{w}
21: \def\bigw{W}
22: \def\kval{k}
23: \def\kvals{k}
24: \def\len{n}
25: \def\biglen{N}
26: \def\E{{\mathbb E}}
27: \def\P{{\mathbb P}}
28: \def\R{{\mathbb R}}
29: \def\Rp{{{\mathbb R}_+}}
30: \def\W{{\bigw}}
31: \def\X{{\mathcal X}}
32: \def\Z{{\mathbb Z}}
33: \def\lg{{\log_2}}
34: 
35: \newcommand{\defn}[0]{\it}
36: \hyphenation{szpan-kow-ski}
37: 
38: \begin{document}
39: \bibliographystyle{IEEEtran} \title{Optimal Prefix Codes for Infinite
40: Alphabets with Nonlinear Costs}
41: \author{Michael~B.~Baer,~\IEEEmembership{Member,~IEEE}% 
42: \thanks{This work was supported in part by the National Science
43: Foundation (NSF) under Grant CCR-9973134 and the Multidisciplinary
44: University Research Initiative (MURI) under Grant DAAD-19-99-1-0215.
45: Part of this work was performed while the author was at Stanford
46: University.  This material was presented in part at the IEEE
47: International Symposium on Information Theory, Seattle, Washington,
48: USA, July 2006 and at the IEEE International Symposium on Information Theory,
49: Nice, France, June 2007}%
50: \thanks{The author is with Ocarina Networks, Inc., 42 Airport Parkway, San Jose, CA  95110-1009  USA (e-mail:{\color{white}{i}}calbear{\color{black}{@}}{\bf \tiny \.{1}}eee.org).}
51: \thanks{This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.}}
52: \markboth{IEEE Transactions on Information Theory}{Optimal Prefix Codes for Infinite Alphabets with Nonlinear Costs}
53: %\pubid{0000--0000/00\$00.00~\copyright~2007 IEEE}
54: \maketitle
55: 
56: \begin{abstract}
57: Let $\bigp = \{\boldp(i)\}$ be a measure of strictly positive
58: probabilities on the set of nonnegative integers.  Although the
59: countable number of inputs prevents usage of the Huffman algorithm,
60: there are nontrivial $\bigp$ for which known methods find a source
61: code that is optimal in the sense of minimizing expected codeword
62: length.  For some applications, however, a source code should instead
63: minimize one of a family of nonlinear objective functions,
64: $\beta$-exponential means, those of the form $\log_a \sum_i \boldp(i)
65: a^{\len(i)}$, where $\len(i)$ is the length of the $i$th codeword and
66: $a$ is a positive constant.  Applications of such minimizations
67: include a novel problem of maximizing the chance of message receipt in
68: single-shot communications ($a<1$) and a previously known problem of
69: minimizing the chance of buffer overflow in a queueing system ($a>1$).
70: This paper introduces methods for finding codes optimal for
71: such exponential means.  One method applies to geometric
72: distributions, while another applies to distributions with lighter
73: tails.  The latter algorithm is applied to Poisson distributions and
74: both are extended to alphabetic codes, as well as to minimizing
75: maximum pointwise redundancy.  The aforementioned application of
76: minimizing the chance of buffer overflow is also considered.
77: \end{abstract}
78: 
79: \begin{keywords}
80: Communication networks, generalized entropies, generalized means,
81: Golomb codes, Huffman algorithm, optimal prefix codes, queueing,
82: worst case minimax redundancy.
83: \end{keywords} 
84: 
85: \IEEEpeerreviewmaketitle
86: 
87: \section{Introduction, Motivation, and Main Results} 
88: \label{intro} 
89: 
90: If probabilities are known, optimal lossless source coding of
91: individual symbols (and blocks of symbols) is usually done using David
92: Huffman's famous algorithm\cite{Huff}.  There are, however, cases that
93: this algorithm does not solve.  Problems with an
94: infinite number of possible inputs --- e.g., geometrically-distributed
95: variables --- are not covered.  Also, in some instances, the
96: optimality criterion --- or {\defn penalty} --- is not the linear
97: penalty of expected length.  Both variants of the problem have been
98: considered in the literature, but not simultaneously.  This paper 
99: discusses cases which are both infinite and nonlinear.
100: 
101: An infinite-alphabet source emits symbols drawn from the alphabet
102: $\X_\infty = \{0, 1, 2, \ldots \}$.  (More generally, we use $\X$ to
103: denote an input alphabet whether infinite or finite.)  Let $\bigp =
104: \{\boldp(i)\}$ be the sequence of probabilities for each symbol, so
105: that the probability of symbol $i$ is $\boldp(i) > 0$.  The source
106: symbols are coded into binary codewords.  The codeword $c(i) \in
107: \{0,1\}^*$ in code $C$, corresponding to input symbol~$i$, has length
108: $\len(i)$, thus defining length distribution~$\biglen$.  Such codes
109: are called {\defn integer codes} (as in, e.g., \cite{YaQi}).
110: 
111: Perhaps the most well-known integer codes are the codes derived by
112: Golomb for geometric distributions\cite{Golo,GaVV}, and many other
113: types of integer codes have been considered by others\cite{Abr01}.
114: There are many reasons for using such integer codes rather than codes
115: for finite alphabets, such as Huffman codes.  The most obvious use is
116: for cases with no upper bound --- or at least no known upper bound ---
117: on the number of possible items.  In addition, for many cases it is
118: far easier to come up with a general code for integers rather than a
119: Huffman code for a large but finite number of inputs.  Similarly, it
120: is often faster to encode and decode using such well-structured codes.
121: For these reasons, integer codes and variants of them are widely used
122: in image and video compression standards\cite{WSBL, WSS}, as well as
123: for compressing text, audio, and numerical data.
124: 
125: To date, the literature on integer codes has considered only finding
126: efficient uniquely decipherable codes with respect to minimizing
127: expected codeword length $\sum_i \boldp(i) \len(i)$.  Other utility
128: functions, however, have been considered for finite-alphabet codes.
129: Campbell~\cite{Camp} introduced a problem in which the penalty to
130: minimize, given some continuous (strictly) monotonic increasing {\defn
131: cost function} $\varphi(x):\Rp \rightarrow \Rp$, is 
132: $$
133: \CampCost(\bigp,\biglen,\varphi) = \varphi^{-1}\left(\sum_i \boldp(i)
134: \varphi(\len(i))\right) 
135: $$ and specifically considered the exponential subcases with exponent $a>1$:
136: \begin{equation} 
137: \CampCost_a(\bigp,\biglen) \definedas \log_a \sum_i \boldp(i) a^{\len(i)} 
138: \label{ExpCost} 
139: \end{equation} 
140: that is, $\varphi(x) = a^x$.  Note that minimizing penalty $\CampCost$
141: is also an interesting problem for $0<a<1$ and approaches the standard
142: penalty $\sum_i \boldp(i) \len(i)$ for $a \rightarrow 1$\cite{Camp}.
143: While $\varphi(x)$ decreases for $a<1$, one can map decreasing
144: $\varphi$ to a corresponding increasing function $\tilde{\varphi}(l) =
145: \varphi_{\max} - \varphi(l)$ (e.g., for $\varphi_{\max} = 1$) without
146: changing the penalty value.  Thus this problem, equivalent to
147: maximizing $\sum_i \boldp(i) a^{\len(i)}$, is a subset of those
148: considered by Campbell.  All penalties of the form (\ref{ExpCost}) are
149: called $\beta$-exponential means, where $\beta = \lg
150: a$\cite[p.~158]{AcDa}.
151: 
152: Campbell noted certain properties for $\beta$-exponential means, but
153: did not consider applications for these means.  Applications were
154: later found for the problem with $a>1$ \cite{Jeli,Humb2,BlMc};
155: these applications all relate to a buffer overflow problem
156: discussed in Section~\ref{application}.
157: 
158: Here we introduce a novel application for problems of the form $a<1$.
159: Consider a situation related by Alfred R\'{e}nyi, an ancient scenario
160: in which a rebel fortress was besieged by Romans.  The rebels' only
161: hope was the knowledge gathered by a mute, illiterate spy, one who
162: could only nod and shake his head \cite[pp.~13-14]{Reny}.  This
163: apocryphal tale --- based upon a historical siege --- is the premise
164: behind the Hungarian version of the spoken parlor game Twenty
165: Questions.  A modern parallel in the 21\textsuperscript{st} century
166: occurred when Russian forces gained the knowledge needed to defeat
167: hostage-takers by asking hostages ``yes'' or ``no'' questions over
168: mobile phones\cite{MSN,Tar}.
169: 
170: R\'{e}nyi presented this problem in narrative form in order to
171: motivate the relation between Shannon entropy and binary prefix
172: coding.  Note however that Twenty Questions, traditional prefix
173: coding, and the siege scenario actually have three different
174: objectives.  In Twenty Questions, the goal is to be able to determine
175: the symbol (i.e., the item or message) by asking at most twenty
176: questions.  In prefix coding, the goal is to minimize the expected
177: number of questions --- or, equivalently, bits --- necessary to
178: determine the message.  For the siege scenario, the goal is survival;
179: that is, assuming partial information is not useful, the besieged
180: would wish to maximize the probability that the message is
181: successfully transmitted within a certain window of opportunity.  When
182: this window closes --- e.g., when the fortress falls --- the
183: information becomes worthless.  An analogous situation occurs when a
184: wireless device is losing power or is temporarily within range of a
185: base station; one can safely assume that the channel, when available,
186: will transmit at the lowest (constant) bitrate, and will be lost after a
187: nondeterministic time period.
188: 
189: Assume that the duration of the window of opportunity is independent
190: of the communicated message and is memoryless, the latter being a
191: common assumption --- due to both its accuracy and expedience --- of
192: such stochastic phenomena.  Memorylessness implies that the window
193: duration is distributed exponentially.  Therefore, quantizing time in
194: terms of the number of bits $T$ that we can send within our window,
195: $$\P(T = t) = (1-a)a^t, ~ t = 0, 1, 2, \ldots $$ with known positive parameter
196: $a<1$.  We then wish to maximize the probability of success, i.e., the
197: probability that the message length does not exceed the quantized
198: window length:
199: \begin{eqnarray*}
200: \P[\len(X) \leq T] &=& \sum_{t=0}^\infty \P(T=t) \cdot \P[\len(X) \leq t] \\
201: &=& \sum_{t=0}^\infty (1-a)a^t \cdot \sum_{i \in \X} p(i) 1_{\len(i) \leq t} \\
202: &=& \sum_{i \in \X} p(i) \cdot (1-a) \sum_{t=\len(i)}^\infty a^t \\
203: &=& \sum_{i \in \X} p(i) a^{\len(i)} \cdot (1-a) \sum_{t=0}^\infty a^t \\
204: &=& \sum_{i \in \X} p(i) a^{\len(i)}
205: \end{eqnarray*}
206: where $1_{\len(i) \leq t}$ is $1$ if $\len(i) \leq t$, $0$ otherwise.
207: Minimizing (\ref{ExpCost}) is an equivalent objective.
208: 
209: Note that this problem can be constrained or otherwise modified for
210: the application in question.  For example, in some cases, we might
211: need some extra time to send the first bit, or, alternatively, the
212: window of opportunity might be of at least a certain duration,
213: increasing or reducing the probability that no bits can be sent,
214: respectively.  Thus we might have
215: $$ \P(T = t) = \left\{
216: \begin{array}{ll}
217: t_0,& t = 0 \\
218: (1-t_0)(1-a)a^{t-1},& t = 1, 2, \ldots 
219: \end{array}
220: \right.
221: $$ for some~$t_0 \in (0,1)$.  In this case, 
222: $$\P[\len(X) \leq T] = \frac{(1-t_0)}{a} \sum_{i \in \X} p(i)
223: a^{\len(i)}$$ and the maximizing code is identical to that of the more
224: straightforward case.  Likewise, if we need to send multiple messages,
225: the same code maximizes the expected number of independent messages we can
226: send within the window, due to the memoryless property.
227: 
228: We must be careful regarding the meaning of an ``optimal code'' when
229: there are an infinite number of possible codes under consideration.
230: One might ask whether there must exist an optimal code or if there can
231: be an infinite sequence of codes of decreasing penalty without any
232: code achieving the limit penalty value.  Fortunately the answer is the
233: former, the proof being a special case of Theorem~2 in~\cite{Baer06}
234: (a generalization of the result for the expected-length
235: penalty\cite{LTZ}).  The question is then how to find one of these
236: optimal source codes given parameter $a$ and probability
237: measure~$\bigp$.
238: 
239: As in the linear case, a general solution for (\ref{ExpCost}) is not
240: known for general $\bigp$ over a countably infinite number of events,
241: but methods and properties for finite numbers of events --- discussed
242: in the next section --- can be used to find optimal codes for certain
243: common infinite-item distributions.  In Section~\ref{geometric}, we
244: consider geometric distributions and find that Golomb codes are
245: optimal, although the optimal Golomb code for a given probability mass
246: function varies according to $a$.  The main result of
247: Section~\ref{geometric} is that, for $\boldp_\theta(i) =
248: (1-\theta)\theta^i$ and $a \in \Rp$, G$\kval$, the Golomb code with
249: parameter $\kval$, is optimal, where $$\kval = \max\left(1,
250: \left\lceil -\log_\theta a -\log_\theta (1+\theta)
251: \right\rceil\right).$$ In Section~\ref{other}, we consider
252: distributions that are relatively light-tailed, that is, that decline
253: faster than certain geometric distributions.  If there is a
254: nonnegative integer $r$ such that for all $j>r$ and $i<j$,
255: $$\boldp(i) \geq \max\left(\boldp(j), \sum_{k=j+1}^\infty \boldp(k)
256: a^{k-j}\right)$$ then an optimal binary prefix code tree exists which
257: consists of a unary code tree appended to a leaf of a finite code
258: tree.  A specific case of this is the Poisson distribution,
259: $\boldp_\lambda(i)=\lambda^i e^{-\lambda}/i!$, where $e$ is the base
260: of the natural logarithm ($e \approx 2.71828$).  We show that in this
261: case the aforementioned $r$ is given by $r = \max(\lceil 2 a \lambda
262: \rceil - 2, \lceil e \lambda \rceil - 1)$.  An application, that of
263: minimizing probability of buffer overflow, as in~\cite{Humb2}, is
264: considered in Section~\ref{application}, where we show that the
265: algorithm developed in \cite{Humb2} readily extends to coding
266: geometric and light-tailed distributions.  Section~\ref{nonexp}
267: discusses the maximum pointwise redundancy penalty, which has a
268: similar solution for light-tailed distributions and for which the
269: Golomb code G$\kval$ with $\kval = \lceil -1/\lg \theta \rceil$ is
270: optimal for with geometric distributions.  We conclude with some
271: remarks on possible extensions to this work.
272: 
273: Throughout the following, a set or sequence of items $x(i)$ is
274: represented by its uppercase counterpart, $X$.  A glossary of terms is
275: given in Appendix~\ref{glossary}.
276: 
277: \section{Background: Finite Alphabets}
278: \label{background} 
279:  
280: If a finite number of events comprise $\bigp$ (i.e., $|\X|<\infty$),
281: the exponential penalty (\ref{ExpCost}) is minimized using an
282: algorithm found independently by Hu {\it et al.}~\cite[p.~254]{HKT},
283: Parker \cite[p.~485]{Park}, and Humblet
284: \cite[p.~25]{Humb0},\cite[p.~231]{Humb2}, although only the last of
285: these considered $a < 1$.  (The simultaneity of these lines of
286: research was likely due to the appearance of the first paper on
287: adapting the Huffman algorithm to a nonlinear penalty, $\max_i
288: (\boldp(i) + \len(i))$ for given $\boldp(i) \in \Rp$, in
289: 1976\cite{Golu}.)  We will use this finite-alphabet
290: exponential-penalty algorithm in the sections that follow in order to
291: prove optimally for infinite distributions, so let us reproduce the
292: algorithm here:
293: 
294: \textbf{Procedure for Exponential Huffman Coding (finite alphabets):} 
295: This procedure finds the optimal code
296: whether $a>1$ (a minimization of the average of a growing exponential)
297: or $a<1$ (a maximization of the average of a decaying exponential).
298: Note that it minimizes (\ref{ExpCost}), even if the ``probabilities''
299: do not add to $1$.  We refer to such arbitrary positive inputs as
300: {\defn weights}, denoted by $\boldw(i)$ instead of~$\boldp(i)$:
301: 
302: \begin{enumerate}
303: \item Each item $i$ has weight $\boldw(i) \in \bigw_{\X}$, where $\X$
304: is the (finite) alphabet and $\bigw_{\X} = \{w(i)\}$ is the set of all
305: such weights.  Assume each item $i$ has codeword $c(i)$, to be
306: determined later.
307: \item Combine the items with the two smallest weights $\boldw(j)$ and
308:   $\boldw(k)$ into one compound item with the combined weight
309:   $\tilde{\boldw}(j) = a \cdot (\boldw(j) + \boldw(k))$.  This item
310:   has codeword $\tilde{c}(j)$, to be determined later, while item $j$ is
311:   assigned codeword $c(j) = \tilde{c}(j)0$ and $k$ codeword $c(k) =
312:   \tilde{c}(j)1$.  Since these have been assigned in terms of
313:   $\tilde{c}(j)$, replace $\boldw(j)$ and $\boldw(k)$ with
314:   $\tilde{\boldw}(j)$ in $\bigw_\X$ to form $\bigw_{\tilde{\X}}$.
315: \item Repeat procedure, now with the remaining codewords (reduced in
316:   number by $1$) and corresponding weights, until only one
317:   item is left.  The weight of this item is $\sum_i \boldw(i)
318:   a^{\len(i)}$.  All codewords are now defined by assigning the null
319:   string to this trivial item.
320: \end{enumerate}
321: This algorithm assigns a weight to each node of
322: the resulting implied code tree by having each item represented by a
323: node with its parent representing the items combined into its subtree,
324: as in Fig.~\ref{buildgolo}: If a node is a leaf, its weight is given
325: by the associated probability; otherwise its weight is defined
326: recursively as $a$ times the sum of its children.  This concept is
327: useful in visualizing both the coding procedure and its output.
328: 
329: Van Leeuwen implemented the Huffman algorithm in linear time (to input
330: size) given sorted weights in \cite{Leeu}, and this implementation was
331: extended to the exponential problem in \cite{Baer05} as follows:
332: 
333: \textbf{Two-Queue Implementation of Exponential Huffman Coding:}
334: The two-queue method of implementing the Huffman algorithm puts
335: nodes/items in two queues, the first of which is initialized with the
336: input items (eventual leaf nodes) arranged from head to tail in order
337: of nondecreasing weight, and the second of which is initially empty.
338: At any given step, a node with lowest weight among all nodes in both
339: queues is at the head of one of the two queues, and thus two
340: lowest-weighted nodes can be combined in constant time.  This compound
341: node is then inserted into (the tail of) the second queue, and the
342: algorithm progresses until only one node is left.  This node is the
343: root of the coding tree and is obtained in linear time.
344: 
345: The presentation of the algorithm in \cite{Baer05} did not include a
346: formal proof, so we find it useful to present one here:
347: 
348: \begin{lemma}
349: The two-queue method using the exponential combining rule
350: results in an optimal exponential Huffman code given a finite number
351: of input items.  
352: \label{twoqueue}
353: \end{lemma}
354: 
355: \begin{proof}
356: The method is clearly a valid implementation of the exponential
357: Huffman algorithm so long as both queues' sets of nodes remain in
358: nondecreasing order.  This is clearly satisfied prior to the first
359: combination step.  Here we show that, if nodes are in order at all
360: points prior to a given combination step, they must be in order at the
361: end of that step as well, inductively proving the correctness of the
362: algorithm.  It is obvious that order is preserved in the single-item
363: queue, since nodes are only removed from it, not added to it.  In the
364: compound-node queue, order is only a concern if there is already at
365: least one node in it at the beginning of this step, a step that
366: combines nodes we call node $i_{-1}$ and node $i_{-2}$.  If so, the
367: item at the tail of the compound-node queue at the beginning of the
368: step was two separate items, $i_{-3}$ and $i_{-4}$, at the beginning
369: of the prior step.  At the beginning of this prior step, all four
370: items must have been distinct --- i.e., corresponding to distinct sets
371: of (possibly combined) leaf nodes --- and, because the algorithm
372: chooses the smallest two nodes to combine, neither $i_{-3}$ nor
373: $i_{-4}$ can have a greater weight than either $i_{-1}$ or $i_{-2}$.
374: Thus --- since $a\cdot(\boldw(i_{-3})+\boldw(i_{-4})) \leq
375: a\cdot(\boldw(i_{-1})+\boldw(i_{-2}))$ and the node with weight
376: $a\cdot(\boldw(i_{-3})+\boldw(i_{-4}))$ is the compound node with the
377: largest weight in the compound-node queue at the beginning of the step
378: in question --- the queues remain properly ordered at the end of the
379: step in question.
380: \end{proof}
381: 
382: If $a < 0.5$, the compound-node queue will never have more than one
383: item.  At each step after the first, the sole compound item will be
384: removed from its queue since it has a weight less than the maximum
385: weight of each of the two nodes combined to create it, which in turn
386: is no greater than the weight of any node in the single-item queue.
387: It is replaced by the new (sole) compound item.  This extends to $a =
388: 0.5$ if we prefer to merge combined nodes over single items of the
389: same weight.  Thus, any finite input distribution can be optimally
390: coded for $a \leq 0.5$ using a {\defn truncated unary code}, a
391: truncated version of the {\defn unary code}, the latter of which has
392: codewords of the form $\{1^j0 : j \geq 0\}$.  The truncated unary code
393: has identical codewords as the unary code except for the longest
394: codeword, which is of the form $\{1^{|\X|-1}\}$.  This results from
395: each compound node being formed using at least one single item (leaf).
396: Taking limits, informally speaking, results in a unary limit code.
397: Formally, this is a straightforward corollary of Theorem~\ref{tailthm}
398: in Section~\ref{other}.
399: 
400: If $a>0.5$, a code with finite penalty exists if and only if R\'{e}nyi
401: entropy of order $\alpha(a) = {(1+\lg a)}^{-1}$ is finite, as shown in
402: \cite{Baer06}.  It was Campbell who first noted the connection between
403: the optimal code's penalty, $\CampCost_a(\bigp,\biglen^*)$, and
404: R\'{e}nyi entropy
405: \begin{eqnarray*}
406: H_{\alpha}(\bigp) &\definedas& \frac{1}{1-\alpha} \lg \sum_{i \in \X} 
407: \boldp(i)^\alpha \\
408: \Rightarrow H_{\alpha(a)}(\bigp) &=& \frac{1+\lg a}{\lg a} 
409: \lg \sum_{i \in \X} \boldp(i)^{(1+\lg a)^{-1}} .
410: \end{eqnarray*}
411: This relationship is
412: $$H_{\alpha(a)}(\bigp) \leq \CampCost_a(\bigp,\biglen^*) <
413: H_{\alpha(a)}(\bigp) + 1$$ which should not be surprising given the
414: similar relationship between Huffman-optimal codes and Shannon
415: entropy\cite{Shan}, which corresponds to $a \rightarrow 1$ ($\alpha
416: \rightarrow 1$)\cite{Ren2,Camp}; due to this correspondence, Shannon
417: entropy is sometimes expressed as $H_1(\bigp)$.
418: 
419: \section{Geometric Distribution with Exponential Penalty}
420: \label{geometric}
421: 
422: Consider the geometric distribution $$\boldp_\theta(i) =
423: (1-\theta)\theta^i$$ for parameter $\theta \in (0,1)$.  This
424: distribution arises in run-length coding among other
425: circumstances\cite{Golo,GaVV}.
426: 
427: For the traditional linear penalty, a Golomb code with
428: parameter~$\kval$ --- or G$\kval$ --- is optimal for $\theta^\kval +
429: \theta^{\kval+1} \leq 1 < \theta^{\kval-1} + \theta^\kval$.  Such a
430: code consists of a unary prefix followed by a binary suffix, the
431: latter taking one of $\kval$ possible values.  If $\kval$ is a power
432: of two, all binary suffix possibilities have the same length;
433: otherwise, their lengths $\sigma(i)$ differ by at most $1$ and $\sum_i
434: 2^{-\sigma(i)}=1$.  Binary codes such as these suffix codes are called
435: {\defn complete} codes.  This defines the Golomb code; for example,
436: the Golomb code for $\kval = 3$ is:
437: \begin{center}
438: $$
439: \begin{array}{rll}
440: \hline
441: \hline
442: i&\boldp(i)&c(i) \\
443: \hline
444: 0&1-\theta&0~0 \\
445: 1&(1-\theta)\theta&0~10 \\
446: 2&(1-\theta)\theta^2&0~11 \\
447: 3&(1-\theta)\theta^3&10~0 \\
448: 4&(1-\theta)\theta^4&10~10 \\
449: 5&(1-\theta)\theta^5&10~11 \\
450: 6&(1-\theta)\theta^6&110~0 \\
451: 7&(1-\theta)\theta^7&110~10 \\
452: 8&(1-\theta)\theta^8&110~11 \\
453: 9&(1-\theta)\theta^9&1110~0 \\
454: \vdots&\qquad \vdots&\qquad \vdots \\
455: \hline
456: \end{array}
457: $$
458: \end{center}
459: where the space in the code separates the unary prefix from the complete
460: suffix.  In general, codeword $j$ for G$\kval$ is of the form
461: $\{1^{\lfloor j/\kval \rfloor} 0 b(j \bmod \kval,\kval) : j \geq 0\}$,
462: where $b(j \bmod \kval, \kval)$ is a complete binary code for the $(j -
463: \kval \lfloor j/\kval \rfloor+1)$th of $\kval$ items.
464: 
465: It turns out that such codes are optimal for the exponential penalty:
466: \begin{theorem}
467: For $a \in \Rp$, if
468: \begin{equation}
469: \theta^\kval + \theta^{\kval+1} \leq \frac{1}{a} < \theta^{\kval-1} +
470: \theta^\kval
471: \label{ineq}
472: \end{equation}
473: for $\kval \geq 1$, then the Golomb code G$\kval$ is the optimal code
474: for $\bigp_\theta$.  If no such $\kval$ exists, the unary code G$1$ is
475: optimal.
476: \label{optgeo}
477: \end{theorem}
478: 
479: \textit{Remark:} This rule for finding an optimal Golomb G$\kval$ code
480: is equivalent to
481: $$\kval = \max\left(1, \left\lceil -\log_\theta a -\log_\theta
482: (1+\theta) \right\rceil\right).$$ This is a generalization of the
483: traditional linear result, which corresponds to $a \rightarrow 1$.
484: Cases in which the left inequality is an equality have multiple
485: solutions, as with linear coding; see, e.g., \cite[p.~289]{Goli2}.
486: The proof of the optimality of the Golomb code for exponential
487: penalties is somewhat similar to that of \cite{GaVV}, although it must
488: be significantly modified due to the nonlinearity involved.
489: 
490: Before proving Theorem~\ref{optgeo}, we need the following lemma:
491: 
492: \begin{lemma}
493: Consider a Huffman combining procedure, such as the exponential
494: Huffman coding procedure, implemented using the two-queue method presented in the previous section just prior to Lemma~\ref{twoqueue}.  Now consider a step at which the first (single-item)
495: queue is empty, so that remaining are only compound items, that is,
496: items representing internal nodes rather than leaves in the final
497: Huffman coding tree.  Then, in this final tree, the nodes corresponding to these compound items will be on
498: levels differing by at most one; that is, the nodes will form a
499: complete tree.  Furthermore, if $n$ is the number of items remaining
500: at this point, all items that finish at level $\lceil \lg n \rceil$
501: appear closer to the head of the (second, nonempty) queue than any
502: item at level $\lceil \lg n \rceil - 1$ (if any).
503: \label{thelemma}
504: \end{lemma}
505: 
506: \begin{proof}[Lemma~\ref{thelemma}]
507: We use an inductive proof, in which the base cases of one and two
508: compound items (i.e., internal nodes) are trivial.  Suppose the lemma is
509: true for every case with $n-1$ items for $n>2$, that is, that all
510: nodes are at levels $\lfloor \lg (n-1) \rfloor$ or $\lceil \lg (n-1)
511: \rceil$, with the latter items closer to the head of the queue than
512: the former.  Consider now a case with $n$ nodes.  The first step of
513: coding is to merge two nodes, resulting in a combined item that is
514: placed at the end of the combined-item queue.  Because it is at the
515: end of the queue in the reduced problem of size $n-1$, this combined node is at level
516: $\lfloor \lg (n-1) \rfloor$ in the final tree, and its children are at
517: level $1+\lfloor \lg (n-1) \rfloor = \lceil \lg n \rceil$.  If $n$ is
518: a power of two, the remaining items end up on level $\lg n = \lceil \lg
519: (n-1) \rceil$, satisfying this lemma.  If $n-1$ is a
520: power of two, they end up on level $\lg (n-1) = \lfloor \lg n \rfloor$,
521: also satisfying the lemma.  Otherwise, there is at least one item ending up at
522: level $\lceil \lg n \rceil = \lceil \lg (n-1) \rceil$ near the head of
523: the queue, followed by the remaining items, which end up at level
524: $\lfloor \lg n \rfloor = \lfloor \lg (n-1) \rfloor$.  In any case, the
525: lemma is satisfied for $n$ items, and thus, inductively, for any number of items.
526: \end{proof}
527: 
528: This lemma applies to any problem in which a two-queue Huffman algorithm provides an optimal solution, including the original Huffman
529: problem and the tree-height problem of \cite{Park}.  Here we apply the lemma to the exponential Huffman algorithm to prove Theorem~\ref{optgeo}:
530: 
531: \begin{figure*}
532: \psfrag{  0}{\mbox{\tiny $w(0)$}}
533: \psfrag{  1}{\mbox{\tiny $w(1)$}}
534: \psfrag{  2}{\mbox{\tiny $w(2)$}}
535: \psfrag{  3}{\mbox{\tiny $w(3)$}}
536: \psfrag{  4}{\mbox{\tiny $w(4)$}}
537: \psfrag{  5}{\mbox{\tiny $w(5)$}}
538: \psfrag{  6}{\mbox{\tiny $w(6)$}}
539: \psfrag{  7}{\mbox{\tiny $w(7)$}}
540: \psfrag{  8}{\mbox{\tiny $w(8)$}}
541: \psfrag{  9}{\mbox{\tiny $w(9)$}}
542: \psfrag{  10}{\mbox{\tiny $w(10)$}}
543: \psfrag{  11}{\mbox{\tiny $w(11)$}}
544: \psfrag{  12}{\mbox{\tiny $w(12)$}}
545: \psfrag{  13}{\mbox{\tiny $w(13)$}}
546: \psfrag{  14}{\mbox{\tiny $w(14)$}}
547: \psfrag{  15}{\mbox{\tiny $w(15)$}}
548: \psfrag{  16}{\mbox{\tiny $w(16)$}}
549: \psfrag{  17}{\mbox{\tiny $w(17)$}}
550: \psfrag{  18}{\mbox{\tiny $w(18)$}}
551: \psfrag{  19}{\mbox{\tiny $w(19)$}}
552: \psfrag{  20}{\mbox{\tiny $w(20)$}}
553: \psfrag{  21}{\mbox{\tiny $w(21)$}}
554: \psfrag{  22}{\mbox{\tiny $w(22)$}}
555: \begin{center}
556: \resizebox{14cm}{!}{\includegraphics{inftree3d.eps}}
557: \caption{Formation of a Golomb code using a code for an $m$-reduced
558: source.  In this illustration, $m=17$ and $\kval=5$, and smaller weights are pictorially lower.  Weights are merged bottom-up, in a manner consistent with the exponential Huffman algorithm, first in separate (truncated) unary subtrees, then in a (five-leaf) complete tree.}
559: \label{buildgolo}
560: \end{center}
561: \end{figure*}
562: 
563: \begin{proof}[Theorem~\ref{optgeo}]
564: We start with an optimal exponential Huffman code for a sequence of
565: similar finite weight distributions.  These finite weight
566: distributions, called {\defn $m$-reduced geometric sources} $\bigw_m$,
567: are defined as:
568: $$
569: \boldw_m(i) \definedas \left\{
570: \begin{array}{ll}
571: (1-\theta)\theta^i,& 0 \leq i \leq m \\
572: \displaystyle
573: \frac{(1-\theta)a\theta^i}{1-a\theta^\kval},& m < i \leq m + \kval .\\
574: \end{array}
575: \right.
576: $$ where $\kval$ is as given in the statement of the theorem, or $1$
577: if no such $\kval$ exists.  
578: 
579: Weights $\boldw_m(0)$ through $\boldw_m(m)$ are decreasing, as are
580: $\boldw_m(m+1)$ through $\boldw_m(m+\kval)$.  Thus we can combine the
581: nodes with weights $\boldw_{m}(m)$ and $\boldw_m(m+\kval)$ if
582: $$\frac{(1-\theta)a\theta^{m+\kval}}{1-a\theta^\kval} \leq 
583: (1-\theta)\theta^{m-1}$$
584: and
585: $$\frac{(1-\theta)a\theta^{m+\kval-1}}{1-a\theta^\kval} >
586: (1-\theta)\theta^m \mbox{ or } \kval=1.$$ These conditions
587: are equivalent to the left and right sides, respectively, of
588: (\ref{ineq}).  Thus the combined item is
589: $$\boldw_{m-1}(m) = \frac{(1-\theta)a\theta^m}{1-a\theta^\kval}$$ 
590: and the code is reduced to the $\bigw_{m-1}$ case.
591: 
592: After merging the two smallest weights for $m=0$, the reduced source
593: is $$\boldw_{-1}(i) = \frac{(1-\theta)a\theta^i}{1-a\theta^\kval}, ~ 0
594: \leq i \leq \kval-1 .$$ For $\kval=1$ (including all instances of the
595: degenerate $a \leq 0.5$ case and all instances in which (\ref{ineq})
596: cannot be satisfied), this proves that the optimal tree is the
597: truncated unary tree.  Considering now only $\kval>1$ for $m \geq
598: \kval-1$, the two-queue algorithm assures that, when the problem is
599: reduced to weights $\{\boldw_{-1}(i)\}$, all corresponding nodes are
600: in the combined-item queue.  Lemma~\ref{thelemma} thus proves that
601: these nodes form a complete code.  The overall optimal tree for any
602: $m$-reduced code with $m \geq \kval-1$ is then a truncated Golomb
603: tree, as pictorially represented in Fig.~\ref{buildgolo}, where $m=17$
604: and $\kval=5$.  Note that $m+1$ is the number of leaves in common with
605: what we call the ``Golomb tree,'' the tree we show to be optimal for
606: the original geometric source.  The number of remaining leaves in the
607: truncated tree is~$\kval$, which is thus the number of distinct unary
608: subtrees in the Golomb tree.
609: 
610: Fig.~\ref{buildgolo} represents both the truncated and full Golomb
611: trees, along with how to merge the weights.  Squares represent items
612: to code, while circles represent other nodes of the tree.  Smaller
613: weights are below larger ones, so that items are merged as pictured.
614: Rounded squares are items $m+1$ through $m+\kval$, the items which are
615: replaced in the Golomb tree by unary subtrees, that is, subtrees
616: representing the unary code.  Other squares are items $0$ through $m$,
617: those corresponding to single items in the integer code.  White
618: circles are the leaves used for the complete tree.
619: 
620: \begin{figure*}[ht]
621: \psfrag{L-Ha}{$\bar{R}_a(\biglen_{\theta,a}^*,\bigp_\theta)$}
622: \psfrag{a}{$a$}
623: \psfrag{ag}{\mbox {\huge $a$}}
624: \psfrag{Theta}{$\theta$}
625:      \centering
626:      \subfigure[$a>1$]
627: 	       { \label{apos} \includegraphics[width=.45\textwidth]{R0.eps} }
628:      \subfigure[$a<1$]
629: 	       { \label{aneg} \includegraphics[width=.45\textwidth]{R1.eps} }
630:      \caption{Redundancy of the optimal code for the geometric
631:      distribution with the exponential penalty (parameter $a$).
632:      $\bar{R}_a(\biglen_{\theta,a}^*,\bigp_\theta) =
633:      \CampCost_a(\bigp_\theta,\biglen_{\theta,a}^*) - H_{\alpha(a)}(\bigp_\theta)$,
634:      where $\alpha(a) = (1+\lg a)^{-1}$, $\bigp_\theta$ is the
635:      geometric probability sequence implied by $\theta$, and
636:      $\biglen_{\theta,a}^*$ is the optimal length sequence for
637:      distribution $\bigp_\theta$ and parameter $a$.}
638:      \label{aall}
639: \end{figure*}
640: 
641: It is equivalent to follow the complete portion of the code with the
642: unary portion --- as in the exponential Huffman tree in
643: Fig.~\ref{buildgolo} --- or to reorder the bits and follow the unary
644: portion with the complete portion --- as in the Golomb
645: code\cite{Golo}.  The latter is more often used in practice and has
646: the advantage of being alphabetic, that is, $i>j$ if and only if
647: $c(i)$ is lexicographically after $c(j)$.
648: 
649: The truncated Golomb tree for any $m \geq \kval-1$ represents a code
650: that has the same penalty for the $m$-reduced distribution as does the
651: Golomb code with the corresponding geometric distribution.  We now
652: show that this is the minimum penalty for any code with this geometric
653: distribution.
654: 
655: Let $\biglen_{\theta,a}^*$ (or $\biglen^*$ if there is no ambiguity)
656: be codeword lengths that minimize the penalty for the geometric
657: distribution (which, as we noted, exist as shown in Theorem~2
658: of~\cite{Baer06}).  Let $\biglen_m$ be codeword lengths for the
659: $m$-reduced distribution found earlier; that is, $\len_m(i)$ is the
660: Golomb length for $i \leq m$ and $\len_m(i) = \len_m(i-\kval)$ for the
661: remaining values.  Finally, let $\biglen_{\infty}$ be the lengths of
662: the code implied by $m \rightarrow \infty$, that is, the lengths of
663: the Golomb code G$\kval$.  Then
664: \begin{equation}
665: \begin{array}{rcl}
666: \displaystyle
667: \log_a \sum_{i=0}^\infty \boldp(i) a^{\len^*(i)} &\leq&
668: \displaystyle
669: \log_a \sum_{i=0}^\infty \boldp(i) a^{\len_{\infty}(i)} \\
670: &=&
671: \displaystyle
672: \log_a \sum_{i=0}^{m+\kval} \boldw_m(i) a^{\len_m(i)} \\
673: &\leq&
674: \displaystyle
675: \log_a \sum_{i=0}^{m+\kval} \boldw_m(i) a^{\len^*(i)} 
676: \end{array}
677: \label{fininf}
678: \end{equation}
679: where the inequalities are due to the optimality of the respective
680: codes and the facts that $\boldw_m(i)=\boldp(i)$ for $i \leq m$ and
681: $$\boldw_m(i)=\sum_{j=0}^\infty (1-\theta)\theta^{i+j\kval}a^{j+1} =
682: \sum_{j=0}^\infty a^{j+1} \boldp(i+j\kval)$$
683: for $i \in (m,m+\kval]$.  The difference between the exponent of the
684: first and the last of the expressions in (\ref{fininf}) is
685: $$
686: \begin{array}{l}
687: \displaystyle
688: \sum_{i=0}^\infty \boldp(i) a^{\len^*(i)} - \sum_{i=0}^{m+\kval}
689: \boldw_m(i) a^{\len^*(i)} \\
690: \displaystyle
691: \qquad ~ = 
692: \sum_{i=m+1}^\infty \boldp(i) a^{\len^*(i)}
693: - \sum_{i=m+1}^{m+\kval} \boldw_m(i) a^{\len^*(i)} .
694: \end{array}
695: $$ As $m \rightarrow \infty$ for $m \geq \kval-1$, the sums on the
696: right-hand side approach~$0$; the first is the difference between a
697: limit (an infinite sum) and its approaching sequence of finite sums,
698: all upper bounded in~(\ref{fininf}), and each of the terms in the
699: second summation is upper-bounded by a multiplicative constant of the
700: corresponding term in the first.  (In the latter finite summation,
701: terms are $0$ for $i>m+\kval$.)  Their difference therefore also
702: approaches zero, so the summations on the left-hand side approach
703: equality, as do those in (\ref{fininf}), and the Golomb code must be
704: optimal.
705: \end{proof}
706: 
707: It is equivalent for the bits of the unary portion to be complemented,
708: that is, to use $\{0^{\lfloor j/\kval \rfloor} 1 b(j \bmod
709: \kval,\kval) : j \geq 0\}$ (as in \cite{GaVV}) instead of
710: $\{1^{\lfloor j/\kval \rfloor} 0 b(j \bmod \kval,\kval) : j \geq 0\}$
711: (as in \cite{Golo}).  It is also worth noting that Golomb originally
712: proposed his code in the context of a spy reporting run lengths; this
713: is similar to R\'{e}nyi's context for communications, related in
714: Section~\ref{intro} as a motivation for the nonlinear penalty with
715: $a<1$.
716: 
717: A little algebra reveals that, for a distribution $\bigp_\theta$ and a Golomb
718: code with parameter $\kval$ (lengths $\biglen_\kval$), 
719: \begin{equation}
720: \begin{array}{rcl}
721: \CampCost_a(\bigp_\theta,\biglen_\kval) &=& \displaystyle
722: \log_a \sum_{i=0}^\infty
723: (1-\theta)\theta^i a^{(\left\lceil\frac{i+1-z}{\kval} \right\rceil + g)} \\ 
724: \displaystyle
725: &=& g + {\log}_a
726: \left(1+\frac{(a-1)\theta^z}{1-a\theta^\kval}\right) 
727: \end{array}
728: \label{geosum}
729: \end{equation}
730: where
731: $g=\lfloor \log_2 \kval \rfloor + 1$ and $z = 2^g - \kval$.  
732: Therefore, Theorem~\ref{optgeo} provides the $\kval$ that minimizes
733: (\ref{geosum}).  If $a>0.5$, the corresponding R\'{e}nyi entropy is
734: \begin{equation}
735: H_{\alpha(a)}(\bigp_\theta) = \log_a
736: \frac{1-\theta}{(1-\theta^{\alpha(a)})^{1/\alpha(a)}}
737: \label{geoent}
738: \end{equation}
739: where we recall that $\alpha(a) = (1 +
740: \lg a)^{-1}$.  (Again, $a \leq 0.5$ is degenerate, an
741: optimal code being unary with no corresponding R\'{e}nyi entropy.)
742: 
743: In evaluating the effectiveness of the optimal code, one might use the
744: following definition of {\defn average pointwise redundancy} (or just
745: {\defn redundancy}): $$\bar{R}_a(\biglen, \bigp) \definedas
746: \CampCost_a (\bigp,\biglen) - H_{\alpha(a)}(\bigp) .$$
747: For nondegenerate values, we can plot the $\bar{R}_a(\biglen_{\theta,a}^*,
748: \bigp_\theta)$ obtained from the minimization.  This is done for $a>1$
749: and $a<1$ in Fig.~\ref{aall}.  Note that as $a \rightarrow 1$, the
750: plot approaches the redundancy plot for the linear case, e.g.,
751: \cite{GaVV}, reproduced as Fig.~\ref{shannon}.
752: 
753: In many potential applications of nonlinear penalties --- such as the
754: aforementioned for $a>1$\cite{Jeli,Humb2,BlMc} and $a<1$
755: (Section~\ref{intro}) --- $a$ is very close to~$1$.  Since the preceding
756: analysis shows that the Golomb code that is optimal for given $a$
757: and $\theta$ is optimal not only for these particular values, but for
758: a range of $a$ (fixing $\theta$) and a range of $\theta$ (fixing $a$),
759: the Golomb code for the traditional linear penalty is, in some sense,
760: much more robust and general than previously appreciated.
761: 
762: \begin{figure}[t]
763: \psfrag{L-H}{\mbox{\huge $\bar{R}_1(\biglen_{\theta,1}^*,\bigp_\theta)$}}
764: \psfrag{THETA}{\mbox{\huge $\theta$}}
765:      \centering
766:      \resizebox{8cm}{!}{\includegraphics{Rlin.eps}}
767:      \caption{Redundancy of the optimal code for the geometric
768:      distribution with the traditional linear penalty.}
769:      \label{shannon}
770: \end{figure}
771: 
772: \section{Other Infinite Sources}
773: \label{other}
774: 
775: Abrahams noted that, in the linear case, slight deviation from the
776: geometric distribution in some cases does not change the optimal
777: code\cite[Proposition~(2)]{Abr1}.  Other extensions to and deviations
778: of the geometric distribution have also been
779: considered\cite{MSW,GoMa,BCSV}, including optimal codes for nonbinary
780: alphabets\cite{Abr1,GoMa}.  Many of these approaches can be adapted to
781: the nonlinear penalties considered here.  However, in this section we
782: instead consider another type of probability distribution for binary
783: coding, the type with a light tail.
784: 
785: Humblet's approach\cite{Humb1}, later extended in \cite{KHN}, uses the
786: fact that there is an optimal code tree with a unary subtree for any
787: probability distribution with a relatively light tail, one for which
788: there is an $r$ such that, for all $j>r$ and $i<j$, $\boldp(i) \geq
789: \boldp(j)$ and $\boldp(i) \geq \sum_{k=j+1}^\infty \boldp(k)$.  Due to
790: the additive nature of Huffman coding, items beyond $r$ form the unary
791: subtree, while the remaining tree can be coded via the Huffman
792: algorithm.  Once again, this has to be modified for exponential
793: penalties.
794: 
795: \begin{figure*}
796: \psfrag{  0x}{\mbox{\tiny $p(0)$}}
797: \psfrag{  1x}{\mbox{\tiny $p(1)$}}
798: \psfrag{  2x}{\mbox{\tiny $p(2)$}}
799: \psfrag{  3x}{\mbox{\tiny $p(3)$}}
800: \psfrag{  4x}{\mbox{\tiny $p(4)$}}
801: \psfrag{  5x}{\mbox{\tiny $p(5)$}}
802: \psfrag{  6x}{\mbox{\tiny $p(6)$}}
803: \psfrag{  7x}{\mbox{\tiny $p(7)$}}
804: \psfrag{  8x}{\mbox{\tiny $p(8)$}}
805: \psfrag{  9x}{\mbox{\tiny $p(9)$}}
806: \psfrag{  10x}{\mbox{\tiny $p(10)$}}
807: \psfrag{  11x}{\mbox{\tiny $p(11)$}}
808: \psfrag{  12x}{\mbox{\tiny $p(12)$}}
809: \psfrag{  13r}{\mbox{\tiny $w(13)$}}
810: \begin{center}
811: \resizebox{14cm}{!}{\includegraphics{inf_humb3d.eps}}
812: \caption{Formation of a unary-ended infinite code using a Huffman-like
813: code.  (Smaller weights are pictorially lower.)  Weights are merged
814: bottom-up, in a manner consistent with the exponential Huffman
815: algorithm, first in the (truncated) unary subtree, then as in the
816: exponential Huffman algorithm.}
817: \label{buildhumb}
818: \end{center}
819: \end{figure*}
820: 
821: We wish to show that the optimal code can be obtained when there is a
822: nonnegative integer $r$ such that, for all $j>r$ and $i<j$, $$\boldp(i)
823: \geq \max\left(\boldp(j), \sum_{k=j+1}^\infty \boldp(k) a^{k-j}\right).$$
824: The optimal code is obtained by considering the reduced alphabet
825: consisting of symbols $0,1,\ldots,r+1$ with weights
826: \begin{equation}
827: \boldw(i) = \left\{
828: \begin{array}{ll}
829: \boldp(i),& i \leq r \\
830: \sum_{k=r+1}^\infty \boldp(k) a^{k-r},& i = r+1 . \\
831: \end{array}
832: \right.
833: \label{weights}
834: \end{equation}
835: Apply exponential Huffman coding to this reduced set of weights.  For
836: items $0$ through $r$, the Huffman codewords for the reduced and the
837: infinite alphabets are identical.  Each other item $i>r$ has a
838: codeword consisting of the reduced codeword for item $r+1$ (which,
839: without loss of generality, consists of all $1\s$) followed by the
840: unary code for $i-r-1$, that is, $i-r-1$ ones followed by a zero.  We
841: call such codes {\defn unary-ended}.  A pictorial example is shown in
842: Fig.~\ref{buildhumb} for a problem instance for which $r=12$.
843: 
844: \begin{theorem}
845: Let $\boldp(\cdot)$ be a probability measure on the set of nonnegative
846: integers, and let $a$ be the parameter of the penalty to be optimized.
847: If there is a nonnegative integer $r$
848: such that for all $j>r$ and $i<j$,
849: \begin{equation}
850: \boldp(i) \geq \boldp(j)
851: \label{cond1}
852: \end{equation}
853: and
854: \begin{equation}
855: \boldp(i) \geq \sum_{k=j+1}^\infty \boldp(k) a^{k-j}
856: \label{cond2}
857: \end{equation}
858: then there exists a minimum-penalty binary prefix code with every
859: codeword~$j>r$ consisting of $j-x$ $1\s$ followed by one $0$ for some
860: fixed nonnegative integer~$x$.
861: \label{tailthm}
862: \end{theorem}
863: 
864: \begin{proof}
865: The idea here is similar to that for geometric distributions, to show
866: a sequence of finite codes which in some sense converges to the
867: optimal code for the infinite alphabet.  In this case we consider the
868: infinite sequence of codes implicit in the above; for a given $m \geq -1$, the
869: corresponding codeword weights are
870: $$
871: \boldw_m(i) = \left\{
872: \begin{array}{ll}
873: \boldp(i),& i < r+m+2 \\
874: \sum_{k=r+m+2}^\infty \boldp(k) a^{k-r-m-1},& i = r+m+2. \\
875: \end{array}
876: \right.
877: $$  It is obvious that an optimal code for
878: each $m$-reduced distribution is identical to the proposed code for
879: the infinite alphabet, except for the item $r+m+2$, which is the
880: code tree sibling of item $r+m+1$.  
881: 
882: For $a<1$, we show, as in the geometric case, that the difference
883: between the penalties for the optimal and the proposed codes
884: approaches~$0$.  In this case, the equivalent of
885: inequality~(\ref{fininf}) is
886: \begin{equation}
887: \begin{array}{rcl}
888: \displaystyle
889: \log_a \sum_{i=0}^\infty \boldp(i) a^{\len^*(i)} &\leq&
890: \displaystyle
891: \log_a \sum_{i=0}^\infty \boldp(i) a^{\len_{\infty}(i)} \\
892: &=&
893: \displaystyle
894: \log_a \sum_{i=0}^{r+m+2} \boldw_m(i) a^{\len_m(i)} \\
895: &\leq&
896: \displaystyle
897: \log_a \sum_{i=0}^{r+m+2} \boldw_m(i) a^{\len^*(i)} 
898: \end{array}
899: \label{fininf2}
900: \end{equation}
901: where in this case $n_\infty(i)$ denotes a codeword of the proposed
902: code, $n_m(i) = n_\infty(i)$ for $i<r+m+2$ and $n_m(i) =
903: n_\infty(i-1)$ for $i=r+m+2$, and, again, $\len^*(\cdot)$ denotes the
904: lengths of codewords in an optimal code.  The corresponding difference
905: between the exponent of the first and the last expressions of
906: (\ref{fininf2}) is
907: \begin{equation}
908: \begin{array}{l}
909: \displaystyle
910: \sum_{i=0}^\infty \boldp(i) a^{\len^*(i)} - 
911: \sum_{i=0}^{r+m+2} \boldw_m(i) a^{\len^*(i)} \\
912: \displaystyle
913: \qquad = 
914: \sum_{i=r+m+2}^\infty 
915: \boldp(i) a^{\len^*(i)} - \boldw_m(r+m+2) 
916: a^{\len^*(r+m+2)}. 
917: \end{array}
918: \label{fininf3}
919: \end{equation}
920: As $m \rightarrow \infty$, both terms in the difference
921: on the second line of (\ref{fininf3}) clearly approach $0$, so the
922: terms in~(\ref{fininf2}) approach equality, showing the proposed code to
923: be optimal.
924: 
925: For $a>1$, the same method will work, but it is not so obvious that
926: the terms in the difference on the second line of (\ref{fininf3})
927: approach~$0$.  Let us first find an upper bound for $\boldw_m(r+m+2)$
928: in terms of $\boldp(r+m+2)$:
929: \begin{eqnarray*}
930: \boldw_m(r+m+2) 
931: &=& a\boldp(r+m+2)+a^2\boldp(r+m+3)+\\
932: && \displaystyle\qquad \sum_{i=r+m+4}^\infty \boldp(i) a^{i-r-m-1} \\
933: &\leq& (a^2+a)\boldp(r+m+2)+a^2\boldp(r+m+3) \\
934: &\leq& (2a^2+a)\boldp(r+m+2)
935: \end{eqnarray*}
936: where the first equality is due to the definition of
937: $\boldw_m(\cdot)$, the first inequality due to (\ref{cond2}), and the
938: second inequality due to (\ref{cond1}).  Thus $\boldw_m(r+m+2)$ has an
939: upper bound of $(2a^2+a)\boldp(r+m+2)$ for all $m \geq -1$.  In
940: addition, since the proposed code has a finite penalty --- identical
941: to that of any reduced code --- the optimal code has a finite penalty,
942: and the sequence of its terms --- each one of which has the form
943: $\boldp(r+m+2) a^{\len^*(r+m+2)}$ --- approaches $0$ as $m$ increases.
944: Thus $\boldw_m(r+m+2) a^{\len^*(r+m+2)}$ approaches $0$ as well.  Due
945: to the optimality of $\len^*(\cdot)$, $\boldw_m(r+m+2)
946: a^{\len^*(r+m+2)}$ serves as an upper bound for $\sum_{i=r+m+2}^\infty
947: \boldp(i) a^{\len^*(i)}$, and thus both terms approach~$0$.  As with
948: $a<1$, then, the terms in~(\ref{fininf2}) approach equality for $m
949: \rightarrow \infty$, showing the proposed code to be optimal.
950: \end{proof}
951: 
952: The rate at which $\boldp(\cdot)$ must decrease in order to satisfy
953: condition~(\ref{cond2}) clearly depends on $a$.  One simple sufficient
954: condition --- provable via induction --- is that it satisfy $\boldp(i)
955: \geq a \boldp(i+1) + a \boldp(i+2)$ for large $i$.  A less general
956: condition is that $\boldp(i)$ eventually decrease at least as quickly
957: as $g^i$ where $g = (\sqrt{1+4/a}-1)/2$, the same ratio needed for a
958: unary geometric code for $\theta=g$, as in~(\ref{ineq}).  The ratio
959: $g$ is plotted in Fig.~\ref{ga}.
960: 
961: \begin{figure}[t]
962: \psfrag{a}{$a$}
963: \psfrag{ag}{\mbox {\huge $a$}}
964: \psfrag{g}{\mbox {\huge $g$}}
965:      \centering
966:      \resizebox{8cm}{!}{\includegraphics{ga2.eps}}
967:      \caption{Ratio $g$, probability distribution fall-off sufficient
968:      for the optimality of a unary-ended code.  Note that 
969:      $1/g = \Phi \definedas \frac{1}{2}(1+\sqrt{5})$,
970:      the golden ratio, at $a=1$.}
971:      \label{ga}
972: \end{figure}
973: 
974: For $a \rightarrow 1$, these conditions approach those derived in
975: \cite{Humb1}.  The stronger results of \cite{KHN} do not easily extend
976: here due to the nonadditivity of the exponential penalty.  An attempt
977: at such an extension in \cite[pp.~103--105]{Baer} gives no criteria
978: for success, so that, while one could produce certain codewords for
979: certain codes, one might fail in producing other codewords for the
980: same codes or for other codes.  Thus this extension is not truly a
981: workable algorithm.
982: 
983: Consider the example of optimal codes for the Poisson distribution,
984: $$\boldp_\lambda(i)=\frac{\lambda^i e^{-\lambda}}{i!} . $$ How does
985: one find a suitable value for $r$ (as in Section~\ref{other}) in such
986: a case?  It has been shown that $r \geq \lceil e \lambda \rceil - 1$
987: yields $\boldp(i) \geq \boldp(j)$ for all $j>r$ and $i<j$, satisfying
988: the first condition of Theorem~\ref{tailthm} \cite{Humb1}.  Moreover,
989: if, in addition, $j \geq \lceil 2 a \lambda \rceil - 1$ (and thus $j >
990: a \lambda - 1$), then
991: \begin{eqnarray*}
992: \sum_{k=1}^\infty \boldp(j+k)a^k 
993: &=& \frac{e^{-\lambda}\lambda^j}{j!}\left[
994: \frac{a \lambda}{j+1} + \frac{a^2 \lambda^2}{(j+1)(j+2)} + \cdots \right] \\
995: &<& \boldp(j) \left[\frac{a \lambda}{j+1} + \frac{a^2 \lambda^2}{(j+1)^2} + \cdots \right] \\
996: &=& \boldp(j) \frac{\frac{a \lambda}{j+1}}{1-\frac{a \lambda}{j+1}} \\
997: &\leq& \boldp(j) \\
998: &\leq& \boldp(i) .
999: \end{eqnarray*}
1000: Thus, since we consider $j > r$, $r = \max(\lceil 2 a \lambda \rceil -
1001: 2, \lceil e \lambda \rceil - 1)$ is sufficient to establish an $r$
1002: such that the above method yields the optimal infinite-alphabet code.
1003: 
1004: In order to find the optimal reduced code, use
1005: $$\boldw_{-1}(r+1)=\sum_{k=r+1}^\infty \boldp(k) a^{k-r} = a^{-r}e^{\lambda(a-1)} - \sum_{k=0}^r \boldp(k) a^{k-r} .
1006: $$  For example, consider the Poisson distribution with $\lambda = 1$.  We
1007: code this for both $a=1$ and $a=2$.  For both values, $r = 2$, so both
1008: are easy to code.  For $a=1$, $\boldw_{-1}(3) = 1 - 2.5 e^{-1} \approx
1009: 0.0803 \ldots$, while, for $a=2$, $\boldw_{-1}(3) = 0.25 e - 1.25
1010: e^{-1} \approx 0.2197 \ldots$.  After using the appropriate Huffman
1011: procedure on each reduced source of $4$ weights, we find that the
1012: optimal code for $a=1$ has lengths $\biglen = \{1, 2, 3, 4, 5, 6, \ldots\}$
1013: --- those of the unary code --- while the optimal code for $a=2$ has
1014: lengths $\biglen = \{2, 2, 2, 3, 4, 5, \ldots\}$.
1015: 
1016: It is worthwhile to note that these techniques are easily extensible
1017: to finding an optimal alphabetic code --- that is, one with $c(i)\s$
1018: arranged in lexicographical order --- for $a>1$.  One needs only to
1019: find the optimal alphabetic code for the reduced code with weights
1020: given in equation~(\ref{weights}), as in \cite{HKT}, with codewords
1021: for $i>r$ consisting of the reduced code's codeword for $r+1$ followed
1022: by $i-r-1$ ones and one zero.  As previously mentioned, Golomb codes
1023: are also alphabetic and thus are optimal alphabetic codes for the
1024: geometric distribution.
1025: 
1026: \section{Application: Buffer Overflow}
1027: \label{application}
1028: 
1029: The application of the exponential penalty in \cite{Humb2} concerns
1030: minimizing the probability of a buffer overflowing.  It requires that
1031: each candidate code for overall optimality be an optimal code
1032: for one of a series of exponential parameters ($a\s$ where $a>1$).  An
1033: iterative approach yields a final output code by noting that, for the
1034: overall utility function, each candidate code is no worse than 
1035: its predecessor, and there are a finite number of possible candidate
1036: codes.  Therefore, eventually a candidate code yields the same value
1037: as the prior candidate code, and this can be shown to be the optimal
1038: code.  This application of exponential Huffman coding can, using the
1039: above techniques, be extended to infinite alphabets.
1040: 
1041: In the application, integers with a known distribution $\bigp$ arrive
1042: with independent intermission times having a known probability density
1043: function.  Encoded bits are sent at a given rate, with bits to be sent
1044: waiting in a buffer of fixed size.  Constant $b$ represents the buffer
1045: size in bits, random variable $T$ represents the probability
1046: distribution of source integer intermission times measured in units of
1047: encoded bit transmission time, and function $A(s)$ is the
1048: Laplace-Stieltjes transform of $T$, $\E[e^{-sT}]$.  
1049: 
1050: When the integers are coded using $\biglen = \{\len(i)\}$, the
1051: probability per input integer of buffer overflow is of the order of
1052: $e^{-s^*b}$, where $s^*$ is the largest $s$ such that
1053: $$
1054: f(\biglen,s) \leq 1
1055: $$
1056: where
1057: \begin{equation}
1058: f(\biglen,s) \definedas A(s) \sum_{i=0}^\infty
1059: \boldp(i)e^{s\len(i)} .
1060: \label{buff}
1061: \end{equation}
1062: 
1063: The previously known algorithm to maximize $s^*$ is as follows:
1064: 
1065: {\bf Procedure for Finding Code with Largest $s^*$} \cite{Humb2}
1066: 
1067: \begin{enumerate}
1068: \item Choose any $s_0 \in \Rp$.
1069: \item $j \leftarrow 0$.
1070: \item $j \leftarrow j+1$.
1071: \item Find codeword lengths $\biglen_j$ minimizing $\sum_i \boldp(i) e^{s_{j-1}
1072: \len(i)}$.
1073: \item Compute $s_j \definedas \max\{s \in \R : f(\biglen_j,s) \leq 1\}$.
1074: \item If $s_j \neq s_{j-1}$ then go to step 3; otherwise stop.
1075: \end{enumerate}
1076: 
1077: We can use the above methods in order to accomplish step 4, but we
1078: still need to examine how to modify steps 1 and 5 for an infinite
1079: input alphabet.
1080: 
1081: First note that, unlike in the finite case, $s^*<\infty$, that is, there
1082: always exists an $s^* \in \Rp$ such that, for all $s>s^*$,
1083: $f(\biglen,s) > 1$.  For any stable system, the buffer cannot receive
1084: integers more quickly than it can transit bits, so there is a positive
1085: probability that $\P[T \geq 1]$.  Thus the Laplace-Stieltjes transform
1086: $A(s)$ exceeds $c_1 e^{-s}$ for some constant $c_1>0$.  Also, without
1087: loss of generality, we can assume that $\boldp(i)$ is monotonic
1088: nonincreasing and an optimal $\len(i)$ is monotonic nondecreasing.
1089: This monotonicity means that $\len(i) \geq \lg i$, and there is no
1090: exponential base $a_0$ and offset constant $c_2$ for which
1091: $\sum_{i=0}^\infty \boldp(i) e^{s\len(i)} \leq a_0^{s+c_2}$ for all~$s
1092: \in \Rp$.  Thus the summation in~(\ref{buff}) must increase
1093: superexponentially, and, multiplying the $A(s)$ and summation terms,
1094: there is an $s$ such that $f(\biglen,s)>1$ for $s>s^*$.
1095: 
1096: For step 1, the initial guess proposed in \cite{Humb2} is an upper
1097: bound for all possible values of $s^*$.  The R\'{e}nyi entropy of
1098: $\bigp$ is used to find an initial guess using
1099: \begin{equation}
1100: A(s) \left(\sum_{i=0}^\infty \boldp(i)^{\frac{1}{1+\lg e^s}}\right)^{1+\lg e^s}
1101: \leq A(s) \sum_{i=0}^\infty \boldp(i)e^{s\len(i)},
1102: \label{humbbound}
1103: \end{equation}
1104: and choosing $s_0$ as the largest $s$ such that the left term of
1105: (\ref{humbbound}) is no greater than one.  Thus, $s_0 \geq s^*$ for any
1106: value of $s^*$ corresponding to step 5.
1107: 
1108: This technique is well-suited to a geometric distribution, for which
1109: entropy has the closed form shown in equation (\ref{geoent}), so
1110: $$A(s) \cdot \frac{1-\theta}{\left(1-\theta^{(1+\lg
1111: e^s)^{-1}}\right)^{1+\lg e^s}} \leq f(\biglen,s).$$ However, a general
1112: distribution with a light tail, such as the Poisson distribution,
1113: might have no closed form for this bound.  One solution to this is to
1114: use more relaxed lower bounds on the sum --- such as using a partial
1115: sum with a fixed number of terms --- yielding looser upper bounds
1116: for~$s^*$.  Another approach would be to note that, because of the
1117: light tail, the infinite sum can usually be quickly calculated to the
1118: precision of the architecture used.  Note, however, that no matter
1119: what the technique, the bound must be chosen so that $s_0$ is an
1120: real number and not infinity.  Partial sums may be refined to accomplish
1121: this.
1122: 
1123: In calculating $f(\biglen,s)$ for use in step 5, the geometric
1124: distribution has the closed-form value for $f$ obtainable from
1125: equation (\ref{geosum}), while the other distributions must instead
1126: rely on approximations of~$f$.  As before, this is easily done due to
1127: the light tail of the distribution.  Alternatively, a partial sum and
1128: a geometric approximation can be used to bound $f(\biglen,s)$ and thus
1129: $s^*$, and these two bounds used to find two codes.  If the two codes
1130: are identical, the algorithm may proceed; otherwise, we must roll back
1131: to the summation and improve the bounds until the codes are identical.
1132: 
1133: These variations make the steps of the algorithm possible, but the
1134: algorithm itself must also be proven correct with the variations.
1135: 
1136: \begin{theorem}
1137: Given a geometric distribution or an input distribution satisfying the
1138: conditions of Theorem~\ref{tailthm} for $a=e^{s_0}$, where $s_0$ is an
1139: upper-bound on $s^*$, the above Procedure for Finding Code
1140: with Largest $s^*$ terminates with an optimal code.
1141: \label{qthm}
1142: \end{theorem}
1143: 
1144: \begin{proof}
1145: The number of codes that can be generated in the course of running the
1146: algorithm should be bounded so that the algorithm is guaranteed to
1147: terminate.  Optimality for the algorithm then follows as for the
1148: finite case~\cite{Humb2}.  As in the finite case, $s_{j+1} \geq s_j$
1149: for $j \geq 1$ (but not $j=0$) due to step 5 [$f(\biglen_j,s_j) \leq
1150: 1$], step 4 [$f(\biglen_{j+1},s_j) \leq f(\biglen_j,s_j)$], and the
1151: definition of $s_{j+1}$.  
1152: 
1153: In the case of a geometric distribution,
1154: each $\biglen_j$ is a Golomb code G$\kval_j$ for some positive
1155: integer~$\kval_j$.  Clearly, if we choose $s_0$ as detailed above, it
1156: is the greatest value of $s_j$, being either optimal or unachievable
1157: due to its derivation as a bound of the problem.  Since
1158: $\mbox{G}\kval_i$ (with lengths $\biglen_i$) is the optimal code for
1159: the code with exponential base $a=e^{s_{i-1}}$, (\ref{ineq}) means
1160: that $\theta^{\kval_i} + \theta^{\kval_i+1} \leq e^{-s_{i-1}} <
1161: \theta^{\kval_i-1} + \theta^{\kval_i}$, and thus
1162: $$(1+\theta)\theta^{\kval_1} \leq e^{-s_0} \leq e^{-s_{j-1}} <
1163: (1+\theta)\theta^{\kval_j-1}$$ and, since $\theta < 1$, we have
1164: $\kval_j-1 < \kval_1$ (or, equivalently, $\kval_j \leq \kval_1$) for all
1165: $j \geq 1$.  Therefore, there are only $\kval_1$ possible codes the
1166: algorithm can generate.  
1167: 
1168: In the case of a distribution with a lighter tail, the minimum $r$ of
1169: Theorem~\ref{tailthm} increases with each iteration after the first,
1170: and the first $r_1$ (corresponding to $s_0$) upper bounds the
1171: remaining $r_i$.  Thus all candidate codes can be specified by their
1172: first $r_1$ codeword lengths, none of which is greater than $r_1$.
1173: The number of codes is then bounded for both cases, and the algorithm
1174: terminates with the optimal code.
1175: \end{proof}
1176: 
1177: \section{Redundancy penalties}
1178: \label{nonexp}
1179: 
1180: It is natural to ask whether the above results can be extended to
1181: other penalties.  One penalty discussed in the literature is that of
1182: maximal pointwise redundancy\cite{DrSz}, which is
1183: $$R^*(\biglen,\bigp) \definedas \sup_{i \in \X} [\len(i)+\lg
1184: \boldp(i)]$$ where we use $\sup$ when we are not assured the existence
1185: of a maximum.  This can be shown to be a limit of the exponential
1186: case, as in \cite{Baer05}, allowing us to analyze its minimization
1187: using the same techniques as exponential Huffman coding.  This limit
1188: can be shown by defining {\defn $d$th exponential redundancy} as
1189: follows:
1190: \begin{eqnarray*}
1191: R_d(\biglen,\bigp) &\definedas&
1192: \frac{1}{d} \lg \sum_{i \in \X} \boldp(i) 2^{d\left(\len(i)+\lg \boldp(i)\right)} \\
1193:  &=& \frac{1}{d} \lg
1194: \sum_{i \in \X} \boldp(i)^{1+d} 2^{d\len(i)}.
1195: \end{eqnarray*}
1196: Thus $R^*(\biglen,\bigp) = \lim_{d \rightarrow \infty}
1197: R_d(\biglen,\bigp)$, and the above methods should apply in the limit.
1198: In particular:
1199: 
1200: \begin{theorem}
1201: The Golomb code G$\kval$ for $\kval = \lceil -1/\lg \theta \rceil$ 
1202: is optimal for minimizing maximal pointwise redundancy for $\bigp_\theta$. 
1203: \label{optmmr}
1204: \end{theorem}
1205: 
1206: \begin{figure*}%[htp]
1207: \psfrag{DABRRR}{$R_d(\biglen_{\theta,a,d}^*,\bigp_\theta)$}
1208: \psfrag{Theta}{$\theta$}
1209: \psfrag{theta}{$\theta$}
1210: \psfrag{THETA}{\mbox{\huge $\theta$}}
1211:      \centering
1212: \begin{picture}(0,0)%
1213: \includegraphics{radrat.eps}%
1214: \end{picture}%
1215: \setlength{\unitlength}{1865sp}%
1216: %
1217: \begingroup\makeatletter\ifx\SetFigFont\undefined%
1218: \gdef\SetFigFont#1#2#3#4#5{%
1219:   \reset@font\fontsize{#1}{#2pt}%
1220:   \fontfamily{#3}\fontseries{#4}\fontshape{#5}%
1221:   \selectfont}%
1222: \fi\endgroup%
1223: \begin{picture}(16332,6837)(-8,-6007)
1224: \put(4051,-5911){\makebox(0,0)[b]{\smash{{\SetFigFont{8}{9.6}{\familydefault}{\mddefault}{\updefault}{(a) $\theta \in (0.5,1)$}%
1225: }}}}
1226: \put(12511,-5911){\makebox(0,0)[b]{\smash{{\SetFigFont{8}{9.6}{\familydefault}{\mddefault}{\updefault}{(b) $\theta \in (2^{-0.1},2^{-0.001})$, with $x$-axis $\propto \lg (- 1/{\lg \theta})$}%
1227: }}}}
1228: \end{picture}%
1229:      \caption{Maximal pointwise redundancy of the optimal maximal
1230:      redundancy code for the geometric distribution, solid
1231:      (with discontinuities represented by dashed); optimal $d$th exponential
1232:      redundancy for the geometric distribution, dotted for
1233:      $d=\{1,2,4,16,256,65536\}$, from lowest to highest.}
1234:      \label{mmr}
1235: \end{figure*}
1236: 
1237: \begin{proof}
1238: 
1239: {\it Case 1:} Consider first when $-1/\lg \theta$ is not an integer.
1240: We show that $\kval = \lceil -1/\lg \theta\rceil$ is optimal by
1241: finding a $D$ such that, for all $d > D$, the optimal code for the
1242: $d$th exponential redundancy penalty is G$\kval$.  For
1243: a fixed $d$, (\ref{ineq}) implies that such a code should satisfy
1244: \begin{equation}
1245: (\theta^{1+d})^\kval + (\theta^{1+d})^{\kval+1} \leq \frac{1}{2^d} <
1246: (\theta^{1+d})^{\kval-1} + (\theta^{1+d})^\kval,
1247: \label{dineq}
1248: \end{equation}
1249: and thus we wish to show that this holds for all $d > D$.  Consider
1250: $\kvals = \lceil -1/\lg \theta\rceil$.  Clearly, $\kvals >
1251: -1/\lg \theta$, or, equivalently, 
1252: \begin{equation}
1253: \theta^\kvals < \frac{1}{2}.
1254: \label{mmr1}
1255: \end{equation}
1256: Now consider $$D=-1+\frac{1}{1+(\kvals-1)\lg \theta}$$ so that
1257: $(\kvals-1)\lg \theta \in (-1,0]$ and therefore $D \geq 0.$ Taken
1258: together with the fact that $\theta \in (0,1)$, (\ref{mmr1}) yields
1259: $\theta^{d\kval} < 2^{-d}$ and $(1+\theta^{1+d})\theta^\kval <
1260: 2\theta^k < 1$.  Multiplication yields the left-hand side of
1261: (\ref{dineq}) for any $d > D$.  For any such $d$, algebra easily shows
1262: that we also have the inequality $(2\theta^{\kvals-1})^{1+d} \geq 2$,
1263: yielding
1264: \begin{eqnarray*}
1265: \left[(\theta^{1+d})^{\kvals-1}+(\theta^{1+d})^{\kvals}\right]2^d 
1266: &=& \frac{1}{2}(2\theta^{\kvals-1})^{1+d} + 
1267: \frac{1}{2}(2\theta^{\kvals})^{1+d} \\
1268: &=& \frac{1}{2}(2\theta^{\kvals})^{1+d}
1269: (\theta^{-1-d}+1) \\
1270: &=& \frac{1}{2}(2\theta^{\kvals-1})^{1+d}
1271: (1+\theta^{1+d}) \\
1272: &>& 1 .
1273: \end{eqnarray*}
1274: This is equivalent to the right-hand side of
1275: inequality~(\ref{dineq}) for the values implied by the definition of
1276: $R_d(\biglen,\bigp)$.  Then G$\kvals$ is an optimal code for
1277: $d > D$, and thus for the limit case of maximal pointwise redundancy.
1278: 
1279: {\it Case 2:} Now consider when $-1/\lg \theta$ is an integer.  It
1280: should be noted that, for the traditional (linear) penalty, these are
1281: precisely the $\kval$ values that Golomb considered in his original
1282: paper\cite{Golo} and that they are local infima for the minimum
1283: maximal pointwise redundancy function in~$\theta$, as in
1284: Fig.~\ref{mmr}.  Here we show they are local minima.
1285: 
1286: Since $\theta=0.5$ is a dyadic probability distribution and thus
1287: trivial, we can assume that $\theta > 0.5$.  We wish to show that
1288: optimality is preserved in these right limits of Case~1.  Note that,
1289: for each $i$ with fixed $\biglen$,
1290: $$\lim_{\theta' \uparrow \theta} \left[\len(i) + \lg
1291: \boldp_{\theta'}(i) \right] = \len(i) + \lg \boldp_{\theta}(i).$$ This
1292: is of particular interest for the value of $i$ maximizing pointwise
1293: redundancy for G$\kval$ at $\theta'$, where $\theta' \in
1294: (\theta^{1/\lg 2\theta}, \theta)$, allowing us to use the right limit
1295: of $\theta$.  Let $i^{**} \definedas 2^{\lceil \lg \kval \rceil}-\kval$, the
1296: smallest $i$ which has codeword length exceeding the codeword length
1297: for item~$0$.  Clearly the pointwise redundancy for this value is
1298: greater than that for all items with $i<i^{**}$, since they are one
1299: bit shorter but not more than twice as likely.  Similarly, items in
1300: $(i^{**},\kval)$ have identical length but lower probability, and thus
1301: smaller redundancy.  For items with $i \geq \kval$, note that the
1302: redundancy of items in the sequence $\{j, j+\kval, j+2\kval, \ldots\}$
1303: for any $j$ must be nonincreasing because the difference in redundancy
1304: is constant yet redundancy is upper-bounded by the maximum.  Thus
1305: $i^{**}$ maximizes pointwise redundancy for G$\kval$ at $\theta'$.
1306: 
1307: We know the pointwise redundancy of $i^{**}$ for G$\kval$ 
1308: at $\theta$, although we have yet to show that $i^{**}$ yields the
1309: maximal pointwise redundancy for G$\kval$ at $\theta$ or that G$\kval$ 
1310: minimizes maximal pointwise redundancy.  However, for any code,
1311: including the optimal code, as a result of pointwise continuity,
1312: \begin{eqnarray*}  
1313: \sup_{i \in \X_\infty} [\len(i)+\lg \boldp_\theta(i)] &\geq& \len(i^{**}) + \lg
1314: \boldp_\theta(i^{**}) \\ &=& \lim_{\theta' \uparrow \theta} [\len(i^{**}) +
1315: \lg \boldp_{\theta'}(i^{**})] .
1316: \end{eqnarray*}
1317: From the above discussion, it is clear that the right-hand side is
1318: minimized by the Golomb code with $\kval=-1/\lg \theta$, so, because
1319: the left-hand side achieves same value with this code, the left-hand
1320: side is indeed minimized by G$\kval$.  Thus this code
1321: minimizes maximal pointwise redundancy for~$\theta$.  The
1322: corresponding maximal pointwise redundancy is
1323: $$ 
1324: \begin{array}{l}
1325: \max_i [\len_\theta^{**}(i)+\lg \boldp_\theta(i)] \\
1326: \begin{array}{rcl}
1327: &=& \len_\theta^{**}(2^{\lceil \lg \kval \rceil}-\kval) +\lg
1328: \boldp_\theta(2^{\lceil \lg \kval \rceil}-\kval) \\ 
1329: &=& \lceil
1330: \lg \kval \rceil + 1 + \lg(1-\theta) + (2^{\lceil \lg \kval
1331: \rceil}-\kval) \lg \theta 
1332: \end{array}
1333: \end{array}
1334: $$
1335: where $\biglen_\theta^{**} = \{\len_\theta^{**}(i)\}$ is defined as
1336: the lengths of a code minimizing maximal pointwise redundancy.  Note
1337: that this is the redundancy for all items $i=2^{\lceil \lg \kval
1338: \rceil}+ j \kval$ with integer $j \geq -1$.
1339: \end{proof}
1340: 
1341: It is worthwhile to observe the behavior of maximal pointwise
1342: redundancy in a fixed (not necessarily optimal) Golomb code with
1343: length distribution $\biglen_\kval$.  The maximal pointwise redundancy
1344: $$R^*(\biglen_\kval,\bigp_\theta) = \sup_{i \in \X_\infty}
1345: [\len_\kval(i)+\lg \boldp_\theta(i)]$$ decreases with increasing
1346: $\theta$ --- and is an optimal code for $\theta \in (2^{-1/(\kval-1)},
1347: 2^{-1/\kval}]$ --- until $\theta$ exceeds $2^{-1/\kval}$, after which
1348: there is no maximum, that is, pointwise redundancy is unbounded.  This
1349: explains the discontinuous behavior of minimum maximal redundancy for
1350: an optimal code as a function of $\theta$, illustrated in
1351: Fig.~\ref{mmr}, where each continuous segment corresponds to an
1352: optimal code for $\theta \in (2^{-1/(\kval-1)}, 2^{-1/\kval}]$.
1353: 
1354: Note also the oscillating behavior as $\theta \uparrow 1$.  We show in
1355: Appendix~\ref{maxred} that $\lim \inf_{\theta \uparrow 1}
1356: R^*(\biglen_\theta^{**},\bigp_\theta) = 1-\lg \lg e$ and $\lim
1357: \sup_{\theta \uparrow 1} R^*(\biglen_\theta^{**},\bigp_\theta) = 2 -
1358: \lg e$, and we characterize this oscillating behavior.  This technique
1359: is extensible to other redundancy scenarios of the kind introduced
1360: in~\cite{Baer05}.
1361: 
1362: For distributions with light tails, one can use a technique much like
1363: the technique of Theorem~\ref{tailthm} in Section~\ref{other}.  First
1364: note that this requires, as a necessary step, the ability to construct
1365: a minimum maximal pointwise redundancy code for finite alphabets.
1366: This can be done either with the method in \cite{DrSz} or any of those
1367: in \cite{Baer05}, the simplest of which uses a variant of the
1368: tree-height problem\cite{Park}, solved via a different extension of
1369: Huffman coding.  Simply put, the weight combining rule, rather than
1370: $\boldw(j) + \boldw(k)$ or $a \cdot (\boldw(j) + \boldw(k))$, is
1371: \begin{equation}
1372: \tilde{\boldw}(j) = 2\cdot\max(\boldw(j),\boldw(k)).
1373: \label{maxrule}
1374: \end{equation}
1375: This rule is used to create an optimal code with lengths
1376: $\biglen^{(r)}$ for $\bigw^{(r)} \definedas \{\boldp(0), \boldp(1),
1377: \ldots, \boldp(r), 2\boldp(r+1)\}$, assuming a unary subtree for items
1378: with index $i\geq r$ (and no other items) is part of an optimal code
1379: tree.  As in the coding method corresponding to Theorem~\ref{tailthm},
1380: the codewords for items $0$ through $r$ of this reduced code are
1381: identical to those of the infinite alphabet.  Each other item $i>r$
1382: has a codeword consisting of the reduced codeword for $r+1$ followed
1383: by the unary code for $i-r-1$, that is, $i-r-1$ ones followed by a
1384: zero.
1385: 
1386: A sufficient condition for using this method is finding an $r$ such that
1387: $$\mbox{for all } i<r,~\boldp(i) \geq \boldp(r)$$
1388: and 
1389: $$\mbox{for all } j \geq r,~\boldp(j) \geq 2 \boldp(j+1).$$ For
1390: such~$j$, pointwise redundancy is nonincreasing along a unary subtree,
1391: as 
1392: \begin{eqnarray*}
1393: \len(j) + \lg \boldp(j) &=& \len(j+1) + \lg (\boldp(j)/2) \\
1394: &\geq& \len(j+1) + \lg \boldp(j+1).
1395: \end{eqnarray*}
1396: The aforementioned coding method works because, for each $j$, an
1397: optimal subtree consisting of the items with index $i\geq j$ and
1398: higher has $\len(i) = \len(j) - j + i$; this subtree is optimal because the
1399: weight of the root node of {\it any} subtree cannot be less than
1400: $2\boldp(j)$.  A formal proof, similar to that of
1401: Theorem~\ref{tailthm}, is omitted in the interest of space.
1402: 
1403: For a Poisson random variable, $r = \lceil e \lambda \rceil - 1$
1404: satisfies this condition, since, for $i < r \leq j$, $\boldp(i) \geq
1405: \boldp(r)$ (as in \cite{Humb1}), and
1406: $$\boldp(j) = \frac{j+1}{\lambda} \boldp(j+1) \geq \frac{r+1}{\lambda}
1407: \boldp(j+1) \geq e\boldp(j+1) > 2\boldp(j+1).$$  Thus such a random
1408: variable can be coded in this manner.
1409: 
1410: Note that other sufficient conditions can be obtained through
1411: alternative methods.  One simple rule is that any code for which $p(i)
1412: \leq 2^{-i}p(0)$ for all $i > 0$ will necessarily have $\len(0) + \lg
1413: p(0)$ minimized by letting $\len(0)=1$, and this will be the maximum
1414: redundancy if $\len(i)=i-1$ in general.  For example, a unary tree
1415: optimizes $\bigp = \{0.6, 0.15, 0.15, 0.0375, 0.0375, \ldots\}$, since
1416: $\lg 1.2 \approx 0.263$ is a lower bound on maximal pointwise
1417: redundancy for any code given $p(1)=0.6$, and this bound is achieved
1418: for the unary code.  If viewed as a rule for a unary subtree, this is
1419: looser than the above, since, unlike linear and exponential penalties,
1420: not all subtrees of the subtree need be optimal.  Other relaxations
1421: can be obtained, although, as they are usually not needed, we do not
1422: discuss them here.
1423: 
1424: \section{Conclusion}
1425: 
1426: The aforementioned methods for coding integers are applicable to
1427: geometric and light-tailed distributions with exponential and related
1428: penalties.  Although they are not direct applications of Huffman
1429: coding, per se, these methods are derived from the properties of
1430: generalizations of the Huffman algorithm.  This allows examination of
1431: subtrees of a proposed optimal code independently of the rest of the
1432: code tree, and thus specification of finite codes which in some sense
1433: converge to the optimal integer code.  Different penalties --- e.g.,
1434: $\varphi(x) = x^2$, implying the minimization of $\sqrt{\sum_i
1435: \boldp(i) \len(i)^2}$ --- do not share this independence property, as
1436: an optimal code tree with optimal subtrees need not exist.  Thus
1437: finding an optimal code for such penalties is more difficult.  There
1438: should, however, be cases in which this is possible for convex
1439: $\varphi$ which grow more slowly than some exponential.
1440: 
1441: Another extension of this work would be to find coding algorithms for
1442: other probability mass functions under the nonlinear penalties already
1443: considered, e.g., to attempt to use the techniques of
1444: \cite[pp.~103--105]{Baer} for a more reliable algorithm.  Other
1445: possible extensions and generalizations involve variants of geometric
1446: probability distributions; in addition to the one we mentioned that is
1447: analogous to Proposition~(2) in \cite{Abr1}, there are others in
1448: \cite{MSW, GoMa, BCSV}.  Extending these methods to nonbinary codes
1449: should also be feasible, following the approaches in \cite{Abr1} and
1450: \cite{KHN}.  Finally, as a nonalgorithmic result, it might be
1451: worthwhile to characterize {\it all} optimal codes --- not merely
1452: finding {\it an} optimal code --- as in \cite[p.~289]{Goli2}.
1453: 
1454: \section*{Acknowledgments}
1455: 
1456: The author wishes to thank the anonymous reviewers, David
1457: Morgenthaler, and Andrew Brzezinski for their suggestions in improving
1458: the rigor and clarity of this paper.
1459: 
1460: \appendices
1461: \section{Optimal Maximal Redundancy Golomb Codes for Large~$\theta$}
1462: \label{maxred}
1463: 
1464: Let us calculate optimal maximal redundancy as a function of $\theta
1465: \geq 0.5$:
1466: $$
1467: \begin{array}{rcl}
1468: R^*(\biglen_\theta^{**},\bigp_\theta) 
1469: &=& \max_i \len_\theta^{**}(i) + \lg \boldp_\theta(i) \\
1470: &=& 1 + 
1471: \left\lceil \lg \lceil - \frac{1}{\lg \theta} 
1472: \rceil \right\rceil + \\
1473: &&\lg (1-\theta) + \\
1474: &&\left(2^{\left\lceil \lg \lceil - \frac{1}{\lg \theta}
1475: \rceil \right\rceil} - \left\lceil - \frac{1}{\lg \theta} \right\rceil
1476: \right)\lg \theta \\
1477: &=& 1 - \left\lceil -\frac{1}{\lg \theta} \right\rceil \lg \theta + \\
1478: &&\lg \left( - \frac{1-\theta}{\lg \theta} \right) - \\
1479: && 2^{\left\lceil \lg \left(- \frac{1}{\lg \theta}\right)
1480: \right\rceil - \lg (- \frac{1}{\lg \theta})} + \\
1481: &&\left\lceil \lg \left(- \frac{1}{\lg \theta}\right)
1482: \right\rceil - \lg \left(- \frac{1}{\lg \theta}\right) \\
1483: &=& 2 + \lg \left( - \frac{1-\theta}{\lg \theta} \right) - 
1484: \left\lceil -\frac{1}{\lg \theta} \right\rceil \lg \theta - \\
1485: && 2^{1-\langle \lg (- \frac{1}{\lg \theta})
1486: \rangle} - \left\langle \lg \left(- \frac{1}{\lg \theta}\right)
1487: \right\rangle,
1488: \end{array}
1489: $$
1490: where $\langle x \rangle$ denotes the fractional
1491: part of $x$, i.e., $\langle x \rangle \definedas x - \lfloor x \rfloor$, since
1492: $$\left\lceil \lg \lceil - \frac{1}{\lg \theta} \rceil \right\rceil =
1493: \left\lceil \lg \left( - \frac{1}{\lg \theta} \right) \right\rceil$$
1494: for $\theta > 0.25$ (and thus for $\theta \geq 0.5$).  Using the
1495: Taylor series expansion about $\theta = 1$, we find
1496: $$ 
1497: \lg \left( - \frac{1-\theta}{\lg \theta} \right) = - \lg \lg e -
1498: (\lg \sqrt{e})(1-\theta)+\order((1-\theta)^2)
1499: $$
1500: where $e$ is the base of the natural logarithm.
1501: Additionally,
1502: $$-\left\lceil-\frac{1}{\lg \theta} \right\rceil \lg \theta = 1 +
1503: \order(1-\theta).$$ Note that this actually oscillates between $1$ and
1504: $1+(1-\theta)\lg e$ in the limit, so this first-order asymptotic term
1505: cannot be improved upon.  However, the remaining terms
1506: \begin{equation}
1507: 2-2^{1-\langle \lg (- \frac{1}{\lg \theta})
1508: \rangle} - \left\langle \lg \left(- \frac{1}{\lg \theta}\right)\right\rangle 
1509: \label{osc}
1510: \end{equation}
1511: oscillate in the zero-order term.  Assigning $x = \langle \lg (-
1512: 1/\lg \theta)\rangle$, we find that (\ref{osc}) achieves
1513: its minimum value, $0$, at $0$ and $1$.  The maximum point is easily
1514: found via a first derivative test.  This point is achieved at $x=1-\lg
1515: \lg e$, at which point (\ref{osc}) achieves the maximum value $1-\lg
1516: e+\lg \lg e$.  
1517: Thus, gathering all terms, 
1518: $$\lim \inf_{\theta \uparrow 1} R^*(\biglen_\theta^{**},\bigp_\theta) = 1-\lg
1519: \lg e = 0.4712336270 \ldots,$$ $$\lim \sup_{\theta \uparrow 1}
1520: R^*(\biglen_\theta^{**},\bigp_\theta) = 2 - \lg e = 0.5573049591 \ldots,$$ 
1521: and, overall,
1522: \begin{eqnarray*}
1523: R^*(\biglen_\theta^{**},\bigp_\theta)
1524: &=& 3 - \lg \lg e - \\
1525: && 2^{1-\langle \lg (- \frac{1}{\lg \theta})
1526: \rangle} - \left\langle \lg \left(- \frac{1}{\lg \theta}\right)
1527: \right\rangle+ \\
1528: && \order(1-\theta).
1529: \end{eqnarray*}
1530: This oscillating behavior is similar to that of the average redundancy
1531: of a complete tree, as in \cite{Gall} and \cite[p.~192]{Knu3}.
1532: Contrast this with the periodicity of the minimum {\it average}
1533: redundancy for a Golomb code:\cite{Szpa}
1534: \begin{eqnarray*}
1535: \bar{R}(\biglen_{\theta,1}^*,\bigp_\theta) 
1536: &=& 1 - \lg \lg e - \lg e + \\
1537: && 
1538: 2^{2-2^{1-\langle \lg (-\frac{1}{\lg \theta}) \rangle}} 
1539: - \left\langle \lg \left(-\frac{1}{\lg
1540:   \theta}\right) \right\rangle + \\
1541: &&\order(1-\theta)
1542: \end{eqnarray*}
1543: where $\biglen_{\theta,1}^*$ is the optimal code for the traditional
1544: (linear) penalty.
1545: 
1546: \section{Glossary of Terms}
1547: \label{glossary}
1548: 
1549: \tablefirsthead{\hline \multicolumn{1}{c}{Notation}
1550:                      & \multicolumn{1}{l}{~Meaning} \\ \hline }
1551: \tablehead{\hline \multicolumn{2}{l}{\small \sl ~~continued}\\
1552:            \hline \multicolumn{1}{c}{Notation}
1553:                      & \multicolumn{1}{l}{~Meaning} \\ \hline }
1554: \tabletail{}
1555: \tablelasttail{}
1556: \begin{supertabular}{l|l}
1557: $a$ & Base of exponential penalty \\
1558: $b(x,k)$ & $(x+1)$th codeword of complete binary code \\
1559: & with $k$ items (i.e., the order-preserving \\
1560: & [alphabetic] code having the first $2^{\lceil \lg k \rceil}-k$ \\
1561: & items with length $\lfloor \lg k \rfloor$ and the last \\
1562: & $2k - 2^{\lceil \lg k \rceil}$ items with length $\lceil \lg k \rceil$) \\
1563: $c(i)$ & Codeword (for symbol) $i$ \\
1564: $C$ & Code $\{c(i)\}$ \\
1565: $e$ & Base of the natural logarithm ($e \approx 2.71828$) \\
1566: G$\kval$ & Golomb code with parameter $\kval$, one of the \\
1567: &form $\{1^{\lfloor j/\kval \rfloor} 0 b(j \bmod \kval, \kval) : j \geq 0\}$ \\
1568: $H_{\alpha}(\bigp)$ & R\'{e}nyi entropy $(1-\alpha)^{-1} \lg \sum_{i \in \X} 
1569: \boldp(i)^{\alpha}$ \\
1570: & (or, if $\alpha \in \{0,1,\infty\}$, the limit of this) \\
1571: $i^{**}$ & Index of the codeword that, among a \\
1572: & given code's inputs $i \in \X$, maximizes \\
1573: & pointwise redundancy, $\len(i)+\lg \boldp(i)$ \\
1574: $j \bmod k$ & $j-k \lfloor j/k \rfloor$ \\
1575: $\CampCost_a (\bigp,\biglen)$ & Penalty $\log_a \sum_{i\in \X} \boldp(i) a^{\len(i)}$ \\
1576: $\len(i)$ & Length of codeword (for symbol) $i$ \\
1577: $\biglen$ & $\{\len(i)\}$, the lengths for a given code \\
1578: $\len^{(r)}(i)$ & Length of codeword $i$ of an optimal code \\
1579: & minimizing maximum redundancy for $\bigw^{(r)}$ \\ 
1580: $\biglen^{(r)}$ & $\{\len^{(r)}(i)\}$, the lengths of an optimal code \\
1581: & minimizing maximum redundancy for $\bigw^{(r)}$  \\
1582: $\len^*(i)$ & Length of codeword $i$ of an optimal code \\
1583: & for an exponential penalty, $\CampCost$ \\
1584: $~~(\len_{\theta,a}^*(i))$ & ~~(...if $\theta$ and $a$ are specified) \\
1585: $\biglen^*$ & $\{\len^*(i)\}$, the lengths of an optimal code \\
1586: $~~(\biglen_{\theta,a}^*)$ & ~~(...if $\theta$ and $a$ are specified) \\
1587: $\len_{\theta,a,d}^*(i)$ & Length of codeword $i$ of an optimal code \\
1588: & minimizing $d$th exponential redundancy \\
1589: $\biglen_{\theta,a,d}^*$ & $\{\len_{\theta,a,d}^*(i)\}$, the lengths of an optimal code\\
1590: & minimizing $d$th exponential redundancy \\
1591: $\len^{**}(i)$ & Length of codeword $i$ of an optimal code \\
1592: & minimizing maximum redundancy \\
1593: $\biglen^{**}$ & $\{\len^{**}(i)\}$, the lengths of an optimal code \\
1594: & minimizing maximum redundancy \\
1595: $\order(\cdot)$ & Order of $\cdot$ asymptotic complexity \\
1596: $\boldp(i)$ & Probability of input symbol $i$ \\
1597: $~~(\boldp_\theta(i))$ & ~~(...for geometric dist\textsuperscript{r} with parameter $\theta$) \\
1598: $~~(\boldp_\lambda(i))$ & ~~(...for Poisson dist\textsuperscript{r} with parameter~$\lambda$) \\
1599: $\bigp$ & $\{\boldp(i)\}$, the input probability mass function \\
1600: $~~(\bigp_\theta)$ & ~~(...for geometric dist\textsuperscript{r} with parameter $\theta$) \\
1601: $~~(\bigp_\lambda)$ & ~~(...for Poisson dist\textsuperscript{r} with parameter~$\lambda$) \\
1602: $\bar{R}_a(\biglen, \bigp)$&$\CampCost_a (\bigp,\biglen) - H_{\alpha(a)}(\bigp)$, the average \\
1603: &pointwise redundancy \\
1604: $R_d(\biglen, \bigp)$&$d^{-1} \lg \sum_{i \in \X} \boldp(i) 2^{d\left(\len(i)+\lg \boldp(i)\right)}$, \\
1605: &the $d$th exponential redundancy \\
1606: $R^*(\biglen, \bigp)$&$\max_{i \in \X} [\len(i)+\lg \boldp(i)]$, the maximum \\
1607: & pointwise redundancy \\
1608: $\R$ & The set of real numbers \\
1609: $\Rp$ & The set of positive real numbers \\
1610: $s_0$ & Upper bound on $s^*$ \\
1611: $s^*$ & $\ln a$ for $a$ corresponding to optimal coding \\
1612: & for buffer overflow \\
1613: $\boldw(i)$ & Weight (for symbol) $i$ \\
1614: $\bigw$ & $\{\boldw(i)\}$, the set of weights \\
1615: $\boldw^{(r)}(i)$ & $\boldp(i)$ for $i \leq r$, $2\boldp(r+1)$ for $i=r+1$ \\
1616: $\bigw^{(r)}$ & $\{\boldp(0), \boldp(1), \ldots, \boldp(r), 2\boldp(r+1)\}$ \\
1617: $\X$ & Input alphabet (usually $\X_\infty = \{0, 1, \ldots \}$) \\
1618: $\alpha(a)$ & $1/(1+\lg a)$ (parameter for R\'{e}nyi entropy) \\
1619: $\theta$ & Geometric dist\textsuperscript{r} parameter ($\boldp_\theta(i) = (1-\theta)\theta^i$) \\
1620: $\lambda$ & Poisson dist\textsuperscript{r} parameter ($\boldp_\lambda(i)=\lambda^i e^{-\lambda}/i!$) \\
1621: $\Phi$ & Golden ratio, $\frac{1}{2}(1+\sqrt{5})$ \\
1622: \end{supertabular}
1623: 
1624: \ifx \cyr \undefined \let \cyr = \relax \fi
1625: \begin{thebibliography}{10}
1626: \providecommand{\url}[1]{#1}
1627: \csname url@rmstyle\endcsname
1628: \providecommand{\newblock}{\relax}
1629: \providecommand{\bibinfo}[2]{#2}
1630: \providecommand\BIBentrySTDinterwordspacing{\spaceskip=0pt\relax}
1631: \providecommand\BIBentryALTinterwordstretchfactor{4}
1632: \providecommand\BIBentryALTinterwordspacing{\spaceskip=\fontdimen2\font plus
1633: \BIBentryALTinterwordstretchfactor\fontdimen3\font minus
1634:   \fontdimen4\font\relax}
1635: \providecommand\BIBforeignlanguage[2]{{%
1636: \expandafter\ifx\csname l@#1\endcsname\relax
1637: \typeout{** WARNING: IEEEtran.bst: No hyphenation pattern has been}%
1638: \typeout{** loaded for the language `#1'. Using the pattern for}%
1639: \typeout{** the default language instead.}%
1640: \else
1641: \language=\csname l@#1\endcsname
1642: \fi
1643: #2}}
1644: 
1645: \bibitem{Huff}
1646: D.~A. Huffman, ``A method for the construction of minimum-redundancy codes,''
1647:   \emph{Proc. IRE}, vol.~40, no.~9, pp. 1098--1101, Sept. 1952.
1648: 
1649: \bibitem{YaQi}
1650: S.~Yang and P.~Qiu, ``Efficient integer coding for arbitrary probability
1651:   distributions,'' \emph{IEEE Trans. Inf. Theory}, vol. IT-52, no.~8, pp.
1652:   3764--3772, Aug. 2006.
1653: 
1654: \bibitem{Golo}
1655: S.~W. Golomb, ``Run-length encodings,'' \emph{IEEE Trans. Inf. Theory}, vol.
1656:   IT-12, no.~3, pp. 399--401, July 1966.
1657: 
1658: \bibitem{GaVV}
1659: R.~G. Gallager and D.~C. {van Voorhis}, ``Optimal source codes for
1660:   geometrically distributed integer alphabets,'' \emph{IEEE Trans. Inf.
1661:   Theory}, vol. IT-21, no.~2, pp. 228--230, Mar. 1975.
1662: 
1663: \bibitem{Abr01}
1664: J.~Abrahams, ``Code and parse trees for lossless source encoding,''
1665:   \emph{Communications in Information and Systems}, vol.~1, no.~2, pp.
1666:   113--146, Apr. 2001.
1667: 
1668: \bibitem{WSBL}
1669: T.~Wiegand, G.~J. Sullivan, G.~Bj{\o}ntegaard, and A.~Luthra, ``Overview of the
1670:   {H.264/AVC} video coding standard,'' \emph{IEEE Trans. Circuits and Systems
1671:   for Video Technology}, vol.~13, no.~7, pp. 560--576, July 2003.
1672: 
1673: \bibitem{WSS}
1674: M.~Weinberger, G.~Seroussi, and G.~Sapiro, ``The {LOCO-I} lossless image
1675:   compression algorithm: Principles and standardization into {JPEG-LS},''
1676:   \emph{IEEE Trans. Image Processing}, vol.~9, no.~8, pp. 1309--1324, Aug.
1677:   2000, originally as Hewlett-Packard Laboratories Technical Report No.
1678:   HPL-98-193R1, November 1998, revised October 1999. Available from
1679:   \url{http://www.hpl.hp.com/loco/}.
1680: 
1681: \bibitem{Camp}
1682: L.~L. Campbell, ``Definition of entropy by means of a coding problem,''
1683:   \emph{Z. Wahrscheinlichkeitstheorie und verwandte Gebiete}, vol.~6, pp.
1684:   113--118, 1966.
1685: 
1686: \bibitem{AcDa}
1687: J.~Acz{\'{e}}l and Z.~Dar{\'{o}}czy, \emph{On Measures of Information and Their
1688:   Characterizations}.\hskip 1em plus 0.5em minus 0.4em\relax New York, NY:
1689:   Academic, 1975.
1690: 
1691: \bibitem{Jeli}
1692: F.~Jelinek, ``Buffer overflow in variable length coding of fixed rate
1693:   sources,'' \emph{IEEE Trans. Inf. Theory}, vol. IT-14, no.~3, pp. 490--501,
1694:   May 1968.
1695: 
1696: \bibitem{Humb2}
1697: P.~A. Humblet, ``Generalization of {Huffman} coding to minimize the probability
1698:   of buffer overflow,'' \emph{IEEE Trans. Inf. Theory}, vol. IT-27, no.~2, pp.
1699:   230--232, Mar. 1981.
1700: 
1701: \bibitem{BlMc}
1702: A.~C. Blumer and R.~J. McEliece, ``The {R\'{e}nyi} redundancy of generalized
1703:   {Huffman} codes,'' \emph{IEEE Trans. Inf. Theory}, vol. IT-34, no.~5, pp.
1704:   1242--1249, Sept. 1988.
1705: 
1706: \bibitem{Reny}
1707: A.~R{\'{e}}nyi, \emph{A Diary on Information Theory}.\hskip 1em plus 0.5em
1708:   minus 0.4em\relax New York, NY: John Wiley {\&} Sons Inc., 1987, original
1709:   publication: {\it Napl\`{o} az inform\'{a}ci\'{o}elm\'{e}letr\H{o}l},
1710:   Gondolat, Budapest, Hungary, 1976.
1711: 
1712: \bibitem{MSN}
1713: P.~Mendenhall. (2002, Oct. 26) Cell phones were rebels' downfall. MSNBC News.
1714: 
1715: \bibitem{Tar}
1716: J.~Taranto. (2002, Oct. 28) {Best of the Web Today}. OpinionJournal, from {The
1717:   Wall Street Journal} Editorial Page. Available from
1718:   \url{http://www.opinionjournal.com/best/?id=110002538}.
1719: 
1720: \bibitem{Baer06}
1721: M.~B. Baer, ``Source coding for quasiarithmetic penalties,'' \emph{IEEE Trans.
1722:   Inf. Theory}, vol. IT-52, no.~10, pp. 4380--4393, Oct. 2006.
1723: 
1724: \bibitem{LTZ}
1725: T.~Linder, V.~Tarokh, and K.~Zeger, ``Existence of optimal prefix codes for
1726:   infinite source alphabets,'' \emph{IEEE Trans. Inf. Theory}, vol. IT-43,
1727:   no.~6, pp. 2026--2028, Nov. 1997.
1728: 
1729: \bibitem{HKT}
1730: T.~C. Hu, D.~J. Kleitman, and J.~K. Tamaki, ``Binary trees optimum under
1731:   various criteria,'' \emph{SIAM J. Appl. Math.}, vol.~37, no.~2, pp. 246--256,
1732:   Apr. 1979.
1733: 
1734: \bibitem{Park}
1735: D.~S. Parker, Jr., ``Conditions for optimality of the {Huffman} algorithm,''
1736:   \emph{SIAM J. Comput.}, vol.~9, no.~3, pp. 470--489, Aug. 1980.
1737: 
1738: \bibitem{Humb0}
1739: P.~A. Humblet, ``Source coding for communication concentrators,'' Ph.D.
1740:   dissertation, Massachusetts Institute of Technology, 1978.
1741: 
1742: \bibitem{Golu}
1743: M.~C. Golumbic, ``Combinatorial merging,'' \emph{IEEE Trans. Comput.}, vol.
1744:   C-25, no.~11, pp. 1164--1167, Nov. 1976.
1745: 
1746: \bibitem{Leeu}
1747: J.~{van Leeuwen}, ``On the construction of {Huffman} trees,'' in \emph{Proc.
1748:   3rd Int. Colloquium on Automata, Languages, and Programming}, July 1976, pp.
1749:   382--410.
1750: 
1751: \bibitem{Baer05}
1752: M.~B. Baer, ``A general framework for codes involving redundancy
1753:   minimization,'' \emph{IEEE Trans. Inf. Theory}, vol. IT-52, no.~1, pp.
1754:   344--349, Jan. 2006.
1755: 
1756: \bibitem{Shan}
1757: C.~E. Shannon, ``A mathematical theory of communication,'' \emph{Bell Syst.
1758:   Tech. J.}, vol.~27, pp. 379--423, July 1948.
1759: 
1760: \bibitem{Ren2}
1761: A.~R{\'{e}}nyi, ``On measures of entropy and information,'' in \emph{Proc. 4th
1762:   Berkeley Symposium on Mathematical Statistics and Probability}, vol.~1, 1961,
1763:   pp. 547--561.
1764: 
1765: \bibitem{Goli2}
1766: M.~J. Golin, ``A combinatorial approach to {Golomb} forests,''
1767:   \emph{Theoretical Computer Science}, vol. 263, no. 1--2, pp. 283--304, July
1768:   2001.
1769: 
1770: \bibitem{Abr1}
1771: J.~Abrahams, ``Huffman-type codes for infinite source distributions,''
1772:   \emph{Journal of the Franklin Institute}, vol. 331B, no.~3, pp. 265--271, May
1773:   1994.
1774: 
1775: \bibitem{MSW}
1776: N.~Merhav, G.~Seroussi, and M.~Weinberger, ``Optimal prefix codes for sources
1777:   with two-sided geometric distributions,'' \emph{IEEE Trans. Inf. Theory},
1778:   vol. IT-46, no.~2, pp. 121--135, Mar. 2000.
1779: 
1780: \bibitem{GoMa}
1781: M.~J. Golin and K.~K. Ma, ``Algorithms for constructing infinite {Huffman}
1782:   codes,'' Hong Kong University of Science {\&} Technology Theoretical Computer
1783:   Science Center, Tech. Rep. HKUST-TCSC-2004-07, Aug. 2004, available from
1784:   \url{http://www.cs.ust.hk/tcsc/RR/index_7.html}.
1785: 
1786: \bibitem{BCSV}
1787: F.~Bassino, J.~Cl\'{e}ment, G.~Seroussi, and A.~Viola, ``Optimal prefix codes
1788:   for two-dimensional geometric distributions,'' in \emph{Proc., IEEE Data
1789:   Compression Conf.}, Mar. 28--30, 2006, pp. 113--122.
1790: 
1791: \bibitem{Humb1}
1792: P.~A. Humblet, ``Optimal source coding for a class of integer alphabets,''
1793:   \emph{IEEE Trans. Inf. Theory}, vol. IT-24, no.~1, pp. 110--112, Jan. 1978.
1794: 
1795: \bibitem{KHN}
1796: A.~Kato, T.~S. Han, and H.~Nagaoka, ``{Huffman} coding with an infinite
1797:   alphabet,'' \emph{IEEE Trans. Inf. Theory}, vol. IT-42, no.~3, pp. 977--984,
1798:   May 1996.
1799: 
1800: \bibitem{Baer}
1801: M.~B. Baer, ``Coding for general penalties,'' Ph.D. dissertation, Stanford
1802:   University, 2003.
1803: 
1804: \bibitem{DrSz}
1805: M.~Drmota and W.~Szpankowski, ``Precise minimax redundancy and regret,''
1806:   \emph{IEEE Trans. Inf. Theory}, vol. IT-50, no.~11, pp. 2686--2707, Nov.
1807:   2004.
1808: 
1809: \bibitem{Gall}
1810: R.~G. Gallager, ``Variations on a theme by {Huffman},'' \emph{IEEE Trans. Inf.
1811:   Theory}, vol. IT-24, no.~6, pp. 668--674, Nov. 1978.
1812: 
1813: \bibitem{Knu3}
1814: D.~E. Knuth, \emph{The Art of Computer Programming, Vol. 3: Sorting and
1815:   Searching}, 2nd~ed.\hskip 1em plus 0.5em minus 0.4em\relax Reading, MA:
1816:   Addison-Wesley, 1998.
1817: 
1818: \bibitem{Szpa}
1819: W.~Szpankowski, ``Asymptotic redundancy of {Huffman} (and other) block codes,''
1820:   \emph{IEEE Trans. Inf. Theory}, vol. IT-46, no.~7, pp. 2434--2443, Nov. 2000.
1821: 
1822: \end{thebibliography}
1823: 
1824: \end{document}
1825: