cs0502032/full.tex
1: \documentclass[11pt,letterpaper]{article}
2: \usepackage{fullpage, comment}
3: \usepackage{amsmath, amsthm}
4: 
5: \newtheorem{theorem}{Theorem}
6: \newtheorem{lemma}[theorem]{Lemma}
7: 
8: % Avoid line breaks before citations (\cite) and references (\ref)
9: \let\latexcite=\cite
10: \def\cite{\nolinebreak\latexcite}
11: \let\latexref=\ref
12: \def\ref{\nolinebreak\latexref}
13: 
14: 
15: \let\epsilon=\varepsilon
16: 
17: \newenvironment{description*}%
18:   {\vspace{-2ex}
19:    \begin{description}%
20:     \setlength{\itemsep}{-1ex}%
21:     \setlength{\parsep}{0pt}}%
22:   {\end{description}}
23: 
24: \newenvironment{itemize*}%
25:   {\vspace{-2ex}
26:    \begin{itemize}%
27:     \setlength{\itemsep}{-1ex}%
28:     \setlength{\parsep}{0pt}}%
29:   {\end{itemize}
30:    \vspace{-1ex}}
31: 
32: \newcommand{\func}[1] {\texttt{#1}}
33: 
34: \begin{document}
35: 
36: \title{On Dynamic Range Reporting in One Dimension}
37: 
38: \author{
39:         Christian Worm Mortensen%
40: \footnote{Part of this work was done while
41:   the author was visiting the Max-Planck-Institut f\"ur Informatik,
42:   Saarbr\"ucken, as a Marie Curie doctoral fellow.} \\
43: \small  IT U. Copenhagen \\
44:         \texttt{cworm@itu.dk}
45: \and 
46:         Rasmus Pagh \\
47: \small  IT U. Copenhagen \\
48:         \texttt{pagh@itu.dk}
49: \and
50:         Mihai P\v{a}tra\c{s}cu \\
51: \small  MIT \\
52:         \texttt{mip@mit.edu}
53: }
54: 
55: \maketitle
56: 
57: 
58: \begin{abstract}
59:   We consider the problem of maintaining a dynamic set of integers and
60:   answering queries of the form: report a point (equivalently, all
61:   points) in a given interval. Range searching is a natural and
62:   fundamental variant of integer search, and can be solved using
63:   predecessor search. However, for a RAM with $w$-bit words, we show
64:   how to perform updates in $O(\lg w)$ time and answer queries in
65:   $O(\lg\lg w)$ time. The update time is identical to the van Emde
66:   Boas structure, but the query time is exponentially faster. Existing
67:   lower bounds show that achieving our query time for predecessor
68:   search requires doubly-exponentially slower updates. We present some
69:   arguments supporting the conjecture that our solution is optimal.
70: 
71:   Our solution is based on a new and interesting recursion idea which
72:   is ``more extreme'' that the van Emde Boas recursion. Whereas van
73:   Emde Boas uses a simple recursion (repeated halving) on each path in
74:   a trie, we use a nontrivial, van Emde Boas-like recursion on every
75:   such path. Despite this, our algorithm is quite clean when seen from
76:   the right angle. To achieve linear space for our data structure, we
77:   solve a problem which is of independent interest. We develop the
78:   first scheme for dynamic perfect hashing requiring sublinear
79:   space. This gives a dynamic Bloomier filter (an approximate storage
80:   scheme for sparse vectors) which uses low space. We strengthen
81:   previous lower bounds to show that these results are optimal.
82: \end{abstract}
83: 
84: 
85: \section{Introduction}
86: 
87: Our problem is to maintain a set $S$ under insertions and deletions of
88: values, and a range reporting query. The query $\func{findany}(a,b)$
89: should return an arbitrary value in $S \cap [a,b]$, or report that $S
90: \cap [a,b] = \emptyset$. This is a form of existential range query.
91: In fact, since we only consider update times above the predecessor
92: bound, updates can maintain a linked list of the values in $S$ in
93: increasing order. Given a value $x \in S \cap [a,b]$, one can traverse
94: this list in both directions starting from $x$ and list all values in
95: the interval $[a,b]$ in constant time per value.  Thus, the
96: $\func{findany}$ query is equivalent to one-dimensional range
97: reporting.
98: 
99: The model in which we study this problem is the word RAM. We assume
100: the elements of $S$ are integers that fit in a word, and let $w$ be
101: the number of bits in a word (thus, the ``universe size'' is $u =
102: 2^w$). We let $n = |S|$. Our data structure will use Las Vegas
103: randomization (through hashing), and the bounds stated will hold with
104: high probability in $n$.
105: 
106: Range reporting is a very natural problem, and its higher-dimensional
107: versions have been studied for decades. In one dimension, the problem
108: is easily solved using predecessor search. The predecessor problem has
109: also been studied intensively, and the known bounds are now tight in
110: almost all cases \cite{beame02predecessor}. Another well-studied
111: problem related to ours is the lookup problem (usually solved by
112: hashing), which asks to find a key in a set of values. Our problem is
113: more general than the lookup problem, and less general than the
114: predecessor problem. While these two problems are often dubbed ``the
115: integer search problems'', we feel range reporting is an equally
116: natural and fundamental incarnation of this idea, and deserves similar
117: attention.
118: 
119: The first to ask whether or not range reporting is as hard as finding
120: predecessors were Miltersen et al in STOC'95
121: \cite{miltersen99asymmetric}. For the static case, they gave a data
122: structure with space $O(nw)$ and constant query time, which cannot be
123: achieved for the predecessor problem with polynomial space. An even
124: more surprising result from STOC'01 is due to Alstrup, Brodal and
125: Rauhe \cite{alstrup01range}, who gave an optimal solution for the
126: static case, achieving linear space and constant query time. In the
127: dynamic case, however, no solution better than the predecessor problem
128: was known. For this problem, the fastest known solution in terms of
129: $w$ is the classic van Emde Boas structure \cite{veb77predecessor},
130: which achieves $O(\lg w)$ time per operation.
131: 
132: For the range reporting problem, we show how to perform updates in
133: $O(\lg w)$ time, while supporting queries in $O(\lg\lg w)$ time. The
134: space usage is optimal, i.e. $O(n)$ words. The update time is
135: identical to the one given by the van Emde Boas structure, but the
136: query time is exponentially faster. In contrast, Beame and Fich
137: \cite[Theorem 3.7]{beame02predecessor} show that achieving any query
138: time that is $o(\lg w / \lg\lg w)$ for the predecessor problem
139: requires update time $\Omega(2^{w^{1 - \epsilon}})$, which is
140: doubly-exponentially slower than our update time. We also give an
141: interesting tradeoff between update and query times; see theorem
142: \ref{thm:range} below.
143: 
144: Our solution incorporates some basic ideas from the previous solutions
145: to static range reporting in one dimension
146: \cite{miltersen99asymmetric, alstrup01range}. However, it brings two
147: important technical contributions. First, we develop a new and
148: interesting recursion idea which is more advanced than van Emde Boas
149: recursion (but, nonetheless, not technically involved). We describe
150: this idea by first considering a simpler problem, the bit-probe
151: complexity of the greater-than function. Then, the solution for
152: dynamic range reporting is obtained by using the recursion for this
153: simpler problem, on \emph{every path} of a binary trie of depth
154: $w$. This should be contrasted to the van Emde Boas structure, which
155: uses a very simple recursion idea (repeated halving) on every
156: root-to-leaf path of the trie. The van Emde Boas recursion is
157: fundamental in the modern world of data structures, and has found many
158: unrelated applications (e.g.  exponential trees, integer sorting,
159: cache-oblivious layouts, interpolation search trees). It will be
160: interesting to see if our recursion scheme has a similar impact. 
161: 
162: The second important contribution of this paper is needed to achieve
163: linear space for our data structure. We develop a scheme for dynamic
164: perfect hashing, which requires sublinear space. This can be used to
165: store a sparse vector in small space, if we are only interested in
166: obtaining correct results when querying non-null positions (the
167: Bloomier filter problem). We also prove that our solution is
168: optimal. To our knowledge, this solves the last important theoretical
169: problem connected to Bloom filters. The stringent space requirements
170: that our data structure can meet are important in data-stream
171: algorithms and database systems. We mention one application below, but
172: believe others exist as well.
173: 
174: 
175: \subsection{Data-Stream Perfect Hashing and Bloomier Filters}
176: 
177: The Bloom filter is a classic data structure for testing membership in
178: a set. If a constant rate of false-positives is allowed, the space
179: \emph{in bits} can be made essentially linear in the size of the
180: set. Optimal bounds for this problem are obtained in
181: \cite{pagh05bloom}. Bloomier filters, an extension of the classical
182: Bloom filter with a catchy name, were defined and analyzed in the
183: static case by Chazelle et al \cite{chazelle04bloom}. The problem is
184: to represent a vector $V[0..u-1]$ with elements from $\{ 0, \dots, 2^r
185: - 1\}$ which is nonzero in only $n$ places (assume $n \ll u$, so the
186: vector is sparse). Thus, we have a sparse set as before, but with
187: values associated to the elements.  The information theoretic lower
188: bound for representing such a vector is $\Omega(n\cdot r + \lg
189: \binom{u}{n}) \approx \Omega(n (r + \lg u))$ bits. However, if we only
190: want correct answers when $V[x] \ne 0$, we can obtain a space usage of
191: roughly $O(nr)$ bits in the static case.
192: 
193: For the dynamic problem, where the values of $V$ can change
194: arbitrarily at any point, achieving such low space is impossible
195: regardless of the query and update times. Chazelle et
196: al.~\cite{chazelle04bloom} proved that $\Omega(n(r + \min(\lg\lg
197: \frac{u}{n^3}, \lg n)))$ bits are needed. No non-trivial upper bound
198: was known. We give matching lower and upper bounds:
199: 
200: \begin{theorem} \label{thm:bloomlb}
201: The randomized space complexity of maintaining a dynamic Bloomier
202: filter for $r\geq 2$ is $\Theta(n(r + \lg\lg \frac{u}{n}))$ bits in
203: expectation. The upper bound is achieved by a RAM data structure that
204: allows access to elements of the vector in worst-case constant time,
205: and supports updates in amortized expected $O(1)$ time.
206: \end{theorem}
207: 
208: To detect whether $V[x] = 0$ with probability of correctness at least
209: $1-\epsilon$, one can use a Bloom filter on top. This requires space
210: $\Theta(n\lg( 1/\epsilon ))$, and also works in the dynamic case
211: \cite{pagh05bloom}. Note that even for $\epsilon = 1$, randomization
212: is essential, since any deterministic solution must use $\Omega(n
213: \lg(u/n))$ bits of space, i.e.~it must essentially store the set of
214: nonzero entries in the vector.
215: 
216: With marginally more space, $O(n(r + \lg\lg u))$, we can make the
217: space and update bounds hold with high probability. To do that, we
218: analyze a harder problem, namely maintaining a perfect hash function
219: dynamically using low space. The problem is to maintain a set $S$ of
220: keys from $\{0, \dots, u-1\}$ under insertions and deletions, and be
221: able to evaluate a perfect hash function (i.e. a one-to-one function)
222: from $S$ to a small range. An element needs to maintain the same hash
223: value while it is in $S$. However, if an element is deleted and
224: subsequently reinserted, its hash value may change.
225: 
226: \begin{theorem} \label{thm:hash}
227: We can maintain a perfect hash function from a set $S \subset \{ 0,
228: \dots, u-1 \}$ with $|S| \leq n$ to a range of size $n + o(n)$, under
229: $n^{O(1)}$ insertions and deletions, using $O(n\lg\lg u)$ bits of
230: space w.h.p., plus a constant number of machine words. The function
231: can be evaluated in worst-case constant time, and updates take
232: constant time w.h.p.
233: \end{theorem}
234: 
235: This is the first dynamic perfect hash function that uses less space
236: than needed to store $S$ ($\lg \binom{u}{n}$ bits). Our space usage
237: is close to optimal, since the problem is harder than dynamic Bloomier
238: filtering. These operating conditions are typical of data-stream
239: computation, where one needs to support a stream of updates and
240: queries, but does not have space to hold the entire state of the data
241: structure. Quite remarkably, our solution can achieve this goal
242: without introducing errors (we use only Las Vegas randomization).
243: 
244: We mention an independent application of Theorem \ref{thm:hash}.
245: In a database we can maintain an index of a relation under insertions
246: of tuples, using internal memory per tuple which is logarithmic in the
247: length of the key for the tuple. If tuples have fixed length, they can
248: be placed directly in the hash table, and need only be moved if the
249: capacity of the hash table is exceeded.
250: 
251: 
252: \subsection{Tradeoffs and the scheme of things} \label{scheme}
253: 
254: We begin with a discussion of the greater-than problem. Consider an
255: infinite memory of bits, initialized to zero. Our problem has two
256: stages. In the update stage, the algorithm is given a number $a \in
257: [0..n-1]$. After seeing $a$, the algorithm is allowed to flip $O(T_u)$
258: bits in the memory. In the query stage, the algorithm is given a
259: number $b \in [0..n-1]$. Now the algorithm may inspect $O(T_q)$ bits,
260: and must decide whether or not $b > a$. The problem was previously
261: studied by Fredman \cite{fredman82sums}, who showed that $\max(T_u,
262: T_q) = \Omega(\lg n / \lg\lg n)$. It is quite tempting to believe that
263: one cannot improve past the trivial upper bound $T_u = T_q = O(\lg
264: n)$, since, in some sense, this is the complexity of ``writing down''
265: $a$. However, as we show in this paper, Fredman's bound is optimal, in
266: the sense that it is a point on our tradeoff curve. We give upper and
267: lower bounds that completely characterize the possible asymptotic
268: tradeoffs:
269: 
270: \begin{theorem} \label{thm:bitgt}
271:   The bit-probe complexity of the greater-than function satisfies the
272:   tight tradeoffs:
273:   
274:   \vspace{-4ex}
275:   \begin{eqnarray*}
276:    T_q \geq \lg\lg n,\ T_u \leq \lg n &:& T_u = \Theta(\lg_{T_q} n) \\
277:    T_q \leq \lg\lg n,\ T_u \geq \lg n &:& 2^{T_q} = \Theta(\lg_{T_u} n) \\
278:   \end{eqnarray*}
279: \end{theorem}
280: \vspace{-3ex}
281: 
282: As mentioned already, we use the same recursion idea as in the
283: previous algorithm for dynamic range reporting, except that we apply
284: this recursion to every root-to-leaf path of a binary trie of depth
285: $w$. Quite remarkably, these structures can be made to overlap
286: in-as-much as the paths overlap, so only one update suffices for all
287: paths going through a node. Due to this close relation, we view the
288: lower bounds for the greater-than function as giving an indication
289: that our range reporting data structure is likewise optimal. In any
290: case, the lower bounds show that markedly different ideas would be
291: necessary to improve our solution for range reporting.
292: 
293: Let $T_{pred}$ be the time needed by one update and one query in the
294: dynamic predecessor problem. The following theorem summarizes our
295: results for dynamic range reporting:
296: 
297: \begin{theorem} \label{thm:range}
298:   There is a data structure for the dynamic range reporting problem,
299:   which uses $O(n)$ space and supports updates in time $O(T_u)$, and
300:   queries in time $O(T_q)$, $(\forall) T_u, T_q$ satisfying:
301: 
302:   \vspace{-3ex}
303:   \begin{eqnarray*}
304:     T_q \geq \lg\lg w,\ \frac{\lg w}{\lg\lg w} \leq T_u \leq \lg w
305:       &:& T_u = O(\lg_{T_q} w) + T_{pred} \\
306:     T_q \leq \lg\lg w,\phantom{\ \ \frac{\lg w}{\lg\lg w} \leq} 
307:       T_u \geq \lg w &:& 2^{T_q} = O(\lg_{T_u} w) \\
308:   \end{eqnarray*}
309: \end{theorem}
310: \vspace{-3ex}
311: 
312: Notice that the most appealing point of the tradeoff is the cross-over
313: of the two curves: $T_u = O(\lg w)$ and $T_q = O(\lg\lg w)$ (and
314: indeed, this has been the focus of our discussion). Another
315: interesting point is at constant query time. In this case, our data
316: structure needs $O(w^{\epsilon})$ update time. Thus, our data
317: structure can be used as an optimal static data structure, which is
318: constructed in time $O(n w^{\epsilon})$, improving on the construction
319: time of $O(n \sqrt{w})$ given by Alstrup et al \cite{alstrup01range}.
320: 
321: The first branch of our tradeoff is not interesting with $T_{pred} =
322: \Theta(\lg w)$. However, it is generally believed that one can achieve
323: $T_{pred} = \Theta( \lg w / \lg\lg w)$, matching the optimal bound for
324: the static case. If this is true, the $T_{pred}$ term can be ignored.
325: In this case, we can remark a very interesting relation between our
326: problem and the predecessor problem. When $T_u = T_q$, the bounds we
327: achieve are identical to the ones for the predecessor problem, i.e.
328: $T_u = T_q = O(\lg w / \lg\lg w)$. However, if we are interested in
329: the possible tradeoffs, the gap between range reporting and the
330: predecessor problem quickly becomes huge. The same situation appears
331: to be true for deterministic dictionaries with linear space, though
332: the known tradeoffs are not as general as ours. We set forth the bold
333: conjecture (the proof of which requires many missing pieces) that all
334: three search problems are united by an optimal time of $\Theta(\lg w /
335: \lg\lg w)$ in this point of their tradeoff curves.
336: 
337: We can achieve bounds in terms of $n$, rather than $w$, by the classic
338: trick of using our structure for small $w$ and a fusion tree structure
339: \cite{fredman93fusion} for large $w$. In particular, we can achieve
340: $T_q = O(\lg\lg n)$ and $T_u = O\left( \frac{\lg n}{\lg\lg n}
341: \right)$. Compared with the optimal bound for the predecessor problem
342: of $\Theta\left( \sqrt{\frac{\lg n}{\lg\lg n}} \right)$, our data
343: structure improves the query time exponentially by sacrificing the
344: update time quadratically.
345: 
346: \begin{comment}
347:   Specifically, we use \cite{exptrees}, which gives a
348:   bound of $O(\lg_w n + \lg\lg n)$. We obtain the interesting tradeoff
349:   $T_u \cdot T_q \lg T_q = O(\lg n)$ for $\Omega(\lg\lg n) < T_q <
350:   O\left( \sqrt{\frac{\lg n}{\lg\lg n}} \right)$.
351: \end{comment}
352: 
353: 
354: \section{Data-Stream Perfect Hashing}
355: 
356: We denote by $S$ be the set of values that we need to hash at present
357: time. Our data structure has the following parts:
358: 
359: \begin{itemize}
360: \item A hash function $\rho: \{0,\dots,u-1 \} \rightarrow
361:   \{0,1\}^{v}$, where $v = O(\lg n)$, from a family of universal hash
362:   functions with small representations (for example, the one
363:   from \cite{dietzfel96universal}).
364: 
365: \item A hash function $\phi: \{0,1\}^{v} \rightarrow \{1,\dots,r\}$,
366:   where $r=\lceil n/\lg^2 n \rceil$, taken from Siegel's class of
367:   highly independent hash functions \cite{siegel04hash}. 
368:   % note: used to be thm 3 in \cite{ncstrl.nyu_cs//TR1995-684}
369:   
370: \item An array of hash functions $h_1,\dots,h_r: \{0,1\}^v \rightarrow
371:   \{0,1\}^s$, where $s=\lceil (6+2c)\lg\lg u \rceil$, chosen
372:   independently from a family of universal hash functions; $c$ is a
373:   constant specified below.
374:   
375: \item A high performance dictionary \cite{dietzfel90highperf} for a
376:   subset $S'$ of the keys in $S$. The dictionary should have a
377:   capacity of $O(\lceil n/\lg u \rceil)$ keys (but might expand
378:   further). Along with the dictionary we store a linked list of length
379:   $O(\lceil n/\lg u \rceil)$, specifying certain vacant positions in
380:   the hash table.
381:   
382: \item An array of dictionaries $D_1,\dots,D_r$, where $D_i$ is a
383:   dictionary that holds $h_i(\rho(k))$ for each key $k\in S \setminus
384:   S'$ with $\phi(\rho(k))=i$. A unique value in $\{0,\dots,j-1\}$,
385:   where $j=(1+o(1))\lg^2 n$, is associated with each key in $D_i$. A
386:   bit vector of $j$ bits and an additional string of $\lg n$ bits is
387:   used to keep track of which associated values are in use. We will
388:   return to the exact choice of $j$ and the implementation of the
389:   dictionaries.
390: \end{itemize}
391: 
392: The main idea is that all dictionaries in the construction assign to
393: each of their keys a unique value within a subinterval of $[1 .. m]$.
394: Each of the dictionaries $D_1, \dots, D_r$ is responsible for an
395: interval of size $j$, and the high performance dictionary is
396: responsible for an interval of size $O(n/\lg u) = o(n)$.
397: 
398: The hash function $\rho$ is used to reduce the key length to $v$. The
399: constant in $v = O(\lg n)$ can be chosen such that with high
400: probability, over a polynomially bounded sequence of updates, $\rho$
401: will never map two elements of $S$ to the same value (the conflicts,
402: if they occur, end up in $S'$ and are handled by the high performance
403: dictionary).
404: 
405: When inserting a new value $k$, the new key is included in $S'$ if
406: either:
407: 
408: \begin{itemize}
409: \item There are $j$ keys in $D_i$, where $i=\phi(\rho(k))$, or
410:   
411: \item There exists a key $k'\in S$ where
412:   $\phi(\rho(k))=\phi(\rho(k'))=i$ and $h_i(\rho(k))=h_i(\rho(k'))$.
413: \end{itemize}
414: 
415: Otherwise $k$ is associated with the key $h_i(\rho(k))$ in $D_i$.
416: Deletion of a key $k$ is done in $S'$ if $k\in S'$, and otherwise the
417: associated key in the appropriate $D_i$ is deleted.
418: 
419: To evaluate the perfect hash function on a key $k$ we first see
420: whether $k$ is in the high performance dictionary. If so, we return
421: the value associated with $k$. Otherwise we compute $i=\phi(\rho(k))$
422: and look up the value $\Delta$ associated with the key $h_i(\rho(k))$
423: in $D_i$. Then we return $(i-1)j+\Delta$, i.e., position $\Delta$
424: within the $i$-th interval.
425: 
426: Since $D_1,\dots,D_r$ store keys and associated values of $O(\lg\lg
427: u)$ bits, they can be efficiently implemented as constant depth search
428: trees of degree $w^{\Omega(1)}$, where each internal node resides in a
429: single machine word. This yields constant time for dictionary
430: insertions and lookups, with an optimal space usage of $O(\lg^2
431: n\lg\lg u)$ bits for each dictionary.  We do not go into details of
432: the implementation as they are standard; refer to \cite{hagerup98ram}
433: for explanation of the required word-level parallelism techniques.
434: 
435: What remains to describe is how the dictionaries keep track of vacant
436: positions in the hash table in constant time per insertion and
437: deletion. The high performance dictionary simply keeps a linked list
438: of all vacant positions in its interval. Each of $D_1,\dots,D_r$
439: maintain a bit vector indicating vacant positions, and additional
440: $O(\lg n)$ summary bits, each taking the or of an interval of size
441: $O(\lg n)$. This can be maintained in constant time per operation,
442: employing standard techniques.
443: 
444: Only $o(n)$ preprocessing is necessary for the data structure
445: (essentially to build tables needed for the word-level parallelism).
446: The major part of the data structure is initialized lazily.
447: 
448: 
449: \subsection{Analysis}
450: 
451: Since evaluation of all involved hash functions and lookup in the
452: dictionaries takes constant time, evaluation of the perfect hash
453: function is done in constant time. As we will see below, the high
454: performance dictionary is empty with high probability unless $n/\lg u
455: > \sqrt{n}$. This means that it always uses constant time per
456: update with high probability in $n$. All other operations done for
457: update are easily seen to require constant time w.h.p.
458: 
459: We now consider the space usage of our scheme. The function $\rho$ can
460: be represented in $O(w)$ bits. Siegel's highly independent hash
461: function uses $o(n)$ bits of space. The hash functions $h_1,\dots,h_r$
462: use $O(\lg n + \lg\lg u)$ bits each, and $o(n\lg\lg u)$ bits in
463: total.  The main space bottleneck is the space for $D_1,\dots,D_r$,
464: which sums to $O(n\lg\lg u)$.
465: 
466: Finally, we show that the space used by the high performance
467: dictionary is $O(n)$ bits w.h.p. This is done by showing that each of
468: the following hold with high probability throughout a polynomial
469: sequence of operations:
470: 
471: \begin{itemize*}
472: \item[1.] The function $\rho$ is one-to-one on $S$.
473:   
474: \item[2.] There is no $i$ such that $S_i = \{ k \in S \mid
475:   \phi(\rho(k))=i \}$ has more than $j$ elements.
476: 
477: \item[3.] The set $S'$ has $O(\lceil n/\lg u \rceil)$ elements.
478: \end{itemize*}
479: 
480: That 1.~holds with high probability is well known. To show 2.~we use
481: the fact that, with high probability, Siegel's hash function is
482: independent on every set of $n^{\Omega(1)}$ keys. We may thus employ
483: Chernoff bounds for random variables with limited independence to
484: bound the probability that any $i$ has $|S_i| > j$, conditioned on the
485: fact that 1.~holds. Specifically, we can use \cite[Theorem
486: 5.I.b]{schmidt95chernoff} to argue that for any $l$, the probability
487: that $|S_{i}| > j$ for $j = \lceil \lg^2 n + \lg^{5/3} n \rceil$ is
488: $n^{-\omega(1)}$, which is negligible. On the assumption that 1.~and
489: 2.~hold, we finally consider~3. We note that every key $k'\in S'$ is
490: involved in an $h_i$-collision in $S_i$ for $i=\phi(\rho(k'))$,
491: i.e.~there exists $k''\in S_i \setminus \{k'\}$ where
492: $h_i(k')=h_i(k'')$. By universality, for any $i$ the expected number
493: of $h_i$-collisions in $S_i$ is $O(\lg^4 n / (\lg u)^{6+2c}) = O((\lg
494: u)^{-(2+2c)})$.  Thus the probability of one or more collisions is
495: $O((\lg u)^{-(2+2c)})$.  For $\lg u \geq \sqrt{n}$ this means that
496: there are no keys in $S'$ with high probability. Specifically, $c$ may
497: be chosen as the sum of the constants in the exponents of the length
498: of the operation sequence and the desired high probability bound. For
499: the case $\lg u < \sqrt{n}$ we note that the expected number of
500: elements in $S'$ is certainly $O(n/\lg u)$. To see than this also
501: holds with high probability, note that the event that one or more keys
502: from $S_i$ end up in $S'$ is independent among the $i$'s. Thus we can
503: use Chernoff bounds to get that the deviation from the expectation is
504: small with high probability.
505: 
506: 
507: \section{Lower Bound for Bloomier Filters}
508: 
509: For the purpose of the lower bound, we consider the following two-set
510: distinction problem, following \cite{chazelle04bloom}. The problem has
511: the following stages:
512: 
513: \begin{enumerate}
514: \item[0.] a random string $R$ is drawn, which will be available to the
515:   data structure throughout its operation. This is equivalent to
516:   drawing a deterministic algorithm from a given distribution, and is
517:   more general than assuming each stage has its own random coins (we
518:   are giving the data structure free storage for its random bits).
519: 
520: \item the data structure is given $A \subset [u], |A| \le n$. It must
521:   produce a representation $f_R(A)$, which for any $A$ has size at
522:   most $S$ bits, in expectation over all choices of $R$. Here $S$ is a
523:   function of $n$ and $u$, which is the target of our lower bound.
524: 
525: \item the data structure is given $B \subset [u]$, such that $|B| \le
526:   n, A \cap B = \emptyset$. Based on the old state $f_R(A)$, it must
527:   produce $g_R(B, f_R(A))$ with expected size at most $S$ bits.
528: 
529: \item the data structure is given $x \in [u]$ and its previously
530:   generated state, i.e.~$f_R(A)$ and $g_R(B, f_R(A))$. Now it must
531:   answer as follows with no error allowed: if $x \in A$, it must
532:   answer ``A''; if $x \in B$, it must answer ``B''; if $x \notin A
533:   \cup B$, it can answer either ``A'' or ``B''. Let $h_R(x,f,g)$
534:   be the answer computed by the data structure, when the previous
535:   state is $f$ and $g$.
536: 
537: \end{enumerate}
538: 
539: It is easy to see that a solution for dynamic Bloomier filters
540: supporting ternary associated data, using expected space $o(n\lg\lg
541: \frac{u}{n})$, yields a solution to the two-set distinction problem
542: with $S = o(n\lg\lg \frac{u}{n})$. We will prove such a solution does
543: not exist.
544: 
545: Since a solution to the distinction problem is not allowed to make an
546: error we can assume w.l.o.g.~that step 3 is implemented as follows. If
547: there exist appropriate $A, B \subset [u]$, with $x \in A$ such that
548: $f_R(A) = f_0$ and $g_R(B, f_0) = g_0$, then $h_R(x, f_0, g_0)$ must
549: be ``A''. Similarly, if there exists a plausible scenario with $x \in
550: B$, the answer must be ``B''. Otherwise, the answer can be arbitrary.
551: 
552: Assume that the inputs $A \times B$ are drawn from a given
553: distribution. We argue that if the expected sizes of $f$ and $g$ are
554: allowed to be at most $2S$, the data structure need not be
555: randomized. This uses a bicriteria minimax principle. We have
556: $E_{R,A,B}\left[ \frac{|f|}{S} + \frac{|g|}{S} \right] \leq 2$, where
557: $|f|, |g|$ denote the length of the representations. Then, there
558: exists a random string $R_0$ such that $E_{A,B} \left[ \frac{|f|}{S} +
559: \frac{|g|}{S} \right] \leq 2$. Since $|f|, |g| \geq 0$, this implies
560: $E_{A,B}[|f|] \leq 2S, E_{A,B}[|g|] \leq S$. The data structure can
561: simply use the deterministic sequence $R_0$ as its random bits; we
562: drop the subscript from $f_R, g_R$ when talking about this
563: deterministic data structure.
564: 
565: 
566: \subsection{Lower Bound for Two-Set Distinction}
567: 
568: Assume $u = \omega(n)$, since a lower bound of $\Omega(n)$ is trivial
569: for universe $u \ge 2n$. Break the universe into $n$ equal parts $U_1,
570: \dots, U_n$; w.l.o.g.~assume $n$ divides $u$, so $|U_i| =
571: \frac{n}{u}$. The hard input distribution chooses $A$ uniformly at
572: random from $U_1 \times \dots \times U_n$. We write $A = \{ a_1,
573: \dots, a_n \}$, where $a_i$ is a random variable drawn from
574: $U_i$. Then, $B'$ is chosen uniformly at random from the same product
575: space; again $B' = \{b_1, \dots, b_n\}, b_i \gets U_i$. We let $B = B'
576: \setminus A$. Note that $E[|B|] = n \cdot \Pr[A_1 \ne B_1] = (1 -
577: \frac{n}{u}) \cdot n = (1 - o(1)) \cdot n$.
578: 
579: Let $A_i^p$ be the plausible values of $A_i$ after we see $f(A)$; that
580: is, $A_i^p$ contains all $a \in U_i$ for which there exists a valid
581: $A'$ with $a \in A'$ and $f(A') = f(A)$. Intuitively speaking, if
582: $f(A)$ has expected size $o(n \lg\lg \frac{u}{n})$, it contains on
583: average $o(\lg\lg \frac{u}{n})$ bits of information about each
584: $a_i$. This is much smaller than the range of $a_i$, which is
585: $\frac{u}{n}$, so we would expect that the average $|A_i^p|$ is quite
586: large, around $\frac{u}{n} / (\lg \frac{u}{n})^{o(1)}$. This intuition
587: is formalized in the following lemma:
588: 
589: \begin{lemma}
590: With probability at least a half over a uniform choice of $A$ and $i$,
591: we have $|A_i^p| \geq \frac{u/n}{2^{O(S/n)}}$.
592: \end{lemma}
593: 
594: \begin{proof}
595: The Kolmogorov complexity of $A$ is $n\lg \frac{u}{n} - O(1)$; no
596: encoding for $A$ can have an expected size less than this quantity.
597: We propose an encoding for $A$ consisting of two parts: first, we
598: include $f(A)$; second, for each $i$ we include the index of $a_i$ in
599: $|A_i^p|$, using $\lceil \lg|A_i^p| \rceil$ bits. This is easily
600: decodable. We first generate all possible $A'$ with $f(A') = f(A)$,
601: and thus obtain the sets $A_i^p$. Then, we extract from each plausible
602: set the element with the given index. The expected size of the
603: encoding is $2S + \sum_i E_{A}[\lg |A_i^p|] + O(n)$, which must be
604: $\ge n\lg \frac{u}{n} - O(1)$. This implies $\lg \frac{u}{n} -
605: E_{i,A}[\lg |A_i^p|] \le \frac{2S}{n} + O(1)$. By Markov's inequality,
606: with probability at least a half over $i$ and $A$, $\lg \frac{u}{n} -
607: \lg |A_i^p| \le \frac{4S}{n} + O(1)$, so $\lg |A_i^p| \ge \lg
608: \frac{u}{n} - O(\frac{S}{n})$.
609: \end{proof}
610: 
611: We now make a crucial observation which justifies our interest in
612: $A_i^p$. Assume that $b_i \in A_i^p$. In this case, the data structure
613: must be able to determine $b_i$ from $f(A)$ and $g(B,f(A))$. Indeed,
614: suppose we compute $h(x,f,g)$ for all $x \in |A_i^p|$. If that data
615: strucuture does not answer ``B'' when $x = b_i$, it is obviously
616: incorrent. On the other hand, if it answers ``B'' for both $x = b_i$
617: and some other $x' \in A_i^p$, it also makes an error. Since $x'$ is
618: plausible, there exist $A'$ with $x' \in A'$ such that $f(A') =
619: f(A)$. Then, we can run the data structure with $A'$ as the first set
620: and $B$ as the second set. Since $f(A') = f(A)$, the data structure
621: will behave exactly the same, and will incorrectly answer ``B'' for
622: $x'$.
623: 
624: To draw our conclusion, we consider another encoding argument, this
625: time in connection to the set $B'$. The Kolmogorov complexity of $B'$
626: is $n \lg \frac{u}{n} - O(1)$. Consider a randomized encoding,
627: depending on a set $A$ drawn at random. First, we encode an $n$-bit
628: vector specifying which indices $i$ have $a_i = b_i$. It remains to
629: encode $B' \setminus A = B$. We encode another $n$-bit vector,
630: specifying for which positions $i$ we have $b_i \in A_i^p$. For each
631: $b_i \notin A_i^p$, we simply encode $B_i$ using $\lceil \lg
632: \frac{u}{n} \rceil$ bits. Finally, we include in the encoding $g(B,
633: f(A))$. As explained already, this is enough to recover all $b_i$
634: which are in $A_i^p$. Note that we do not need to encode $f(A)$, since
635: this depends only on our random coins, and the decoding algorithm can
636: reconstruct it.
637: 
638: The expected size of this encoding will be $O(n + S) + n\cdot
639: \Pr_{A,B',i} [b_i \notin A_i^p] \cdot \lg \frac{u}{n}$. We know that
640: with probability a half over $A$ and $i$, we have $|A_i^p| \geq
641: \frac{u/n}{2^{O(S/n)}}$. Thus, $\Pr_{A,B',i} [b_i \in A_i^p] \geq
642: \frac{1}{2} \cdot 2^{-O(S/n)}$. Thus, the expected size of the
643: encoding is at most $O(n + S) + (1 - 2^{-O(S/n)}) \cdot n \lg
644: \frac{u}{n}$. Note that by the minimax principle, randomness in the
645: encoding is unessential and we can always fix $A$ guaranteeing the
646: same encoding size, in expectation over $B$. We now get the bound:
647: 
648: \begin{eqnarray*}
649: & & O(n + S) + (1 - 2^{-O(S/n)}) \cdot n \lg \frac{u}{n} \geq n \lg
650: \frac{u}{n} - O(1) \\ 
651: & \Rightarrow & O\left( \frac{S}{n} \right) \geq 2^{-O(S/n)} \lg
652: \frac{u}{n} - O(1) \Rightarrow 2^{O(S/n)} O(S / n) \geq \lg
653: \frac{u}{n} \Rightarrow \frac{S}{n} = \Omega \left( \lg\lg \frac{u}{n}
654: \right)
655: \end{eqnarray*}
656: 
657: 
658: \section{A Space-Optimal Bloomier Filter}
659: 
660: It was shown in \cite{carter78bloom} that the approximate membership
661: problem (i.e., the problem solved by Bloom filters) can be solved
662: optimally using a reduction to the exact membership problem. The
663: reduction uses universal hashing.  In this section we extend this idea
664: to achieve optimal dynamic Bloomier filters.
665: 
666: Recall that Bloomier filters encode sparse vectors with entries from
667: $\{0,\dots,2^r - 1\}$.  Let $S\subseteq [u]$ be the set of at most $n$
668: indexes of nonzero entries in the vector $V$.  The data structure must
669: encode a vector $V'$ that agrees with $V$ on indexes in $S$, and such
670: that for any $x\not\in S$, $\Pr[V'[x]\neq 0]\leq \epsilon$, where
671: $\epsilon > 0$ is the error probability of the Bloomier
672: filter. Updates to $V$ are done using the following operations:
673: \begin{itemize}
674: \item {\sc Insert($x$, $a$)}. Set $V[x]:=a$, where $a\neq 0$.
675: \item {\sc Delete($x$)}. Set $V[x]:=0$.
676: \end{itemize}
677: 
678: The data structure assumes that only valid updates are performed,
679: i.e. that inserts are done only in situations where $V[x]=0$ and
680: deletions are done only when $V[x]\neq 0$.
681: 
682: \begin{theorem}\label{thm:filter}
683: Let positive integers $n$ and $r$, and $\epsilon > 0$ be given. On a
684: RAM with word length $w$ we can maintain a Bloomier filter $V'$ for a
685: vector $V$ of length $u=2^{O(w)}$ with at most $n$ nonzero entries
686: from $\{0,\dots,2^r - 1\}$, such that:
687: 
688: \begin{itemize}
689: \item {\sc Insert} and {\sc Delete} can be done in amortized
690:   expected constant time. The data structure assumes all updates are
691:   valid.
692: 
693: \item Computing $V'[x]$ on input $x$ takes worst case constant
694:   time. If $V[x]\neq 0$ the answer is always 'V[x]'. If $V[x]=0$ the
695:   answer is '0' with probability at least $1-\epsilon$.
696: 
697: \item The expected space usage is $O(n(\lg\lg(u/n) + \lg(1/\epsilon) +
698:   r))$ bits.
699: \end{itemize}
700: \end{theorem}
701: 
702: 
703: \subsection{The Data Structure}
704: 
705: Assume without loss of generality that $u\geq 2n$ and that
706: $\epsilon\geq u/n$.  Let $v=\max(n \log(u/n), n/\epsilon)$, and choose
707: $h: \{0,\dots,u-1\} \rightarrow \{0,\dots,v-1\}$ as a random function
708: from a universal class of hash functions. The data structure maintains
709: information about a minimal set $S'$ such that $h$ is 1-1 on $S
710: \setminus S'$. Specifically, it consists of two parts:
711: 
712: \begin{enumerate}
713: \item A dictionary for the set $S'$, with corresponding values of $V$
714:   as associated information.
715: 
716: \item A dictionary for the set $h(S\backslash S')$, where the element
717:   $h(x)$, $x\in S\backslash S'$, has $V[x]$ as associated information.
718: \end{enumerate}
719: 
720: Both dictionaries should succinct, i.e., use space close to the
721: information theoretic lower bound.  Raman and Rao
722: \cite{raman03succinct} have described such a dictionary using space
723: that is $1+o(1)$ times the minimum possible, while supporting lookups
724: in $O(1)$ time and updates in expected amortized $O(1)$ time.
725: 
726: To compute $V'[x]$ we first check whether $x\in S'$, in which case
727: $V'[x]$ is stored in the first dictionary. If this is not the case, we
728: check whether $h(x)\in h(S\backslash S')$.  If this is the case we
729: return the information associated with $h(x)$ in the second
730: dictionary.  Otherwise, we return '0'.
731: 
732: {\sc Insert($x$, $a$)}. First determine whether $h(x)\in h(S\backslash
733: S')$, in which case we add $x$ to the set $S'$, inserting $x$ in the
734: first dictionary.  Otherwise we add $h(x)$ to the second
735: dictionary. In both cases, we associate $a$ with the inserted element.
736: 
737: {\sc Delete($x$)} proceeds by deleting $x$ from the first dictionary
738: if $x\in S'$, and otherwise deleting $h(x)$ from the second
739: dictionary.
740: 
741: 
742: \subsection{Analysis}
743: 
744: It is easy to see that the data structure always return correct
745: function values on elements in $S$, given that all updates are
746: valid. When computing $V'[x]$ for $x\not\in S$ we get a nonzero result
747: if and only if there exists $x'\in S$ such that $h(x)=h(x')$. Since
748: $h$ was chosen from a universal family, this happens with probability
749: at most $n/v \leq \epsilon$.
750: 
751: It remains to analyze the space usage. Using once again that $h$ was
752: chosen from a universal family, the expected size of $S'$ is
753: $O(n/\log(u/n))$. This implies that the expected number of bits
754: necessary to store the set $S'$ is $\log\binom{u}{O(n/\log(u/n))} =
755: O(n)$, using convexity of the function $x\mapsto \binom{u}{x}$ in the
756: interval $0\dots u/2$. In particular, the first dictionary achieves an
757: expected space usage of $O(n)$ bits.  The information theoretical
758: minimum space for the set $h(S\backslash S')$ is bounded by
759: $\log\binom{r}{n} = O(n \log(r/n)) = O(n \log\log(u/n) +
760: n\log(1/\epsilon))$ bits, matching the lower bound.  We disregarded is
761: the space for the universal hash function, which is $O(\log u)$ bits.
762: However, this can be reduced to $O(\log n + \log\log u)$ bits, which
763: is vanishing, by using slightly weaker universal functions and
764: doubling the size $r$ of the range. Specifically, $2$-universal
765: functions suffice; see \cite{pagh00dispers} for a construction. Using
766: such a family requires preprocessing time $(\log u)^{O(1)}$, expected.
767: 
768: 
769: \section{Upper Bounds for the Greater-Than Problem}
770: 
771: We start with a simple upper bound of $T_u = O(\lg n), T_q = O(\lg\lg
772: n)$. Our upper bound uses a trie structure. We consider a balanced
773: tree with branching factor 2, and with $n$ leaves. Every possible
774: value of the update parameter $a$ is represented by a root-to-leaf
775: path. In the update stage, we mark this root-to-leaf path, taking time
776: $O(\lg n)$. In the query stage, we want to find the point where $b$'s
777: path in the trie would diverge from $a$'s path. This uses binary
778: search on the $\lg n$ levels, as follows. To test if the paths diverge
779: on a level, we examine the node on that level on $b$'s path.  If the
780: node is marked, the paths diverge below; otherwise they diverge
781: above. Once we have found the divergence point, we know that the
782: larger of $a$ and $b$ is the one following the right child of the
783: lowest common ancestor.
784: 
785: For the full tradeoff, we consider a balanced tree with branching
786: factor $B$. In the update stage, we need to mark a root-to-leaf path,
787: taking time $\lg_B n$. In the query stage, we first find the point
788: where $b$'s path in the trie would diverge from $a$'s path. This uses
789: binary search on the $\lg_B n$ levels, so it takes time $O(\lg\lg_B
790: n)$. Now we know the level where the paths of $a$ and $b$ diverge. The
791: nodes on that level from the paths of $a$ and $b$ must be siblings in
792: the tree. To test whether $b > a$, we must find the relative order of
793: the two sibling nodes. There are two strategies for this, giving the
794: two branches of the tradeoff curve. To achieve small update time, we
795: can do all work at query time. We simply test all siblings to the left
796: of $b$'s path on the level of divergence. If we find a marked one,
797: then $a$'s path goes to the left of $b$'s path, so $a < b$; otherwise
798: $a > b$. This stragegy gives $T_u = O(\lg_B n)$ and $T_q = O(\lg(\lg_B
799: n) + B)$, for any $B \geq 2$. For $T_q > \Omega(\lg\lg n)$, we have
800: $T_q = \Theta(B)$, so we have achieved the tradeoff $T_u = O(\lg_{T_q}
801: n)$.
802: 
803: The second strategy is to do all work at update time. For every node
804: on $a$'s path, we mark all left siblings of the node as such. Then to
805: determine if $b$'s path is to the left or to the right of $a$'s path,
806: we can simply query the node on $b$'s path just below the divergence
807: point, and see if it is marked as a left sibling. This strategy gives
808: $T_u = O(B \lg_B n)$ and $T_q = O(\lg(\lg_B n))$. For small enough $B$
809: (say $B = O(\lg n)$), this strategy gives $T_q = O(\lg\lg n)$
810: regardless of $B$ and $T_u$. For $B = \Omega(\lg n)$, we have $\lg B =
811: \Theta(\lg T_u)$. Therefore, we can express our tradeoff as: $2^{T_q}
812: = O(\lg_{T_u} n)$.
813: 
814: 
815: \section{Dynamic Range Reporting}
816: 
817: We begin with the case $T_u = O(\lg w), T_q = O(\lg\lg w)$. Let $S$ be
818: the current set of values stored by the data structure.  Without loss
819: of generality, assume $w$ is a power of two.  For an arbitrary $t \in
820: [0, \lg w]$, we define the trie of order $t$, denoted $T_t$, to be the
821: trie of depth $w / 2^t$ and alphabet of $2^t$ \emph{bits}, which
822: represents all numbers in $S$. We call $T_0$ the \emph{primary trie}
823: (this is the classic binary trie with elements from $S$). Observe that
824: we can assign distinct names of $O(w)$ bits to all nodes in all
825: tries. We call \emph{active paths} the paths in the tries which
826: correspond to elements of $S$. A node $v$ from $T_t$ corresponds to a
827: subtree of depth $2^t$ in the primary trie; we denote the root of this
828: subtree by $r_0(v)$. A node from $T_t$ corresponds to a 2-level
829: subtree in $T_{t-1}$; we call such a subtree a \emph{natural
830: subtree}. Alternatively, a 2-level subtree of any trie is natural iff
831: its root is at an even depth.
832: 
833: A root-to-leaf path in the primary trie is seen as the leaves of the
834: tree used for the greater-than problem. The paths from the primary
835: trie are broken into chunks of length $2^t$ in the trie of order
836: $t$. So $T_t$ is similar to the $t$-th level (counted bottom-up) of
837: the greater-than tree. Indeed, every node on the $t$-th level of that
838: tree held information about a subtree with $2^t$ leaves; here one edge
839: in $T_t$ summarizes a segment of length $2^t$ bits.  Also, a natural
840: subtree corresponds to two siblings in the greater-than structure. On
841: the next level, the two siblings are contracted into a node; in the
842: trie of higher order, a natural subtree is also contracted into a
843: node. It will be very useful for the reader to hold these parallels in
844: mind, and realize that the data structure from this section is
845: implementing the old recursion idea \emph{on every path}.
846: 
847: The root-to-leaf paths corresponding to the values in $S$ determine at
848: most $n-1$ branching nodes in any trie. By convention, we always
849: consider roots to be branching nodes. For every branching node from
850: $T_0$, we consider the extreme points of the interval spanned by the
851: node's subtree. By doubling the universe size, we can assume these are
852: never elements of $S$ (alternatively, such extreme points are formal
853: rationals like $x + \frac{1}{2}$). We define $\overline{S}$ to be the
854: union of $S$ and the two special values for each branching node in the
855: primary trie; observe that $|\overline{S}| = O(n)$. We are interested
856: in holding $\overline{S}$ for navigation purposes: it gives a way to
857: find in constant time the maximum and minimum element from $S$ that
858: fits under a branching node (because these two values should be the
859: elements from $S$ closest to the special values for the branching
860: node).
861: 
862: \smallskip
863: 
864: Our data structure has the following components:
865: 
866: \begin{itemize}
867: \item[1.] a linked list with all elements of $S$ in increasing order,
868:   and a predecessor structure for $S$.
869:   
870: \item[2.] a linked list with all elements of $\overline{S}$ in
871:   increasing order, accompanied by a navigation structure which
872:   enables us to find in constant time the largest value from $S$
873:   smaller than a given value from $\overline{S} \setminus S$.  We also
874:   hold a predecessor structure for $\overline{S}$.
875:   
876: \item[3.] every branching node from the primary trie holds pointers to
877:   its lowest branching ancestor, and the two branching descendants
878:   (the highest branching nodes from the left and right subtrees; we
879:   consider leaves associated with elements from $S$ as branching
880:   descendants). We also hold pointers to the two extreme values
881:   associated with the node in the list in item 2. Finally, we hold a
882:   hash table with these branching nodes.
883: 
884: \item[4.] for each $t$, and every node $v$ in $T_t$, which is either a
885:   branching node or a child of a branching node on an active path, we
886:   hold the depth of the lowest branching ancestor of $r_0(v)$, using a
887:   Bloomier filter.
888: \end{itemize}
889: 
890: We begin by showing that this data structure takes linear space. Items
891: 1-3 handle $O(n)$ elements, and have constant overhead per element.
892: We show below that the navigation structure from 2.~can be implemented
893: in linear space. The predecessor structure should also use linear
894: space; for van Emde Boas, this can be achieved through hashing
895: \cite{willard83predecessor}.
896: 
897: In item 4., there are $O(n)$ branching nodes per trie. In addition,
898: there are $O(n)$ children of branching nodes which are on active
899: paths. Thus, we consider $O(n\lg w)$ nodes in total, and hold $O(\lg
900: w)$ bits of information for each (a depth). Using our solution for the
901: Bloomier filter, this takes $O(n(\lg w)^2 + w)$ bits, which is $o(n)$
902: words. Note that storing the depth of the branching ancestor is just a
903: trick to reduce space. Once we have a node in $T_0$ and we know the
904: depth of its branching ancestor, we can calculate the ancestor in
905: $O(1)$ time (just ignore the bits below the depth of the ancestor). So
906: in essence these are ``compressed pointers'' to the ancestors.
907: 
908: We now sketch the navigation structure from item 2. Observe that the
909: longest run in the list of elements from $\overline{S} \setminus S$
910: can have length at most $2w$. Indeed, the leftmost and rightmost
911: extreme values for the branching nodes form a parenthesis structure;
912: the maximum depth is $w$, corresponding to the maximum depth in the
913: trie. Between an open and a closed parenthesis, there must be at least
914: one element from $S$, so the longest uninterrupted sequence of
915: parenthesis can be $w$ closed parenthesis and $w$ open parenthesis.
916: 
917: The implementation of the navigation structure uses classic ideas. We
918: bucket $\Theta(\sqrt{w})$ consecutive elements from the list, and then
919: we bucket $\Theta(\sqrt{w})$ buckets. Each bucket holds a summary
920: word, with a bit for each element indicating whether it is in $S$ or
921: not; second-order buckets hold bits saying whether first order buckets
922: contain at least one element from $S$ or not. There is also an array
923: with pointers to the elements or first order buckets. By shifting, we
924: can always insert another summary bit in constant time when something
925: is added. However, we cannot insert something in the array in constant
926: time; to fix that, we insert elements in the array on the next
927: available position, and hold the correct permutation packed in a word
928: (using $O(\sqrt{w} \lg w)$ bits). To find an element from $S$, we need
929: to walk $O(1)$ buckets. The time is $O(1)$ per traversed bucket, since
930: we can use the classic constant-time subroutine for finding the most
931: significant bit \cite{fredman93fusion}.
932: 
933: We also describe a useful subroutine, $\func{test-branching}(v)$,
934: which tests whether a node $v$ from some $T_t$ is a branching node. To
935: do that, we query the structure in item 4.~to find the lowest
936: branching ancestor of $r_0(v)$. This value is defined if $v$ is a
937: branching node, but the Bloomier filter may return an arbitrary result
938: otherwise. We look up the purported ancestor in the structure of item
939: 3. If the node is not a branching node, the value in the Bloom filter
940: for $v$ was bogus, so $v$ is not a branching node. Otherwise, we
941: inspect the two branching descendants of this node. If $v$ is a
942: branching node, one of these two descendants must be mapped to $v$ in
943: the trie of order $t$, which can be tested easily.
944: 
945: 
946: \subsection{Implementation of Updates}
947: 
948: We only discuss insertions; deletions follow parallel steps
949: uneventfully. We first insert the new element in $S$ and
950: $\overline{S}$ using the predecessor structures. Inserting a new
951: element creates exactly one branching node $v$ in the primary trie.
952: This node can be determined by examining the predecessor and successor
953: in $S$. Indeed, the lowest common ancestor in the primary trie can be
954: determined by taking an xor of the two values, finding the most
955: significant bit, and them masking everything below that bit from the
956: original values \cite{alstrup01range}.
957: 
958: We calculate the extreme values for the new branching node $v$, and
959: insert them in $\overline{S}$ using the predecessor structure. Finding
960: the branching ancestor of $v$ is equivalent to finding the enclosing
961: parentheses for the pair of parentheses which was just inserted. But
962: $\overline{S}$ has a special structure: a pair of parentheses always
963: encloses two subexpressions, which are either values from $S$, or a
964: parenthesized expression (i.e., the branching nodes from $T_0$ form a
965: binary tree structure). So one of the enclosing parentheses is either
966: immediately to the left, or immediately to the right of the new
967: pair. We can traverse a link from there to find the branching
968: ancestor. Once we have this ancestor, it is easy to update the local
969: structure of the branching nodes from item 3. Until now, the time is
970: dominated by the predecessor structure.
971: 
972: It remains to update the structure in item 4. For each $t > 0$, we can
973: either create a new branching node in $T_t$, or the branching node
974: existed already (this is possible for $t > 0$ because nodes have many
975: children). We first test whether the branching node existed or not
976: (using the $\func{test-branching}$ subroutine). If we need to
977: introduce a branching node, we simply add a new new entry in the
978: Bloomier filter with the depth of the branching ancestor of $v$. It
979: remains to consider active children of branching nodes, for which we
980: must store the depth of $v$. If we have just introduced a branching
981: node, it has exactly two active children (if there exist more than two
982: children on active paths, the node was a branching node before). These
983: children are determined by looking at the branching descendants of
984: $v$; these give the two active paths going into $v$. Both descendants
985: are mapped to active children of the new branching node from $T_t$. If
986: the branching node already existed, we must add one active child,
987: which is simply the child that the path to the newly inserted value
988: follows. Thus, to update item 4., we spend constant time per $T_t$. In
989: total, the running time of an update is $T_{pred} + O(\lg w) = O(\lg
990: w)$.
991: 
992: \subsection{Implementation of Queries}
993: 
994: Remember that a query receives an interval $[a,b]$ and must return a
995: value in $S \cap [a,b]$, if one exists. We begin by finding the node
996: $v$ which is the lowest common ancestor of $a$ and $b$ in the primary
997: trie; this takes constant time \cite{alstrup01range}. Note that $v$
998: spans an interval which includes $[a,b]$. The easiest case is when $v$
999: is a branching node; this can be recognized by a lookup in the hash
1000: table from item 3. If so, we find the two branching descendants of
1001: $v$; call the left one $v_L$ and the right one $v_R$. Then, if $S \cap
1002: [a,b] \ne \emptyset$, either the rightmost value from $S$ that fits
1003: under $v_L$ or the leftmost value from $S$ that in fits under $v_R$
1004: must be in the interval $[a,b]$. This is so because $[a,b]$ straddles
1005: the middle point of the interval spanned by $v$. The two values
1006: mentioned above are the two values from $S$ closest (on both sides) to
1007: this middle point, so if $[a,b]$ is non-empty, it must contain one of
1008: these two. To find these two values, we follow a pointer from $v_L$ to
1009: its left extreme point in $\overline{S}$. Then, we use the navigation
1010: structure from item 2., and find the predecessor from $S$ of this
1011: value in constant time. The rightmost value under $v_R$ is the next
1012: element from $S$. Altogether, the case when $v$ is a branching node
1013: takes constant time.
1014: 
1015: Now we must handle the case when $v$ is not a branching node. If $S
1016: \cap [a,b] \ne \emptyset$, it must be the case that $v$ is on an
1017: active path. Below we describe how to find the lowest branching
1018: ancestor of $v$, \emph{assuming that $v$ is on an active path}. If
1019: this assumption is violated, the value returned can be arbitrary. Once
1020: we have the branching ancestor of $v$, we find the branching
1021: descendant $w$ which is in $v$'s subtree. Now it is easy to see, by
1022: the same reasoning as above, that if $[a,b] \cap S \ne \emptyset$
1023: either the leftmost or the rightmost value from $S$ which is under $w$
1024: must be in $[a,b]$. These two values are found in constant time using
1025: the navigation structure from item 2., as described above. So if
1026: $[a,b] \cap S \ne \emptyset$, we can find an element inside $[a,b]$.
1027: If none of these two elements were in $[a,b]$ it must be the case that
1028: $[a,b]$ was empty, because the algorithm works correctly when $[a,b]
1029: \cap S \ne \emptyset$.
1030: 
1031: It remains to show how to find $v$'s branching ancestor, assuming $v$
1032: is on an active path, but is not a branching node. If for some $t >
1033: 0$, $v$ is mapped to a branching node in $T_t$, it will also be mapped
1034: to a branching node in tries of higher order. We are interested in the
1035: smallest $t$ for which this happens. We find this $t$ by binary
1036: search, taking time $O(\lg\lg w)$. For some proposed $t$, we check
1037: whether the node to which $v$ is mapped in $T_t$ is a branching node
1038: (using the $\func{test-branching}$ subroutine). If it is, we continue
1039: searching below; otherwise, we continue above.
1040: 
1041: Suppose we found the smallest $t$ for which $v$ is mapped to a
1042: branching node. In $T_{t-1}$, $v$ is mapped to some $z$ which is
1043: \emph{not} a branching node. Finding the lowest branching ancestor of
1044: $v$ is identical to finding the lowest branching ancestor of $r_0(z)$
1045: in the primary trie (since $z$ is a not a branching node, there is no
1046: branching node in the primary trie in the subtree corresponding to
1047: $z$). Since in $T_t$ $z$ gets mapped to a branching node, its natural
1048: subtree in $T_{t-1}$ must contain at least one branching node. We have
1049: two cases: either $z$ is the root or a leaf of the natural subtree
1050: (remember that a natural subtree has two levels). These can be
1051: distinguished based on the parity of $z$'s depth. If $z$ is a leaf,
1052: the root must be a branching node (because there is at least another
1053: active leaf). But then $z$ is an active child of a branching node, so
1054: item 4.~tells us the branching ancestor of $r_0(z)$. Now consider the
1055: case when $z$ is the root of the natural subtree. Then $z$ is above
1056: any branching node in its natural subtree, so to find the branching
1057: ancestor of $r_0(z)$ we can find the branching ancestor of the node
1058: from $T_t$ to which the natural subtree is mapped. But this is a
1059: branching node, so the structure in item 4.~gives the desired
1060: branching ancestor. To summarize, the only super-constant cost is the
1061: binary search for $t$, which takes $O(\lg\lg w)$ time.
1062: 
1063: 
1064: \section{Tradeoffs from Dynamic Range Reporting}
1065: 
1066: Fix a value $B \in [2,\sqrt{w}]$; varying $B$ will give our tradeoff
1067: curve.  For an arbitrary $t \in [0, \lg_B w]$, we define the trie of
1068: order $t$ to be the trie of depth $w / B^t$ and alphabet of $B^t$
1069: bits, which represents all numbers in $S$. We call the trie for $t =
1070: 0$ the primary trie. A node $v$ in a trie of order $t$ is represented
1071: by a subtree of depth $B^t$ in the primary trie; we say that the root
1072: of this subtree ``corresponds to'' the node $v$. A node from a trie of
1073: order $t$ is represented by a subtree of depth $B$ in the trie of
1074: order $t-1$; we call such a subtree a ``natural depth-$B$
1075: subtree''. Alternatively, a depth-$B$ subtree is natural if it starts
1076: at a depth divisible by $B$.
1077: 
1078: The root-to-leaf paths from the primary trie are boken into chunks of
1079: length $B^t$ in the trie of order $t$. A trie of order $t$ is similar
1080: to the $t$-th level (counted bottom-up) of the tree used for the
1081: greater-than problem, since a path in the primary trie is seen as the
1082: leaves of that tree. Indeed, every node on the $t$-th level of that
1083: tree held information about a subtree with $B^t$ leaves; here one edge
1084: in a trie of order $t$ summarizes a segment of length $B^t$ bits.
1085: Also, a natural depth-$B$ subtree corresponds to $B$ siblings in the
1086: old structure. On the next level, the $B$ siblings are contracted into
1087: a node; in the trie of higher order, a natural depth-$B$ subtree is
1088: also contracted into a node. 
1089: 
1090: Our data structure has the following new components:
1091: 
1092: \begin{itemize}
1093:   
1094: \item[5A.] choose this for the first branch of the tradeoff (faster
1095:   updates, slower queries): hold the same information as in 4.~for
1096:   each $t$, and every node $v$ in the trie of order $t$ which is not a
1097:   branching node, is on an active path, and is the child of a
1098:   branching node in the trie of order $t$.
1099:   
1100: \item[5B.] choose this for the second branch of the tradeoff: hold the
1101:   same information as above for each $t$, and every node $v$ which is
1102:   not a branching node, is on an active path, and has a branching
1103:   ancestor in the same natural depth-$B$ subtree.
1104: \end{itemize}
1105: 
1106: In item 5A., notice that for every $t$ there are at most $2n - 2$
1107: children of branching nodes which are on active paths. We store $O(\lg
1108: w)$ bits for each, and there are $O(\lg_B w)$ values of $t$, so we can
1109: store this in a Bloomier filter with $o(n)$ words of space. In item
1110: 5B., the number of interesting nodes blows up by at most $B$ compared
1111: to 5A., and since $B \leq \sqrt{w}$, we are still using $o(n)$ words
1112: of space.
1113: 
1114: \paragraph{Updates.}
1115: For each $t > 0$, we can either create a new branching node in the
1116: trie of order $t$, or the branching node existed already. We first
1117: test whether the branching node existed or not. If we just introduced
1118: a branching node, it has at most two children which are not branching
1119: nodes and are on active paths (if more than two such children exist,
1120: the node was a branching node before). If the branching node was old,
1121: we might have added one such child. These children are determined by
1122: looking at the branching descendents of $v$ (these give the two active
1123: paths going into $v$, one or both of which are new active paths going
1124: into the node in the subtrie of order $t$). For such children, we add
1125: the depth of $v$ in the structure from item 5A. If we are in case 5B,
1126: we follow both paths either until we find a branching node, or the
1127: border of the natural depth-$B$ subtree. For of these $O(B)$
1128: positions, we add the depth of $v$ in item $5B$. To summarize, the
1129: running time is $O(T_{pred} + \lg_B w)$ if we need to update 5A., and
1130: $O(T_{pred} + B \lg_B w)$ is we need to update 5B.
1131: 
1132: 
1133: \paragraph{Queries.}
1134: We need to show how to find $v$'s branching ancestor, assuming $v$ is
1135: on an active path, but is not a branching node. For some $t > 0$, and
1136: all $t$'s above that value, $v$ will be mapped in the trie of order
1137: $t$ to some branching node. That is the smallest $t$ such that the
1138: depth-$B^t$ natural subtree containing $v$ contains some branching
1139: node. We find this $t$ by binary search, taking time $O(\lg(\lg_B
1140: w))$. For some proposed $t$, we check if the node to which $v$ is
1141: mapped is a branching node in the trie of order $t$ (using the
1142: subroutine described above). If it is, we continue searching below;
1143: otherwise, we continue above.
1144: 
1145: Say we found the smallest $t$ for which $v$ is mapped to a branching
1146: node. In the trie of order $t-1$, $v$ is mapped to some $w$ which is
1147: not a branching node. Finding the lowest branching ancestor of $v$ is
1148: identical to finding the lowest branching ancestor of the node
1149: corresponding to $w$ in the primary trie (since $w$ is a not a
1150: branching node, there is no branching node in the primary trie in the
1151: subtree represented by $w$). In the trie of order $t$, $w$ gets mapped
1152: to a branching node, so the natural depth-$B$ subtree of $w$ contains
1153: at least one branching node. The either: (1) there is some branching
1154: node above $w$ in its natural depth-$B$ subtree, or (2) $w$ is on the
1155: active path going to the root of this natural subtree (it is above any
1156: branching node).
1157: 
1158: We first deal with case (2). If $w$ is above any branching node in its
1159: natural subtree, to find $w$'s branching ancestor we can find the
1160: branching ancestor of the node from the trie of order $t$, to which
1161: this subtree is mapped. But this is a branching node, so the structure
1162: in item 4.~gives the branching ancestor $z$. We can test that we are
1163: indeed in case (2), and not case (1), by looking at the two branching
1164: descendents of $z$, and checking that one of them is strictly under
1165: $v$.
1166: 
1167: Now we deal with case (1). If we have the structure 5B., this is
1168: trivial. Because $w$ is on an active path and has a branching ancestor
1169: in its natural depth-$B$ subtree, it records the depth of the
1170: branching ancestor of the node corresponding to $w$ in the primary
1171: trie. So in this case, the only super-constant cost is the binary
1172: search for $t$, which is $O(\lg(\lg_B w))$. If we only have the
1173: structure 5A., we need to walk up the trie of order $t-1$ starting
1174: from $w$. When we reach the child of the branching node above $w$, the
1175: branching node from the primary trie is recorded in item 5A. Since the
1176: branching node is in the same natural depth-$B$ subtree as $w$, we
1177: reach this point after $O(B)$ steps. One last detail is that we do not
1178: actually know when we have reached the child of a branching node
1179: (because the Bloomier filter from item 5A.~can return arbitrary
1180: results for nodes not satisfying this property). To cope with this, at
1181: each level we hope that we have reached the destination, we query the
1182: structure in item 5A., we find the purported branching ancestor, and
1183: check that it really is the lowest branching acestor of $v$. This
1184: takes constant time; if the result is wrong, we continue walking up
1185: the trie. Overall, with the structure of 5A.~we need query time
1186: $O(\lg(\lg_B w) + B)$.
1187: 
1188: We have shown how to achieve the same running times (as functions of
1189: $B$) as in the case of the greater-than function. The same calculation
1190: establishes our tradeoff curve.
1191: 
1192: 
1193: \section{Lower Bounds for the Greater-Than Problem}
1194: 
1195: A lower bound for the first branch of the tradeoff can be obtained
1196: based on Fredman's proof idea \cite{fredman82sums}. We ommit the
1197: details for now. To get a lower bound for the second case ($T_q <
1198: O(\lg\lg n)$), we use the sunflower lemma of Erd\H{o}s and Rado. A
1199: sunflower is collection of sets (called petals) such that the
1200: intersection of any two of the sets is equal to the intersection of
1201: all the sets.
1202: 
1203: \begin{lemma}[Sunflower Lemma]
1204:   Consider a collection of $n$ sets, of cardinalities at most $s$. If
1205:   $n > (p-1)^{s+1} s!$, the collection contains as a subcollection a
1206:   sunflower with $p$ petals.
1207: \end{lemma}
1208: 
1209: For every query parameter in $[0,n-1]$, the algorithm performs at most
1210: $T_q$ probes to the memory. Thus, there are $2^{T_q}$ possible
1211: execution paths, and at most $2^{T_q} - 1$ bit cells are probed on at
1212: least some execution path. This gives $n$ sets of cells of sizes at
1213: most $s = O(2^{T_q})$; we call these sets query schemes. By the
1214: sunflower lemma, we can find a sunflower with $p$ petals, if $p$
1215: satisfies: $n > (p-1)^{s+1} s! \Rightarrow \lg n > \Theta(s (\lg p +
1216: \lg s))$. If $T_q < (1-\epsilon) \lg\lg n$, we have $s\lg s = o(\lg
1217: n)$, so our condition becomes $\lg n > \Theta(s \lg p)$. So we can
1218: find a sunflower with $p$ petals such that $\lg p = \Omega((\lg
1219: n)/s)$. Let $P$ be the set of query parameters whose query schemes are
1220: these $p$ petals.
1221: 
1222: The center of the sunflower (the intersection of all sets) obviously
1223: has size at most $s$. Now consider the update schemes for the numbers
1224: in $P$. We can always find $T \subset P$ such that $|T| \geq |P| /
1225: 2^s$ and the update schemes for all numbers in $T$ look identical if
1226: we only inspect the center of the sunflower. Thus $\lg |T| = \lg |P| -
1227: s = \Omega(\frac{\lg n}{s} - s)$. If $T_u < (\frac{1}{2} - \epsilon)
1228: \lg\lg n$, we have $s = o(\frac{\lg n}{s})$, so we obtain $\lg |T| =
1229: \Omega(\frac{\lg n}{s})$.
1230: 
1231: Now we restrict our attention to numbers in $T$ for both the update
1232: and query value. The cells in the center of the sunflower are thus
1233: fixed. Define the natural result of a certain query to be the result
1234: (greater than vs. not greater than) of the query if all bit cells read
1235: by the query outside the center of the sunflower are zero. Now pick a
1236: random $x \in T$. For some $y$ in the middle third of $T$ (when
1237: considering the elements of $T$ in increasing order), we have $\Pr[y
1238: \leq x] \geq \frac{1}{3}, \Pr[y > x] \geq \frac{1}{3}$, so no matter
1239: what the natural result of querying $y$ is, it is wrong with
1240: probability at least $\frac{1}{3}$. So for a random $x$, at least a
1241: fraction of $\frac{1}{9}$ of the natural results are wrong. Consider
1242: an explicit $x$ with this property. The update scheme for $x$ must set
1243: sufficiently many cells to change these natural results. But these
1244: cells can only be in the petals of the queries whose natural results
1245: are wrong, and the petals are disjoint except for the center, which is
1246: fixed. So the update scheme must set at least one cell for every
1247: natural result which is wrong. Hence $T_u \geq |T|/9 \Rightarrow \lg
1248: T_u = \Omega(\lg |T|) = \Omega(\frac{\lg n}{s}) = \Omega(\frac{\lg
1249:   n}{2^{T_q}}) \Rightarrow 2^{T_q} = \Omega(\lg_{T_u} n)$.
1250: 
1251: 
1252: 
1253: \paragraph*{Acknowledgement.}  
1254: The authors would like to thank Gerth Brodal for discussions in the
1255: early stages of this work, in particular on how the results could be
1256: extended to dynamic range counting.
1257: 
1258: 
1259: \bibliographystyle{plain}
1260: \bibliography{../general}
1261: 
1262: %\begin{thebibliography}{10}
1263: %  \setlength{\parsep}{0pt}
1264: %  \setlength{\itemsep}{0pt}
1265: 
1266: \end{document}
1267: 
1268: % cite bloom using:
1269: % @Article{Bloom:1970:STT,
1270: % author =       "Burton H. Bloom",
1271: % title = "Space\slash Time Trade-offs in Hash Coding with Allowable Errors",
1272: % journal =      "Communications of the {ACM}",
1273: % volume =       "13",
1274: % number =       "7",
1275: % pages =        "422--426",
1276: % year =         "1970",
1277: %}
1278: