da27197457a4530d.tex
1: \begin{definition}[$L_p$ prefix encoding]
2: Let $1 < p \le 2$.
3: Consider a word $S$ of length $b$ on the alphabet of size~$\sigma$. Define $q_0 = b$. For $k = 0, \ldots, \ceil{\log b \sigma^p}$, let $q_k \le q_{k-1}$ be the leftmost position such that the $p$'th moment of the difference between $S[q_k, b]$ and $P[1,b-q_k+1]$, i.e. $\norm{S[q_k, b]-P[1,b-q_k+1]}_p^p$, is at most~$2^k$. 
4: 
5: Further, divide $S[q_k, b]$ into $\Theta(1/\eps^{ p})$ blocks such that each block is either a single character, or the $p$'th moment of the difference between each block and the corresponding subword of $P[1,b-p_k+1]$ is at most $\eps^{p} \cdot 2^{k}$. Let $q_k = q_k^0 \le q_k^1 \le \ldots q_k^{\ell_k} = b$ be the block borders. We choose $q_k^1, q_k^2, \ldots, q_k^{\ell_k}$ from left to right, and each position $q_k^i$ is chosen to be the rightmost possible. 
6: 
7: The \emph{$L_p$ prefix encoding} of $S$ is defined to contain sorted lists of the positions $q_k$ and $q_k^i$, characters $S[q_k^i]$, and sketches for $(1\pm C_p \cdot\eps / p)$-approximating the $p$'th norm of $S[q_k^j, b]$, for all $k, j$ and $C_p$ as in Observation~\ref{obs:norm_moment}, see also Theorem~\ref{th:psketch}.
8: \end{definition}
9: