1: \documentclass{article}
2:
3: \usepackage{graphicx}
4: \usepackage{amssymb,amsmath}
5:
6: \newcommand{\de}{{\rm d}}
7: \newtheorem{theorem}{Theorem}
8:
9:
10: \title{Nonparametric estimation of distribution and density functions in
presence of missing data: an IFS approach}
11:
12: \author{Stefano M. Iacus\footnote{Department of Economics, Via Conservatorio 7, I-20122 Milan - Italy, email: stefano.iacus@unimi.it} \and Davide La Torre}
13:
14: \DeclareGraphicsExtensions{.eps}
15:
16:
17: \begin{document}
18:
19: \maketitle
20:
21: \begin{abstract}
22: % Text of abstract
23: In this paper we consider a class of nonparametric estimators of a distribution function $F$, with compact support, based on the theory of IFSs. The estimator
24: of $F$ is tought as the fixed point of a
25: contractive operator $T$ defined in terms of a vector of parameters $p$ and a family of affine maps $\mathcal W$
26: which can be both depend of the sample $(X_1, X_2, \ldots, X_n)$.
27: Given $\mathcal W$, the problem consists in finding
28: a vector $p$ such that the fixed point of $T$ is ``sufficiently
29: near'' to $F$. It turns out that this is a quadratic constrained optimization problem that we propose to solve by penalization techniques. If $F$ has a density $f$, we can also provide an estimator of $f$ based on Fourier techniques. IFS estimators for $F$ are asymptotically equivalent to the empirical distribution function (e.d.f.) estimator. We will study relative efficiency of the IFS estimators with respect to the e.d.f. for small samples via Monte Carlo approach.
30:
31: For well behaved distribution functions $F$ and for a particular family of so-called wavelet maps
32: the IFS estimators can be dramatically better than the e.d.f. (or the kernel estimator for density estimation) in presence of missing
33: data, i.e. when it is only possibile to observe data on subsets of the whole support of $F$.
34:
35: This research has also produced a free package for the R statistical environment which is ready to be used in applications.
36: \end{abstract}
37:
38: \textbf{key words:} iterated function systems, distribution function estimation,
39: nonparametric estimation, missing data, density estimation.\\
40:
41:
42:
43: % main text
44: \section{Introduction}
45: Let $X_1, X_2, \ldots, X_n$ be an i.i.d. sample drawn from a
46: random variable $X$ with unknown distribution function
47: $F$ with compact support $[\alpha,\beta]$. The empirical
48: distribution function (e.d.f.) $$ \hat F_n(x) =
49: \frac{1}{n}\sum_{i=1}^n \chi(X_i\leq x) $$ is one commonly used
50: estimator of the unknown distribution function $F$ (here $\chi$ is the indicator function). The e.d.f. has
51: an impressive set of good statistical properties such as it is
52: first order efficient in the minimax sense (see \cite{dkw}, \cite{beran}, \cite{levit},
53: \cite{millar}, and \cite{gilevit}). More or less recently, other
54: second order efficient estimators have been proposed in the
55: literature for special classes of distribution functions $F$.
56: Golubev and Levit (1996a, b) and \cite{efro} are two of such
57: examples. It is rather curious that a step-wise function can be
58: such a good estimator and, in fact, \cite{efro} shows that,
59: for the class of analytic functions, for small sample sizes, the
60: e.d.f. is not the best estimator. In this paper we study the
61: properties of a new class of distribution function estimators
62: based on iterated function systems (IFSs) introduced by the
63: authors in a previous work \cite{iaclat}. IFSs have
64: been introduced in \cite{hutch} and \cite{bd}. The main idea on which this method is based consists of
65: thinking the estimation of $F$ as the fixed point of a contraction $T$
66: on a complete metric space. The operator $T$ is defined in terms
67: of a family of affine maps $\mathcal W$ and a vector of parameters $p$. For a given family $\mathcal W$, $T$ depends only on the choice $p$. The idea,
68: known as {\it inverse approach} (see Section \ref{sec2}) is to
69: determine $p$ by solving a constrained quadratic optimization
70: problem built in terms of sample moments. In this paper
71: this optimization problem is solved by a penalization method.
72: The nature of affine maps allow to derive easily the Fourier transform of $F$ and, when available, an explicit formula
73: for the density of $F$ via anti Fourier transform.
74: In this way, given $\mathcal W$ and $p$ we have at the same time estimators for the distribution, characteristic and density functions.
75:
76: The paper is organized as follows. In Section \ref{sec2} the
77: inverse approach is presented and a penalization method is proposed in
78: order to solve a quadratic optimization problem. We also discuss
79: the choice of the family of maps $\mathcal W$. In Section
80: \ref{sec3} numerical results and comparisons with classical
81: estimators are shown for small samples via Monte Carlo Analysis.
82:
83: Finally we show an application of these estimators when the empirical distribution function (or the kernel density estimator for the density) cannot be applied. We will consider situations of missing data when, for example, the data can only be observed on some windows of the support of $F$. This can be the case of directional data analysis when, for some reason, instruments are not able for technical or physical reason to collect data in same range of angles say $A$ and $B$, $A, B\subseteq [0,2\pi]$. For $x$ in $A$ or $B$ the e.d.f. will be constant and, at the same time, the kernel density estimator will estimate a plurimodal distribution for these data.
84: In this case we will show examples in which the IFS estimator does it job incredibly well.
85:
86: Tables and figures can be found at the end of the paper after the references.
87:
88: \section{An IFS estimator}\label{sec2}
89: The theory of distribution function approximation via IFSs we will use to derive estimators is due to \cite{fv95}. Results from this section, apart from were explicitly mentioned, are from the cited authors. Let $\mathcal M(X)$ be the set of probability measures
90: on $\mathcal B(X)$, the $\sigma$-algebra of Borel subsets of $X$
91: where $(X,d)$ is a compact metric space (in our case will be
92: $X=[\alpha,\beta]$ and $d$ the Euclidean metric.)
93:
94: In the IFSs literature the following {\sl Hutchinson} metric plays
95: a crucial role $$ d_H(\mu,\nu) = \sup_{f\in {\rm Lip}(X)} \left\{
96: \int_X f \de \mu - \int_X f \de \nu \right\}, \quad \mu,\nu \in
97: \mathcal M(X) $$ where $$ {\rm Lip}(X) = \{ f: X\to \mathbb R,
98: |f(x)-f(y)|\leq d(x,y), x,y \in X\} $$ thus $(\mathcal M(X),d_H)$
99: is a complete metric space \cite[see][]{hutch}.
100:
101: We denote by $({\bf w},{\bf p})$ an {\sl $N$-maps
102: contractive IFS on $X$ with probabilities} or simply an {\sl
103: $N$-maps IFS}, that is, a set of $N$ affine contraction maps,
104: ${\bf w} = (w_1,w_2,\ldots,w_N)$, $$w_i = a_i + b_i \, x,\quad {\rm with}\,\, |b_i|<1,\quad
105: b_i,a_i\in\mathbb R,\quad i=1,2,\ldots, N, $$ with
106: associated probabilities ${\bf p} = (p_1,p_2,\ldots,p_N)$,
107: $p_i\geq 0$, and $\sum_{i=1}^N p_i =1$. The IFS has a
108: contractivity factor defined as $$ c = \max_{1\leq i\leq N}
109: |b_i| <1 $$ Consider the following (usually called {\sl
110: Markov}) operator $M : \mathcal M(X)\to \mathcal M(X)$ defined as
111:
112: \begin{equation}
113: M\mu = \sum_{i=1}^N p_i \mu \circ w_i^{-1},\quad \mu \in \mathcal
114: M(X),
115: \end{equation}
116:
117: where $w_i^{-1}$ is the inverse function of $w_i$ and $\circ$
118: stands for the composition. In Hutchinson (1981) it was shown
119: that $M$ is a contraction mapping on $(\mathcal M(X),d_H)$ i.e.
120: for all $\mu,\nu\in \mathcal M(X)$, $d_H(M\mu,M\nu)\leq c
121: d_H(\mu,\nu)$. Thus, there exists a unique measure
122: $\bar\mu\in\mathcal M(X)$, the {\sl invariant measure} of the IFS,
123: such that $M\bar\mu = \bar \mu$ by Banach theorem. Associated to
124: each measure $\mu\in \mathcal M(X)$, there exists a distribution
125: function $F$. In terms of it the previous operator $M$ can be
126: rewritten as
127: $$ TF(x)=\left\{
128: \begin{array}{ll} 0 & \ \ \mbox{if $x\le \alpha$}\\
129: \\
130: \sum\limits_{i=1}^N p_i F(w_i^{-1}(x)) & \ \ \mbox{if $\alpha<x<\beta$}\\
131: \\
132: 1 & \ \ \mbox{if $x\ge \beta$}\\
133:
134: \end{array}
135: \right. $$
136:
137: \subsection{Minimization approach}
138:
139: For affine IFSs there exists a simple and useful relation between
140: the moments of probability measures on $\mathcal M(X)$. Given a
141: $N$-maps IFS$({\bf w},{\bf p})$ with associated Markov operator
142: $M$, and given a measure $\mu\in\mathcal M(X)$ then, for any
143: continuous function $f:X\to\mathbb R$,
144:
145: \begin{equation}
146: \int_X f(x) \de \nu(x) = \int_X f(x) \de(M\mu)(x) = \sum_{i=1}^N
147: p_i \int_X (f\circ w_i)(x)\de \mu(x)\,, \label{eq1}
148: \end{equation}
149:
150: where $\nu = M\mu$. In our case $X = [\alpha,\beta]\subset\mathbb R$ so we
151: readly have a relation involving the moments of $\mu$ and $\nu$.
152: Let
153: \begin{equation}
154: g_k = \int_X x^k \de\mu,\quad h_k = \int_X x^k \de \nu,\quad
155: k=0,1,2,\ldots,
156: \end{equation}
157: be the moments of the two measures, with $g_0 = h_0 = 1$. Then, by
158: \eqref{eq1}, with $f(x) = x^k$, we have $$ h_k = \sum_{j=0}^k
159: \binom{k}{j} \left\{\sum_{i=1}^N p_i b_i^j a_i^{k-j}
160: \right\} g_j,\quad k=1,2,\ldots,\,. $$
161: Set $X=[\alpha,\beta]$ and let $\mu$ and $\mu^{(j)} \in \mathcal M(X)$,
162: $j=1,2,\ldots$ with associated moments of any order $g_k$ and $$
163: g_k^{(j)} = \int_X x^k \de \mu^{(j)}\,. $$ Then, the following
164: statements are equivalent (as $j\to\infty$ and $\forall k\geq
165: 0$):
166: \begin{enumerate}
167: \item $g_k^{(j)}\to g_k$,
168: \item $\forall f\in {\bf C}(X)$, $\int_X f \de\mu^{(j)} \to \int_X f\de\mu\,,$ (weak* convergence),
169: \item $d_H(\mu^{(j)},\mu)\to0$.
170: \end{enumerate}
171:
172: (here ${\bf C}(X)$ is the space of continuous functions on $X$).
173: This result gives a way to find and appropriate set of maps and
174: probabilities by solving the so called problem of moment matching.
175: With the solution in hands, given the convergence of the moments,
176: we also have the convergence of the measures and then the
177: stationary measure of $M$ approximates with given precision (in a
178: sense specified by the collage theorem below) the target measure
179: $\mu$ \cite[see][]{bd}.
180:
181: Next result, called the {\sl collage} theorem is a standard
182: product of the IFS theory and is a consequence of Banach theorem.
183:
184: {\bf (Collage theorem) : }
185: Let $(Y,d_Y)$ be a complete metric space. Given an $y\in Y$,
186: suppose that there exists a contractive map $f$ on $Y$ with
187: contractivity factor $0\leq c<1$ such that
188: $d_Y(y,f(y))<\varepsilon$. If $\bar y$ is the fixed point of $f$,
189: i.e. $f(\bar y) = \bar y$, then $d_Y(\bar y,y) <
190: \frac{\varepsilon}{1-c}$.
191:
192:
193: So if one wishes to approximate a function $y$ with the fixed
194: point $\bar y$ of an unknown contractive map $f$, it is only
195: needed to solve the inverse problem of finding $f$ which minimizes
196: the collage distance $d_Y(y,f(y))$.
197:
198: The main result in Forte and Vrscay that we will use to build one
199: of the IFS estimators is that the inverse problem can be reduced
200: to minimize a suitable quadratic form in terms of the $p_i$ given
201: a set of affine maps $w_i$ and the sequence of moments $g_k$ of
202: the target measure. Let $$\Pi^N = \left\{ {\bf p} =
203: (p_1,p_2,\ldots,p_N) : p_i\geq 0, \sum_{i=1}^N p_i = 1 \right\} $$
204: be the simplex of probabilities. Let ${\bf w} =
205: (w_1,w_2,\ldots,w_N)$, $N=1,2,\ldots$ be subsets of $\mathcal W =
206: \{w_1,w_2,\ldots\}$ the infinite set of affine contractive maps on
207: $X=[\alpha,\beta]$ and let ${\bf g}$ the set of the moments of any order of
208: $\mu\in\mathcal M(X)$. Denote by $M$ the Markov operator of the
209: $N$-maps IFS $({\bf w},{\bf p})$ and by $\nu_N = M\mu$, with
210: associated moment vector of any order ${\bf h}_N$. The collage
211: distance between the moment vector of $\mu$ and $\nu_N$ $$
212: \Delta({\bf p}) = ||{\bf g}-{\bf h}_N||_{\bar l^2} : \Pi^N \to
213: \mathbb R $$ is a continuous function and attains an absolute
214: minimum value $\Delta_{\min}$ on $\Pi^N$ where $$ ||{\bf
215: x}||_{\bar l^2} = x_0^2 + \sum_{k=1}^\infty \frac{x_k^2}{k^2}\,.$$
216: Moreover,
217: $\Delta^N_{\min} \to 0$ as $N\to\infty$.
218: Thus, the collage distance can be made arbitrarily small by
219: choosing a suitable number of maps and probabilities.
220:
221: The above inverse problem can be posed as a
222: quadratic programming one in the following notation $$ S({\bf p})
223: = (\Delta({\bf p}))^2 = \sum_{k=1}^\infty \frac{(h_k-g_k)^2}{k^2}
224: $$ $$D(X) = \{{\bf g} = (g_0,g_1,\ldots) : g_k = \int_X x^k
225: \de\mu, k=0,1,\ldots, \mu\in\mathcal M(X)\} $$
226:
227: Then by \eqref{eq1} there exists a linear operator $A :D(X)\to
228: D(X)$ associated to $M$ such that ${\bf h}_N = A {\bf g}$. In
229: particular
230: \begin{equation}
231: h_k = \sum_{i=1}^N A_{ki} p_i,\quad k=1,2,\ldots
232: \quad\text{where} \quad
233: A_{ki} = \sum_{j=0}^\infty \binom{k}{j} b_i^j a_i^{k-j} g_j
234: \label{eqAik}
235: \end{equation}
236: Thus $$ S({\bf p}) = {\bf p}^t Q {\bf p} + {\bf B}^t {\bf p}
237: +C,\leqno{({\bf Q})} $$
238: $$\text{where}\quad Q=[q_{ij}],\quad q_{ij} = \sum_{k=1}^\infty
239: \frac{A_{ki} A_{kj}}{k^2},\quad i,j = 1,2,\ldots, N, $$
240: \begin{equation}
241: B_i =
242: -2\sum_{k=1}^\infty \frac{g_k}{k^2} A_{ki},\quad i=1,2,\ldots, N
243: \quad \text{and}\quad C =\sum_{k=1}^\infty \frac{g_k^2}{k^2}\,.
244: \label{eqBiC}
245: \end{equation}
246: The series above are convergent as $0\leq A_{ni}\leq 1$ and the
247: minimum can be found by minimizing the quadratic form on the
248: simplex $\Pi^N$.
249:
250: The estimator will then be built by substituting the moments of the target measure with the empirical moments and by truncation of the above series to a finite sum.
251:
252: \subsection{Numerical solutions}
253: When practical cases are considered, in particular concerning
254: estimation, the previous series have to be truncated and this
255: implies that the matrix $Q$ is assured to be definite positive. Standard
256: numerical procedures for the minimization of constrained quadratic
257: optimization problems involving positive definite quadratic forms
258: cannot be used in this context. To solve this problem an approach
259: is to build the following penalized function $L_\lambda$
260:
261: $$ L_\lambda({\bf p})={\bf p}^t Q {\bf p} + {\bf B}^t {\bf p}
262: +C+\lambda\left(1-\sum_{i=1}^N p_i\right)^2 $$
263:
264: and then to study the following problem
265:
266: $$ \min L_\lambda({\bf p}), \ \ 0\le p_i\le 1 \leqno{(LOP)} $$
267:
268: It is trivial that an optimizer ${\bf p^*}$ of (LOP) such that
269: $\sum_{i=1}^N p_i^*=1$ is also an optimizer for the problem
270:
271: $$ \min S({\bf p}), \ \ {\bf p}\in \Pi^N \leqno{(OP)} $$
272:
273: For solving (LOP) numerically, we have used the method L-BFGS-B due to \cite{byrd} which allows to minimize a nonlinear function with box
274: constraints, i.e. when each variable can be given a lower and/or
275: upper bound. The initial value of this procedure must satisfy the
276: constraints. This uses a limited-memory modification of the BFGS
277: quasi-Newton method. The method `''BFGS''' is a quasi-Newton method
278: (also known as a variable metric algorithm).
279:
280:
281: \subsection{The choice of affine maps}
282:
283: As we are mostly concerned with estimation, we briefly discuss the
284: problem of choosing the maps. In \cite{fv95} the
285: following two sets of wavelet-type maps are proposed. Fixed and
286: index $i^*\in\mathbb N$, define
287:
288: $$ \gamma_{ij} = \frac{x-\alpha+(j-1)(\beta-\alpha)}{2^i}+\alpha,\quad i=1,2,\ldots,
289: i^* \quad j = 1,2,\ldots, 2^i $$ and $$ \eta_{ij} =
290: \frac{x-\alpha+(j-1)(\beta-\alpha)}{i},\quad i=2,\ldots, i^* \quad j = 2,\ldots,
291: i\,. $$ Then set $N = \sum\limits_{i=1}^{i^*} 2^i$ or
292: $N=i^*(i^*-1)/2$ respectively. To choose the maps, consider the
293: natural ordering of the maps $\omega_{ij}$ and operate as follows
294: $$ \mathcal W_1 =\{ w_1 = \gamma_{11}, w_2 = \gamma_{12}, w_3 =
295: \gamma_{21}, \ldots, w_6 =\gamma_{24}, w_7 = \gamma_{31}, \ldots,
296: w_{N}=\gamma_{i^*2^{i^*}}\} $$ and $$\mathcal W_2 =\{ w_1 =
297: \eta_{22}, w_2 = \eta_{32}, w_3 = \eta_{33}, w_4 =\eta_{42},
298: \ldots, w_6 = \eta_{44}, \ldots, w_{N}=\eta_{i^*i^*}\} $$
299: respectively. In \cite{iaclat} we proposed the
300: following quantile based maps $$\mathcal Q_1 =\{ w_i (x) =
301: (q_{i+1}-q_i) x + q_i, i=1,2,\ldots, N\}$$ where $q_i =
302: F^{-1}(u_i)$, and $0=u_1 < u_2 < \ldots < u_{N} < u_{N+1} = 1$ are
303: $N+1$ equally spaced points on $[0,1]$.
304: With these maps, it has been shown that, there is no need to use a moment matching approach. In particular, given $p_i=1/N$, the IFSs turns out to be a smoother of the e.d.f. and so it has nice small sample and asymptotic statistical properties (see cited reference) even for non compact support distribution functions $F$.
305: Here we will also mix the quantile information with the moment matching idea. To distinguish the two cases (fixed $p_i=1/N$ or $p$ solution of $({\bf QP})$) we will use the notation $\mathcal Q_1$ and $\mathcal Q_2$ later on.
306:
307:
308: \subsection{Fourier analysis results}
309: We recall, from \cite{fv98} results that are rather straight forward to prove but also essential to us since we will use them in density estimation and in particular in presence of missing
310: data. Simplicity is due to affinity of the maps. We assume that the support of the measures is $X= [0,1]$ without loss of generality.
311:
312: Given a measure $\mu\in\mathcal M(X)$, the Fourier transform (FT) $\phi : \mathbb R \to \mathbb C$, where $\mathbb C$ is the complex space, is defined by the relation
313: $$
314: \phi(t) = \int_X e^{-itx} \de \mu(x),\quad t\in\mathbb R\,,
315: $$
316: with the well known properties $\phi(0) =1$ and $|\phi(t)|\leq 1$, $\forall\, t\in\mathbb R$.
317: It can be shown that the space of characteristic functions ${\mathcal FT}(X)$ can be made metric and complete with an opportune metric.
318: Thus, given a $N$-maps affine IFS$({\bf w},{\bf p})$ it is possibile to define a new linear operator $B: {\mathcal FT}(X)\to {\mathcal FT}(X)$ whose unique fixed point
319: reads as
320: $$
321: \bar\phi(t) = \sum_{k=1}^N p_k e^{-i t a_k} \bar\phi(b_k t),\quad t\in\mathbb R\,.
322: $$
323: This $\bar\phi(t)$ is the FT of the fixed point of the $N$-maps affine IFS$({\bf w},{\bf p})$.
324: Now \cite[see e.g.][]{tarter}, suppose that the target distribution $F$ admits a density $f$. It is possible to write the density $f$ via Fourier expansion. In fact,
325: $$
326: \phi(t) =\int_0^1 f(x) e^{-itx} \de x = \int_0^1 e^{-itx} \de F(x)
327: $$
328: thus
329: $$
330: f(x) = \frac{1}{2\pi}\sum_{k=-\infty}^{+\infty} B_k e^{ikx}
331: \quad\text{where}\quad B_k = \phi(k)\,.
332: $$
333:
334: \section{Relative efficiency and estimation in presence of missing data}\label{sec3}
335: Suppose to have an i.i.d. sample on $n$ observations with common unknown distribution function $F$ with compact support on $[\alpha,\beta]$ which has all the moments up to order $M$. An IFS estimator of $F$ is the fixed point of the functional $TF$ where the $N$ maps are choosen in advance and the $p_i$ are the solution of the ({\bf QP}) quadratic programming problem where in the expression on $A_{ik}$, $B_i$ and $C$ we replace, in equations \eqref{eqBiC}
336: and \eqref{eqAik}, the true moments $g_k$ with the sample moments $m_k$, $k=0,1,\ldots, M$ for a fixed $M$ and we consider the first $M$ terms of the series involved.
337:
338: Given the solution of ({\bf QP}), we have an estimator for $F$ and an estimator for the characteristic function of $F$, say $\hat \phi$. Suppose that $F$ posseses a density $f$ then we have further a (Fourier) density estimator for $f$
339: $$
340: \begin{aligned}
341: \hat f(x) &= \frac{1}{2\pi}\sum_{k=-m}^{+m} \hat B_k e^{ikx}\\
342: &=
343: \frac{1}{2\pi} + \frac{1}{\pi} \sum_{i=1}^m \biggl\{{\rm Re}(\hat B_k)\,\cos(kx) -
344: {\rm Im}(\hat B_k)\,\sin(kx)\biggr\}
345: \end{aligned}
346: $$
347: where $\hat B_k = \hat \phi(k)$ and $m$, the number of Fourier terms, is choosen in the usual way,
348: i.e.
349: $$
350: \text{if} \, \left | \hat B_{m+1} \right |^2 \text{and} \left | \hat B_{m+2} \right |^2 < \frac{2}{n+1}
351: \quad\text{then use the first $m$ coefficents}
352: $$
353: \cite[see again][]{tarter}.
354: Tables \ref{tab:a} and \ref{tab:b} show camparisons between the empirical cumulative distribution function $\hat F_n$ and the IFS estimator, say $\hat T_N$, for some target distributions $F$, in terms of average mean square error (AMSE) and sup-norm (SUP) distance.
355: These tables contain Monte Carlo analysis where 100 simulations have been done for each target distribution.
356: Tables report the average ratio of the sup norm (and AMSE) of the IFSs over the corresponding sup norm (respectively AMSE) of the empirical distribution function.
357:
358: It is possible to notice that the IFS estimator based on maps $\mathcal W_1$ has good properties for symmetric bell-shaped distributions and distributions with not so heavy tails (see also Figure \ref{fig:beta22}).
359: It is also evident the asymptotic equivalence of the IFSs to the e.d.f. when quantile maps are used.
360: Remark that, for $\mathcal W_1$ we have decided to use 62 maps, for $\mathcal W_2$ 28 maps and $n/2$ quantiles for the quantiles maps $\mathcal Q_1$ and $\mathcal Q_2$.
361: So it is evident that for wavelet-type maps an adjustment can be done by choosing a suitable number of maps in terms of the sample size $n$.
362:
363:
364:
365: \subsection{What if data are missing?}
366: Suppose now that the for some reason, the $n$ sample observations from $F$ are in fact a subset of a biggest sample, of unknown size. In practice we do not observe the data on the whole support of $F$ $[\alpha,\beta]$ but only on some windows. This sample reduction has happened due to some sort of censoring. So we are in presence of missing data when we do not know how many data are missing and where exactly they were missed, i.e. we are not in a classical censoring setup. A motivation for this scheme of (non)-observation is the following: suppose one wants to estimate the distribution of the angle of the wind registered by some instruments in degrees (0,360). For some reason, data from angles (15,37) and (62,79) are missing for technical failures or physical obstacles. In this case the empirical distribution function will be flat on these windows and a kernel density estimator will probably show a bimodal behaviour.
367:
368: Heuristically, this is due to the fact that quantile estimation is inappropriate in this context. At the same time, moments estimation tend to be more robust, in particular if the distribution is symmetric.
369: We only report a graphical example of what can happen. Figure \ref{fig:missing} is about a sample from a Beta(2,2) distribution when only the observation in $(.1, .15)\cup (.37, .43) \cup (.7, .8)$ are available to the observer all the other being truncated by the instrument (we have choose this interval by hazard). The IFS estimator with $\mathcal W_1$ maps seems to be able to reconstruct the underlying distribution and density function, whistle, for obvious reasons both the e.d.f. and the kernel estimators fail. In this example the relative efficiency (IFS/EDF) is 7\% for the AMSE and 23\% for the SUP-norm which is dramatically better than expected!
370:
371:
372: \subsection{Algorithm flow for estimation}
373: \begin{enumerate}
374: \item calculate sample moments
375: \item choose the family of maps $\mathcal W$
376: \item build the quadratic form and solve it for $p$
377: \item if you want to estimate $F$ at point $x$: take any distribution function, for example the uniform over $[\alpha,\beta]$ and start to iterate $T$
378: \item stop after few iteration (normally 5 is enough)
379: \item the ``fixed point'' of $T$ evaluated in $x$ is the estimate of $F(x)$
380: \end{enumerate}
381: In case the support of $F$ is not known one case use the range of the sample but the resulting IFS estimator will then try to approximate a distribution function which has exactly that support.
382: If any hints on the shape of the distribution $F$ is available, use it to choose the maps.
383:
384:
385: All the examples, tables and graphics have been done by some software developed by the authors. In particular, a package called \texttt{ifs} is freely available for the R environment system \cite{R} in the CRAN (Comprehensive R Archive Network) \texttt{http://cran.R-project.org} under the \textsl{contributed} section.
386:
387: \section*{Conclusions}
388: It seems that this kind of approach can be used to make nonparametric inference when data are missing or sample size are small.
389: Remark that with this method it is only possible to work with distributions with compact support. Moreover, a knowledge on the support itself it is needed.
390: Neverthless, it seams a promising approach and the use of different sets of maps merits further investigation.
391:
392: % The Appendices part is started with the command \appendix;
393: % appendix sections are then done as normal sections
394: % \appendix
395:
396: % \section{}
397: % \label{}
398:
399: % Bibliographic references with the natbib package:
400: % Parenthetical: \cite{Bai92} produces (Bailyn 1992).
401: % Textual: \cite{Bai95} produces Bailyn et al. (1995).
402: % An affix and part of a reference:
403: % \cite[e.g.][Ch. 2]{Bar76}
404: % produces (e.g. Barnes et al. 1976, Ch. 2).
405:
406:
407: \begin{thebibliography}{}
408:
409: % \bibitem[Names(Year)]{label} or \bibitem[Names(Year)Long names]{label}.
410: % (\harvarditem{Name}{Year}{label} is also supported.)
411: % Text of bibliographic item
412:
413: \bibitem{bd} Barnsley, M.F., Demko, S., ``Iterated function systems and the
414: global construction of fractals'', {\sl Proc. Roy. Soc. London, Ser A}, {\bf 399}, 243-275, 1985.
415: \bibitem{beran} Beran, R., ``Estimating a distribution function'', {\sl Ann. Statist.}, 5, 400-404, 1977.
416: \bibitem{byrd} Byrd, R. H., Lu, P., Nocedal, J. and Zhu, C. ``A limited memory algorithm for bound constrained optimization'', {\sl SIAM J. Scientific
417: Computing}, 16, 1190-1208, 1995.
418: \bibitem{dkw} Dvoretsky, A., Kiefer, J. and Wolfowitz, J., ``Asymptotic minimax character of the sample distribution function and of the classical multinomial estimators'', {\sl Ann. Math. Statist.}, 27, 642-669, 1956.
419: \bibitem{efro} Efromovich, S., ``Second order efficient estimating a smooth distribution function and its applications'', {\sl Meth. Comp. App. Probab.}, 3, 179-198, 2001.
420: \bibitem{fv95} Forte, B., Vrscay, E.R., ``Solving the inverse problem for function/image approximation using iterated function systems, I. Theoretical basis'', {\sl Fractal}, {\bf 2}, 3, 325-334, 1995.
421: \bibitem{fv98} Forte, B., Vrscay, E.R., ``Inverse problem methods for generalized fractal transforms'', in
422: {\sl Fractal Image Encoding and Analysis}, NATO ASI Series F, Vol. 159, ed. Y. Fisher, Springer Verlag, Heidelberg, 1998.
423: \bibitem{gilevit} Gill, R. D., Levit, B. Y., ``Applications of the van Trees inequality: A Bayesian Cram\'er-Rao bound'', {\sl Bernoulli}, 1, 59-79, 1995.
424: \bibitem{gl96a} Golubev, G. K., Levit, B. Y., ``On the second order minimax estimation of distribution functions'', {\sl Math. Methods. Statist.}, 5, 1-31, 1996a.
425: \bibitem{gl96b} Golubev, G. K., Levit, B. Y., ``Asymptotic efficient estimation for analytic distributions'', {\sl Math. Methods. Statist.}, 5, 357-368, 1996b.
426: \bibitem{hutch} Hutchinson, J., ``Fractals and self-similarity'', {\sl Indiana Univ.
427: J. Math.}, {\bf 30}, 5, 713-747, 1981.
428: \bibitem{iaclat} Iacus, S.M., La Torre, D., ``Approximating distribution functions by iterated function systems
429: and applications'', {\sl Proceedings of the S.I.M.A.I. Conference},
430: Chia Laguna, Italy, May 2002 (CDROM). Submitted.
431: \bibitem{R} Ihaka, R., Gentleman, R., ``R: A Language for Data
432: Analysis and Graphics'', {\em Journal of Computational and Graphical
433: Statistics}, 5, 299-314, 1996.
434: \bibitem{levit} Levit, B.Y., ``Infinite-dimensional information inequalities'', {\sl Theory Probab. Applic.}, 23, 371-377, 1978.
435: \bibitem{millar} Millar, P.W., ``Asymptotic minimax theorems for sample distribution functions'', {\sl Z. Warsch. Verb. Geb.}, 48, 233-252, 1979.
436: \bibitem{tarter} Tarter, M.E. and Lock, M.D, Model free curve estimation, Chapman \& Hall, New York, 1993.
437: \end{thebibliography}
438:
439:
440: \eject
441:
442: \begin{table}
443: {\scriptsize
444: \begin{tabular}{c c c}
445: parameters & AMSE & SUP \\
446: \begin{tabular}{c|c}
447: $n$ & law\\
448: \hline
449: 10 & beta(.9,.1)\\
450: 10 & beta(.1,.9)\\
451: 10 & beta(.1,.1)\\
452: 10 & beta(\,2,\,2)\\
453: 10&beta(\,5,\,5)\\
454: 10&beta(\,3,\,5)\\
455: 10&beta(\,5,\,3)\\
456: 10&beta(\,1,\,1)
457: \end{tabular}&
458: \begin{tabular}{c|c|c|c}
459: ${\mathcal W_1} $ & ${\mathcal W_2} $ & ${\mathcal Q_1} $ & ${\mathcal Q_2} $\\
460: \hline
461: 81.08 & 77.05 & 203.53 & 149.68\\
462: 211.78 & 2024.68 & 203.39 & 258.88\\
463: 118.27 & 416.17 & 182.88 & 104.07\\
464: 56.47 & 80.53 & 67.68 & 112.46\\
465: 52.77 &57.90 &110.35 & 152.29\\
466: 55.95 & 71.07 & 99.92 & 142.52\\
467: 52.50 & 57.34 & 91.75 & 131.37\\
468: 73.35 & 119.04 & 79.01 & 102.04\\
469: \end{tabular}
470: &
471: \begin{tabular}{c|c|c|c}
472: ${\mathcal W_1} $ & ${\mathcal W_2} $ & ${\mathcal Q_1} $ & ${\mathcal Q_2} $\\
473: \hline
474: 85.76 & 75.44 & 110.11 &110.81\\
475: 175.32 & 441.32 & 114.51 & 161.55\\
476: 114.87 & 192.94 & 119.57 & 106.56\\
477: 53.31 & 69.24 & 70.36 & 98.21\\
478: 53.99 & 54.83 & 81.61 & 125.67\\
479: 51.93 & 60.58 & 81.72 & 116.79\\
480: 51.74 & 52.47 & 77.97 & 109.84\\
481: 65.63 & 90.40 & 70.89 & 90.85\\
482: \end{tabular}
483: \end{tabular}
484: \par
485: \vspace{12pt}
486: \par
487: \begin{tabular}{c c c}
488: parameters & AMSE & SUP \\
489: \begin{tabular}{c|c}
490: $n$ & law\\
491: \hline
492: 20 & beta(.9,.1)\\
493: 20 & beta(.1,.9)\\
494: 20 & beta(.1,.1)\\
495: 20 & beta(\,2,\,2)\\
496: 20&beta(\,5,\,5)\\
497: 20&beta(\,3,\,5)\\
498: 20&beta(\,5,\,3)\\
499: 20&beta(\,1,\,1)
500: \end{tabular}&
501: \begin{tabular}{c|c|c|c}
502: ${\mathcal W_1} $ & ${\mathcal W_2} $ & ${\mathcal Q_1} $ & ${\mathcal Q_2} $\\
503: \hline
504: 94.69 &85.25 &201.85 &169.78\\
505: 388.83 & 4183.36 & 203.70 & 195.36\\
506: 154.1 & 690.08 & 125.35 & 97.53\\
507: 61.46 & 93.37 & 85.46 & 95.49\\
508: 54.31 & 52.89 & 105.84 & 131.84\\
509: 60.42 & 67.33 & 93.30 &118.51\\
510: 53.82 & 57.72 & 92.26 & 114.84\\
511: 95.93 & 89.79 & 71.66 & 154.54\\
\end{tabular}
512: &
513: \begin{tabular}{c|c|c|c}
514: ${\mathcal W_1} $ & ${\mathcal W_2} $ & ${\mathcal Q_1} $ & ${\mathcal Q_2} $\\
515: \hline
516: 90.30 & 79.92 & 105.02 & 123.28\\
517: 257.13 & 612.55 & 109.10 & 122.99\\
518: 139.65 & 255.26 & 103.56 & 99.28\\
519: 55.34 & 73.95 & 84.42 & 91.38\\
520: 53.76 & 48.73 & 85.85 & 106.27\\
521: 55.98 & 60.88 & 85.39 & 101.16\\
522: 53.46 & 52.20 & 85.23 & 102.85\\
523: 63.20 & 106.95 & 81.56 & 82.54\\
\end{tabular}
524: \end{tabular}
525: \par
526: \vspace{12pt}
527: \par
528: \begin{tabular}{c c c}
529: parameters & AMSE & SUP \\
530: \begin{tabular}{c|c}
531: $n$ & law\\
532: \hline
533: 30 & beta(.9,.1)\\
534: 30 & beta(.1,.9)\\
535: 30 & beta(.1,.1)\\
536: 30 & beta(\,2,\,2)\\
537: 30&beta(\,5,\,5)\\
538: 30&beta(\,3,\,5)\\
539: 30&beta(\,5,\,3)\\
540: 30&beta(\,1,\,1)\\
541: \end{tabular}&
542: \begin{tabular}{c|c|c|c}
543: ${\mathcal W_1} $ & ${\mathcal W_2} $ & ${\mathcal Q_1} $ & ${\mathcal Q_2} $\\
544: \hline
545: 107.46 & 90.27 & 195.39 & 143.00\\
546: 540.73 & 6462.03 & 190.82 & 213.45\\
547: 112.66 & 97.04 & 233.50 & 1342.44\\
548: 60.30 & 92.92 & 88.90 & 96.88\\
549: 62.04 & 56.07 & 100.26 & 121.41\\
70.31 & 76.90 & 93.02 & 108.76\\
550: 55.78 & 56.85 & 92.10 & 102.02\\
551: 71.88 & 211.28 & 94.36 & 88.17\\
552: \end{tabular}
553: &
554: \begin{tabular}{c|c|c|c}
555: ${\mathcal W_1} $ & ${\mathcal W_2} $ & ${\mathcal Q_1} $ & ${\mathcal Q_2} $\\
556: \hline
557: 101.83 & 81.05 & 108.59 & 109.85\\
558: 107.80 & 137.26 & 314.53 & 759.57\\
559: 186.70 & 356.91 & 103.39 & 99.98\\
560: 53.71 & 72.06 & 84.92 & 89.11\\
561: 60.08 & 51.82 &89.26 & 100.16\\
562: 61.68 & 66.29 & 86.36 & 95.24\\
563: 55.56& 51.21 & 88.20 & 94.75\\
564: 63.15 & 121.23 & 83.74 & 83.40\\
565: \end{tabular}
566: \end{tabular}
567: }
568: \caption{Relative efficiency of IFS estimators with different set of maps ${\mathcal W_1}$, ${\mathcal W_2}$, ${\mathcal Q_1}$ and ${\mathcal Q_2}$ with respect to the empirical distribution function (i.e. IFS/EDF). Based on 100 Monte Carlo simulation for each distribution. Small sample sizes.}
569: \label{tab:a}
570: \end{table}
571:
572:
573:
574: \begin{table}
575: {\scriptsize
576: \begin{tabular}{c c c}
577: parameters & AMSE & SUP \\
578: \begin{tabular}{c|c}
579: $n$ & law\\
580: \hline
581: 50 & beta(.9,.1)\\
582: 50 & beta(.1,.9)\\
583: 50 & beta(.1,.1)\\
584: 50 & beta(\,2,\,2)\\
585: 50&beta(\,5,\,5)\\
586: 50&beta(\,3,\,5)\\
587: 50&beta(\,5,\,3)\\
588: 50&beta(\,1,\,1)
589: \end{tabular}&
590: \begin{tabular}{c|c|c|c}
591: ${\mathcal W_1} $ & ${\mathcal W_2} $ & ${\mathcal Q_1} $ & ${\mathcal Q_2} $\\
592: \hline
593: 132.67 & 115.10 & 163.33 & 129.24\\
1044.12 & 12573.16 & 181.99 & 180.42\\
306.49 & 1917.23 & 105.68 & 97.27\\
63.03 & 106.56 & 95.35 & 95.66\\
68.94 & 60.19 & 102.22 & 114.92\\
594: 79.98 & 93.80 & 96.20 & 102.32\\
595: 63.13 & 62.21 & 93.59 & 98.47\\
596: 73.47 & 304.41 & 97.24 & 92.19\\
\end{tabular}
597: &
598: \begin{tabular}{c|c|c|c}
599: ${\mathcal W_1} $ & ${\mathcal W_2} $ & ${\mathcal Q_1} $ & ${\mathcal Q_2} $\\
600: \hline
601: 109.18& 88.77 & 103.37 & 101.90\\
602: 421.49 & 991.33 & 104.37 & 123.39\\
214.27 & 430.63 & 100.13 & 98.04\\
603: 58.39 & 80.00 & 89.42 & 89.36\\
604: 66.77 & 55.49 & 91.86 & 97.40\\
605: 66.76 & 77.57 & 91.39 & 93.76\\
606: 62.04 & 55.95 & 90.66 & 93.19\\
607: 62.69 & 150.39 & 87.38 & 86.30\\
\end{tabular}
608: \end{tabular}
609: \par
610: \vspace{12pt}
611: \par
612: \begin{tabular}{c c c}
613: parameters & AMSE & SUP \\
614: \begin{tabular}{c|c}
615: $n$ & law\\
616: \hline
617: 100 & beta(.9,.1)\\
618: 100 & beta(.1,.9)\\
619: 100 & beta(.1,.1)\\
620: 100 & beta(\,2,\,2)\\
621: 100&beta(\,5,\,5)\\
622: 100&beta(\,3,\,5)\\
623: 100&beta(\,5,\,3)\\
624: 100&beta(\,1,\,1)
625: \end{tabular}&
626: \begin{tabular}{c|c|c|c}
627: ${\mathcal W_1} $ & ${\mathcal W_2} $ & ${\mathcal Q_1} $ & ${\mathcal Q_2} $\\
628: \hline
629: 195.54 & 158.80 & 140.55 & 108.27\\
630: 1557.30 & 20324.60 & 135.45 & 125.94\\
631: 554.11 & 3918.62 & 102.67 & 98.29\\
632: 61.63 & 165.60 & 95.58 & 97.46\\
633: 87.97 & 67.79 & 99.28 & 108.21\\
634: 111.30 & 134.54&100.68&103.31\\
635: 61.03 & 57.19 & 97.28 & 101.32\\
636: 67.91 & 558.50 & 97.71 & 94.87\\
637: \end{tabular}
638: &
639: \begin{tabular}{c|c|c|c}
640: ${\mathcal W_1} $ & ${\mathcal W_2} $ & ${\mathcal Q_1} $ & ${\mathcal Q_2} $\\
641: \hline
642: 138.93 & 105.31 & 102.25 & 99.07\\
643: 536.84 & 1267.81 & 103.87 & 106.05\\
644: 304.59 & 625.75 & 99.10 & 98.04\\
645: 57.18 & 98.50 & 92.11 & 93.09\\
646: 78.94 &60.96 &94.83 &96.52\\
647: 79.59 & 100.20 & 95.35 & 95.72\\
648: 65.97 & 55.08 & 94.14 & 95.42\\
649: 58.71 & 201.10 & 90.83 & 89.97\\
650: \end{tabular}
651: \end{tabular}
652: \par
653: \vspace{12pt}
654: \par
655: \begin{tabular}{c c c}
656: parameters & AMSE & SUP \\
657: \begin{tabular}{c|c}
658: $n$ & law\\
659: \hline
660: 250 & beta(.9,.1)\\
661: 250 & beta(.1,.9)\\
662: 250 & beta(.1,.1)\\
663: 250 & beta(\,2,\,2)\\
664: 250&beta(\,5,\,5)\\
665: 250&beta(\,3,\,5)\\
666: 250&beta(\,5,\,3)\\
667: 250&beta(\,1,\,1)\\
668: \end{tabular}&
669: \begin{tabular}{c|c|c|c}
670: ${\mathcal W_1} $ & ${\mathcal W_2} $ & ${\mathcal Q_1} $ & ${\mathcal Q_2} $\\
671: \hline
672: 338.72 & 255.23 & 115.25 & 101.55\\
673: 3979.61 & 50448.13 &117.81 & 105.37\\
674: 1345.72 & 10051.20 & 100.60 & 98.97\\
675: 79.01 & 275.93 & 98.59 & 98.30\\
676: 163.68 & 99.35 &99.07 & 100.54\\
677: 212.17 & 228.58 & 99.45 & 99.69\\
678: 91.32 & 73.31 & 99.05 &99.20\\
679: 69.03 & 1165.61 & 99.47 & 98.46\\
\end{tabular}
680: &
681: \begin{tabular}{c|c|c|c}
682: ${\mathcal W_1} $ & ${\mathcal W_2} $ & ${\mathcal Q_1} $ & ${\mathcal Q_2} $\\
683: \hline
684: 180.29 & 131.97 & 100.68 & 99.43\\
685: 874.65 & 2045.15 & 100.82 & 99.73\\
686: 480.12 & 977.30 & 99.16 & 98.73\\
687: 67.14 & 132.87 & 95.50 & 95.24\\
688: 111.38 & 78.48 & 96.40 & 96.83\\
689: 113.70 & 142.21 & 96.57 & 96.32\\
690: 88.87 & 67.13 & 96.84 & 97.24\\
691: 61.07 & 293.58 & 94.88 & 94.55\\
\end{tabular}
692: \end{tabular}
693: }
694: \caption{Relative efficiency of IFS estimators with different set of maps ${\mathcal W_1}$, ${\mathcal W_2}$, ${\mathcal Q_1}$ and ${\mathcal Q_2}$ with respect to the empirical distribution function (i.e. IFS/EDF). Based on 100 Monte Carlo simulation for each distribution. Moderate to big sample sizes.}
695: \label{tab:b}
696: \end{table}
697:
698: \begin{figure}
699: \includegraphics{miss1}
700: \includegraphics{miss2}
701: \caption{Data from a Beta(2,2) distribution when only the observation in $(.1, .15)\cup (.37, .43) \cup (.7, .8)$ are available to the observer all the other being truncated by the instrument.
702: The observations are marked as vertical ticks. The IFS estimator with $\mathcal W_1$ maps seems to be able to reconstruct the underlying distribution and density function, whistle, for obvious reasons both the edf and the kernel estimators fail. Notice that the arbitrary choice of the window of observation can be changed without substantial loss or gain. In this example the relative efficiency (IFS/EDF) is 7\% for the AMSE and 23\% for the SUP-norm.}
703: \label{fig:missing}
704: \end{figure}
705:
706: \begin{figure}
707: \includegraphics{sup22}
708: \includegraphics{mse22}
709: \caption{Relative efficiency of IFS estimator for different set of maps ${\mathcal W_1}$, ${\mathcal W_2}$, ${\mathcal Q_1}$ and ${\mathcal Q_2}$ with respect to the empirical distribution function. Based on 100 Monte Carlo simulations. SUP-norm up, AMSE bottom.}
710: \label{fig:beta22}
711: \end{figure}
712:
713: \begin{figure}
714: \includegraphics{sup53}
715: \includegraphics{mse53}
716: \caption{Relative efficiency of IFS estimator for different set of maps ${\mathcal W_1}$, ${\mathcal W_2}$, ${\mathcal Q_1}$ and ${\mathcal Q_2}$ with respect to the empirical distribution function. Based on 100 Monte Carlo simulations. SUP-norm up, AMSE bottom.}
717: \label{fig:beta53}
718: \end{figure}
719:
720: \begin{figure}
721: \includegraphics{sup19}
722: \includegraphics{mse19}
723: \caption{Relative efficiency of IFS estimator for different set of maps ${\mathcal W_1}$, ${\mathcal W_2}$, ${\mathcal Q_1}$ and ${\mathcal Q_2}$ with respect to the empirical distribution function. Based on 100 Monte Carlo simulations. SUP-norm up, AMSE bottom.}
724: \label{fig:beta19}
725: \end{figure}
726:
727: \end{document}
728:
729: