0302:math0302016/ifs.tex

1: \documentclass{article}

2:

3: \usepackage{graphicx}

4: \usepackage{amssymb,amsmath}

5:

6: \newcommand{\de}{{\rm d}}

7: \newtheorem{theorem}{Theorem}

8:

9:

10: \title{Nonparametric estimation of distribution and density functions in 
  presence of missing data: an IFS approach}

11:

12: \author{Stefano M. Iacus\footnote{Department of Economics, Via Conservatorio 7, I-20122 Milan - Italy, email: stefano.iacus@unimi.it} \and Davide La Torre}

13:

14: \DeclareGraphicsExtensions{.eps}

15:

16:

17: \begin{document}

18:

19: \maketitle

20:

21: \begin{abstract}

22: % Text of abstract

23: In this paper we consider a class of nonparametric estimators of a distribution function $F$, with compact support, based on the theory of IFSs. The estimator

24: of $F$ is tought as the fixed point of a

25: contractive operator $T$ defined in terms of a vector of parameters $p$ and a family of affine maps $\mathcal W$

26: which can be both depend of the sample $(X_1, X_2, \ldots, X_n)$.

27: Given $\mathcal W$, the problem consists in finding

28: a vector $p$ such that the fixed point of $T$ is ``sufficiently

29: near'' to $F$. It turns out that this is a quadratic constrained optimization problem that we propose to  solve by  penalization techniques.  If $F$ has a density $f$, we can also provide an estimator of $f$ based on Fourier techniques. IFS estimators for $F$ are asymptotically equivalent to the empirical distribution function (e.d.f.) estimator. We will study relative efficiency of the IFS estimators with respect to the e.d.f. for small samples via Monte Carlo approach.

30:

31: For well behaved distribution functions $F$ and for a particular family of so-called wavelet maps

32: the IFS estimators can be dramatically better than the e.d.f. (or the kernel estimator for density estimation) in presence of missing

33: data, i.e. when it is only possibile to observe data on subsets of the whole support of $F$.

34:

35: This research has also produced a free package for the R statistical environment which is ready to be used in applications.

36: \end{abstract}

37:

38: \textbf{key words:} iterated function systems, distribution function estimation,

39: nonparametric estimation, missing data, density estimation.\\

40:

41:

42:

43: % main text

44: \section{Introduction}

45: Let $X_1, X_2, \ldots, X_n$ be an i.i.d.  sample drawn from a

46: random variable $X$ with unknown distribution function

47: $F$ with compact support $[\alpha,\beta]$. The empirical

48: distribution function (e.d.f.) $$ \hat F_n(x) =

49: \frac{1}{n}\sum_{i=1}^n \chi(X_i\leq x) $$ is one commonly used

50: estimator of the unknown distribution function $F$ (here $\chi$ is the indicator function). The e.d.f. has

51: an impressive set of good statistical properties such as it is

52: first order efficient in the minimax sense (see \cite{dkw}, \cite{beran}, \cite{levit},

53: \cite{millar}, and \cite{gilevit}). More or less recently, other

54: second order efficient estimators have been proposed in the

55: literature for special classes of distribution functions $F$.

56: Golubev and Levit (1996a, b) and \cite{efro} are two of such

57: examples. It is rather curious that a step-wise function can be

58: such a good estimator and, in fact, \cite{efro} shows that,

59: for the class of analytic functions, for small sample sizes, the

60: e.d.f. is not the best estimator. In this paper we study the

61: properties of a new class of distribution function estimators

62: based on iterated function systems (IFSs) introduced by the

63: authors in a previous work \cite{iaclat}. IFSs have

64: been introduced in \cite{hutch} and \cite{bd}. The main idea on which this method is based consists of

65: thinking the estimation of $F$ as the fixed point of a contraction $T$

66: on a complete metric space. The operator $T$ is defined in terms

67: of a family of affine  maps $\mathcal W$ and a vector of parameters $p$. For a given family $\mathcal W$, $T$ depends only on the choice $p$. The idea,

68: known as {\it inverse approach} (see Section \ref{sec2})  is to

69: determine $p$ by solving a constrained quadratic optimization

70: problem built in terms of sample moments. In this paper

71: this optimization problem is solved by a penalization method.

72: The nature of affine maps allow to derive easily the Fourier transform of $F$ and, when available, an explicit formula

73: for the density of $F$ via anti Fourier transform.

74: In this way, given $\mathcal W$ and $p$ we have at the same time estimators for the distribution, characteristic and density functions.

75:

76: The paper is organized as follows. In Section \ref{sec2}  the

77: inverse approach is presented and a penalization method is proposed in

78: order to solve a quadratic optimization problem. We also discuss

79: the choice of the family of maps $\mathcal W$. In Section

80: \ref{sec3} numerical results and comparisons with classical

81: estimators are shown for small samples via Monte Carlo Analysis.

82:

83: Finally we show an application of these estimators when the empirical distribution function (or the kernel density estimator for the density) cannot be applied. We will consider situations of missing data when, for example, the data can only be observed on some windows of the support of $F$. This can be the case of directional data analysis when, for some reason, instruments are not able for technical or physical reason to collect data in same range of angles say $A$ and $B$, $A, B\subseteq [0,2\pi]$. For $x$ in $A$ or $B$ the e.d.f. will be constant and, at the same time, the kernel density estimator will estimate a plurimodal distribution for these data.

84: In this case we will show examples in which the IFS estimator does it job incredibly well.

85:

86: Tables and figures can be found at the end of the paper after the references.

87:

88: \section{An IFS estimator}\label{sec2}

89: The theory of distribution function approximation via IFSs we will use to derive estimators is due to  \cite{fv95}. Results from this section, apart from were explicitly mentioned, are from the cited authors. Let $\mathcal M(X)$ be the set of probability measures

90: on $\mathcal B(X)$, the $\sigma$-algebra of Borel subsets of $X$

91: where $(X,d)$ is a compact metric space (in our case will be

92: $X=[\alpha,\beta]$ and $d$ the Euclidean metric.)

93:

94: In the IFSs literature the following {\sl Hutchinson} metric plays

95: a crucial role $$ d_H(\mu,\nu) = \sup_{f\in {\rm Lip}(X)} \left\{

96: \int_X f \de \mu - \int_X f \de \nu \right\}, \quad \mu,\nu \in

97: \mathcal M(X) $$ where $$ {\rm Lip}(X) = \{ f: X\to \mathbb R,

98: |f(x)-f(y)|\leq d(x,y), x,y \in X\} $$ thus $(\mathcal M(X),d_H)$

99: is a complete metric space \cite[see][]{hutch}.

100:

101: We denote by $({\bf w},{\bf p})$ an {\sl $N$-maps

102: contractive IFS on $X$ with probabilities} or simply an {\sl

103: $N$-maps IFS}, that is, a set of $N$ affine contraction maps,

104: ${\bf w} = (w_1,w_2,\ldots,w_N)$, $$w_i = a_i + b_i \, x,\quad {\rm with}\,\, |b_i|<1,\quad

105: b_i,a_i\in\mathbb R,\quad i=1,2,\ldots, N, $$ with

106: associated probabilities ${\bf p} = (p_1,p_2,\ldots,p_N)$,

107: $p_i\geq 0$, and $\sum_{i=1}^N p_i =1$. The IFS has a

108: contractivity factor defined as $$ c = \max_{1\leq i\leq N}

109: |b_i| <1 $$ Consider the following (usually called {\sl

110: Markov}) operator $M : \mathcal M(X)\to \mathcal M(X)$ defined as

111:

112: \begin{equation}

113: M\mu = \sum_{i=1}^N p_i \mu \circ w_i^{-1},\quad \mu \in \mathcal

114: M(X),

115: \end{equation}

116:

117: where $w_i^{-1}$ is the inverse function of $w_i$ and $\circ$

118: stands for the  composition. In Hutchinson (1981) it was shown

119: that $M$ is a contraction mapping on $(\mathcal M(X),d_H)$ i.e.

120: for all $\mu,\nu\in \mathcal M(X)$, $d_H(M\mu,M\nu)\leq c

121: d_H(\mu,\nu)$. Thus, there exists a unique measure

122: $\bar\mu\in\mathcal M(X)$, the {\sl invariant measure} of the IFS,

123: such that $M\bar\mu = \bar \mu$ by Banach theorem. Associated to

124: each measure $\mu\in \mathcal M(X)$, there exists a distribution

125: function $F$. In terms of it the previous operator $M$ can be

126: rewritten as

127: $$ TF(x)=\left\{

128: \begin{array}{ll} 0 & \ \ \mbox{if $x\le \alpha$}\\

129: \\

130: \sum\limits_{i=1}^N p_i F(w_i^{-1}(x)) & \ \ \mbox{if $\alpha<x<\beta$}\\

131: \\

132: 1 & \ \ \mbox{if $x\ge \beta$}\\

133:

134: \end{array}

135: \right. $$

136:

137: \subsection{Minimization approach}

138:

139: For affine IFSs there exists a simple and useful  relation between

140: the moments of probability measures  on $\mathcal M(X)$. Given a

141: $N$-maps IFS$({\bf w},{\bf p})$ with associated Markov operator

142: $M$, and given a measure $\mu\in\mathcal M(X)$  then, for any

143: continuous function $f:X\to\mathbb R$,

144:

145: \begin{equation}

146: \int_X f(x) \de \nu(x) = \int_X f(x) \de(M\mu)(x) = \sum_{i=1}^N

147: p_i \int_X (f\circ w_i)(x)\de \mu(x)\,, \label{eq1}

148: \end{equation}

149:

150: where $\nu = M\mu$. In our case $X = [\alpha,\beta]\subset\mathbb R$  so we

151: readly have a relation involving the moments of $\mu$ and $\nu$.

152: Let

153: \begin{equation}

154: g_k = \int_X x^k \de\mu,\quad h_k = \int_X x^k \de \nu,\quad

155: k=0,1,2,\ldots,

156: \end{equation}

157: be the moments of the two measures, with $g_0 = h_0 = 1$. Then, by

158: \eqref{eq1}, with $f(x) = x^k$, we have $$ h_k = \sum_{j=0}^k

159: \binom{k}{j} \left\{\sum_{i=1}^N p_i b_i^j a_i^{k-j}

160: \right\} g_j,\quad k=1,2,\ldots,\,. $$

161: Set $X=[\alpha,\beta]$ and let $\mu$ and $\mu^{(j)} \in \mathcal M(X)$,

162: $j=1,2,\ldots$ with associated moments of any order $g_k$ and $$

163: g_k^{(j)}  = \int_X x^k \de \mu^{(j)}\,. $$ Then, the following

164: statements are equivalent (as $j\to\infty$  and $\forall k\geq

165: 0$):

166: \begin{enumerate}

167: \item $g_k^{(j)}\to g_k$,

168: \item $\forall f\in {\bf C}(X)$, $\int_X f \de\mu^{(j)} \to \int_X f\de\mu\,,$ (weak* convergence),

169: \item $d_H(\mu^{(j)},\mu)\to0$.

170: \end{enumerate}

171:

172: (here ${\bf C}(X)$ is the space of continuous functions on $X$).

173: This result gives a way to find and appropriate set of maps and

174: probabilities by solving the so called problem of moment matching.

175: With the solution in hands, given the convergence of the moments,

176: we also have the convergence of the measures and then the

177: stationary measure of $M$ approximates with given precision (in a

178: sense specified by the collage theorem below) the target measure

179: $\mu$ \cite[see][]{bd}.

180:

181: Next result, called the {\sl collage} theorem is a standard

182: product  of the IFS theory and is a consequence of Banach theorem.

183:

184: {\bf (Collage theorem) : }

185: Let $(Y,d_Y)$ be a complete metric space. Given an $y\in Y$,

186: suppose that there  exists a contractive map $f$ on $Y$ with

187: contractivity factor $0\leq c<1$ such that

188: $d_Y(y,f(y))<\varepsilon$. If $\bar y$ is the fixed point of $f$,

189: i.e. $f(\bar y) = \bar y$, then $d_Y(\bar y,y) <

190: \frac{\varepsilon}{1-c}$.

191:

192:

193: So if one wishes to approximate  a function $y$ with the fixed

194: point $\bar y$ of an unknown contractive map $f$, it is only

195: needed to solve the inverse problem of finding $f$ which minimizes

196: the collage distance $d_Y(y,f(y))$.

197:

198: The main result in Forte and Vrscay that we will use to build one

199: of the IFS estimators is that the inverse problem can be reduced

200: to minimize a suitable quadratic form in terms of the $p_i$ given

201: a set of affine maps $w_i$ and the sequence of moments $g_k$ of

202: the target measure. Let $$\Pi^N = \left\{ {\bf p} =

203: (p_1,p_2,\ldots,p_N) : p_i\geq 0, \sum_{i=1}^N p_i = 1 \right\} $$

204: be the simplex of probabilities. Let ${\bf w} =

205: (w_1,w_2,\ldots,w_N)$, $N=1,2,\ldots$ be  subsets of $\mathcal W =

206: \{w_1,w_2,\ldots\}$ the infinite set of affine contractive maps on

207: $X=[\alpha,\beta]$ and let ${\bf g}$ the set of the moments of any order of

208: $\mu\in\mathcal M(X)$. Denote by $M$ the Markov operator of the

209: $N$-maps IFS $({\bf w},{\bf p})$ and by  $\nu_N = M\mu$, with

210: associated moment vector of any order ${\bf h}_N$. The collage

211: distance between the moment vector of $\mu$ and $\nu_N$ $$

212: \Delta({\bf p}) = ||{\bf g}-{\bf h}_N||_{\bar l^2} : \Pi^N \to

213: \mathbb R $$ is a continuous function and attains an absolute

214: minimum value $\Delta_{\min}$ on $\Pi^N$ where $$ ||{\bf

215: x}||_{\bar l^2} = x_0^2 + \sum_{k=1}^\infty \frac{x_k^2}{k^2}\,.$$

216: Moreover,

217: $\Delta^N_{\min} \to 0$ as $N\to\infty$.

218: Thus, the collage distance can be made arbitrarily small by

219: choosing a suitable number of maps and probabilities.

220:

221: The above inverse problem can be posed as a

222: quadratic programming one in the following notation $$ S({\bf p})

223: = (\Delta({\bf p}))^2 = \sum_{k=1}^\infty \frac{(h_k-g_k)^2}{k^2}

224: $$ $$D(X) = \{{\bf g} = (g_0,g_1,\ldots) : g_k = \int_X x^k

225: \de\mu, k=0,1,\ldots, \mu\in\mathcal M(X)\} $$

226:

227: Then by \eqref{eq1} there exists a linear operator $A :D(X)\to

228: D(X)$ associated to $M$ such that ${\bf h}_N = A {\bf g}$. In

229: particular

230: \begin{equation}

231: h_k = \sum_{i=1}^N A_{ki} p_i,\quad k=1,2,\ldots

232: \quad\text{where} \quad

233: A_{ki} = \sum_{j=0}^\infty \binom{k}{j} b_i^j a_i^{k-j} g_j

234: \label{eqAik}

235: \end{equation}

236: Thus $$ S({\bf p}) = {\bf p}^t Q {\bf p} + {\bf B}^t {\bf p}

237: +C,\leqno{({\bf Q})} $$

238: $$\text{where}\quad Q=[q_{ij}],\quad   q_{ij} = \sum_{k=1}^\infty

239: \frac{A_{ki} A_{kj}}{k^2},\quad i,j = 1,2,\ldots, N, $$

240: \begin{equation}

241: B_i =

242: -2\sum_{k=1}^\infty \frac{g_k}{k^2} A_{ki},\quad i=1,2,\ldots, N

243: \quad \text{and}\quad C =\sum_{k=1}^\infty \frac{g_k^2}{k^2}\,.

244: \label{eqBiC}

245: \end{equation}

246: The series above are convergent as $0\leq A_{ni}\leq 1$ and the

247: minimum can be found by minimizing the quadratic form on the

248: simplex $\Pi^N$.

249:

250: The estimator will then be built by substituting the moments of the target measure with the empirical moments and by truncation of the above series to a finite sum.

251:

252: \subsection{Numerical solutions}

253: When practical cases are considered, in particular concerning

254: estimation, the previous series have to be truncated and this

255: implies that the matrix $Q$ is assured to be definite positive. Standard

256: numerical procedures for the minimization of constrained quadratic

257: optimization problems involving positive definite quadratic forms

258: cannot be used in this context. To solve this problem an approach

259: is to build the following penalized function $L_\lambda$

260:

261: $$ L_\lambda({\bf p})={\bf p}^t Q {\bf p} + {\bf B}^t {\bf p}

262: +C+\lambda\left(1-\sum_{i=1}^N p_i\right)^2 $$

263:

264: and then to study the following problem

265:

266: $$ \min L_\lambda({\bf p}), \ \  0\le p_i\le 1  \leqno{(LOP)} $$

267:

268: It is trivial that an optimizer ${\bf p^*}$ of (LOP) such that

269: $\sum_{i=1}^N p_i^*=1$ is also an optimizer for the problem

270:

271: $$ \min S({\bf p}), \ \  {\bf p}\in \Pi^N  \leqno{(OP)} $$

272:

273: For solving (LOP) numerically, we have used the method L-BFGS-B due to \cite{byrd} which allows to minimize a nonlinear function with box

274: constraints, i.e. when each variable can be given a lower and/or

275: upper bound. The initial value of this procedure must satisfy the

276: constraints. This uses a limited-memory modification of the BFGS

277: quasi-Newton method. The method `''BFGS''' is a quasi-Newton method

278: (also known as a variable metric algorithm).

279:

280:

281: \subsection{The choice of affine maps}

282:

283: As we are mostly concerned with estimation, we briefly discuss the

284: problem of choosing the maps. In \cite{fv95} the

285: following two sets of wavelet-type maps are proposed. Fixed and

286: index $i^*\in\mathbb N$, define

287:

288: $$ \gamma_{ij} = \frac{x-\alpha+(j-1)(\beta-\alpha)}{2^i}+\alpha,\quad i=1,2,\ldots,

289: i^* \quad j = 1,2,\ldots, 2^i $$ and $$ \eta_{ij} =

290: \frac{x-\alpha+(j-1)(\beta-\alpha)}{i},\quad i=2,\ldots, i^* \quad j = 2,\ldots,

291: i\,. $$ Then set $N = \sum\limits_{i=1}^{i^*} 2^i$ or

292: $N=i^*(i^*-1)/2$ respectively. To choose the maps, consider the

293: natural ordering of the maps $\omega_{ij}$ and operate as follows

294: $$ \mathcal W_1 =\{ w_1 = \gamma_{11}, w_2 = \gamma_{12}, w_3 =

295: \gamma_{21}, \ldots, w_6 =\gamma_{24}, w_7 = \gamma_{31}, \ldots,

296: w_{N}=\gamma_{i^*2^{i^*}}\} $$ and $$\mathcal W_2 =\{ w_1 =

297: \eta_{22}, w_2 = \eta_{32}, w_3 = \eta_{33}, w_4 =\eta_{42},

298: \ldots, w_6 = \eta_{44}, \ldots, w_{N}=\eta_{i^*i^*}\} $$

299: respectively. In \cite{iaclat} we proposed the

300: following quantile based maps $$\mathcal Q_1 =\{ w_i (x) =

301: (q_{i+1}-q_i) x + q_i, i=1,2,\ldots, N\}$$ where $q_i =

302: F^{-1}(u_i)$, and $0=u_1 < u_2 < \ldots < u_{N} < u_{N+1} = 1$ are

303: $N+1$ equally spaced points on $[0,1]$.

304: With these maps, it has been shown that, there is no need to use a moment matching approach. In particular, given $p_i=1/N$, the IFSs turns out to be a smoother of the e.d.f. and so it has nice small sample and asymptotic statistical properties (see cited reference) even for non compact support distribution functions $F$.

305: Here we will also mix the quantile information with the moment matching idea. To distinguish the two cases (fixed $p_i=1/N$ or $p$ solution of $({\bf QP})$) we will use the notation $\mathcal Q_1$ and $\mathcal Q_2$ later on.

306:

307:

308: \subsection{Fourier analysis results}

309: We recall, from \cite{fv98}  results that are rather straight forward to prove but also essential  to us since we will use them in density estimation and in particular in presence of missing

310: data. Simplicity is due to affinity of the maps. We assume that the support of the measures is $X= [0,1]$ without loss of generality.

311:

312: Given a measure $\mu\in\mathcal M(X)$, the Fourier transform (FT)  $\phi : \mathbb R \to \mathbb C$, where $\mathbb C$ is the complex space, is defined by the relation

313: $$

314: \phi(t) = \int_X e^{-itx} \de \mu(x),\quad t\in\mathbb R\,,

315: $$

316: with the well known properties $\phi(0) =1$ and $|\phi(t)|\leq 1$, $\forall\, t\in\mathbb R$.

317: It can be shown that the space of characteristic functions ${\mathcal FT}(X)$ can be made metric and complete with an opportune metric.

318: Thus, given a $N$-maps affine IFS$({\bf w},{\bf p})$  it is possibile to define a new linear operator $B: {\mathcal FT}(X)\to {\mathcal FT}(X)$ whose unique fixed point

319: reads as

320: $$

321: \bar\phi(t) = \sum_{k=1}^N p_k e^{-i t a_k} \bar\phi(b_k t),\quad t\in\mathbb R\,.

322: $$

323: This $\bar\phi(t)$ is the FT of the fixed point of the $N$-maps affine IFS$({\bf w},{\bf p})$.

324: Now \cite[see e.g.][]{tarter}, suppose that the target distribution $F$ admits a density $f$. It is possible to write the density $f$ via Fourier expansion. In fact,

325: $$

326: \phi(t) =\int_0^1 f(x)  e^{-itx}  \de x = \int_0^1 e^{-itx} \de F(x)

327: $$

328: thus

329: $$

330: f(x) = \frac{1}{2\pi}\sum_{k=-\infty}^{+\infty} B_k e^{ikx}

331: \quad\text{where}\quad B_k  = \phi(k)\,.

332: $$

333:

334: \section{Relative efficiency and estimation in presence of missing data}\label{sec3}

335: Suppose to have an i.i.d. sample on $n$ observations with common unknown distribution function $F$ with compact support on $[\alpha,\beta]$ which has all the moments up to order $M$. An IFS estimator of $F$ is the fixed point of the functional $TF$ where the $N$ maps are choosen in advance and the $p_i$ are the solution of the ({\bf QP}) quadratic programming problem where in the expression on $A_{ik}$, $B_i$ and $C$ we replace, in equations \eqref{eqBiC}

336:  and \eqref{eqAik},  the true moments $g_k$ with the sample moments $m_k$, $k=0,1,\ldots, M$ for a fixed $M$ and we consider the first $M$ terms of the series involved.

337:

338: Given the solution of ({\bf QP}), we have an estimator for $F$ and an estimator for the characteristic function of $F$, say $\hat \phi$. Suppose that $F$ posseses a density $f$ then we have further a (Fourier) density estimator for $f$

339: $$

340: \begin{aligned}

341: \hat f(x) &=  \frac{1}{2\pi}\sum_{k=-m}^{+m} \hat B_k e^{ikx}\\

342: &=

343: \frac{1}{2\pi} + \frac{1}{\pi} \sum_{i=1}^m \biggl\{{\rm Re}(\hat B_k)\,\cos(kx) -

344:  {\rm Im}(\hat B_k)\,\sin(kx)\biggr\}

345: \end{aligned}

346: $$

347: where $\hat B_k = \hat \phi(k)$ and $m$, the number of Fourier terms, is choosen in the usual way,

348: i.e.

349: $$

350: \text{if} \, \left | \hat B_{m+1} \right |^2 \text{and}   \left | \hat B_{m+2} \right |^2 < \frac{2}{n+1}

351: \quad\text{then use the first $m$ coefficents}

352: $$

353: \cite[see again][]{tarter}.

354: Tables \ref{tab:a} and \ref{tab:b} show camparisons between the empirical cumulative distribution function $\hat F_n$ and the IFS estimator, say $\hat T_N$,  for some target distributions $F$, in terms of average mean square error (AMSE) and sup-norm (SUP) distance.

355: These tables contain Monte Carlo analysis where 100 simulations have been done for each target distribution.

356: Tables report the average ratio of the sup norm (and AMSE) of the IFSs over the corresponding sup norm (respectively AMSE) of the empirical distribution function.

357:

358: It is possible to notice that the IFS estimator based on maps $\mathcal W_1$ has good properties for  symmetric bell-shaped distributions and distributions with not so heavy tails (see also Figure \ref{fig:beta22}).

359: It is also evident the asymptotic equivalence of the IFSs to the e.d.f. when quantile maps are used.

360: Remark that, for $\mathcal W_1$ we have decided to use 62 maps, for $\mathcal W_2$ 28 maps and $n/2$ quantiles for the quantiles maps $\mathcal Q_1$ and $\mathcal Q_2$.

361: So it is evident that for wavelet-type maps an adjustment can be done by choosing a suitable number of maps in terms of the sample size $n$.

362:

363:

364:

365: \subsection{What if data are missing?}

366: Suppose now that the for some reason, the $n$ sample observations from $F$ are in fact a subset of a biggest sample, of unknown size. In practice we do not observe the data on the whole support of $F$ $[\alpha,\beta]$ but only on some windows. This sample reduction has happened due to some sort of censoring. So we are in presence of missing data when we do not know how many data are missing and where exactly they were missed, i.e. we are not in a classical censoring setup. A motivation for this scheme of (non)-observation is the following: suppose one wants to estimate the distribution of the angle of the wind registered by some instruments in degrees (0,360). For some reason, data from angles (15,37) and (62,79) are missing for technical failures or physical obstacles. In this case the empirical distribution function will be flat on these windows and a kernel density estimator will probably show a bimodal behaviour.

367:

368: Heuristically, this is due to the fact that quantile estimation is inappropriate in this context. At the same time, moments estimation tend to be more robust, in particular if the distribution is symmetric.

369: We only report a graphical example of what can happen. Figure \ref{fig:missing} is about a sample from a Beta(2,2) distribution when only the observation in $(.1, .15)\cup (.37, .43) \cup (.7, .8)$ are available to the observer all the other being truncated by the instrument (we have choose this interval by hazard). The IFS estimator with $\mathcal W_1$ maps seems to be able to reconstruct the underlying distribution and density function, whistle, for obvious reasons both the e.d.f. and the kernel estimators fail. In this example the relative efficiency (IFS/EDF) is 7\% for the AMSE and 23\% for the SUP-norm which is dramatically better than expected!

370:

371:

372: \subsection{Algorithm flow for estimation}

373: \begin{enumerate}

374: \item calculate sample moments

375: \item choose the family of maps $\mathcal W$

376: \item build the quadratic form and solve it for $p$

377: \item if you want to estimate $F$ at point $x$: take any distribution function, for example the uniform over $[\alpha,\beta]$ and start to iterate $T$

378: \item stop after few iteration (normally 5 is enough)

379: \item the ``fixed point'' of $T$ evaluated in $x$ is the estimate of $F(x)$

380: \end{enumerate}

381: In case the support of $F$ is not known one case use the range of the sample but the resulting IFS estimator will then try to approximate a distribution function which has exactly that support.

382: If any hints on the shape of the distribution $F$ is available, use it to choose the maps.

383:

384:

385: All the examples, tables and graphics have been done by some software developed by the authors. In particular, a package called \texttt{ifs} is freely available for the R environment system \cite{R} in the CRAN (Comprehensive R Archive Network) \texttt{http://cran.R-project.org} under the \textsl{contributed} section.

386:

387: \section*{Conclusions}

388: It seems that this kind of approach can be used  to make nonparametric inference when data are missing or sample size are small.

389: Remark that with this method  it is only possible to work with distributions with compact support. Moreover, a knowledge on the support itself it is needed.

390: Neverthless, it seams a promising approach and the use of different sets of maps merits further investigation.

391:

392: % The Appendices part is started with the command \appendix;

393: % appendix sections are then done as normal sections

394: % \appendix

395:

396: % \section{}

397: % \label{}

398:

399: % Bibliographic references with the natbib package:

400: % Parenthetical: \cite{Bai92} produces (Bailyn 1992).

401: % Textual: \cite{Bai95} produces Bailyn et al. (1995).

402: % An affix and part of a reference:

403: %   \cite[e.g.][Ch. 2]{Bar76}

404: %   produces (e.g. Barnes et al. 1976, Ch. 2).

405:

406:

407: \begin{thebibliography}{}

408:

409: % \bibitem[Names(Year)]{label} or \bibitem[Names(Year)Long names]{label}.

410: % (\harvarditem{Name}{Year}{label} is also supported.)

411: % Text of bibliographic item

412:

413: \bibitem{bd} Barnsley, M.F., Demko, S.,  ``Iterated function systems and the

414: global construction of fractals'', {\sl Proc. Roy. Soc. London, Ser A}, {\bf 399}, 243-275, 1985.

415: \bibitem{beran} Beran, R., ``Estimating a distribution function'', {\sl Ann. Statist.}, 5, 400-404, 1977.

416: \bibitem{byrd} Byrd, R. H., Lu, P., Nocedal, J. and Zhu, C.  ``A limited memory algorithm for bound constrained optimization'', {\sl SIAM J. Scientific

417: Computing}, 16, 1190-1208, 1995.

418: \bibitem{dkw} Dvoretsky, A., Kiefer, J. and Wolfowitz, J., ``Asymptotic minimax character of the sample distribution function and of the classical multinomial estimators'', {\sl Ann. Math. Statist.}, 27, 642-669, 1956.

419: \bibitem{efro}  Efromovich, S., ``Second order efficient estimating a smooth distribution function and its applications'', {\sl Meth. Comp. App. Probab.}, 3, 179-198, 2001.

420: \bibitem{fv95} Forte, B., Vrscay, E.R.,  ``Solving the inverse problem for function/image approximation using iterated function systems, I. Theoretical basis'', {\sl Fractal}, {\bf 2}, 3, 325-334, 1995.

421: \bibitem{fv98} Forte, B., Vrscay, E.R.,  ``Inverse problem methods for generalized fractal transforms'', in

422: {\sl Fractal Image Encoding and Analysis}, NATO ASI Series F, Vol. 159, ed. Y. Fisher, Springer Verlag, Heidelberg, 1998.

423: \bibitem{gilevit}  Gill, R. D., Levit, B. Y., ``Applications of the van Trees inequality: A Bayesian Cram\'er-Rao bound'', {\sl Bernoulli}, 1, 59-79, 1995.

424: \bibitem{gl96a}  Golubev, G. K., Levit, B. Y., ``On the second order minimax estimation of distribution functions'', {\sl Math. Methods. Statist.}, 5, 1-31, 1996a.

425: \bibitem{gl96b}  Golubev, G. K., Levit, B. Y., ``Asymptotic efficient estimation for analytic distributions'', {\sl Math. Methods. Statist.}, 5, 357-368, 1996b.

426: \bibitem{hutch} Hutchinson, J.,  ``Fractals and self-similarity'', {\sl Indiana Univ.

427: J. Math.}, {\bf 30}, 5, 713-747, 1981.

428: \bibitem{iaclat} Iacus, S.M., La Torre, D., ``Approximating distribution functions by iterated function systems

429: and applications'', {\sl Proceedings of the S.I.M.A.I. Conference},

430: Chia Laguna, Italy, May 2002 (CDROM). Submitted.

431: \bibitem{R}  Ihaka, R., Gentleman, R., ``R: A Language for Data

432: Analysis and Graphics'', {\em Journal of Computational and Graphical

433: Statistics}, 5, 299-314, 1996.

434: \bibitem{levit} Levit, B.Y., ``Infinite-dimensional information inequalities'', {\sl Theory Probab. Applic.}, 23, 371-377, 1978.

435: \bibitem{millar}  Millar, P.W., ``Asymptotic minimax theorems for sample distribution functions'', {\sl Z. Warsch. Verb. Geb.}, 48, 233-252, 1979.

436: \bibitem{tarter} Tarter, M.E. and Lock, M.D, Model free curve estimation, Chapman \& Hall, New York, 1993.

437: \end{thebibliography}

438:

439:

440: \eject

441:

442: \begin{table}

443: {\scriptsize

444: \begin{tabular}{c c c}

445: parameters & AMSE & SUP \\

446: \begin{tabular}{c|c}

447: $n$ & law\\

448: \hline

449: 10 & beta(.9,.1)\\

450:  10    & beta(.1,.9)\\

451:   10  & beta(.1,.1)\\

452:  10 & beta(\,2,\,2)\\

453: 10&beta(\,5,\,5)\\

454: 10&beta(\,3,\,5)\\

455: 10&beta(\,5,\,3)\\

456: 10&beta(\,1,\,1)

457: \end{tabular}&

458: \begin{tabular}{c|c|c|c}

459: ${\mathcal W_1} $ & ${\mathcal W_2} $ & ${\mathcal Q_1} $ & ${\mathcal Q_2} $\\

460: \hline

461: 81.08 & 77.05 & 203.53 & 149.68\\

462: 211.78 & 2024.68 & 203.39 & 258.88\\

463: 118.27 & 416.17 & 182.88 & 104.07\\

464: 56.47 & 80.53 & 67.68 & 112.46\\

465: 52.77 &57.90 &110.35 & 152.29\\

466: 55.95 & 71.07 & 99.92 & 142.52\\

467: 52.50 & 57.34 & 91.75 & 131.37\\

468: 73.35 & 119.04 & 79.01 & 102.04\\

469: \end{tabular}

470: &

471: \begin{tabular}{c|c|c|c}

472: ${\mathcal W_1} $ & ${\mathcal W_2} $ & ${\mathcal Q_1} $ & ${\mathcal Q_2} $\\

473: \hline

474: 85.76 & 75.44 & 110.11 &110.81\\

475: 175.32 & 441.32 & 114.51 & 161.55\\

476: 114.87 & 192.94 & 119.57 & 106.56\\

477: 53.31 & 69.24 & 70.36 & 98.21\\

478: 53.99 & 54.83 & 81.61 & 125.67\\

479: 51.93 & 60.58 & 81.72 & 116.79\\

480: 51.74 & 52.47 & 77.97 & 109.84\\

481: 65.63 & 90.40 & 70.89 & 90.85\\

482: \end{tabular}

483: \end{tabular}

484: \par

485: \vspace{12pt}

486: \par

487: \begin{tabular}{c c c}

488: parameters & AMSE & SUP \\

489: \begin{tabular}{c|c}

490: $n$ & law\\

491: \hline

492: 20 & beta(.9,.1)\\

493:  20    & beta(.1,.9)\\

494:   20  & beta(.1,.1)\\

495:  20 & beta(\,2,\,2)\\

496: 20&beta(\,5,\,5)\\

497: 20&beta(\,3,\,5)\\

498: 20&beta(\,5,\,3)\\

499: 20&beta(\,1,\,1)

500: \end{tabular}&

501: \begin{tabular}{c|c|c|c}

502: ${\mathcal W_1} $ & ${\mathcal W_2} $ & ${\mathcal Q_1} $ & ${\mathcal Q_2} $\\

503: \hline

504: 94.69 &85.25 &201.85 &169.78\\

505: 388.83 & 4183.36 & 203.70 & 195.36\\

506: 154.1 & 690.08 & 125.35 & 97.53\\

507: 61.46 & 93.37 &  85.46 &  95.49\\

508: 54.31 & 52.89 & 105.84 & 131.84\\

509: 60.42 & 67.33 & 93.30 &118.51\\

510: 53.82 & 57.72 & 92.26 & 114.84\\

511: 95.93 & 89.79 & 71.66 &  154.54\\ 
\end{tabular}

512: &

513: \begin{tabular}{c|c|c|c}

514: ${\mathcal W_1} $ & ${\mathcal W_2} $ & ${\mathcal Q_1} $ & ${\mathcal Q_2} $\\

515: \hline

516: 90.30 & 79.92 & 105.02 & 123.28\\

517: 257.13 & 612.55 & 109.10 & 122.99\\

518: 139.65 & 255.26 & 103.56 & 99.28\\

519: 55.34 & 73.95 & 84.42 & 91.38\\

520: 53.76 & 48.73 & 85.85 & 106.27\\

521: 55.98 & 60.88 & 85.39 & 101.16\\

522: 53.46 & 52.20 & 85.23 & 102.85\\

523: 63.20 & 106.95 & 81.56 & 82.54\\
\end{tabular}

524: \end{tabular}

525: \par

526: \vspace{12pt}

527: \par

528: \begin{tabular}{c c c}

529: parameters & AMSE & SUP \\

530: \begin{tabular}{c|c}

531: $n$ & law\\

532: \hline

533: 30 & beta(.9,.1)\\

534:  30    & beta(.1,.9)\\

535:   30  & beta(.1,.1)\\

536:  30 & beta(\,2,\,2)\\

537: 30&beta(\,5,\,5)\\

538: 30&beta(\,3,\,5)\\

539: 30&beta(\,5,\,3)\\

540: 30&beta(\,1,\,1)\\

541: \end{tabular}&

542: \begin{tabular}{c|c|c|c}

543: ${\mathcal W_1} $ & ${\mathcal W_2} $ & ${\mathcal Q_1} $ & ${\mathcal Q_2} $\\

544: \hline

545: 107.46 & 90.27 & 195.39 & 143.00\\

546: 540.73 & 6462.03 & 190.82 & 213.45\\

547: 112.66 & 97.04 &  233.50 & 1342.44\\

548: 60.30 & 92.92 & 88.90 & 96.88\\

549: 62.04 & 56.07 & 100.26 & 121.41\\
70.31 & 76.90 & 93.02 & 108.76\\

550: 55.78 & 56.85 & 92.10 & 102.02\\

551: 71.88 & 211.28 & 94.36 &  88.17\\

552: \end{tabular}

553: &

554: \begin{tabular}{c|c|c|c}

555: ${\mathcal W_1} $ & ${\mathcal W_2} $ & ${\mathcal Q_1} $ & ${\mathcal Q_2} $\\

556: \hline

557: 101.83 & 81.05 & 108.59 & 109.85\\

558: 107.80 & 137.26 & 314.53 & 759.57\\

559: 186.70 & 356.91 & 103.39 & 99.98\\

560: 53.71 & 72.06 &  84.92 & 89.11\\

561: 60.08 & 51.82 &89.26 & 100.16\\

562: 61.68 & 66.29 & 86.36 & 95.24\\

563: 55.56& 51.21 & 88.20 & 94.75\\

564: 63.15 & 121.23 & 83.74 & 83.40\\

565: \end{tabular}

566: \end{tabular}

567: }

568: \caption{Relative efficiency of IFS estimators with different set of maps  ${\mathcal W_1}$, ${\mathcal W_2}$,  ${\mathcal Q_1}$ and ${\mathcal Q_2}$ with respect to the empirical distribution function (i.e. IFS/EDF).  Based on 100 Monte Carlo simulation for each distribution. Small sample sizes.}

569: \label{tab:a}

570: \end{table}

571:

572:

573:

574: \begin{table}

575: {\scriptsize

576: \begin{tabular}{c c c}

577: parameters & AMSE & SUP \\

578: \begin{tabular}{c|c}

579: $n$ & law\\

580: \hline

581: 50 & beta(.9,.1)\\

582:  50    & beta(.1,.9)\\

583:   50  & beta(.1,.1)\\

584:  50 & beta(\,2,\,2)\\

585: 50&beta(\,5,\,5)\\

586: 50&beta(\,3,\,5)\\

587: 50&beta(\,5,\,3)\\

588: 50&beta(\,1,\,1)

589: \end{tabular}&

590: \begin{tabular}{c|c|c|c}

591: ${\mathcal W_1} $ & ${\mathcal W_2} $ & ${\mathcal Q_1} $ & ${\mathcal Q_2} $\\

592: \hline

593: 132.67 & 115.10 & 163.33 & 129.24\\
1044.12 & 12573.16 & 181.99 & 180.42\\
306.49 & 1917.23 & 105.68 & 97.27\\
63.03 & 106.56 & 95.35 & 95.66\\
68.94 & 60.19 & 102.22 & 114.92\\

594: 79.98 & 93.80 & 96.20 & 102.32\\

595: 63.13 & 62.21 & 93.59 & 98.47\\

596: 73.47 & 304.41 & 97.24 & 92.19\\
\end{tabular}

597: &

598: \begin{tabular}{c|c|c|c}

599: ${\mathcal W_1} $ & ${\mathcal W_2} $ & ${\mathcal Q_1} $ & ${\mathcal Q_2} $\\

600: \hline

601: 109.18& 88.77 & 103.37 & 101.90\\

602: 421.49 & 991.33 & 104.37 & 123.39\\
214.27 & 430.63 & 100.13 &  98.04\\

603: 58.39 & 80.00 & 89.42 & 89.36\\

604: 66.77 & 55.49 & 91.86 & 97.40\\

605: 66.76 & 77.57 & 91.39 & 93.76\\

606: 62.04 & 55.95 & 90.66 & 93.19\\

607: 62.69 & 150.39 & 87.38 & 86.30\\
\end{tabular}

608: \end{tabular}

609: \par

610: \vspace{12pt}

611: \par

612: \begin{tabular}{c c c}

613: parameters & AMSE & SUP \\

614: \begin{tabular}{c|c}

615: $n$ & law\\

616: \hline

617: 100 & beta(.9,.1)\\

618:  100    & beta(.1,.9)\\

619:   100  & beta(.1,.1)\\

620:  100 & beta(\,2,\,2)\\

621: 100&beta(\,5,\,5)\\

622: 100&beta(\,3,\,5)\\

623: 100&beta(\,5,\,3)\\

624: 100&beta(\,1,\,1)

625: \end{tabular}&

626: \begin{tabular}{c|c|c|c}

627: ${\mathcal W_1} $ & ${\mathcal W_2} $ & ${\mathcal Q_1} $ & ${\mathcal Q_2} $\\

628: \hline

629: 195.54 & 158.80 & 140.55 & 108.27\\

630: 1557.30 & 20324.60 & 135.45 & 125.94\\

631: 554.11 & 3918.62 & 102.67 & 98.29\\

632: 61.63 & 165.60 & 95.58 & 97.46\\

633: 87.97 & 67.79 & 99.28 & 108.21\\

634: 111.30 & 134.54&100.68&103.31\\

635: 61.03 & 57.19 & 97.28 & 101.32\\

636: 67.91 & 558.50 & 97.71 & 94.87\\

637:  \end{tabular}

638: &

639: \begin{tabular}{c|c|c|c}

640: ${\mathcal W_1} $ & ${\mathcal W_2} $ & ${\mathcal Q_1} $ & ${\mathcal Q_2} $\\

641: \hline

642: 138.93 & 105.31 & 102.25 & 99.07\\

643: 536.84 & 1267.81 & 103.87 & 106.05\\

644: 304.59 & 625.75 & 99.10 & 98.04\\

645: 57.18 & 98.50 & 92.11 & 93.09\\

646: 78.94 &60.96 &94.83 &96.52\\

647: 79.59 & 100.20 & 95.35 & 95.72\\

648: 65.97 & 55.08 & 94.14 & 95.42\\

649: 58.71 & 201.10 & 90.83 & 89.97\\

650:  \end{tabular}

651: \end{tabular}

652: \par

653: \vspace{12pt}

654: \par

655: \begin{tabular}{c c c}

656: parameters & AMSE & SUP \\

657: \begin{tabular}{c|c}

658: $n$ & law\\

659: \hline

660: 250 & beta(.9,.1)\\

661:  250    & beta(.1,.9)\\

662:   250  & beta(.1,.1)\\

663:  250 & beta(\,2,\,2)\\

664: 250&beta(\,5,\,5)\\

665: 250&beta(\,3,\,5)\\

666: 250&beta(\,5,\,3)\\

667: 250&beta(\,1,\,1)\\

668: \end{tabular}&

669: \begin{tabular}{c|c|c|c}

670: ${\mathcal W_1} $ & ${\mathcal W_2} $ & ${\mathcal Q_1} $ & ${\mathcal Q_2} $\\

671: \hline

672: 338.72 & 255.23 & 115.25 & 101.55\\

673: 3979.61 & 50448.13 &117.81 & 105.37\\

674: 1345.72 & 10051.20 &  100.60 & 98.97\\

675: 79.01 & 275.93 & 98.59 & 98.30\\

676: 163.68 & 99.35 &99.07 & 100.54\\

677: 212.17 & 228.58 & 99.45 & 99.69\\

678: 91.32 & 73.31 & 99.05 &99.20\\

679: 69.03 & 1165.61 & 99.47 & 98.46\\ 
 \end{tabular}

680: &

681: \begin{tabular}{c|c|c|c}

682: ${\mathcal W_1} $ & ${\mathcal W_2} $ & ${\mathcal Q_1} $ & ${\mathcal Q_2} $\\

683: \hline

684: 180.29 & 131.97 &  100.68 &  99.43\\

685: 874.65 & 2045.15 & 100.82 & 99.73\\

686: 480.12 & 977.30 & 99.16 & 98.73\\

687: 67.14 & 132.87 & 95.50 & 95.24\\

688: 111.38 & 78.48 & 96.40 & 96.83\\

689: 113.70 & 142.21 & 96.57 & 96.32\\

690: 88.87 & 67.13 & 96.84 & 97.24\\

691: 61.07 & 293.58 & 94.88 & 94.55\\
 \end{tabular}

692: \end{tabular}

693: }

694: \caption{Relative efficiency of IFS estimators with different set of maps  ${\mathcal W_1}$, ${\mathcal W_2}$,  ${\mathcal Q_1}$ and ${\mathcal Q_2}$ with respect to the empirical distribution function  (i.e. IFS/EDF). Based on 100 Monte Carlo simulation for each distribution. Moderate to big sample sizes.}

695: \label{tab:b}

696: \end{table}

697:

698: \begin{figure}

699: \includegraphics{miss1}

700: \includegraphics{miss2}

701: \caption{Data from a Beta(2,2) distribution when only the observation in $(.1, .15)\cup (.37, .43) \cup (.7, .8)$ are available to the observer all the other being truncated by the instrument.

702: The observations are marked as vertical ticks. The IFS estimator with $\mathcal W_1$ maps seems to be able to reconstruct the underlying distribution and density function, whistle, for obvious reasons both the edf and the kernel estimators fail. Notice that the arbitrary choice of the window of observation can be changed without substantial loss or gain. In this example the relative efficiency (IFS/EDF) is 7\% for the AMSE and 23\% for the SUP-norm.}

703: \label{fig:missing}

704: \end{figure}

705:

706: \begin{figure}

707: \includegraphics{sup22}

708: \includegraphics{mse22}

709: \caption{Relative efficiency of IFS estimator for different set of maps ${\mathcal W_1}$, ${\mathcal W_2}$,  ${\mathcal Q_1}$ and ${\mathcal Q_2}$ with respect to the empirical distribution function. Based on 100 Monte Carlo simulations. SUP-norm up, AMSE bottom.}

710: \label{fig:beta22}

711: \end{figure}

712:

713: \begin{figure}

714: \includegraphics{sup53}

715: \includegraphics{mse53}

716: \caption{Relative efficiency of IFS estimator for different set of maps ${\mathcal W_1}$, ${\mathcal W_2}$,  ${\mathcal Q_1}$ and ${\mathcal Q_2}$ with respect to the empirical distribution function. Based on 100 Monte Carlo simulations. SUP-norm up, AMSE bottom.}

717: \label{fig:beta53}

718: \end{figure}

719:

720: \begin{figure}

721: \includegraphics{sup19}

722: \includegraphics{mse19}

723: \caption{Relative efficiency of IFS estimator for different set of maps ${\mathcal W_1}$, ${\mathcal W_2}$,  ${\mathcal Q_1}$ and ${\mathcal Q_2}$ with respect to the empirical distribution function. Based on 100 Monte Carlo simulations. SUP-norm up, AMSE bottom.}

724: \label{fig:beta19}

725: \end{figure}

726:

727: \end{document}

728:

729: