1: % LaTeX-2e document, 30pp, 2 figures
2:
3: %\documentclass[12pt]{article}
4: \documentclass[12pt]{amsart}
5:
6: \title{A Statistical Approach to Persistent Homology}
7: % \author{Peter
8: % Bubenik\thanks{Cleveland State University, Department of
9: % Mathematics, 2121 Euclid Ave. RT 1515, Cleveland OH 44115-2214,
10: % USA, Email: p.bubenik@csuohio.edu. This research was partially
11: % funded by the Swiss National Science Foundation grant
12: % 200020-105383.} \ and Peter T. Kim\thanks{Department of
13: % Mathematics and Statistics, University of Guelph, Guelph, Ontario
14: % N1G 2W1 Canada, Email: pkim@uoguelph.ca. This research was
15: % partially funded by NSERC grant OGP46204.}}
16: \author{Peter Bubenik}
17: \address{Cleveland State University, Department of Mathematics, 2121 Euclid Ave. RT 1515, Cleveland OH 44115-2214, USA}
18: \email{p.bubenik@csuohio.edu}
19: \thanks{This research was partially funded by the
20: Swiss National Science Foundation grant 200020-105383.}
21: \author{Peter T. Kim}
22: \address{Department of Mathematics and Statistics, University of Guelph,
23: Guelph, Ontario N1G 2W1 Canada}
24: \email{pkim@uoguelph.ca}
25: \thanks{This research was partially funded by NSERC grant OGP46204.}
26: \date{\today}
27:
28: \usepackage{amsmath}
29: \usepackage{amsthm}
30: \usepackage{amssymb}
31: \usepackage{ifpdf}
32:
33: \ifpdf
34: \usepackage[pdftex]{graphicx}
35: \DeclareGraphicsExtensions{.pdf}
36: \usepackage{hyperref}
37: \else
38: \usepackage[dvips]{graphicx}
39: \DeclareGraphicsExtensions{.eps}
40: \usepackage[dvipdfm]{hyperref}
41: \fi
42:
43: % use pdfsync package to switch between pdf output and emacs input:
44: %\usepackage{pdfsync}
45:
46: \newtheorem{thm}{Theorem}[section]
47: \newtheorem{lemma}[thm]{Lemma}
48: \newtheorem{prop}[thm]{Proposition}
49: \newtheorem{claim}[thm]{Claim}
50: \newtheorem{cor}[thm]{Corollary}
51: \newtheorem{conj}[thm]{Conjecture}
52:
53: \theoremstyle{definition}
54: \newtheorem{defn}[thm]{Definition}
55: \newtheorem{eg}[thm]{Example}
56:
57: \theoremstyle{remark}
58: \newtheorem{rem}[thm]{Remark}
59: \newtheorem{notn}[thm]{Notation}
60: \newtheorem{goal}[thm]{Goal}
61: \newtheorem{question}[thm]{Question}
62:
63: \renewcommand{\theequation}{\thesection.\arabic{equation}}
64:
65: \numberwithin{equation}{section}
66:
67: \newcommand{\beq}{\begin{equation}}
68: \newcommand{\eeq}{\end{equation}}
69: \newcommand {\abs}[1] {\lvert#1\rvert}
70: \newcommand {\M} {\ensuremath {\mathcal{M}} }
71: \newcommand {\N} {\ensuremath {\mathbb{N}} }
72: \newcommand {\Z} {\ensuremath {\mathbb{Z}} }
73: \newcommand {\Q} {\ensuremath {\mathbb{Q}} }
74: \newcommand {\R} {\ensuremath {\mathbb{R}} }
75: \newcommand {\eR} {\ensuremath {\overline{\mathbb{R}}} }
76: \newcommand {\RP} {\ensuremath {\mathbb{RP}} }
77: \newcommand {\F} {\ensuremath {\mathbb{F}} }
78: \newcommand {\Rn} {\ensuremath {\mathbb{R}^n} }
79: \newcommand {\Rinfty} {\ensuremath {\mathbb{R}^{\infty}} }
80: \newcommand {\isom} {\ensuremath {\cong} }
81: \newcommand {\tensor} {\ensuremath {\otimes} }
82: \newcommand {\incl} {\ensuremath {\hookrightarrow} }
83: \newcommand {\injects} {\ensuremath {\hookrightarrow} }
84: \newcommand {\onto} {\ensuremath {\twoheadrightarrow} }
85: \newcommand {\isomto} {\ensuremath {\xrightarrow{\isom}} }
86: \newcommand {\xto}[1] {\ensuremath {\xrightarrow{#1}} }
87: \newcommand {\Iff} { if and only if }
88: \newcommand {\opensubset} {\stackrel{\subset}{{\scriptscriptstyle
89: \open}}}
90: \newcommand {\BR} {\mathcal{B}_{\R}}
91: \newcommand {\BRn} {\mathcal{B}_{\Rn}}
92: \newcommand {\maxf} {\max(f_{\kappa})}
93: \newcommand {\minf} {\min(f_{\kappa})}
94: \newcommand {\cF} {{\mathcal F}}
95: \newcommand {\cR} {{\mathcal R}}
96: \newcommand {\cC} {{\mathcal {C}}}
97: \newcommand {\E} {{\mathcal E}}
98: \newcommand {\CX} {C_*(X)}
99: \newcommand {\Cd} {(C,d)}
100: \newcommand {\homoteq} {\approx}
101:
102:
103: \DeclareMathOperator{\Id}{Id}
104: \DeclareMathOperator{\im}{im}
105: \DeclareMathOperator{\Const}{Const}
106: \DeclareMathOperator{\tr}{tr}
107: \DeclareMathOperator{\diag}{diag}
108:
109: \begin{document}
110:
111: \maketitle
112:
113: % to add whitespace between all lines:
114: %\baselineskip = 20pt plus 3pt minus 3pt
115:
116: \begin{abstract}
117: Assume that a finite set of points is randomly sampled from a
118: subspace of a metric space. Recent advances in computational
119: topology have provided several approaches to recovering the
120: geometric and topological properties of the underlying space. In
121: this paper we take a statistical approach to this problem. We assume
122: that the data is randomly sampled from an unknown probability
123: distribution. We define two filtered complexes with which we can
124: calculate the persistent homology of a probability distribution.
125: Using statistical estimators for samples from certain families of
126: distributions, we show that we can recover the persistent homology
127: of the underlying distribution.
128: \end{abstract}
129:
130:
131: \section{Introduction}
132:
133: There is growing interest in characterizing topological features of
134: data sets. Given a finite set, sometimes called \emph{point cloud
135: data (PCD)}, that is randomly sampled from a subspace $X$ of some metric
136: space, one hopes to recover geometric and topological properties of
137: $X$. Using random samples, P. Niyogi, S. Smale and S. Weinberger
138: \cite{niyogiSmaleWeinberger} show how to recover the homology of
139: certain submanifolds. In \cite{chazalCohen-SteinerLieutier} the homotopy-type of certain compact
140: subsets is recovered.
141:
142: A finer descriptor, developed by H. Edelsbrunner, D. Letscher, A.
143: Zomorodian and G. Carlsson, is that of \emph{persistent homology}
144: \cite{edelsbrunnerLetscherZomorodian, zomorodianCarlsson:computingPH}. While it
145: is not a homotopy invariant, it is stable under small
146: changes~\cite{cohen-steinerEdelsbrunnerHarer}. Using the PCD and the
147: metric, one can construct a filtered simplicial complex which
148: approximates the unknown space
149: $X$~\cite{deSilvaCarlsson,czcg:persistenceBarcodesForShapes}.
150: This leads naturally to a spectral sequence. What is unusual, is that
151: the homology of the start of the spectral sequence is uninteresting,
152: and so is what it converges to. Nevertheless, the intermediate
153: homology, called \emph{persistent homology} is of interest. It can be
154: described using \emph{barcodes}, which are analogues of the Betti
155: numbers.
156:
157: The aim of this paper is to take a statistical approach to these
158: ideas. We assume that the data is sampled from a manifold with respect
159: to a probability distribution. Given such a distribution, we construct
160: two filtered chain complexes: the \emph{Morse complex}, and the
161: \emph{\v{C}ech complex}. For most of the distributions we consider,
162: these complexes are related by Alexander duality. Using persistent
163: homology, one can calculate the corresponding Betti barcodes, which
164: provide a topological description of the distribution. In the case of
165: the \v{C}ech complex we define a Betti--$0$ function. We apply to these
166: methods to several parametric families of distributions: the von
167: Mises, von Mises-Fisher, Watson and Bingham distributions on $S^{p-1}$
168: and the matrix von Mises distribution on $SO(3)$.
169:
170: Given a sample, it is assumed that the underlying distribution is
171: unknown, but that it is one of a parametrized family. We use
172: statistical techniques to estimate the parameter. These are then
173: used to estimate the barcodes. As a result, we prove that we can
174: recover the persistent homology of the underlying distribution.
175:
176: \begin{thm}
177: Let $x_1, \ldots, x_n$ be a sample from $S^{p-1}$ according to the
178: von Mises--Fisher distribution with fixed concentration parameter
179: $\kappa \geq 0$. Given the sample, let $\hat{\kappa}$ be the maximum
180: likelihood estimator for $\kappa$ (which is given by formula
181: \eqref{est-kappa}). Let $\beta_{\kappa}$ and $\beta_{\hat{\kappa}}$
182: denote the Betti barcodes for the persistent homology of the
183: densities associated with $\kappa$ and $\hat{\kappa}$ using either
184: the Morse or the \v{C}ech filtration. Finally let $E(\cdot)$
185: denote the expectation, and $\mathcal{D}$ denote the barcode
186: metric (see Definition~\ref{def:barcodeMetric}). Then,
187: \begin{equation*}
188: E (\mathcal{D}(\beta_{\hat{\kappa}},\beta_{\kappa})) \leq C(\kappa) n^{-1/2},
189: \end{equation*}
190: as $n \to \infty$, for some constant $C(\kappa)$.
191: \end{thm}
192:
193:
194: We also show that the classical theory of spacings \cite{pyke:spacings}
195: can be used to calculate the exact expectations of the Betti barcodes
196: for samples from the uniform distribution on $S^1$ together with their
197: asymptotic behavior.
198:
199: As part of results, we show that the Morse filtrations of our
200: distributions each correspond to a relative CW-structure for the
201: underlying spaces. The von Mises and von Mises-Fisher distributions
202: correspond to the decomposition $S^{p-1} \approx * \cup_* D^{p-1}$,
203: the Watson distribution corresponds to $S^{p-1} \approx S^{p-2}
204: \cup_{\Id \amalg -\Id} (D^{p-1} \amalg D^{p-1})$, and the Bingham
205: distribution corresponds to $S^{p-1} \approx * \cup_{\Id \amalg -\Id}
206: (D^1 \amalg D^1) \cup_{\Id \amalg -\Id} (D^2 \amalg D^2) \cup \ldots
207: \cup_{\Id \amalg -\Id} (D^{p-1} \amalg D^{p-1})$. Finally, the Morse
208: filtration on the matrix von Mises distribution on $SO(3)$ corresponds
209: to the decomposition $\RP^2 \cup_f D^3$ where $f:S^2 \to \RP^2$
210: identifies antipodal points. Interestingly, the last decomposition is
211: obtained by using the Hopf fibration $S^0 \to S^3 \to \RP^3$.
212:
213:
214: A summary of the paper goes as follows. In Section \ref{notation}, we
215: go over the background and notation used in this paper. We review
216: both the statistical and the topological terminologies. In
217: Section \ref{sectionPriorWork} we discuss filtrations and persistent
218: homology and
219: %in Section \ref{sectionPersistentHofD}
220: we develop two filtrations for densities. In Section
221: \ref{sectionBettiBofS} we use the theory of spacings to give exact
222: estimates of the persistent homology of uniform samples on $S^1$. In
223: Section \ref{sectionBarcodesOfDensities} we calculate the persistent
224: homology of some standard parametric families of densities on
225: $S^{p-1}$ and $SO(3)$. In Section \ref{statestimation} we use maximum
226: likelihood estimators to recover the persistent homology of the underlying
227: density.
228:
229: \section{Background and notation}
230: \label{notation}
231:
232: In an attempt to make this article accessible to a broad audience, we
233: define some of the basic statistical and topological terms we will be
234: using.
235:
236:
237: \subsection{Statistics}
238:
239: Given a manifold $\M$ with Radon measure $\nu$, a \emph{density} is a
240: function $f: \M \to [0,\infty]$ such that $f d\nu$ is a
241: \emph{probability distribution} on $\M$ with $\int_\M f d\nu = 1$.
242: A common statistical example is to take $\M = \R^p$, and $d\nu$ to
243: be the $p-$dimensional Lebesgue measure. A density in this case would
244: be a nonnegative function that integrates to unity. We can also take
245: $\M = S^{p-1}$, the $(p-1)$-dimensional unit sphere, with $d\nu$ being
246: the $(p-1)$-dimensional spherical measure. In this case a density is
247: referred to as a \emph{directional density}. For $\M$ a compact
248: connected orientable Riemannian manifold, $d\nu$ would be the measure
249: induced by the Riemannian structure.
250:
251: In statistics, we think of a family of probability densities parametrized
252: accordingly
253: \begin{equation} \label{density_par}
254: \left\{ f_{\vartheta} : \vartheta \in \Theta\right\} \ \ ,
255: \end{equation}
256: where $\vartheta$ is called a \emph{parameter} and $\Theta$ is called the
257: \emph{parameter space}. The parameter space $\Theta$ can be quite general
258: and if it is some subset of a finite-dimensional vector space, then (\ref{density_par})
259: is referred to as a \emph{parametric} family of densities, otherwise it is
260: known as a \emph{nonparametric} family of densities. Subsequent to this,
261: the corresponding statistical
262: problem will be referred to as either a parametric statistical procedure, or, a
263: nonparametric statistical procedure, depending on whether we are dealing with
264: a parametric, or nonparametric family of densities, respectively.
265:
266: Some parametric examples are in order. Let $\M = \R^p$ and consider
267: the normal family of location scale probability densities,
268: \begin{equation} \label{normal} f_{\mu, \sigma}(x) = (2 \pi \sigma^2)^{-p/2} \exp
269: \left\{ \tfrac {\|x-\mu\|^2}{2\sigma^2} \right\} \ \ , \end{equation} where
270: $\mu, x \in \R^p$ and $\sigma^2 \in [0,\infty)$. Letting $\vartheta =
271: (\mu , \sigma^2)$, we note that this parametric problem has $\Theta =
272: \R^p \times [0,\infty )$ as its parameter space.
273:
274: If we take $\M=S^{p-1}$, a well known example of a directional
275: density, and one that will be used in this paper is given by
276: \begin{equation} \label{vmf} f_{\mu,\kappa}(x) = c(\kappa) \exp\left\{\kappa x^t
277: \mu\right\}, \end{equation} where $\mu , x \in S^{p-1}$, $\kappa \in
278: [0,\infty)$, $c(\kappa)$ is the normalizing constant and superscript
279: ``$t$" denotes transpose. The distribution arising from
280: $f_{\mu,\kappa}$ is called the \emph{von Mises-Fisher distribution}
281: where this parametric problem has $\Theta = S^{p-1} \times [0,\infty
282: )$ as its parameter space.
283:
284: Somewhat related to the above is the situation where $\M = SO(p)$, the
285: space of $p \times p$ rotation matrices. Let \begin{equation} \label{mvmf}
286: f_{\mu,\kappa}(x) = c(\kappa) \exp\left\{\kappa {\rm tr}\, x^t
287: \mu\right\}, \end{equation} where $\mu , x \in SO(p)$, $\kappa \in [0,\infty)$
288: and $c(\kappa)$ is the normalizing constant. The distribution arising
289: from $f_{\mu,\kappa}$ is called the \emph{matrix von Mises-Fisher
290: distribution} where this parametric problem has $\Theta = SO(p)
291: \times [0,\infty )$ as its parameter space.
292:
293: A \emph{sample} $X_1, X_2, \ldots X_N$ is a sequence of independent
294: and identically distributed random quantities on $\M$
295: drawn according to the density $f_{\vartheta}$ for some fixed but unknown
296: $\vartheta \in \Theta$. The parameter of interest would be the fixed but unknown
297: parameter $\vartheta$, or, more generally, some transformation $\tau(\vartheta)$
298: thereof. Statistically, we want to find an estimator
299: ${\tilde \tau} = {\tilde \tau}(X_1, \ldots , X_N)$ of
300: $\tau(\vartheta)$. Given some metric $\gamma$ on $\tau(\Theta)$, the
301: performance of the estimator is evaluated relative to this metric in
302: expectation with respect to the joint probability density of the sample,
303: \begin{equation} \label{expectation}
304: E_{\vartheta}\gamma\left({\tilde \tau}, \tau \right)
305: = \int_{\M}\cdots \int_{\M}\gamma\left({\tilde \tau}, \tau\right)f_{\vartheta}
306: \cdots f_{\vartheta} d\nu \cdots d\nu \ \ ,
307: \end{equation}
308: where the above represents an $N-$fold integration and
309: $\vartheta \in \Theta$.
310: Thus the relative merit of one estimator over another estimator can be evaluated
311: using (\ref{expectation}) in a statistical decision theory context, see~\cite{berger:statisticalDecisionTheory}.
312:
313: There are a wide variety of different distributions for a given
314: manifold, as well as sample spaces that are different manifolds.
315: References that discuss these topics can be found in the books by
316: Mardia and Jupp~\cite{mardiaJupp:book} and
317: Chikuse~\cite{chikuse:book}. Furthermore, although nonparametric
318: statistical procedures on compact Riemannian manifolds are available, \cite{hendriks, efromovich, angersKim, kimKoo},
319: %Hendriks (1990), Efromovich (2000) and Angers and Kim (2005),
320: in this paper we will deal with parametric statistical procedures.
321:
322: \subsection{Topology} \label{sectionBackgroundTopology}
323:
324:
325: Let $R$ be a commutative ring with identity. (In fact, we will only be
326: interested in cases where $R$ is a field, in which case $R$-modules are
327: vector spaces and $R$-module morphisms are linear maps of vector
328: spaces.)
329: \begin{defn}
330: A \emph{chain
331: complex} over $R$ is a sequence of $R$-modules $\{C_i\}_{i \in \Z}$
332: together with $R$-module morphisms $d_i: C_i \to C_{i-1}$ called
333: \emph{differentials} such that $d_i \circ d_{i+1} = 0$. This condition
334: is often abbreviated to $d^2=0$. The elements of $C_n$ are called
335: \emph{$n$-chains}. This chain complex is denoted by $(C,d)$.
336: \end{defn}
337:
338: \begin{defn}
339: An (abstract) \emph{simplicial complex} $K$ is a set of finite,
340: ordered subsets of an ordered set $\bar{K}$, such that
341: \begin{itemize}
342: \item the ordering of the subsets is compatible with the ordering of
343: $\bar{K}$, and
344: \item if $\alpha \in K$ then any nonempty subset of $\alpha$ is also
345: an element of $K$.
346: \end{itemize}
347: The elements of $K$ with $n+1$ elements are called $n$-simplices and
348: denoted $K_n$.
349: \end{defn}
350:
351: \begin{defn} \label{defn:chainComplexOnK} Given a simplicial complex
352: $K$, the \emph{chain complex} on $K$, denoted $(C_*(K),d)$ is
353: defined as follows. Let $C_n(K)$ be the free $R$-module with basis $K_n$.
354: We define the differential on $K_n$ and extend it to $C_n(K)$ by
355: linearity. For $[v_0, \ldots, v_n] \in K_n$ define
356: \[ d[v_0, \ldots, v_n] = \sum_i (-1)^i [v_0, \ldots, \hat{v}_i, \ldots, v_n],
357: \]
358: where $\hat{v}_i$ denotes that the element $v_i$ is omitted from the sequence.
359: \end{defn}
360:
361: For $n\geq 0$, the \emph{standard $n$-simplex} is the $n$-dimensional
362: polytope in $\R^{n+1}$, denoted $\Delta^n$, whose vertices are given
363: by the standard basis vectors $e_0,\ldots ,e_n$. It is just
364: the convex hull of the standard basis vectors; that is \begin{equation}
365: \label{simplex}
366: \Delta^n = \left\{x = \sum_{i=0}^n a_i e_i \ \left| \ \forall i \ a_i
367: \geq 0 \text{ and } \sum_{i=0}^n a_i = 1 \right. \right\}.
368: \end{equation}
369: There are inclusion maps
370: \begin{equation}
371: \label{inclusion}
372: \delta_i: \Delta^n \to \Delta^{n+1}
373: \end{equation}
374: (called the $i$-th face inclusion) are given
375: by $\delta_i(x_0,\ldots x_n) = (x_0,\ldots, x_{i-1}, 0, x_{i}, \ldots,
376: x_n)$ for $0 \leq i \leq n+1$.
377:
378: \begin{defn} \label{defn:singularChainComplex}
379: Let $X$ be a topological space.
380: For $n\geq 0$, let $C_n(X)$ be the free $R$-module generated by the
381: set of continuous maps $\{\phi: \Delta^n \to X\}$.
382: For $n<0$, let $C_n(X) = 0$.
383: For $\phi: \Delta^n \to X$ let
384: \begin{equation} \label{boundarymaps}
385: d(\phi) = \sum_{i=0}^n (-1)^i \ \phi \circ \delta_i \ \in C_{n-1}(X).
386: \end{equation}
387: Extend this by linearity to an $R$-module morphism
388: $d: C_n(X) \to C_{n-1}(X)$.
389: One can check that $d^2=0$ so this defines a differential and $C_*(X) =
390: (\{C_n(X)\}_{n \in \Z}, d)$ is a chain complex,
391: called the \emph{singular chain complex}.
392: \end{defn}
393:
394: \begin{defn} \label{defn:homology} Given a chain complex $(C,d)$, let
395: $Z_k$ be the submodule given by $\{x \in C_k \ | \ dx = 0\}$ called
396: the \emph{$k$-cycles}, and let $B_k$ be the submodule given by $\{ x
397: \in C_k \ | \ \exists y \in C_{k+1} \text{ such that } dy = x\}$,
398: called the \emph{$k$-boundaries}. Since $d^2=0$, $d(dy)=0$ and thus
399: $B_k \subset Z_k$. The \emph{$k$-th homology} of $(C,d)$, denoted
400: $H_k(C,d)$ is given by the $R$-module $Z_k/ B_k$. The homologies
401: $\{H_k(C,d)\}_{k \in \Z}$ form a chain complex with differential $0$
402: denoted $H_*(C,d)$ and called the homology of $(C,d)$. If $R$ is a
403: principal ideal domain (for example, if $R$ is a field) and
404: $H_k(C,d)$ is finitely generated, then $H_k(C,d)$ is the direct sum
405: of a free group and a finite number of finite cyclic groups. The
406: \emph{$k$-th Betti number} $\beta_k(C,d)$ is the rank of the free
407: group. If $R$ is a field, then $\beta_k(C,d)$ equals the dimension
408: of the vector space $H_k(C,d)$. If $X$ is a topological space then
409: $H_*(X)$ denotes the homology of the singular chain complex on $X$.
410: \end{defn}
411:
412: \begin{defn}
413: Two spaces $X$ and $Y$ are said to be homotopy equivalent (written
414: $X \homoteq Y$) if there are maps $f:X \to Y$ and $g:Y \to X$ such
415: that $g \circ f$ is homotopic to the identity map on $X$ and $f
416: \circ g$ is homotopic to the identity map on $Y$.
417: \end{defn}
418:
419: \begin{rem} \label{rem:contractible} If $X \homoteq Y$ then $H_*(X)
420: \isom H_*(Y)$. So if $X$ is a \emph{contractible space} (that is, a
421: space which is homotopy equivalent to a point), then $H_0(X) \isom
422: R$ and $H_k(X) = 0$ for $k \geq 1$.
423: \end{rem}
424:
425: \section{Filtrations and persistent homology} \label{sectionPriorWork}
426:
427: From now on, we will assume that the ground ring is a field $\F$.
428:
429: \subsection{Persistent homology} \label{sectionPersistentHomology}
430:
431: In Definition~\ref{defn:homology} we showed how to calculate the
432: homology of a chain complex. Given some additional information on the
433: chain complex, we will calculate homology in a more sophisticated way.
434: Namely, we will show how to calculate the \emph{persistent homology}
435: of a \emph{filtered chain complex}. This will detect homology classes
436: which persist through a range of values in the filtration.
437:
438: Let $\eR$ denote the totally ordered set of extended real numbers $\eR = \R \cup \{-\infty, \infty\}$. Then an increasing
439: \emph{$\eR$-filtration} on a chain complex $(C,d)$ is a sequence of
440: chain complexes $\{\cF_r(C,d)\}_{r \in \eR}$ such that $\cF_r(C,d)$ is
441: a subchain module of $(C,d)$ and $\cF_r(C,d) \subset \cF_{r'}(C,d)$
442: whenever $r \leq r' \in \eR$. A chain complex together with a
443: $\eR$-filtration is called a \emph{$\eR$-filtered chain complex}.
444:
445: For a filtered chain complex, the inclusions $\cF_j(C,d) \to
446: \cF_{j+l}(C,d)$ induce maps
447: \[
448: H_k(F_j(C,d)) \to H_k(F_{j+l}(C,d)).
449: \]
450: The image of this map is call the $l$-persistent $k$-th homology of
451: $\cF_j(C,d)$.
452:
453: Let $Z^i_k = Z_k(\cF_i(C,d))$ and let $B_k^i = B_k(\cF_i(C,d))$.
454: Assume $\alpha \in Z^i_k$. Then $\alpha$ represents a homology class
455: $[\alpha]$ in $H_*(\cF_i(C,d))$. Furthermore since $Z^i_k \subset
456: Z^{i'}_k$ for all $i'\geq i$, $\alpha$ also represents a homology
457: class in $H_*(\cF_{i'}(C,d))$, which we again denote $[\alpha]$. One
458: possibility is that $[\alpha]\neq 0$ in $H_k(\cF_i(C,d))$ but
459: $[\alpha]= 0$ in $H_k(\cF_{i'}(C,d))$ for some $i'>i$.
460:
461: Assume $\Cd$ is a chain complex with an $\eR$-filtration
462: ${\cF}_r(\Cd)$ such that
463: \begin{equation} \label{eqnFiltrnCndn}
464: \bigcup_{r \in \eR} {\cF}_r\Cd
465: = \Cd \text{ and } \bigcap_{r \in \eR} {\cF}_r\Cd = 0.
466: \end{equation}
467: Equivalently, $\cF_{\infty}\Cd = \Cd$ and $\cF_{-\infty}\Cd = 0$.
468:
469:
470: \begin{lemma} \label{lemmar} Let $\Cd$ be a filtered chain complex
471: satisfying \eqref{eqnFiltrnCndn}. For any $n$-chain $\alpha \in
472: \Cd$, there is some smallest $r \in \eR$ such that $\alpha \notin
473: {\cF}_{r'}\Cd$ for all $r' < r$ and $\alpha \in {\cF}_{r''}\Cd$ for
474: all $r'' > r$.
475: \end{lemma}
476:
477: \begin{proof}
478: This follows from the definition of an $\eR$-filtration, the
479: assumption \eqref{eqnFiltrnCndn}, and the linear ordering of $\eR$.
480: \end{proof}
481:
482: \begin{lemma} \label{lemmaHomologyInterval}
483: For any $n$-cycle $\alpha \in Z_n$, the set of all $r\in \eR$ such
484: that $0 \neq [\alpha] \in H_n({\cF}_r\Cd$ is either empty or is
485: an interval.
486: \end{lemma}
487:
488: \begin{proof}
489: Let $\alpha \in Z_n$, and let $r_1$ be the corresponding value given by Lemma~\ref{lemmar}.
490:
491: If there is some $\beta \in C_{n+1}$ such that $d\beta = \alpha$ then again let $r_2$ be the corresponding value given by Lemma~\ref{lemmar}. Since $\beta \in {\cF}_j\Cd$ implies that $d\beta \in {\cF}_j\Cd$, it follows that $r_2 \geq r_1$. Thus $\alpha$ represents a nonzero homology class in ${\cF}_r\Cd$ exactly when $r$ is in the (possibly empty) interval beginning at $r_1$ and ending at $r_2$. This interval contains $r_1$ if and only if $\alpha \in {\cF}_{r_1}\Cd$, and it does not contain $r_2$ if and only if $\beta \in {\cF}_{r_2}\Cd$.
492:
493: If $\alpha$ is not a $k$-boundary then $\alpha$ represents a nonzero homology class in ${\cF}_r\Cd$ exactly when $r$ is in the interval $\{x \ | \ x \geq r_1\}$ or $\{x \ | \ x > r_1\}$. beginning at $r_1$. Again this interval contains $r_1$ if and only if $\alpha \in {\cF}_{r_1}\Cd$.
494: \end{proof}
495:
496: \begin{defn}
497: For $\alpha \in Z_k$ define the \emph{persistence $k$-homology
498: interval} represented by $\alpha$ to be the interval given by
499: Lemma~\ref{lemmaHomologyInterval}. Denote it by $I_{\alpha}$.
500: \end{defn}
501:
502: \begin{defn} \label{defn:barcode} Define a \emph{Betti--$k$ barcode}
503: to be a set of intervals\footnote{In
504: Section~\ref{sectionPersistentHofD} we will see that using the
505: \v{C}ech filtration, the Betti--$0$ barcode of manifolds will have
506: uncountably many intervals, so we will define a more appropriate
507: descriptor, the Betti--$0$ function. In
508: Section~\ref{sectionBettiBofS} it will also be useful to convert
509: finite Betti barcodes to functions so that we can analyze limiting
510: and asymptotic behavior.} $\{J_{\alpha}\}_{\alpha \in S \subset
511: Z_k}$ such that
512: \begin{itemize}
513: \item $J_{\alpha}$ is a subinterval of $I_{\alpha}$, and
514: \item for all $r \in \eR$, $\{[\alpha] \ | \ \alpha \in S, \ r \in
515: J_{\alpha}\}$ is an $\F$-basis for $H_k({\cF}_r\Cd)$.
516: \end{itemize}
517: We will sometimes use $\beta_k$ to denote a Betti--$k$ barcode.
518: \end{defn}
519:
520:
521: The set of barcodes has a
522: metric~\cite{czcg:persistenceBarcodesForShapes} defined as follows.
523:
524: \begin{defn} \label{def:barcodeMetric} Given an interval $J$, let
525: $\ell(J)$ denote its length. Given two intervals $J$ and
526: $J'$, the \emph{symmetric difference}, $\Delta(J,J')$, between them
527: is the one-dimensional measure of $J \cup J' - J \cap J'$. Given two
528: barcodes $\{J_{\alpha}\}_{\alpha \in S}$ and
529: $\{J'_{\alpha'}\}_{\alpha' \in S'}$, a \emph{partial matching}, $M$,
530: between the two sets is a subset of $S\times S'$ where each $\alpha$
531: and $\alpha'$ appears at most once. Define
532: \begin{equation*}
533: \mathcal{D}(\{J_{\alpha}\}_{\alpha \in S},
534: \{J'_{\alpha'}\}_{\alpha' \in S'}) = \min_M
535: \left( \sum_{(\alpha,\alpha') \in M}
536: \Delta(J_{\alpha},J'_{\alpha'}) + \sum_{\alpha \notin M_1}
537: \ell(J_{\alpha}) + \sum_{\alpha' \notin M_2} \ell(J'_{\alpha'}) \right),
538: \end{equation*}
539: where the minimum is taken over all partial matchings, and $M_i$ is
540: the projection of $M$ to $S_i$.
541: This defines a quasi-metric (since its value may be infinite). If
542: desired, it can be converted into a metric.
543: \end{defn}
544:
545:
546:
547:
548: \subsection{Persistent homology from point cloud data}
549: \label{sectionPersistentHfPCD}
550: Let $(\M,\rho)$ be a manifold with a metric $\rho$.
551: Let $X = \{x_1, x_2, \ldots, x_n\} \subset \M$.
552: $X$ is called \emph{point cloud data}.
553: One would like to be able to obtain information on $\M$ from $X$.
554: If $X$ contains sufficiently many uniformly distributed points one may be
555: able to construct a complex from $X$ that in some sense reconstructs $\M$.
556:
557: One such construction is the following $\eR$-filtered simplicial
558: complex called the \v{C}ech complex. Recall that we are working over
559: a ground field $\F$. Let $\cC_*(X)$ be the largest simplicial complex
560: on the ordered vertex set $X$. That is $\cC_0(X) = X$ and for $k\geq
561: 1$, $\cC_k(X)$ consists of the ordered subsets of $X$ with $k+1$
562: elements. Now filter this simplicial complex (along $\eR$) as
563: follows. Given $r<0$, define $\cF^{\check{C}}_r(\cC_n(X))=0$ for all
564: $n$. Let $B_r(x)$ denote the ball of radius $r$ centered at $x$. For
565: $r \geq 0$ and $k\geq 1$, define $\cF^{\check{C}}_r(\cC_k(X))$ to be
566: the $\F$-vector space whose basis is the $k$-simplices $[x_{i_0},
567: \ldots, x_{i_k}]$ such that $\cap_{j=0}^k B_r(x_{i_j}) \neq 0$. We
568: remark that there are fast algorithms for computing
569: $\cF^{\check{C}}_r(\cC_k(X))$.\footnote{The balls of radius $r$
570: centered at the points $\{x_{i_j}\}$ have nonempty intersection if
571: and only if there is a ball of radius $r$ containing the points
572: $\{x_{i_j}\}$. There are fast algorithms for the smallest enclosing
573: ball problem\cite{fischerGaertnerKutz, gaertner:www}.}
574: $\cF^{\check{C}}_r(\cC_*(X))$ is called the $r$-\v{C}ech complex. It
575: is the \emph{nerve} of the collection of balls $\{B_r(x_i)\}_{i=1}^n$,
576: and its geometric realization is homotopy equivalent to the union of
577: these balls.
578:
579: A related construction is the Rips complex. For each $r$, the $r$-Rips
580: complex, $\cF^R_r(\cC_*(X))$, is the largest simplicial complex
581: containing $\cF^{\check{C}}_r(\cC_1(X))$. That is, $\cF^R_r(\cC_*(X))$
582: is the $\F$-vector space whose basis is the set of $k$-simplices
583: $[x_{i_0}, \ldots, x_{i_k}]$ such that $\rho(x_{i_j}, x_{i_{\ell}})
584: \leq r$ for all pairs $0 \leq j, \ell \leq k$.
585:
586: Using either of these filtered chain complexes, one obtains a filtered
587: chain complex as follows. Let $\Delta_*(\cC_*(X))$ be the chain
588: complex on $\cC_*(X)$. Filter this over $\eR$ by letting
589: \[
590: \cF_r(\Delta_*(\cC_*(X))) = \Delta_*(\cF_r(\cC_*(X))) \text{, where }
591: \cF_r = \cF^{\check{C}}_r \text{ or } \cF^R_r.
592: \]
593: To simplify the notation, we write $\Delta_k(X) := \Delta_k(\cC_*(X))$.
594: We remark that these filtrations satisfy \eqref{eqnFiltrnCndn}:
595: \[
596: \bigcup_{r \in \eR} {\cF}_r(\Delta_*(X)) = \Delta_*(X) \text{ and }
597: \bigcap_{r \in \eR} {\cF}_r(\Delta_*(X)) = 0.
598: \]
599: Let $\alpha$ be an $n$-chain.
600: By Lemma~\ref{lemmar} we know that there is some $r \in \eR$ such that
601: $\alpha \notin \cF_{r'}(\Delta_n(X))$ for all $r'<r$ and $\alpha \in
602: \cF_{r''}(\Delta_n(X))$ for all $r'' > r$.
603: In fact,
604:
605: \begin{lemma} \label{lemmaRipsr}
606: Consider an $n$-chain, $\alpha = \sum_{i=1}^m \alpha_i
607: (x_{i_0},\ldots, x_{i_n})$. For the \v{C}ech filtration let
608: \[
609: r = \max_{i=1 \ldots m} \min \{ r_i \ | \ \exists x \text{ such that }
610: B_{r_i}(x) \ni x_{i_0}, \ldots x_{i_n} \},
611: \]
612: and for the Rips filtration let
613: \[
614: r = \max_{i=1\ldots m} \max_{j\neq k}
615: \rho(x_{i_j},x_{i_k}) \ \ .
616: \]
617: Then $\alpha \notin \cF_{r'}(\Delta_n(X))$ for all $r'<r$ and $\alpha \in
618: \cF_{r''}(\Delta_n(X))$ for all $r''\geq r$.
619: \end{lemma}
620:
621: If $\alpha$ is an $n$-cycle then by Lemma~\ref{lemmaHomologyInterval}
622: there is a (possibly empty) persistence $n$-homology interval
623: corresponding to $\alpha$.
624: Applying Lemma~\ref{lemmaRipsr} to $\alpha$ and if there is some
625: $\beta \in \Delta_{k+1}(X)$ such that $d\beta = \alpha$, applying
626: Lemma~\ref{lemmaRipsr} to $\beta$, we get the following.
627:
628: \begin{lemma} \label{lemmaRipsHomologyInterval}
629: Given an $n$-cycle $\alpha$, the persistence $n$-homology interval
630: associated to $\alpha$ is either empty or has the form $[r_1,r_2)$
631: or $[r_1,\infty]$.
632: \end{lemma}
633:
634: %[*** PB - mention Delaunay triangulations, Voronoi diagrams,
635: % \v{C}ech complexes and $\alpha$-shape complexes.]
636:
637: %\section{Chain complexes filtered by densities and their
638: %corresponding persistent homologies}
639:
640:
641:
642: \subsection{Persistent homology of densities}
643: \label{sectionPersistentHofD}
644:
645: Let $f_{\vartheta}$ be a probability density on a manifold $\M$ for
646: some $\vartheta \in \Theta$. We will use $f_{\vartheta}$ to define two
647: increasing $\eR$-filtrations on $C_*(\M)$, the singular chain complex on
648: $\M$ (see Definition~\ref{defn:singularChainComplex}).
649:
650: \subsubsection{The Morse filtration} \label{section:morse}
651:
652: For $r \in \eR$, the \emph{excursion sets}
653: \begin{equation} \label{eqnMr}
654: \M_{\leq r} = \{ x \in \M \ | \ f_{\vartheta}(x) \leq r\},
655: \end{equation}
656: (used in Morse theory~\cite{milnor:morseTheory}) filter
657: $\M$ over $\eR$.
658: Hence they also provide an $\eR$-filtration of the singular chain
659: complex $C_*(\M)$,
660: \[
661: \cF^M_r(C_*(\M)) = C_*(\M_{\leq r}),
662: \]
663: which we call the \emph{Morse filtration}.
664: We remark that for all $k$,
665: \[
666: H_k(\cF^M_r C_*(\M)) = H_k(\M_{\leq r}).
667: \]
668:
669: \subsubsection{The \v{C}ech filtration} \label{section:rips}
670:
671: There is a dual increasing filtration to the Morse filtration which uses superlevel sets instead of sublevel sets. We modify this filtration slightly so that it mirrors the filtration on the \v{C}ech complex defined in Section~\ref{sectionPersistentHfPCD}, and we will call it the \emph{\v{C}ech filtration}. We do this since the filtrations on the \v{C}ech complex and the related Rips complex are the main filtrations used in computations of persistent homology.
672:
673: Notice that in the \v{C}ech complex filtration all of the points in $X$, even distant outliers, appear when $r=0$. So the \v{C}ech filtration starts with all of the points of $M$ and the discrete topology, and then progressively connects the regions with decreasing density.
674:
675: For $r<0$ and all $k$, define $\cF^{\check{C}}_r(C_k(\M)) = 0$.
676: For $r\geq 0$, let $\cF^{\check{C}}_r(C_0(\M)) = C_0(\M)$.
677: Assume $k\geq 1$.
678: Let
679: \[
680: \Const_k = \{\phi:\Delta^k \to \M \ | \ \phi \text{ is constant} \}
681: \subset C_k(\M).
682: \]
683: For $0 \leq s \leq \infty$, let
684: \begin{equation} \label{eqnM1r}
685: \M_{\geq s} = \left\{m \in \M \ | \ f_{\vartheta}(m) \geq s \right\}.
686: \end{equation}
687: For $r \geq 0$, let
688: \begin{equation} \label{eqnFr}
689: \cF^{\check{C}}_r(C_k(\M)) = {\rm Const}_k + C_k(\M_{\geq \frac{1}{r}}).
690: \end{equation}
691: From this filtered chain complex we can calculate persistence $k$-homology intervals and Betti--$k$ barcodes just as in Section~\ref{sectionPersistentHfPCD}.
692:
693: \begin{lemma} \label{lemmaHkFr}
694: For $k\geq 1$, \[H_k(\cF^{\check{C}}_r(C_*(\M))) \isom H_k(\M_{\geq \frac{1}{r}}) \ \ .\]
695: \end{lemma}
696:
697: %\noindent{\bf Proof:} Follows immediately from the definition of
698: %$\cF^{\check{C}}_r(C_*(M))$. $\Box$
699:
700: \begin{proof}
701: By definition, $Z_k({\cF^{\check{C}}}_r C_*(\M)) = {\rm Const}_k +
702: Z_k C_*(\M_{\geq \frac{1}{r}})$, and $B_k({\cF^{\check{C}}}_r
703: C_*(\M)) = {\rm Const}_k + B_k C_*(\M_{\geq \frac{1}{r}})$. So
704: \[
705: H_k({\cF^{\check{C}}}_r C_*(\M) \isom Z_k(C_*(\M_{\geq \frac{1}{r}})) /
706: B_k(C_*(\M_{\geq \frac{1}{r}})) = H_k(\M_{\geq \frac{1}{r}}).
707: \]
708: \end{proof}
709:
710: Let $r \geq 0$. Recall the notation of
711: Section~\ref{sectionPersistentHomology}: $Z^r_k =
712: Z_k(\cF^{\check{C}}_r(C_*(\M)))$ and $B^r_k =
713: B_k(\cF^{\check{C}}_r(C_*(\M))$. To start, $Z^r_0 = \F[\M]$. Then
714: $\cF^{\check{C}}_r(C_1(\M)) = \F[\{ \phi:\Delta^1 \to \M \ | \ \phi
715: \text{ is constant, or } \im{\phi} \subset \M_{\geq \frac{1}{r}}\}]$.
716:
717: For two points $x,y \in M$, there is some map $\phi:\Delta^1 \to \M$
718: such that $\phi(0)=x$, $\phi(1)=y$ and $\im(\phi) \subset \M_{\geq
719: \frac{1}{r}}$, in which case $d\phi = x-y$, if and only if $x$ and
720: $y$ are in the same path component of $\M_{\geq \frac{1}{r}}$. Thus
721: \[
722: H_0(\cF^{\check{C}}_r(C_*(\M))) \isom \F [ \M / \sim ],
723: \]
724: where $x \sim y$ if and only if $x$ and $y$ are in the same path
725: component of $\M_{\geq \frac{1}{r}}$.
726:
727:
728: In the case where $\M_{\geq \frac{1}{r}}$ is path-connected,
729: $H_0(\cF^{\check{C}}_r(C_*(\M))) \isom \F [ \M / \M_{\geq \frac{1}{r}}
730: ]$. In particular $H_0(\cF^{\check{C}}_0(C_*(\M))) \isom
731: \F[\M/\M_{\geq \infty}]$. Since $f_{\vartheta}$ is a probability
732: density, $\M_{\geq \infty}$ has measure $0$. Therefore almost all $m
733: \in \M$ represent a distinct homology class in
734: $\cF^{\check{C}}_0(C_0(\M))$ and there are uncountably many
735: $0$-homology intervals. As a result the Betti--$0$ barcode is not a
736: good descriptor. In this section, we will describe how the
737: $0$-homology intervals can be used to describe a \emph{Betti--$0$
738: function}, in the case where the density $f_{\vartheta}$ satisfies a
739: continuity condition.
740:
741: More generally, as long as $\M - \M_{\geq \frac{1}{r}}$ is uncountable
742: and $\M_{\geq \frac{1}{r}}$ has countably many path components, then
743: almost all homology classes in $H_0(\cF^{\check{C}}_r(C_*(\M)))$ have a unique
744: representative. In this case we use this as justification to consider
745: only those homology classes with a unique representative.
746:
747:
748: Assume that for all $r$, $\M - \M_{\geq \frac{1}{r}}$ is uncountable
749: and $\M_{\geq \frac{1}{r}}$ has countably many path components, and
750: that the following continuity condition holds for all $m \in \M$:
751: \begin{equation} \label{eqnContinuityCndn} \forall \epsilon > 0, \
752: \exists \text{ injective } \phi: [0,1] \to \M \text{ s.t. } \phi(0)
753: = m \text{ and } f(\phi(t)) > f(m)-\epsilon.
754: \end{equation}
755: This condition holds if $f_{\vartheta}$ is continuous.
756:
757: \begin{lemma}
758: Each $m \in M$ is a unique representative for $[m]$ for exactly
759: those values of $r \in \left[0, \tfrac{1}{f_{\vartheta}(m)}\right)$
760: or $r \in \left[0, \tfrac{1}{f_{\vartheta}(m)}\right]$.
761: \end{lemma}
762:
763: \begin{proof}
764: Let $m \in \M$. Since $dm=0$, $m\in Z^r_0$ for $r\geq 0$. Let $[m]
765: \in H_*(\cF^{\check{C}}_r(C_*(\M)))$ denote the homology class
766: represented by $m$.
767: % [*** PB - turn this into a lemma]
768: By definition $m \in \M_{\geq \frac{1}{r}}$ if and only if $r \geq
769: \frac{1}{f_{\vartheta}(m)}$. Thus $m$ is the unique representative
770: for $[m]$ for $r < \frac{1}{f_{\vartheta}(m)}$. By assumption, for
771: any $\epsilon > 0$ there is a injective map $\phi: [0,1] \to \M$
772: such that $\phi(0) = m$ and $f_{\vartheta}(\phi(t)) >
773: f_{\vartheta}(m)-\epsilon$. Then $\phi \in
774: \cF^{\check{C}}_r(C_1(\M))$ where $r =
775: \frac{1}{f_{\vartheta}(m)-\epsilon}$. This implies that for any
776: $\epsilon > 0$ there is a non-constant continuous map $\phi:
777: \Delta^1 \to \M$ with $\phi(0)=m$ such that $\phi \in
778: \cF^{\check{C}}_{\frac{1}{f_{\vartheta}(m)} + \epsilon}(C_1(\M))$.
779: Hence $m$ is not a unique representative for $[m]$ for $r >
780: \frac{1}{f_{\vartheta}(m)}$. Therefore $m$ is a unique
781: representative for $[m]$ for either $r \in
782: \left[0,\frac{1}{f_{\vartheta}(m)}\right)$ or $r \in
783: \left[0,\frac{1}{f_{\vartheta}(m)}\right]$.
784: \end{proof}
785:
786: Before we formally define the Betti--$0$ function, we give the
787: following intuitive picture. We draw each of our intervals
788: $\left[0,\frac{1}{f_{\vartheta}(m)}\right]$ or
789: $\left[0,\frac{1}{f_{\vartheta}(m)}\right)$ vertically starting at
790: $r=0$ and ending at $r=f_{\vartheta}(m)$. Furthermore we order the
791: intervals from left to right according to their length. In fact we
792: draw all of the intervals between $x=0$ and $x=1$, where the $x$-axis
793: is scaled according to the probability distribution
794: $f_{\vartheta}d\nu$. The increasing curve traced by the tips of the
795: intervals will be called the Betti--$0$ function.
796:
797: \begin{defn} \label{defn:betti0function} Formally, define the
798: \emph{Betti--$0$ function} $\beta_0:(0,1] \times \Theta \to
799: [0,\infty]$ as follows.\footnote{While our definition of $\beta_0$
800: below \eqref{bb-0} is valid for $x=0$, we get
801: $\beta_0(0,\vartheta) \equiv 0$. This does not provide any
802: information, and is furthermore inappropriate in cases such as the
803: von Mises distribution with $\kappa=0$ (see
804: Section~\ref{sectionVonMises} below) where $\beta_0(x,\vartheta)$
805: is constant and nonzero for $x>0$.}
806: %Recall that $\M_{\geq \frac{1}{r}}$ is defined in~\eqref{eqnM1r}.
807: For $r \in [0,\infty]$, let
808: \begin{equation} \label{eqngtheta} g_{\vartheta}(r) = \int_{\M_{\geq
809: \frac{1}{r}}} f_{\vartheta} d\nu.
810: \end{equation}
811: Since $f_{\vartheta}$ is a probability density, $g_{\vartheta}$ is
812: an increasing function $g_{\vartheta}: [0,\infty] \to [0,1]$ for
813: each fixed ${\vartheta} \in \Theta$. Also recall that $\M_{\geq
814: \infty}$ has measure $0$ and by definition $\M_{\geq 0} = \M$. So
815: $g_{\vartheta}(0)=0$ and $g_{\vartheta}(\infty)=1$. For $0 < x \leq
816: 1$, let
817: \begin{equation} \label{bb-0}
818: \beta_0(x,{\vartheta}) = \inf_{g_{\vartheta}(r) \geq x} r \ \ .
819: \end{equation}
820: If $g_{\vartheta}$ is continuous and strictly increasing,\footnote{In this case we can define
821: $\beta_0(x,\vartheta)$ for $x \in [0,1]$.} then
822: \begin{equation} \label{bb-0c}
823: \beta_0(x,{\vartheta}) = g_{\vartheta}^{-1}(x) \ \ ,
824: \end{equation}
825: for $\vartheta \in \Theta$. That is,
826: $\beta_0(x,\vartheta)$ is the unique value of $r$ such that $\int_{M
827: \geq \frac{1}{r}} f_{\vartheta} d\nu = x$.
828: \end{defn}
829:
830:
831:
832: \subsubsection{Alexander duality}
833:
834: The Morse and \v{C}ech filtration on $S^{p-1}$ are related by
835: Alexander duality. Let $f$ be a density on $S^{p-1}$. Assume that $r
836: \in \im (f)$ and that $r < \sup (f)$. Then $S^{p-1}_{f\leq r}$ is a
837: proper, nonempty subset of $S^{p-1}$. Assume that $S^{p-1}_{f \leq
838: r}$ is compact and a neighborhood retract.
839:
840: \begin{thm}[Alexander duality for the Morse and \v{C}ech filtrations on $S^{p-1}$]
841: Let $\tilde{H}$ denote reduced homology, let $\F$ be a field, and let $s=\frac{1}{r}$.
842: \[
843: \tilde{H}_i(S^{p-1}_{f > \frac{1}{s}}; \F) \isom
844: \tilde{H}^{p-2-i}(S^{p-1}_{f \leq r}; \F) \isom
845: \tilde{H}_{p-2-1}(S^{p-1}_{f\leq r}; \F).
846: \]
847: \end{thm}
848:
849:
850:
851: \section{Expected barcodes of PCD} \label{sectionBettiBofS}
852:
853: \subsection{Betti barcodes of uniform samples on $S^1$}
854: \label{sectionBettiUniform}
855:
856: Let $f$ be the uniform density on $S^1$. Let $X = \{X_1, \ldots X_n\}
857: \subset S^1$ be a sample drawn according to $f$. $X$ is called the
858: point cloud data. In this section we consider the Betti barcodes
859: obtained for the persistent homology of $\cF^R_*(\Delta_*(X))$ the
860: Rips complex on $X$ (see Section~\ref{sectionPersistentHfPCD}). The
861: metric we use on $S^1$ is $\frac{1}{2\pi}$ times the shortest arc length
862: between two points (we have normalized so that the total length of $S^1$ is one).
863:
864: Before we continue, we introduce some notation.
865: Choose $\alpha$ such that $X_1 = e^{i \cos(\alpha)}$.
866: For $k = 2, \ldots n$ choose $U_k \in [0,1]$ such that
867: \[
868: X_k = e^{2\pi i (\alpha + U_k)}.
869: \]
870: We remark that each $U_k$ is uniformly distributed on $[0,1]$. Now
871: reorder the $\{U_k\}$ to obtain the order statistic\footnote{Equality
872: among any of the terms occurs with probability zero.}:
873: \[
874: 0 < U_{n:1} < U_{n:2} < \ldots < U_{n:n-1} < 1.
875: \]
876: Let $U_{n:0} = 0$ and $U_{n:n} = 1$.
877: Reorder the $\{X_k\}$ as $\{X_{n:k}\}$ to correspond with the $\{U_{n:k}\}$.
878: Then for $1 \leq k \leq n$ define
879: \[
880: S_k = U_{n:k} - U_{n:k-1}.
881: \]
882: The set $S = \{S_1, \ldots S_n\}$ is called the set of
883: spacings~\cite{pyke:spacings}.
884: We remark that if $U_k = U_{n:j}$ with $1\leq j \leq n-1$ and take the
885: usual orientation of $S^1$, then the
886: distances from $X_k$ to its nearest backward neighbor and nearest
887: forward neighbor are $S_j$ and $S_{j+1}$, respectively.
888: Also the distance from $X_1$ to its neighbors is $S_n$ and $S_1$.
889: It is well known (for example, \cite{devroye}) that
890: \begin{lemma} \label{lemma:spacingsDistribution}
891: $(S_1,\ldots,S_n)$ is uniformly distributed on the standard
892: $(n-1)$-simplex $\{(x_1,\ldots x_n) | x_i\geq 0, \sum_{i=1}^n x_i =
893: 1\}$.
894: It follows that
895: \[
896: P[S_1>a_1; \cdots; S_n>a_n] =
897: \begin{cases}
898: (1-\sum_{i=1}^n a_i)^{n-1}& \text{if } \sum_{i=1}^n a_i < 1,\\
899: 0& \text{otherwise.}
900: \end{cases}
901: \]
902: and
903: \begin{equation} \label{eqnProbSimplex}
904: \text{(Whitworth, 1897)} \quad P(S_{n:n} > x) = \sum_{\substack{k \geq 1 \\ kx < 1}} (-1)^{k+1} (1-kx)^{n-1} \binom{n}{k}, \quad \forall x > 0.
905: \end{equation}
906: \end{lemma}
907:
908: Finally, order the spacings to obtain
909: \[
910: 0 < S_{n:1} < S_{n:2} < \ldots < S_{n:n-1} < 1.
911: \]
912:
913: Now we are ready to calculate the homology in degree $0$.
914: Recall that $\beta_0(\cF^R_r(\Delta_*(X)))$ equals the dimension of
915: $H_0(\cF^R_r(\Delta_*(X))$, which equals the number of path components of
916: $\cF^R_r(\Delta_*(X))$.
917: Recall that $\cF^R_r(\Delta_0(X))$ is the empty set for $r<0$ and is the set
918: $X$ for $r \geq 0$.
919: So at $r=0$, there are (almost surely) exactly $n$ distinct homology
920: classes in $H_0(\cF^R_r(\Delta_*(X)))$.
921: Each homology class $[X_k]$ will no longer have a distinct
922: representative when the distance from $X_k$ to one of its neighbors is
923: equal to $r$.
924: That is each time $r$ passes one of the $S_k$ the dimension of
925: $H_*(\cF^R_r(\Delta_*(X)))$ decreases by one.
926: Therefore for $k = 0, \ldots {n-2}$,
927: \[
928: r \in \left[ S_{n:k}, S_{n:k+1} \right) \implies
929: \beta_0(\cF^R_r(\Delta_*(X))) = n-k.
930: \]
931: When $r \geq S_{n:n-1}$, $\cF^R_r(\Delta_*(X))$ is path connected so
932: $\beta_0(\cF^R_r(\Delta_*(X))) = 1$.
933: Translating this, we see that the Betti--$0$ barcode is the collection
934: of homology intervals
935: \[
936: [0, S_{n:k}) \text{ for $k = 1, \ldots {n-1}$ and $[0,\infty]$}.
937: \]
938:
939: Finally, let us consider the homology in degree $1$.
940: Let
941: \[
942: \alpha =
943: (X_{n:1}, X_{n:2}) + \ldots + (X_{n:n-1}, X_{n:n}) + (X_{n:n},
944: X_{n:1}).
945: \]
946: This is a $1$-cycle in $\Delta_*(X)$.
947:
948: \begin{lemma}
949: If $S_{n:n} \leq \frac{1}{2}$ then the Betti--$1$ barcode is the single
950: (possibly empty) persistence homology interval
951: \[
952: I_{\alpha} = [S_{n:n}, R), \quad \text{where } R \in [\tfrac{1}{3}, \tfrac{1}{2}),
953: \]
954: otherwise it is empty.
955: \end{lemma}
956:
957: \begin{rem}
958: If the large spacing $S_{n:n}$ is greater than or equal than
959: $\frac{1}{2}$ then all of the points $X_1, \ldots X_n$ are
960: concentrated on a semicircle, and $\cF^R_r(\Delta_*(X))$ does
961: not contain any non-trivial $1$-cycles. By \eqref{eqnProbSimplex},
962: $P[S_{n:n} > \frac{1}{2}] = \frac{n}{2^{n-1}}$.
963: \end{rem}
964:
965: \begin{proof}
966: Assume that $S_{n:n} \leq \frac{1}{2}$. If $r \geq S_{n:n}$,
967: then $\alpha \in \cF^R_r(\Delta_1(X))$. We claim that by
968: using the definition of the Rips filtration and the geometry of
969: $S^1$, $\alpha$ becomes a boundary at some $R \in [\frac{1}{3},
970: \frac{1}{2}]$. Since half the perimeter of $S^1$ is $\frac{1}{2}$, when $r\geq
971: \frac{1}{2}$, $(X_i,X_j) \in \cF^R_r(\Delta_1(X))$ for all $X_i,X_j
972: \in X$. Thus when $r \geq \frac{1}{2}$ then $\cF^R_r(\Delta_*(X)) =
973: \Delta_*(X)$ which is the full $(n-1)$-simplex on the
974: vertices $X_1, \ldots X_n$. In particular if $r \geq \frac{1}{2}$, then
975: $\alpha$ is a boundary.
976:
977: Since $S_{n:n} < \frac{1}{2}$, the geometric realization of $\alpha$
978: is a $n$-gon containing the center of $S^1$. Thus if there is some
979: $\beta = \sum \beta_{ijk}(X_i,X_j,X_k) \in
980: \cF^R_r(\Delta_2(X))$ such that $d\beta=\alpha$ then for some
981: $(X_i,X_j,X_k) \in \cF^R_r(\Delta_2(X))$ the geometric realization of
982: $(X_i,X_j,X_k)$ contains the center of $S^1$. The smallest $r$ for
983: which this can happen is $\frac{1}{3}$. So if $r <
984: \frac{1}{3}$ then $\alpha$ cannot be a boundary.
985:
986: Thus $\alpha$ becomes a boundary when $r=R$ for some $R \in
987: [\frac{1}{3},\frac{1}{2}]$. If $S_{n:n} \geq \frac{1}{3}$ it is possible
988: that $R = S_{n:n}$, and $\alpha$ is not a non-trivial boundary
989: in any $\cF^R_r(\Delta_*(X)$.
990: \end{proof}
991:
992: \begin{rem}
993: If $S_{n:n} < \frac{1}{3}$ then the Betti--$1$ barcode is a single
994: non-empty persistence homology interval.
995: Using \eqref{eqnProbSimplex}, $P[S_{n:n} \geq \frac{1}{3}] <
996: n\left(\frac{2}{3}\right)^{n-1}$.
997: \end{rem}
998:
999:
1000: \subsection{Expected values of the Betti barcodes}
1001:
1002: Let $U_1, \ldots U_{n-1}$ be a sample from the uniform distribution on
1003: $[0,1]$. Let $0 < U_{n:1} < U_{n:2} < \ldots < U_{n:n-1} < 1$ be the
1004: corresponding order statistic.\footnote{We use $n$ here to match the
1005: notation of Section~\ref{sectionBettiUniform} where $\{U_1, \ldots,
1006: U_{n-1}\}$ is derived from $\{X_1,\ldots, X_n\} \in S^1$.} Define
1007: $U_{n:0} = 0$ and $U_{n:n} = 1$. For $k = 1, \ldots n$, let $S_k =
1008: U_{n:k} - U_{n:k-1}$. Recall (Lemma~\ref{lemma:spacingsDistribution})
1009: that the set of spacings $S = \{S_1, \ldots S_n\}$ is uniformly
1010: distributed on the standard $(n-1)$-simplex.
1011:
1012: Let $0 < S_{n:1} < \ldots < S_{n:n} < 1$ be the order statistic for
1013: the spacings.
1014: Then one can show~\cite[21.1.15]{shorackWellner} that
1015:
1016: \begin{prop}
1017: For $1\leq i \leq n$ the expected value of the spacings is given by
1018: \[
1019: E S_{n:i} = \frac{1}{n} \sum_{j=1}^i \frac{1}{n+1-j} = \frac{1}{n} \sum_{j=n+1-i}^n \frac{1}{j}
1020: \]
1021: \end{prop}
1022:
1023: So the expected Betti--$0$ barcode is the collection of intervals
1024: \[
1025: \left\{ \left[0, \frac{1}{n} \sum_{j=1}^i \frac{1}{n+1-j}\right) \right\}_{i
1026: \in \{1, \ldots, n-1\}} \cup \{ [0, \infty] \},
1027: \]
1028: and the expected Betti--$1$ barcode is
1029: \[
1030: \left\{ \left[ \frac{1}{n} \sum_{j=1}^n \frac{1}{n+1-j}, \infty\right] \right\}.
1031: \]
1032:
1033: To obtain the Betti--$0$ function from the Betti--$0$ barcode let
1034: \[
1035: _n \tilde{\beta}_0(x,0) = E S_{n:\lceil (n-1)x \rceil}.
1036: \]
1037: The Betti--$0$ function is a normalized version of this $ \ _n \beta_0
1038: (x,0) = c_n \ _n \tilde{\beta}_0 (x,0) $ so that $\int_0^1 \ _n
1039: \beta_0 (x,0) dx = 1$. (In fact, $c_n =
1040: \frac{n-1}{1-ES_{n:n}}$, which for large values of $n$ is
1041: approximately equal to $n$.) Thus,
1042: \[
1043: \ _n \beta_0 (x,0) =
1044: \frac{c_n}{n} \sum_{j=1}^{\lceil (n-1)x \rceil} \frac{1}{n+1-j} = \frac{c_n}{n} \sum_{j=n+1-\lceil (n-1)x \rceil}^{n} \frac{1}{j}
1045: \]
1046:
1047:
1048: \begin{prop}
1049: For $0<x<1$, as $n \to \infty$,
1050: \[
1051: \ _n \beta_0 (x,0) \to - \ln (1-x).
1052: \]
1053: \end{prop}
1054:
1055: \begin{proof}
1056: By the definition of $c_n$, $\lim_{n\to \infty}\frac{c_n}{n} = 1$.
1057: The result then follows from the observation that
1058: \[
1059: \frac{1}{n} + \int_k^n \frac{1}{x} dx < \sum_{j=k}^n \frac{1}{j} < \frac{1}{k} + \int_k^n \frac{1}{x} dx
1060: \]
1061: and the fact that
1062: \[
1063: \lim_{n \to \infty} \ln \left( \frac{n}{n+1-\lceil (n-1)x \rceil} \right) = - \ln (1-x).
1064: \]
1065: \end{proof}
1066:
1067: \begin{figure}
1068: \begin{center}
1069: \includegraphics[width=7cm,keepaspectratio=true]{bettiGraph}
1070: \end{center}
1071: \caption{Graphs of the expected Betti $0$-function for $n=10,100$ and $f(x)=-\ln(1-x)$.}
1072: \label{figure:bettiGraph}
1073: \end{figure}
1074:
1075: In Figure~\ref{figure:bettiGraph}, we graph the expected Betti-$0$ functions $y= \ _{10}\beta_0(x,0)$ and $y= \ _{100}\beta_0(x,0)$ and the limiting function $y=-\ln(1-x)$. For comparison, we also graph $y=1$, the limiting function one would obtain if the spacings became relatively equal in the limit.
1076:
1077:
1078:
1079: \section{Barcodes of certain parametric densities} \label{sectionBarcodesOfDensities}
1080:
1081: \subsection{The von Mises distribution} \label{sectionVonMises}
1082: Let $\M = S^1 = \{e^{i\theta} \ | \ \theta \in [-\pi,\pi)\} \subset \R^2$.
1083: We will use this parametrization to identify $\theta \in [-\pi,\pi)$
1084: with an element of $S^1$.
1085: Consider the von Mises density on $S^1$ with respect to the uniform measure,
1086: \[ f_{\mu,\kappa}(\theta) = \tfrac{1}{I_0(\kappa)}e^{\kappa \cos(\theta -
1087: \mu)}, \quad \theta \in [-\pi,\pi)
1088: \]
1089: where $\mu \in [-\pi,\pi)$, $\kappa \in [0,\infty)$ and
1090: $I_0(x)$ is the modified Bessel function of the first kind and
1091: order $0$, where the general $\nu-$th order Bessel function of
1092: the first kind is
1093: \begin{equation}
1094: \label{bessel}
1095: I_{\nu}(\kappa)= \tfrac{(\kappa/2)^{\nu}}{\Gamma\left(\nu + \frac{1}{2}\right)\Gamma\left(\frac{1}{2}\right)}
1096: \int_{-1}^1e^{\kappa t}(1-t^2)^{\nu-\frac{1}{2}}dt \ \ ,
1097: \end{equation}
1098: and $\Gamma (\cdot)$ denotes the gamma function.
1099:
1100: Our homologies will be independent of $\mu$, so assume
1101: that $\mu=0$ and so in this case the parameter $\vartheta = \kappa$.
1102:
1103: We will filter the chain complex on $S^1$ using both the \v{C}ech and
1104: Morse filtrations.
1105: Recall that by \eqref{eqnM1r} and \eqref{eqnMr},
1106: $S^1_{\geq \frac{1}{r}} = \{\theta \in S^1 \ | \ f_{\kappa}(\theta)
1107: \geq \frac{1}{r} \}$ and
1108: $S^1_{\leq r} = \{\theta \in S^1 \ | \ f_{\kappa}(\theta)
1109: \leq r \}$.
1110: Choose $\alpha_{r,\kappa} \in [-\pi,\pi)$ such that
1111: \[
1112: f_{\kappa}(\alpha_{r,\kappa}) = r.
1113: \]
1114: Specifically, let
1115: $
1116: \alpha_{r,\kappa} = cos^{-1}(\frac{1}{\kappa} \ln
1117: (\frac{r}{c(\kappa)})).
1118: $
1119: Our calculations of the persistent homology will follow from the
1120: following straightforward result.
1121:
1122: \begin{lemma} \label{lemmaS1}
1123: For $0 \leq r < \frac{1}{\max f_{\kappa}}$, $S^1_{\geq \frac{1}{r}} = \phi$,
1124: and for $r < \min f_{\kappa}$, $S^1_{\leq r} = \phi$.
1125: For $\frac{1}{\max f_{\kappa}} \leq r < \frac{1}{\min f_{\kappa}}$,
1126: \[
1127: S^1_{\geq \frac{1}{r}} = \{ \theta \ | \ -\alpha_{\frac{1}{r},\kappa} \leq \theta \leq
1128: \alpha_{\frac{1}{r},\kappa} \}.
1129: \]
1130: For $\min f_{\kappa} \leq r < \max f_{\kappa}$,
1131: \[
1132: S^1_{\leq r} = \{ \theta \ | \ \alpha_{r,\kappa} \leq \theta \leq 2\pi
1133: - \alpha_{r,\kappa} \}.
1134: \]
1135: For $r \geq \frac{1}{\min f_{\kappa}}$, $S^1_{\geq \frac{1}{r}} = S^1$,
1136: and for $r \geq \max f_{\kappa}$, $S^1_{\leq r} = S^1$.
1137: \end{lemma}
1138:
1139: Since its analysis is simpler, we start with the Morse filtration on
1140: $S^1$. By Lemma~\ref{lemmaS1}, $S^1_{\leq r}$ is empty if $r < \min
1141: f_{\kappa}$, it is contractible (see Remark~\ref{rem:contractible}) if
1142: $\min f_{\kappa} \leq r < \max f_{\kappa}$ and it is equal to $S^1$ if
1143: $r \geq \maxf_{\kappa}$. It follows that the Betti--$0$ barcode for
1144: the Morse filtration is the single interval
1145: \[
1146: \left[ \min f_{\kappa}, \infty \right] = \left[\tfrac{1}{ I_0(\kappa)
1147: e^{\kappa}}, \infty \right],
1148: \]
1149: the Betti--$1$ barcode is the single interval
1150: \[
1151: \left[ \max f_{\kappa}, \infty \right] = \left[\tfrac{e^{\kappa}}{
1152: I_0(\kappa)}, \infty \right],
1153: \]
1154: and all other Betti--$k$ barcodes are empty.
1155:
1156: Now consider the \v{C}ech filtration on $S^1$.
1157: We will derive a formula for the Betti--$0$ function, $\beta_0(x, \kappa)$, and
1158: calculate the Betti--$k$ barcodes for $k>0$.
1159:
1160: If $\kappa=0$ then $f_0 = 1$.
1161: So for $r<1$, $S^1_{\geq \frac{1}{r}} = \emptyset$, and for $r\geq 1$, $S^1_{\geq \frac{1}{r}} = S^1$.
1162: By definition~\eqref{eqngtheta},
1163: \[
1164: g_{\kappa}(r) = \begin{cases}
1165: 0 & \text{if $r < 1$},\\
1166: 1 & \text{if $r \geq 1$}.
1167: \end{cases}
1168: \]
1169: So by definition \eqref{bb-0}, $\beta_0(x,0) = 1$.
1170:
1171: For $\kappa>0$, let $\minf = \frac{1}{ I_0(\kappa)}e^{-\kappa}$ and
1172: $\maxf=\frac{1}{ I_0(\kappa)}e^{\kappa}$.
1173: For $r<\frac{1}{\maxf}$, $S^1_{\geq \frac{1}{r}} = \emptyset$, and for
1174: $r\geq\frac{1}{\minf}$, $S^1_{\geq \frac{1}{r}} = S^1$.
1175: For $\frac{1}{\maxf} \leq r < \frac{1}{\minf}$, since $f_{\kappa}$ is even and
1176: decreasing for $\theta>0$,
1177: \[
1178: S^1_{\geq \frac{1}{r}} = \{ \theta \ | \ -\alpha_{r,\kappa} \leq \theta \leq \alpha_{r,\kappa}\},
1179: \]
1180: where $\alpha_{r,\kappa} \in (0,\pi)$ and $f_{\kappa}(\alpha_{r,\kappa}) = \frac{1}{r}$.
1181:
1182: Let $x \in [0,1]$ and assume that $\beta_0(x,\kappa)=r$.
1183: Since $\kappa\geq 0$, $g_{\kappa}(r) = \int_{S^1_{\geq \frac{1}{r}}}f_{\kappa}(\theta)d\theta$ is continuous and strictly increasing. So,
1184: \[
1185: x = \int_{S^1_{\geq \frac{1}{r}}} f_{\kappa}(\theta)d\theta .
1186: \]
1187: Define $\alpha_{r,\kappa} \in [0,\pi]$ by the condition that $f_{\kappa}(\alpha_{r,\kappa}) =
1188: \frac{1}{r}$.
1189: So
1190: \begin{equation} \label{eqn:r}
1191: r = \frac{1}{f_{\kappa}(\alpha_{r,\kappa})}.
1192: \end{equation}
1193: For $\psi \in [0,\pi]$, let
1194: \[
1195: F_{\kappa}(\psi) = \int_0^{\psi}f_{\kappa}(\theta) d\theta.
1196: \]
1197: Then
1198: \begin{equation} \label{eqn:x}
1199: x = \int_{S^1_{\geq \frac{1}{r}}} f_{\kappa} d\nu = \int_{-\alpha_{r,\kappa}}^{\alpha_{r,\kappa}} f_{\kappa}(\theta)
1200: d\theta = 2 F_{\kappa}(\alpha_{r,\kappa}).
1201: \end{equation}
1202: Since $F_{\kappa}$ is strictly increasing, it is invertible. So
1203: $\alpha_{r,\kappa} = F_{\kappa}^{-1}(\frac{x}{2})$. Thus
1204: \begin{equation} \label{betti-0} \beta_0(x,\kappa) = r =
1205: \frac{1}{f_{\kappa}(F_{\kappa}^{-1}(\frac{x}{2}))} \end{equation} Since
1206: $f_{\kappa}$ and $F_{\kappa}$ are smooth, by the inverse function
1207: theorem, so is $F_{\kappa}^{-1}$. So
1208: \[
1209: \beta_0(x,\kappa) = (F_{\kappa}^{-1})'\left(\frac{x}{2}\right).
1210: \]
1211: We remark that as $\kappa \rightarrow 0$, $\beta_0(x,\kappa)
1212: \rightarrow 1 = \beta_0(x,0)$.
1213: We can also describe the graph of $r=\beta_0(x,\kappa)$ parametrically
1214: by combining \eqref{eqn:r} and \eqref{eqn:x} (see Figure~\ref{figure:vonMisesBetti0}):
1215: \begin{equation} \label{eqn:vMh}
1216: h_{\kappa}(t) = \left( 2 F_{\kappa}(t), \frac{1}{f_{\kappa}(t)} \right), t \in [0,\pi].
1217: \end{equation}
1218: \begin{figure}
1219: \begin{center}
1220: \includegraphics[width=7cm,keepaspectratio=true]{vonMisesBetti0_3d_hue}
1221: \end{center}
1222: \caption{Graph of the Betti $0$-function of the von Mises density for
1223: a range of concentration parameters}
1224: \label{figure:vonMisesBetti0}
1225: \end{figure}
1226:
1227:
1228: For $k\geq 1$, recall that
1229: \[
1230: \cF^{\check{C}}_r(C_k(S^1)) = \Const_k + C_k(S^1_{\geq \frac{1}{r}}).
1231: \]
1232: Also recall that for $r<\frac{1}{\maxf}$, $S^1_{\geq \frac{1}{r}} = \emptyset$, for
1233: $\frac{1}{\maxf} \leq r < \frac{1}{\minf}$, $S^1_{\geq \frac{1}{r}}$ is the arc from
1234: $-\alpha_{r,\kappa}$ to $\alpha_{r,\kappa}$ where $f_{\kappa}(\alpha_{r,\kappa}) = \frac{1}{r}$ and for $r\geq
1235: \frac{1}{\minf}$, $S^1_{\geq \frac{1}{r}} = S^1$.
1236: It follows that for $k\geq 1$,
1237: \[
1238: H_k(\cF^{\check{C}}_r(C_*(S^1))) = \begin{cases}
1239: \F & \text{for $k=1$ and $r\geq \frac{1}{\minf}$},\\
1240: 0 & \text{otherwise.}
1241: \end{cases}
1242: \]
1243: Therefore the Betti--$1$ barcode has the single interval
1244: \begin{equation}
1245: \label{1-betti}
1246: \left[\tfrac{1}{\minf},\infty\right] = \left[
1247: I_0(\kappa)e^{\kappa},\infty\right]
1248: \end{equation}
1249: and for $k>1$ the Betti--$k$ barcode is
1250: empty.
1251: %We remark that this generalizes the result for $\kappa=0$.
1252:
1253: \subsection {The von Mises-Fisher distribution}
1254: Now consider
1255: $\M=S^{p-1}$, $p \geq 3$ and the unimodal von Mises-Fisher density
1256: given by
1257: \[
1258: f_{\mu,\kappa}(x) = c(\kappa) \exp\left\{\kappa x^t \mu\right\}, \quad x \in S^{p-1}
1259: \]
1260: where $\kappa \in [0,\infty)$, $\mu \in S^{p-1}$, and
1261: \begin{equation} \label{normalizing}
1262: c(\kappa)=\left(\frac{\kappa}{2}\right)^{\frac{p}{2}-1}
1263: \frac{1}{\Gamma(\frac{p}{2}) I_{\frac{p}{2} -1}(\kappa )}
1264: \end{equation}
1265: is the normalizing constant with respect to the uniform measure. This
1266: is also known as the Langevin distribution. Note that the minimum and
1267: maximum of $f$ also do not depend on $\mu$: $\minf =
1268: c(\kappa)e^{-\kappa}$ and $\maxf = c(\kappa)e^{\kappa}$. In fact, by
1269: symmetry the homologies will not depend on $\mu$. Hence once again
1270: take $\vartheta = \kappa$.
1271:
1272: Consider the Morse filtration (defined in Section~\ref{section:morse})
1273: on $S^{p-1}$. If $r< \minf$ then $S^{p-1}_{\leq r} = \phi$ and if $r
1274: \geq \maxf$ then $S^{p-1}_{\leq r} = S^{p-1}$. For $\minf) \leq r <
1275: \maxf$
1276: \[
1277: S^{p-1}_{\leq r} = \{ x\in S^{p-1} | x^t\mu \leq a_{r,\kappa}\},
1278: \]
1279: where $a_{r,\kappa} = \frac{1}{\kappa} \ln
1280: \left(\frac{r}{c(\kappa)}\right) \in [-1,1]$. So $S^{p-1}_{\leq r}$
1281: is the closure of $S^{p-1}$ minus a right circular cone with vertex
1282: $0$ and centered at $\mu$. In particular, $S^{p-1}_{\leq r}$ is
1283: contractible (see Remark~\ref{rem:contractible}) so
1284: $H_0(\cF^M_r(C_*(S^{p-1}))) = \F$ and for $k\geq 1$,
1285: $H_k(\cF^M_r(C_*(S^{p-1}))) = 0$.
1286:
1287: Thus the Betti--$0$ barcode is the single interval $[\minf, \infty)$,
1288: the Betti--$(p-1)$ barcode is the single interval $[\maxf, \infty)$ and all
1289: other barcodes are empty.
1290:
1291: Consider the \v{C}ech filtration (defined in
1292: Section~\ref{section:rips}) on $S^{p-1}$.
1293: %If $r < \frac{1}{\maxf}$ then $S^{p-1}_{\geq \frac{1}{r}} = \emptyset$ and if $r\geq \frac{1}{\minf}$ then $S^{p-1}_{\geq \frac{1}{r}} = S^{p-1}$.
1294: For $\frac{1}{\max(f_{\kappa})} \leq r < \frac{1}{\min(f_{\kappa})}$,
1295: \[
1296: S^{p-1}_{\geq \frac{1}{r}} = \{x \in S^{p-1} \ | \ x^t\mu \geq a_{\frac{1}{r},\kappa}\}.
1297: \]
1298: So $S^{p-1}_{\geq \frac{1}{r}}$ is the intersection of $S^{p-1}$ and a right circular cone with
1299: vertex $0$ and centered at $\mu$.
1300: In particular for
1301: $\frac{1}{\max(f_{\kappa})} \leq r < \frac{1}{\min(f_{\kappa})}$,
1302: $S^{p-1}_{\geq \frac{1}{r}}$ is contractible, so for $k\geq 1$, $H_k(S^{p-1}_{\geq \frac{1}{r}})=0$.
1303:
1304: Assume $\kappa = 0$. Then $f_0 = c(0)$, and
1305: \[
1306: S^{p-1}_{\geq \frac{1}{r}} = \begin{cases}
1307: \phi & \text{if $r < \frac{1}{c(0)}$},\\
1308: S^{p-1} & \text{if $r \geq \frac{1}{c(0)}$}.
1309: \end{cases}
1310: \]
1311: Thus
1312: \[
1313: g_{\kappa}(r) = \begin{cases}
1314: 0 & \text{if $r < \frac{1}{c(0)}$},\\
1315: 1 & \text{if $r \geq \frac{1}{c(0)}$}.
1316: \end{cases}
1317: \]
1318: Therefore $\beta_0(x,0) := \inf_{g_{\kappa}(r) \geq x} r = \frac{1}{c(0)}$.
1319:
1320: Assume $\kappa > 0$. Then for $k=0$,
1321: \begin{equation} \label{eqn:g_kappa}
1322: x = g_{\kappa}(r) =
1323: \int_{S^{p-1}_{\geq \frac{1}{r}}} f_{\kappa} =
1324: c(\kappa) \frac{s_{p-2}}{s_{p-1}} \int_0^{\arccos
1325: \left(-\frac{\ln(rc(\kappa))}{\kappa}\right)} e^{\kappa \cos
1326: \theta} \sin^{p-2}\theta \ d\theta \ \ ,
1327: \end{equation}
1328: where $s_{p-1} = \frac{2\pi^{\frac{p}{2}}}{\Gamma\left(\frac{p}{2}\right)}$.
1329: When $\kappa > 0$, $g_{\kappa}(r)$ is continuous and strictly increasing. Hence
1330: \begin{equation}
1331: \label{0-betti-sphere}
1332: \beta_0(x , \kappa ) = g_{\kappa}^{-1}(x)
1333: \end{equation}
1334: for $x \in [0,1]$ and $\kappa > 0$. As we did for the von Mises
1335: distribution~\eqref{eqn:vMh}, we can describe the graph of $r=\beta_0(x,\kappa)$ more
1336: explicitly using a parametric equation:
1337: \begin{equation} \label{eqn:vMFh}
1338: h_{\kappa}(t) = \left( c(\kappa) \frac{s_{p-2}}{s_{p-1}} \int_0^t e^{\kappa \cos \theta} \sin^{p-2}\theta \ d \theta, \frac{e^{-\kappa \cos t}}{c(\kappa)} \right), \quad t \in [0,\pi].
1339: \end{equation}
1340:
1341: For $k\geq 1$, by Lemma~\ref{lemmaHkFr},
1342: \[ H_k(\cF^{\check{C}}_r(C_*(S^{p-1}))) = H_k(S^{p-1}_{\geq \frac{1}{r}}) = \begin{cases}
1343: \F & \text{ if } k=p-1 \text{ and } r \geq \frac{1}{\minf},\\
1344: 0 & \text{otherwise.}
1345: \end{cases}
1346: \]
1347: Therefore for $k\geq 1$ the Betti--$k$ barcode has the single interval:
1348: \begin{equation} \label{betti-k}
1349: \left[\tfrac{1}{\minf},\infty\right] = \left[\tfrac{e^{\kappa}}{c(\kappa)},\infty\right]
1350: \end{equation}
1351: for $k=p-1$ and is empty otherwise.
1352:
1353:
1354: \subsection{The Watson distribution} \label{sectionWatson}
1355:
1356: Let $\M = S^{p-1}$ and consider the following bimodal distribution
1357: \begin{equation} \label{eqnWatson}
1358: f_{\mu,\kappa}(x) = d(\kappa) \exp \{ \kappa (x^{t}\mu)^2 \},
1359: \end{equation}
1360: where $\kappa\geq 0$ and $x,\mu \in S^{p-1}$,
1361: called the \emph{Watson distribution}.
1362: We remark that this density is rotationally symmetric, where $\mu$ is
1363: the axis of rotation.
1364: The minimum and maximum densities are given by
1365: \[
1366: \min f = d(\kappa), \quad \max f = d(\kappa) e^{\kappa}.
1367: \]
1368: The maximum is achieved at $x=\pm \mu$ and the minimum is achieved at
1369: all $x$ such that $x^t \mu = 0$.
1370:
1371: Using the Morse filtration we get the following Betti barcodes.
1372: For $p=2$, we remark that for $r < \min f$, $S^1_{\leq r} = \phi$.
1373: For $r = \min f$, $S^1_{\leq r}$ is two points.
1374: As $r$ increases, these points become two arcs of increasing size,
1375: which connect when $r = \max f$.
1376: So the Betti--$0$ barcode consists of the two homology intervals
1377: $[\min f, \infty] $ and $[\min f, \max f)$, and the Betti--$1$ barcode
1378: has the single interval $[\max f, \infty]$.
1379: All other Betti barcodes are empty.
1380:
1381: For $p>2$, we observe similar behavior.
1382: When $r < \min f$, $S^{p-1}_{\leq r} = \phi$.
1383: For $r = \min f$, $S^{p-1}_{\leq r}$ is equator which is homeomorphic
1384: to $S^{p-2}$.
1385: As $r$ increases, the equator expands until it reaches the poles when
1386: $r = \max f$.
1387: So the Betti--$0$, Betti--$(p-2)$ and Betti--$(p-1)$ barcodes each
1388: consist of a single homology interval:
1389: $[\min f, \infty]$, $[\min f, \max f)$, and $[\max f, \infty]$,
1390: respectively.
1391: All other Betti barcodes are empty.
1392:
1393: Using the \v{C}ech filtration, $S^{p-1}_{\geq \frac{1}{r}}$ is either
1394: empty, or consists of two contractible components, or is all of $S^{p-1}$.
1395: So the Betti--$(p-1)$ barcode is the single
1396: homology interval $[\frac{1}{\min f}, \infty]$ and the Betti--$k$
1397: barcodes for all other $k\geq 1$ are empty.
1398: The Betti--$0$ function is given by $\beta_0(x,\kappa) =
1399: g^{-1}_{\kappa}(x)$, where
1400: \[
1401: g_{\kappa}(r) = \int_{S^{p-1}_{\geq \frac{1}{r}}} f_{\kappa} = 2
1402: \frac{s_{p-2}}{s_{p-1}} \int_0^{\alpha_{\kappa}(r)} d(\kappa)
1403: e^{\kappa \cos^2(\theta)} \sin^{p-2}(\theta) d\theta,
1404: \]
1405: with $\alpha_{\kappa}(r) =
1406: \cos^{-1}\left(\sqrt{-\frac{1}{\kappa}\ln(d(\kappa)r)}\right)$ and
1407: $s_{p-1} = \frac{2\pi^{p/2}}{\Gamma(p/2)}$.
1408: As with the von Mises \eqref{eqn:vMh} and von Mises-Fisher distributions \eqref{eqn:vMFh}, the Betti--$0$ function can also be described parametrically.
1409:
1410:
1411: \subsection{The Bingham distribution}
1412:
1413: Again let $\M = S^{p-1}$ with the probability density
1414: \[
1415: f_{K}(x) = d(K) \exp \{ x^t K x \}
1416: \]
1417: where $x \in S^{p-1} \subset \R^3$ and $K$ is a symmetric $p \times p$
1418: matrix.
1419: We remark that $f_{K}(x) = d(K) \exp \{ \tr K x x^t \}$.
1420: Also, by a change of coordinates we can write $K = \diag (k_1, \ldots
1421: k_p)$, where $k_p \geq \ldots \geq k_1$ are the eigenvalues of $K$.
1422: Let $v_i$ be the eigenvector associated to $k_i$.
1423:
1424: Assume that $k_p > \ldots > k_1 > 0$.
1425: Then the minimum and maximum values of $f_K$ are given by
1426: \[
1427: \min f_K = d(K) e^{k_1}, \quad \max f_K = d(K) e^{k_p},
1428: \]
1429: and are attained at $\pm v_1$ and $\pm v_p$.
1430:
1431: The Betti--$k$ barcodes (for $k\geq 1$) when $p=2$ are the same as for
1432: the Watson distribution. When $p\geq 3$, the Bingham distribution
1433: differs significantly from the Watson distribution. For example,
1434: the minimum of the function is attained at only $\pm v_1$ which is
1435: certainly not homeomorphic to $S^{p-2}$.
1436:
1437: Consider the Morse filtration. We can calculate the Betti--$k$
1438: barcodes inductively. If we consider $v_p$ to be the north pole, then
1439: there is a homotopy from $S^{p-1} - \{v_p, -v_p\}$ to $S^{p-2}$ which
1440: collapses the sphere with missing its poles to the equator. When $r <
1441: k_p$, by the symmetry of $f_K$ this homotopy also gives a homotopy
1442: from $S^{p-1}_{\leq r}$ to $S^{p-2}_{\leq r}$ where the filtration on
1443: $S^{p-2}$ is the Morse filtration associated to the Bingham
1444: distribution with $K = \diag (k_1, \ldots k_{p-1})$.
1445:
1446: As a result, the Betti--$0$ barcode is given by the two homology
1447: intervals $[d(K)e^{k_1}, \infty]$ and $[d(K)e^{k_1},
1448: d(K)e^{k_{2}})$.
1449: For $1 \leq k \leq p-2$, the Betti--$i$ barcode is given by the
1450: interval $[d(K)e^{k_{i+1}}, d(K)e^{k_{i+2}})$.
1451: Finally, the Betti--$(p-1)$ barcode is given by the interval
1452: $[d(K)e^{k_p}, \infty]$.
1453:
1454: We remark that this barcode corresponds the cellular construction of
1455: $S^{p-1}$ that repeatedly attaches northern and southern hemispheres
1456: of increasing dimension.
1457:
1458: For the \v{C}ech filtration we can use the same argument starting with
1459: $v_1$. The Betti--$0$ barcode is given by the two homology intervals
1460: $\frac{1}{d(K)} \left[e^{-k_p}, \infty\right]$ and $\frac{1}{d(K)} \left[ e^{-k_p}, e^{-k_{p-1}} \right)$. For $1 \leq i \leq
1461: p-2$, the Betti--$i$ barcode is given by the single interval $\frac{1}{d(K)}
1462: \left[e^{-k_{p-i}}, e^{-k_{p-i-1}} \right)$.
1463: The Betti--$(p-1)$ barcode is given by the single interval $\frac{1}{d(K)} \left[ e^{-k_1}, \infty \right]$.
1464:
1465: We remark that the correspondence between the two sets of barcodes is
1466: a manifestation of Alexander duality.
1467:
1468:
1469:
1470:
1471: \subsection{The matrix von Mises distribution and a Hopf fibration}
1472: The Lie group of rotations of $\R^3$, $SO(3)$, can be given the matrix
1473: von Mises density
1474: \begin{equation} \label{eqnMatrixVonMises}
1475: f_{A,\kappa}(X) = c(\kappa) \exp \left\{ \kappa \tr (X^t A) \right\},
1476: \end{equation}
1477: where $A \in SO(3)$ and $\kappa > 0$ is a concentration parameter.
1478: We determine the Morse and \v{C}ech filtrations of $SO(3)$ via the
1479: Hopf fibration $S^3 \to \RP^3$.
1480:
1481: The special orthogonal group $SO(3)$ is diffeomorphic to the real
1482: projective space $\RP^3$. The map $S^3 \to \RP^3$ which identifies
1483: each point on the sphere with the one-dimensional subspace on which
1484: it lies is a Hopf fibration whose fiber is $S^0 = \{-1,1\}$. Thus,
1485: $S^3$ is a double-cover of $SO(3)$ (and since $S^3$ is
1486: simply-connected, it is the universal cover).
1487:
1488: If we represent $S^3$ with the unit quaternions and $\RP^3$ with
1489: $SO(3)$, then the Hopf fibration above is represented by the
1490: Cayley-Klein map $\rho: S^3 \to SO(3)$:
1491: \[ \rho \left(
1492: \begin{array}{c}
1493: p_1 \\
1494: p_2 \\
1495: p_3 \\
1496: p_4
1497: \end{array} \right) = I + 2 p_1B + 2B^2 \text{, where } B = \left(
1498: \begin{array}{ccc}
1499: 0 & -p_4 & p_3 \\
1500: p_4 & 0 & -p_2 \\
1501: -p_3 & p_2 & 0
1502: \end{array} \right).
1503: \]
1504: We can use this map to relate the matrix von Mises
1505: density~\eqref{eqnMatrixVonMises} on $SO(3)$ to the Watson
1506: density~\eqref{eqnWatson} on $S^3$ by making the following
1507: observation.
1508: If $P = \rho(p)$ and $Q = \rho(q)$, then
1509: \[
1510: \tr(P^tQ) = 4 (p^t q)^2 - 1.
1511: \]
1512: Then if $\rho(a) = A$,
1513: \[
1514: \rho^{-1} \{ X \in SO(3) \ | \ f_{A,\kappa}(x) = r\} = \{ x \in S^3 \ | \
1515: f_{a,4\kappa}(x) = kr\} %\frac{d(4\kappa)e^{\kappa}r}{c(\kappa)} \}.
1516: \text{, where } k = \frac{d(4\kappa)e^{\kappa}}{c(\kappa)}.
1517: \]
1518: It follows that
1519: \[
1520: \rho^{-1}(SO(3)_{\leq r}) = S^3_{\leq kr} \text{ and }
1521: \rho^{-1}(SO(3)_{\geq \frac{1}{r}}) = S^3_{\geq k\frac{1}{r}},
1522: % \text{, where } k = \frac{d(4\kappa)e^{\kappa}}{c(\kappa)}
1523: \]
1524: where the filtration on $S^3$ is with respect to Watson density $f_{a,4\kappa}$.
1525:
1526: Recall (Section~\ref{sectionWatson}) that for $\frac{1}{\max f} \leq
1527: kr < \frac{1}{\min f}$, $S^3_{\geq \frac{1}{kr}}$ consists of two
1528: contractible components. The Hopf fibration $S^3 \to \RP^3$ and
1529: equivalently the map $\rho: S^3 \to SO(3)$ identify these two
1530: components. So $SO(3)_{\geq \frac{1}{r}}$ is contractible. Therefore,
1531: for the \v{C}ech filtration the Betti--$3$ barcode is the single homology
1532: interval $[\frac{1}{\min f}, \infty)$ and all other Betti--$k$ barcodes
1533: for $k\geq 1$ are empty. The Betti--$0$ function is identical to the
1534: one for the Watson density on $S^3$.
1535:
1536: For $\min f \leq kr < \max f$, $S^3_{\leq kr}$ is homotopy equivalent
1537: (via a projection onto its equator) to $S^2$. The Hopf fibration $S^3
1538: \to \RP^3$ restricted to the equator gives the Hopf fibration and
1539: double cover $S^2 \to \RP^2$. The homotopy equivalences $S^3_{\leq kr}
1540: \homoteq S^2$ induces a homotopy equivalence $SO(3)_{\leq r} \homoteq
1541: \RP^2$. Thus for the Morse filtration, the Betti--$0$ and Betti--$3$
1542: barcodes are the single homology intervals $[\min f, \infty)$ and
1543: $[\max f, \infty)$ and all Betti--$k$ barcodes for $k>3$ are empty.
1544: However, since the fundamental group and integral homology group of
1545: degree one of $\RP^2$ are the cyclic group of order two, the Betti--$1$
1546: and Betti--$2$ barcodes depend on the choice of the field of
1547: coefficients $\F$. If $\F$ is a field of characteristic $0$ (e.g. the
1548: rationals) then both are empty. However if $\F$ is the field of
1549: characteristic two ($\Z/2\Z$), then both are the single homology
1550: interval $[\min f, \max f)$.
1551:
1552:
1553:
1554:
1555:
1556: \section{Statistical estimation of the Betti barcodes}
1557: \label{statestimation}
1558:
1559: In this section we will calculate the expected persistent homology
1560: using statistics sampled from various densities.
1561:
1562:
1563: \subsection{The von-Mises and von-Mises Fisher distributions}
1564: For point cloud data $x_1, \ldots , x_n$ on $S^{p-1}$ sampled from the
1565: von Mises-Fisher distribution (\ref{vmf}): $f_{\mu,\kappa}(x) =
1566: c(\kappa)\exp\{\kappa x^t \mu \}$, we will give the statistical
1567: estimators for the (unknown) parameters. We will show that these can
1568: be used to obtain good estimates of the persistent homology of the
1569: underlying distribution.
1570:
1571: Letting $\bar x = \frac{1}{n} \sum_{i=1}^nx_i$ denote the sample mean,
1572: consider the decomposition
1573: \[ {\bar x} = \|{\bar x}\|\left(\tfrac{{\bar x}}{ \|{\bar x}\|}\right)
1574: \ \ .
1575: \]
1576: The statistical estimator for $\mu$ is ${\bar x}/\|{\bar x}\|$ while
1577: the statistical estimator for $\kappa$ is solved~\cite[Section
1578: 10.3.1]{mardiaJupp:book} by inverting $A_p(\hat \kappa) = \|{\bar
1579: x}\|$, where $A_p(\lambda) =
1580: \tfrac{I_{p/2}(\lambda)}{I_{p/2-1}(\lambda)}$, and $I_\nu(\lambda)$
1581: is the modified Bessel function of the first kind and order $\nu$.
1582: Hence,
1583: \begin{equation}
1584: \label{est-kappa} {\hat \kappa} = A_p^{-1}(\|{\bar x}\|).
1585: \end{equation}
1586:
1587: A large sample asymptotic normality calculation for
1588: (\ref{est-kappa}) is~\cite[Section 10.3.1]{mardiaJupp:book}
1589: \begin{equation}
1590: \label{asymp-mse-kappa} \sqrt{n}\left( {\hat \kappa} - \kappa
1591: \right) \rightsquigarrow N\left(0, A_p'(\kappa)^{-1}\right),
1592: \end{equation}
1593: as $n \rightarrow \infty$, where $\rightsquigarrow$ means
1594: convergence in distribution and $N(0,\sigma^2)$ stands for a
1595: normally distributed random variable with mean 0 and variance
1596: $\sigma^2 > 0$. \iffalse A large sample calculation for
1597: (\ref{est-kappa}) is~\cite[Section 10.3.1]{mardiaJupp:book}
1598: \begin{equation}
1599: \label{asymp-mse-kappa} E\left( {\hat \kappa} - \kappa \right) =
1600: \frac{(p-1)A_p'(\kappa) - \kappa A_p''(\kappa)}{2\kappa
1601: A_p'(\kappa)^2} \frac{1}{n} + O\left(\frac{1}{n^2}\right),
1602: \end{equation}
1603: as $n \rightarrow \infty$.\fi
1604: Using this estimate of $\kappa$ we
1605: obtain estimates for the $\beta_{\kappa}$ barcodes for the Morse and
1606: \v{C}ech filtrations. For the Morse filtration, we estimate the
1607: $\beta_0$ barcode and $\beta_{p-1}$ barcode to be
1608: $[c(\hat{\kappa})e^{-\hat{\kappa}},\infty]$ and
1609: $[c(\hat{\kappa})e^{\hat{\kappa}},\infty]$, respectively. For the
1610: \v{C}ech filtration, we estimate the $\beta_{p-1}$ barcode to be
1611: $[\frac{e^{\hat{\kappa}}}{c(\hat{\kappa})},\infty]$.
1612:
1613: Recall that the space of barcodes has a metric $\mathcal{D}$ (see
1614: Definition~\ref{def:barcodeMetric}). Let $\beta_i^{M}(f)$ and
1615: $\beta_i^{\check{C}}(f)$ denote the Betti--$i$ barcode for the density
1616: $f$ using the Morse and \v{C}ech filtrations. Then the expectations
1617: of the distance from the estimated persistent homology to the
1618: persistent homology of the underlying density can be bounded as follows.
1619:
1620: \begin{thm}
1621: For the von Mises--Fisher distribution on $S^{p-1}$ and
1622: $\kappa \in [\kappa_0, \kappa_1]$, where $0 < \kappa_0 \leq
1623: \kappa_1 < \infty$,
1624: \begin{equation*}
1625: E (\mathcal{D}(\beta_i^M (f_{\hat{\kappa}}),\beta_i^M(f_{\kappa}))) \leq C(\kappa) n^{-1/2}
1626: \end{equation*}
1627: as $n \to \infty$ for all $i$, and
1628: \begin{equation*}
1629: E (\mathcal{D}(\beta_i^{\check{C}} (f_{\hat{\kappa}}), \beta_i^{\check{C}}(f_{\kappa}))) \leq C(\kappa) n^{-1/2}
1630: \end{equation*}
1631: as $n \to \infty$ for all $i \geq 1$, for some constant $C(\kappa)$.
1632: \end{thm}
1633:
1634: \begin{proof}
1635: Since the barcodes have a particularly simple form, we only need to
1636: know the barcode metric for the following case:
1637: \begin{equation*}
1638: \mathcal{D} ( \{ [a,\infty] \}, \{ [b,\infty] \} ) = |a-b|.
1639: \end{equation*}
1640: Using our previous calculations of the Betti barcodes, we have:
1641: \begin{eqnarray*}
1642: \mathcal{D} ( \beta_0^M (f_{\hat{\kappa}}), \beta_0^M(f_{\kappa}) )
1643: & = & |
1644: c(\hat{\kappa}) e^{-\hat{\kappa}} - c(\kappa) e^{-\kappa} | \\
1645: \mathcal{D} ( \beta_{p-1}^M (f_{\hat{\kappa}}),
1646: \beta_{p-1}^M(f_{\kappa}) ) & = & |
1647: c(\hat{\kappa}) e^{\hat{\kappa}} - c(\kappa) e^{\kappa} | \\
1648: \mathcal{D} ( \beta_{p-1}^{\check{C}} (f_{\hat{\kappa}}),
1649: \beta_{p-1}^{\check{C}}(f_{\kappa}) ) & = & |
1650: c(\hat{\kappa})^{-1} e^{\hat{\kappa}} - c(\kappa)^{-1} e^{\kappa} |.
1651: \end{eqnarray*}
1652:
1653: We note that the normalizing constant can be re-expressed as
1654: \[
1655: c(\kappa) = \frac{B\left(\frac{p-1}{2},\frac12\right)}{\int_{-1}^1
1656: e^{\kappa t}(1-t^2)^{\frac{p-3}{2}}dt} \ \ ,
1657: \]
1658: where $B(\cdot,\cdot)$ is the beta function.
1659: Furthermore,
1660: \[
1661: c'(\kappa) = -B\left(\frac{p-1}{2},\frac12\right)\frac{\int_{-1}^1
1662: e^{\kappa t}t(1-t^2)^{\frac{p-3}{2}}dt}{\left(\int_{-1}^1
1663: e^{\kappa t}(1-t^2)^{\frac{p-3}{2}}dt\right)^2} \ \
1664: \]
1665: and
1666: \[
1667: A_p'(\kappa) = 1 - A_p(\kappa)^2 - \frac{p-1}{\kappa}A_p(\kappa) \
1668: \ .
1669: \ \ \]
1670:
1671: For $0 \leq \kappa_0 \leq \kappa_1 < \infty$ and $\kappa \in
1672: \left[\kappa_0 , \kappa_1\right]$, we observe $0 < c(\kappa) ,
1673: |c'(\kappa)|, A_p'(\kappa) < \infty$, and by the mean value theorem,
1674: \[E | c(\hat{\kappa}) e^{ \hat{\kappa}} - c(\kappa) e^{\kappa}|
1675: = E | (c(\kappa^*)+c'(\kappa^*)) e^{{\kappa}^*} ({\hat
1676: \kappa}-\kappa)| \ \ ,
1677: \]
1678: where $\kappa^*$ is a value between $\hat \kappa$ and $\kappa$.
1679: Consequently,
1680: \begin{eqnarray*} E | c(\hat{\kappa}) e^{
1681: \hat{\kappa}} - c(\kappa) e^{\kappa}| &\leq& \bar{C}(\kappa)
1682: \left\{E|{\hat \kappa}-\kappa|^2\right\}^{1/2} \\
1683: &\leq& C(\kappa)n^{-1/2}
1684: \end{eqnarray*}
1685: where the first inequality is by the H\"{o}lder inequality, and the
1686: second is by~\eqref{asymp-mse-kappa}.
1687:
1688: Similarly,
1689: \[E | c(\hat{\kappa}) e^{ -\hat{\kappa}} - c(\kappa) e^{-\kappa}|
1690: = E | (c'(\kappa^*)-c(\kappa^*)) e^{-{\kappa}^*} ({\hat
1691: \kappa}-\kappa)| \ \ ,
1692: \]
1693: and
1694: \[E \left| \frac{e^{ \hat{\kappa}}}{c(\hat{\kappa})} - \frac{e^{\kappa}}{c(\kappa)}\right|
1695: = E \left| \left(\frac{c(\kappa^*)-c'(\kappa^*)}{c(\kappa^*)^2}\right) e^{{\kappa}^*} ({\hat
1696: \kappa}-\kappa)\right| \ \ . \qedhere
1697: \]
1698: \end{proof}
1699:
1700: Expressing the estimated $\beta_0$-function is more challenging. For
1701: the case of the sphere $S^2$, an exact expression can be obtained.
1702: One can calculate that $c(\kappa) = \frac{\kappa}{\sinh(\kappa)}$, and
1703: from \eqref{eqn:g_kappa},
1704: \[
1705: g_{\kappa}(r) = \frac{e^{\kappa}}{2\sinh(\kappa)} - \frac{1}{2\kappa
1706: r}.
1707: \]
1708: from which we use (\ref{0-betti-sphere}) to obtain,
1709: \begin{equation} \label{0-betti-2}
1710: \beta_0(x , \kappa ) = \frac{e^{2\kappa}-1}{2\kappa[(1-x)e^{2\kappa}+x]}
1711: \end{equation}
1712: for $x \in (0,1]$ and $\kappa > 0$. Notice that $\beta_0(x , \kappa )
1713: \rightarrow 1$ as $\kappa \rightarrow 0$ and $\beta_0(x , \kappa )
1714: \rightarrow 0$ as $\kappa \rightarrow \infty$, for all $x \in (0,1)$.
1715: Furthermore, for (\ref{est-kappa}), \cite[9.3.9]{mardiaJupp:book}
1716: \begin{equation}
1717: \label{A_3}
1718: A_3(\kappa) = \coth \kappa - \tfrac 1 {\kappa} \ \ . \end{equation}
1719:
1720: We have the following:
1721: \begin{thm} For the von Mises-Fisher distribution on $S^2$, and fixed
1722: $\kappa > 0$,
1723: \[
1724: E \left|\left| \beta_0(x,{\hat \kappa}) -
1725: \beta_0(x,\kappa) \right|\right|_{\infty} \leq C(\kappa) n^{-1} \ \ ,
1726: \]
1727: as $n \rightarrow \infty$.
1728: \end{thm}
1729:
1730: \begin{proof}
1731: By the mean value theorem,
1732: \begin{equation} \label{eqn:mvt} \beta_0(x,{\hat \kappa})-
1733: \beta_0(x,\kappa) = \frac{\partial}{\partial
1734: \kappa} \beta_0(x,\tilde{\kappa}) ({\hat \kappa} - \kappa)
1735: \ ,
1736: \end{equation}
1737: where $\tilde{\kappa}$ is between $\hat{\kappa}$ and $\kappa$.
1738: One can calculate that
1739: \[
1740: \frac {\partial}{\partial {\kappa}}\beta_0(x,\kappa) =
1741: \frac{-(1-x)e^{4\kappa} + (1+2\kappa
1742: -2x)e^{2\kappa} + x}
1743: {2\kappa^2\left[(1-x)e^{2\kappa}+x\right]^2} \ \ .
1744: \]
1745: Recall that the domain of $\beta_0(x,\kappa)$ is $(0,1]$. For $x
1746: \in (0,1]$, $\left| \frac{\partial}{\partial \kappa}
1747: \beta_0(x,\kappa) \right|$ is bounded: for instance,
1748: \begin{equation} \label{eqn:ddkappaBound} \left| \frac
1749: {\partial}{\partial {\kappa}}\beta_0(x,\kappa)\right| \leq
1750: \frac{e^{4\kappa} + (1+2\kappa)e^{2\kappa}+1}{2\kappa^2} \ \ .
1751: \end{equation}
1752: Combining \eqref{eqn:mvt}, \eqref{eqn:ddkappaBound}, \eqref{asymp-mse-kappa} and
1753: \eqref{A_3} produces the desired result.
1754: \end{proof}
1755:
1756:
1757: \subsection{The Watson distribution}
1758:
1759: Recall that the Watson distribution on $S^{p-1}$ is given by
1760: \begin{equation} \label{eqn:watson}
1761: f_{\mu,\kappa}(x) = d(\kappa) \exp \{ \kappa (x^t \mu)^2 \} \text{, where }
1762: \mu \in S^{p-1} \text{ and } \kappa > 0.
1763: \end{equation}
1764: Let us parametrize $\mu$ using the spherical angles: $\mu = \mu(\phi)$, where $\phi = (\phi_1, \ldots, \phi_{p-1})^t$.
1765: Let $X_1, \ldots X_n$ be a random sample from the Watson distribution.
1766:
1767: If we take the sample to be fixed and the underlying parameters to be unknown, then the log-likelihood function of \eqref{eqn:watson} is given by:
1768: \begin{equation*}
1769: \ell(\phi,\kappa) = n \log d(\kappa) + \kappa \sum_{j=1}^n (X_j^t \mu(\phi))^2.
1770: \end{equation*}
1771: The maximum likelihood estimation of $\mu$ and $\kappa$ comes from the estimating equation:
1772: \begin{equation} \label{eqn:gradient}
1773: \nabla_{\phi,\kappa} \ell(\phi,\kappa) = 0,
1774: \end{equation}
1775: where $\nabla_{\phi,\kappa}$ denotes the gradient.
1776: Let $\hat{\phi}$ and $\hat{\kappa}$ be the solutions to \eqref{eqn:gradient}, which are the maximum likelihood estimators.
1777: Then the standard theory of maximum likelihood estimators~\cite[pp.294-296]{coxHinkley:book} shows that the large sample asymptotics satisfy:
1778: \begin{equation}
1779: \label{eqn:asymptotics}
1780: \sqrt{n}\left[ \left( \begin{array}{c} \hat{\phi} \\ \hat{\kappa} \end{array} \right) - \left( \begin{array}{c} {\phi} \\ {\kappa} \end{array} \right) \right] \to_d N_p(0,I(\phi,\kappa)^{-1})
1781: \end{equation}
1782: as $n \to \infty$, where ``$\to_d$'' means convergence in
1783: distribution, $I(\phi,\kappa)$ is the Fisher information
1784: matrix\footnote{The Fisher information matrix is defined to be
1785: $I(\phi,\kappa) = -E\nabla^2_{\phi,\kappa} \ell(\phi,\kappa)$, where
1786: $\nabla^2_{\phi,\kappa}$ is the $p \times p$ Hessian matrix.} and
1787: $N_p$ stands for the $p$-dimensional normal distribution with given
1788: mean and covariance.
1789: It turns out that in the case of the Watson distribution,
1790: \begin{equation*}
1791: I(\phi,\kappa) = \left[ \begin{array}{ccc} * & \vline & 0 \\ \hline 0 & \vline & -\frac{\partial^2}{\partial \kappa^2} \log d(\kappa) \end{array} \right].
1792: \end{equation*}
1793: Consequently, from~\eqref{eqn:asymptotics}, we have that %marginally
1794: \begin{equation*}
1795: \sqrt{n} (\hat{\kappa} -\kappa) \to_d N_1 \left( 0, - \left( \frac{\partial^2}{\partial \kappa^2} \log d(\kappa) \right)^{-1} \right),
1796: \end{equation*}
1797: as $n \to \infty$.
1798:
1799:
1800:
1801: \bibliographystyle{halpha}
1802: \bibliography{my}
1803:
1804:
1805: \end{document}
1806:
1807: