0607:math0607634/ds.tex

1: % LaTeX-2e document, 30pp, 2 figures

2:

3: %\documentclass[12pt]{article}

4: \documentclass[12pt]{amsart}

5:

6: \title{A Statistical Approach to Persistent Homology}

7: % \author{Peter

8: %   Bubenik\thanks{Cleveland State University, Department of

9: %     Mathematics, 2121 Euclid Ave. RT 1515, Cleveland OH 44115-2214,

10: %     USA, Email: p.bubenik@csuohio.edu. This research was partially

11: %     funded by the Swiss National Science Foundation grant

12: %     200020-105383.} \ and Peter T. Kim\thanks{Department of

13: %     Mathematics and Statistics, University of Guelph, Guelph, Ontario

14: %     N1G 2W1 Canada, Email: pkim@uoguelph.ca.  This research was

15: %     partially funded by NSERC grant OGP46204.}}

16: \author{Peter Bubenik}

17: \address{Cleveland State University, Department of Mathematics, 2121 Euclid Ave. RT 1515, Cleveland OH 44115-2214, USA}

18: \email{p.bubenik@csuohio.edu}

19: \thanks{This research was partially funded by the

20:     Swiss National Science Foundation grant 200020-105383.}

21: \author{Peter T. Kim}

22: \address{Department of Mathematics and Statistics, University of Guelph,

23: Guelph, Ontario N1G 2W1 Canada}

24: \email{pkim@uoguelph.ca}

25: \thanks{This research was partially funded by NSERC grant OGP46204.}

26: \date{\today}

27:

28: \usepackage{amsmath}

29: \usepackage{amsthm}

30: \usepackage{amssymb}

31: \usepackage{ifpdf}

32:

33: \ifpdf

34: \usepackage[pdftex]{graphicx}

35: \DeclareGraphicsExtensions{.pdf}

36: \usepackage{hyperref}

37: \else

38: \usepackage[dvips]{graphicx}

39: \DeclareGraphicsExtensions{.eps}

40: \usepackage[dvipdfm]{hyperref}

41: \fi

42:

43: % use pdfsync package to switch between pdf output and emacs input:

44: %\usepackage{pdfsync}

45:

46: \newtheorem{thm}{Theorem}[section]

47: \newtheorem{lemma}[thm]{Lemma}

48: \newtheorem{prop}[thm]{Proposition}

49: \newtheorem{claim}[thm]{Claim}

50: \newtheorem{cor}[thm]{Corollary}

51: \newtheorem{conj}[thm]{Conjecture}

52:

53: \theoremstyle{definition}

54: \newtheorem{defn}[thm]{Definition}

55: \newtheorem{eg}[thm]{Example}

56:

57: \theoremstyle{remark}

58: \newtheorem{rem}[thm]{Remark}

59: \newtheorem{notn}[thm]{Notation}

60: \newtheorem{goal}[thm]{Goal}

61: \newtheorem{question}[thm]{Question}

62:

63: \renewcommand{\theequation}{\thesection.\arabic{equation}}

64:

65: \numberwithin{equation}{section}

66:

67: \newcommand{\beq}{\begin{equation}}

68: \newcommand{\eeq}{\end{equation}}

69: \newcommand {\abs}[1] {\lvert#1\rvert}

70: \newcommand {\M} {\ensuremath {\mathcal{M}} }

71: \newcommand {\N} {\ensuremath {\mathbb{N}} }

72: \newcommand {\Z} {\ensuremath {\mathbb{Z}} }

73: \newcommand {\Q} {\ensuremath {\mathbb{Q}} }

74: \newcommand {\R} {\ensuremath {\mathbb{R}} }

75: \newcommand {\eR} {\ensuremath {\overline{\mathbb{R}}} }

76: \newcommand {\RP} {\ensuremath {\mathbb{RP}} }

77: \newcommand {\F} {\ensuremath {\mathbb{F}} }

78: \newcommand {\Rn} {\ensuremath {\mathbb{R}^n} }

79: \newcommand {\Rinfty} {\ensuremath {\mathbb{R}^{\infty}} }

80: \newcommand {\isom} {\ensuremath {\cong} }

81: \newcommand {\tensor} {\ensuremath {\otimes} }

82: \newcommand {\incl} {\ensuremath {\hookrightarrow} }

83: \newcommand {\injects} {\ensuremath {\hookrightarrow} }

84: \newcommand {\onto} {\ensuremath {\twoheadrightarrow} }

85: \newcommand {\isomto} {\ensuremath {\xrightarrow{\isom}} }

86: \newcommand {\xto}[1] {\ensuremath {\xrightarrow{#1}} }

87: \newcommand {\Iff} { if and only if }

88: \newcommand {\opensubset} {\stackrel{\subset}{{\scriptscriptstyle

89:       \open}}}

90: \newcommand {\BR} {\mathcal{B}_{\R}}

91: \newcommand {\BRn} {\mathcal{B}_{\Rn}}

92: \newcommand {\maxf} {\max(f_{\kappa})}

93: \newcommand {\minf} {\min(f_{\kappa})}

94: \newcommand {\cF} {{\mathcal F}}

95: \newcommand {\cR} {{\mathcal R}}

96: \newcommand {\cC} {{\mathcal {C}}}

97: \newcommand {\E} {{\mathcal E}}

98: \newcommand {\CX} {C_*(X)}

99: \newcommand {\Cd} {(C,d)}

100: \newcommand {\homoteq} {\approx}

101:

102:

103: \DeclareMathOperator{\Id}{Id}

104: \DeclareMathOperator{\im}{im}

105: \DeclareMathOperator{\Const}{Const}

106: \DeclareMathOperator{\tr}{tr}

107: \DeclareMathOperator{\diag}{diag}

108:

109: \begin{document}

110:

111: \maketitle

112:

113: % to add whitespace between all lines:

114: %\baselineskip = 20pt plus 3pt minus 3pt

115:

116: \begin{abstract}

117:   Assume that a finite set of points is randomly sampled from a

118:   subspace of a metric space.  Recent advances in computational

119:   topology have provided several approaches to recovering the

120:   geometric and topological properties of the underlying space.  In

121:   this paper we take a statistical approach to this problem. We assume

122:   that the data is randomly sampled from an unknown probability

123:   distribution.  We define two filtered complexes with which we can

124:   calculate the persistent homology of a probability distribution.

125:   Using statistical estimators for samples from certain families of

126:   distributions, we show that we can recover the persistent homology

127:   of the underlying distribution.

128: \end{abstract}

129:

130:

131: \section{Introduction}

132:

133: There is growing interest in characterizing topological features of

134: data sets.  Given a finite set, sometimes called \emph{point cloud

135:   data (PCD)}, that is randomly sampled from a subspace $X$ of some metric

136: space, one hopes to recover geometric and topological properties of

137: $X$.  Using random samples, P. Niyogi, S. Smale and S. Weinberger

138: \cite{niyogiSmaleWeinberger} show how to recover the homology of

139: certain submanifolds. In \cite{chazalCohen-SteinerLieutier} the homotopy-type of certain compact

140: subsets is recovered.

141:

142: A finer descriptor, developed by H. Edelsbrunner, D. Letscher, A.

143: Zomorodian and G. Carlsson, is that of \emph{persistent homology}

144: \cite{edelsbrunnerLetscherZomorodian, zomorodianCarlsson:computingPH}.  While it

145: is not a homotopy invariant, it is stable under small

146: changes~\cite{cohen-steinerEdelsbrunnerHarer}.  Using the PCD and the

147: metric, one can construct a filtered simplicial complex which

148: approximates the unknown space

149: $X$~\cite{deSilvaCarlsson,czcg:persistenceBarcodesForShapes}.

150: This leads naturally to a spectral sequence. What is unusual, is that

151: the homology of the start of the spectral sequence is uninteresting,

152: and so is what it converges to. Nevertheless, the intermediate

153: homology, called \emph{persistent homology} is of interest. It can be

154: described using \emph{barcodes}, which are analogues of the Betti

155: numbers.

156:

157: The aim of this paper is to take a statistical approach to these

158: ideas. We assume that the data is sampled from a manifold with respect

159: to a probability distribution. Given such a distribution, we construct

160: two filtered chain complexes: the \emph{Morse complex}, and the

161: \emph{\v{C}ech complex}. For most of the distributions we consider,

162: these complexes are related by Alexander duality. Using persistent

163: homology, one can calculate the corresponding Betti barcodes, which

164: provide a topological description of the distribution. In the case of

165: the \v{C}ech complex we define a Betti--$0$ function. We apply to these

166: methods to several parametric families of distributions: the von

167: Mises, von Mises-Fisher, Watson and Bingham distributions on $S^{p-1}$

168: and the matrix von Mises distribution on $SO(3)$.

169:

170: Given a sample, it is assumed that the underlying distribution is

171: unknown, but that it is one of a parametrized family. We use

172: statistical techniques to estimate the parameter. These are then

173: used to estimate the barcodes. As a result, we prove that we can

174: recover the persistent homology of the underlying distribution.

175:

176: \begin{thm}

177:   Let $x_1, \ldots, x_n$ be a sample from $S^{p-1}$ according to the

178:   von Mises--Fisher distribution with fixed concentration parameter

179:   $\kappa \geq 0$. Given the sample, let $\hat{\kappa}$ be the maximum

180:   likelihood estimator for $\kappa$ (which is given by formula

181:   \eqref{est-kappa}). Let $\beta_{\kappa}$ and $\beta_{\hat{\kappa}}$

182:   denote the Betti barcodes for the persistent homology of the

183:   densities associated with $\kappa$ and $\hat{\kappa}$ using either

184:   the Morse or the \v{C}ech filtration. Finally let $E(\cdot)$

185:   denote the expectation, and $\mathcal{D}$ denote the barcode

186:   metric (see Definition~\ref{def:barcodeMetric}). Then,

187:   \begin{equation*}

188:     E (\mathcal{D}(\beta_{\hat{\kappa}},\beta_{\kappa})) \leq C(\kappa) n^{-1/2},

189:   \end{equation*}

190:   as $n \to \infty$, for some constant $C(\kappa)$.

191: \end{thm}

192:

193:

194: We also show that the classical theory of spacings \cite{pyke:spacings}

195: can be used to calculate the exact expectations of the Betti barcodes

196: for samples from the uniform distribution on $S^1$ together with their

197: asymptotic behavior.

198:

199: As part of results, we show that the Morse filtrations of our

200: distributions each correspond to a relative CW-structure for the

201: underlying spaces. The von Mises and von Mises-Fisher distributions

202: correspond to the decomposition $S^{p-1} \approx * \cup_* D^{p-1}$,

203: the Watson distribution corresponds to $S^{p-1} \approx S^{p-2}

204: \cup_{\Id \amalg -\Id} (D^{p-1} \amalg D^{p-1})$, and the Bingham

205: distribution corresponds to $S^{p-1} \approx * \cup_{\Id \amalg -\Id}

206: (D^1 \amalg D^1) \cup_{\Id \amalg -\Id} (D^2 \amalg D^2) \cup \ldots

207: \cup_{\Id \amalg -\Id} (D^{p-1} \amalg D^{p-1})$. Finally, the Morse

208: filtration on the matrix von Mises distribution on $SO(3)$ corresponds

209: to the decomposition $\RP^2 \cup_f D^3$ where $f:S^2 \to \RP^2$

210: identifies antipodal points. Interestingly, the last decomposition is

211: obtained by using the Hopf fibration $S^0 \to S^3 \to \RP^3$.

212:

213:

214: A summary of the paper goes as follows.  In Section \ref{notation}, we

215: go over the background and notation used in this paper.  We review

216: both the statistical and the topological terminologies.  In

217: Section \ref{sectionPriorWork} we discuss filtrations and persistent

218: homology and

219: %in Section \ref{sectionPersistentHofD}

220: we develop two filtrations for densities.  In Section

221: \ref{sectionBettiBofS} we use the theory of spacings to give exact

222: estimates of the persistent homology of uniform samples on $S^1$.  In

223: Section \ref{sectionBarcodesOfDensities} we calculate the persistent

224: homology of some standard parametric families of densities on

225: $S^{p-1}$ and $SO(3)$.  In Section \ref{statestimation} we use maximum

226: likelihood estimators to recover the persistent homology of the underlying

227: density.

228:

229: \section{Background and notation}

230: \label{notation}

231:

232: In an attempt to make this article accessible to a broad audience, we

233: define some of the basic statistical and topological terms we will be

234: using.

235:

236:

237: \subsection{Statistics}

238:

239: Given a manifold $\M$ with Radon measure $\nu$, a \emph{density} is a

240: function $f: \M \to [0,\infty]$ such that $f d\nu$ is a

241: \emph{probability distribution} on $\M$ with $\int_\M f d\nu = 1$.

242: A common statistical example is to take $\M = \R^p$, and $d\nu$ to

243: be the $p-$dimensional Lebesgue measure.  A density in this case would

244: be a nonnegative function that integrates to unity.  We can also take

245: $\M = S^{p-1}$, the $(p-1)$-dimensional unit sphere, with $d\nu$ being

246: the $(p-1)$-dimensional spherical measure.  In this case a density is

247: referred to as a \emph{directional density}.  For $\M$ a compact

248: connected orientable Riemannian manifold, $d\nu$ would be the measure

249: induced by the Riemannian structure.

250:

251: In statistics, we think of a family of probability densities parametrized

252: accordingly

253: \begin{equation} \label{density_par}

254: \left\{ f_{\vartheta} : \vartheta \in \Theta\right\}   \ \ ,

255: \end{equation}

256: where $\vartheta$ is called a \emph{parameter} and $\Theta$ is called the

257: \emph{parameter space}.  The parameter space $\Theta$ can be quite general

258: and if it is some subset of a finite-dimensional vector space, then (\ref{density_par})

259: is referred to as a \emph{parametric} family of densities, otherwise it is

260: known as a \emph{nonparametric} family of densities.  Subsequent to this,

261: the corresponding statistical

262: problem will be referred to as either a parametric statistical procedure, or, a

263: nonparametric statistical procedure, depending on whether we are dealing with

264: a parametric, or nonparametric family of densities, respectively.

265:

266: Some parametric examples are in order.  Let $\M = \R^p$ and consider

267: the normal family of location scale probability densities,

268: \begin{equation} \label{normal} f_{\mu, \sigma}(x) = (2 \pi \sigma^2)^{-p/2} \exp

269: \left\{ \tfrac {\|x-\mu\|^2}{2\sigma^2} \right\} \ \ , \end{equation} where

270: $\mu, x \in \R^p$ and $\sigma^2 \in [0,\infty)$.  Letting $\vartheta =

271: (\mu , \sigma^2)$, we note that this parametric problem has $\Theta =

272: \R^p \times [0,\infty )$ as its parameter space.

273:

274: If we take $\M=S^{p-1}$, a well known example of a directional

275: density, and one that will be used in this paper is given by

276: \begin{equation} \label{vmf} f_{\mu,\kappa}(x) = c(\kappa) \exp\left\{\kappa x^t

277:   \mu\right\}, \end{equation} where $\mu , x \in S^{p-1}$, $\kappa \in

278: [0,\infty)$, $c(\kappa)$ is the normalizing constant and superscript

279: ``$t$" denotes transpose.  The distribution arising from

280: $f_{\mu,\kappa}$ is called the \emph{von Mises-Fisher distribution}

281: where this parametric problem has $\Theta = S^{p-1} \times [0,\infty

282: )$ as its parameter space.

283:

284: Somewhat related to the above is the situation where $\M = SO(p)$, the

285: space of $p \times p$ rotation matrices.  Let \begin{equation} \label{mvmf}

286: f_{\mu,\kappa}(x) = c(\kappa) \exp\left\{\kappa {\rm tr}\, x^t

287:   \mu\right\}, \end{equation} where $\mu , x \in SO(p)$, $\kappa \in [0,\infty)$

288: and $c(\kappa)$ is the normalizing constant.  The distribution arising

289: from $f_{\mu,\kappa}$ is called the \emph{matrix von Mises-Fisher

290:   distribution} where this parametric problem has $\Theta = SO(p)

291: \times [0,\infty )$ as its parameter space.

292:

293: A \emph{sample} $X_1, X_2, \ldots X_N$ is a sequence of independent

294: and identically distributed random quantities on $\M$

295: drawn according to the density $f_{\vartheta}$ for some fixed but unknown

296: $\vartheta \in \Theta$.  The parameter of interest would be the fixed but unknown

297: parameter $\vartheta$, or, more generally, some transformation $\tau(\vartheta)$

298: thereof.  Statistically, we want to find an estimator

299: ${\tilde \tau} = {\tilde \tau}(X_1, \ldots , X_N)$ of

300: $\tau(\vartheta)$. Given some metric $\gamma$ on $\tau(\Theta)$, the

301: performance of the estimator is evaluated relative to this metric in

302: expectation with respect to the joint probability density of the sample,

303: \begin{equation} \label{expectation}

304: E_{\vartheta}\gamma\left({\tilde \tau}, \tau \right)

305: = \int_{\M}\cdots \int_{\M}\gamma\left({\tilde \tau}, \tau\right)f_{\vartheta}

306: \cdots f_{\vartheta} d\nu \cdots d\nu \ \ ,

307: \end{equation}

308: where the above represents an $N-$fold integration and

309: $\vartheta \in \Theta$.

310: Thus the relative merit of one estimator over another estimator can be evaluated

311: using (\ref{expectation}) in a statistical decision theory context, see~\cite{berger:statisticalDecisionTheory}.

312:

313: There are a wide variety of different distributions for a given

314: manifold, as well as sample spaces that are different manifolds.

315: References that discuss these topics can be found in the books by

316: Mardia and Jupp~\cite{mardiaJupp:book} and

317: Chikuse~\cite{chikuse:book}.  Furthermore, although nonparametric

318: statistical procedures on compact Riemannian manifolds are available, \cite{hendriks, efromovich, angersKim, kimKoo},

319: %Hendriks (1990), Efromovich (2000) and Angers and Kim (2005),

320: in this paper we will deal with parametric statistical procedures.

321:

322: \subsection{Topology} \label{sectionBackgroundTopology}

323:

324:

325: Let $R$ be a commutative ring with identity. (In fact, we will only be

326: interested in cases where $R$ is a field, in which case $R$-modules are

327: vector spaces and $R$-module morphisms are linear maps of vector

328: spaces.)

329: \begin{defn}

330: A \emph{chain

331:   complex} over $R$ is a sequence of $R$-modules $\{C_i\}_{i \in \Z}$

332: together with $R$-module morphisms $d_i: C_i \to C_{i-1}$ called

333: \emph{differentials} such that $d_i \circ d_{i+1} = 0$. This condition

334: is often abbreviated to $d^2=0$.  The elements of $C_n$ are called

335: \emph{$n$-chains}.  This chain complex is denoted by $(C,d)$.

336: \end{defn}

337:

338: \begin{defn}

339:   An (abstract) \emph{simplicial complex} $K$ is a set of finite,

340:   ordered subsets of an ordered set $\bar{K}$, such that

341: \begin{itemize}

342: \item the ordering of the subsets is compatible with the ordering of

343:   $\bar{K}$, and

344: \item if $\alpha \in K$ then any nonempty subset of $\alpha$ is also

345:   an element of $K$.

346: \end{itemize}

347: The elements of $K$ with $n+1$ elements are called $n$-simplices and

348: denoted $K_n$.

349: \end{defn}

350:

351: \begin{defn} \label{defn:chainComplexOnK} Given a simplicial complex

352:   $K$, the \emph{chain complex} on $K$, denoted $(C_*(K),d)$ is

353:   defined as follows. Let $C_n(K)$ be the free $R$-module with basis $K_n$.

354:   We define the differential on $K_n$ and extend it to $C_n(K)$ by

355:   linearity.  For $[v_0, \ldots, v_n] \in K_n$ define

356: \[ d[v_0, \ldots, v_n] = \sum_i (-1)^i [v_0, \ldots, \hat{v}_i, \ldots, v_n],

357: \]

358: where $\hat{v}_i$ denotes that the element $v_i$ is omitted from the sequence.

359: \end{defn}

360:

361: For $n\geq 0$, the \emph{standard $n$-simplex} is the $n$-dimensional

362: polytope in $\R^{n+1}$, denoted $\Delta^n$, whose vertices are given

363: by the standard basis vectors $e_0,\ldots ,e_n$. It is just

364: the convex hull of the standard basis vectors; that is \begin{equation}

365: \label{simplex}

366: \Delta^n = \left\{x = \sum_{i=0}^n a_i e_i \ \left| \ \forall i \ a_i

367:     \geq 0 \text{ and } \sum_{i=0}^n a_i = 1 \right. \right\}.

368: \end{equation}

369: There are inclusion maps

370: \begin{equation}

371: \label{inclusion}

372: \delta_i: \Delta^n \to \Delta^{n+1}

373: \end{equation}

374: (called the $i$-th face inclusion) are given

375: by $\delta_i(x_0,\ldots x_n) = (x_0,\ldots, x_{i-1}, 0, x_{i}, \ldots,

376: x_n)$ for $0 \leq i \leq n+1$.

377:

378: \begin{defn} \label{defn:singularChainComplex}

379: Let $X$ be a topological space.

380: For $n\geq 0$, let $C_n(X)$ be the free $R$-module generated by the

381: set of continuous maps $\{\phi: \Delta^n \to X\}$.

382: For $n<0$, let $C_n(X) = 0$.

383: For $\phi: \Delta^n \to X$ let

384: \begin{equation} \label{boundarymaps}

385: d(\phi) = \sum_{i=0}^n (-1)^i \ \phi \circ \delta_i \ \in C_{n-1}(X).

386: \end{equation}

387: Extend this by linearity to an $R$-module morphism

388: $d: C_n(X) \to C_{n-1}(X)$.

389: One can check that $d^2=0$ so this defines a differential and $C_*(X) =

390: (\{C_n(X)\}_{n \in \Z}, d)$ is a chain complex,

391: called the \emph{singular chain complex}.

392: \end{defn}

393:

394: \begin{defn} \label{defn:homology} Given a chain complex $(C,d)$, let

395:   $Z_k$ be the submodule given by $\{x \in C_k \ | \ dx = 0\}$ called

396:   the \emph{$k$-cycles}, and let $B_k$ be the submodule given by $\{ x

397:   \in C_k \ | \ \exists y \in C_{k+1} \text{ such that } dy = x\}$,

398:   called the \emph{$k$-boundaries}.  Since $d^2=0$, $d(dy)=0$ and thus

399:   $B_k \subset Z_k$.  The \emph{$k$-th homology} of $(C,d)$, denoted

400:   $H_k(C,d)$ is given by the $R$-module $Z_k/ B_k$.  The homologies

401:   $\{H_k(C,d)\}_{k \in \Z}$ form a chain complex with differential $0$

402:   denoted $H_*(C,d)$ and called the homology of $(C,d)$.  If $R$ is a

403:   principal ideal domain (for example, if $R$ is a field) and

404:   $H_k(C,d)$ is finitely generated, then $H_k(C,d)$ is the direct sum

405:   of a free group and a finite number of finite cyclic groups.  The

406:   \emph{$k$-th Betti number} $\beta_k(C,d)$ is the rank of the free

407:   group.  If $R$ is a field, then $\beta_k(C,d)$ equals the dimension

408:   of the vector space $H_k(C,d)$.  If $X$ is a topological space then

409:   $H_*(X)$ denotes the homology of the singular chain complex on $X$.

410: \end{defn}

411:

412: \begin{defn}

413:   Two spaces $X$ and $Y$ are said to be homotopy equivalent (written

414:   $X \homoteq Y$) if there are maps $f:X \to Y$ and $g:Y \to X$ such

415:   that $g \circ f$ is homotopic to the identity map on $X$ and $f

416:   \circ g$ is homotopic to the identity map on $Y$.

417: \end{defn}

418:

419: \begin{rem} \label{rem:contractible} If $X \homoteq Y$ then $H_*(X)

420:   \isom H_*(Y)$. So if $X$ is a \emph{contractible space} (that is, a

421:   space which is homotopy equivalent to a point), then $H_0(X) \isom

422:   R$ and $H_k(X) = 0$ for $k \geq 1$.

423: \end{rem}

424:

425: \section{Filtrations and persistent homology} \label{sectionPriorWork}

426:

427: From now on, we will assume that the ground ring is a field $\F$.

428:

429: \subsection{Persistent homology} \label{sectionPersistentHomology}

430:

431: In Definition~\ref{defn:homology} we showed how to calculate the

432: homology of a chain complex.  Given some additional information on the

433: chain complex, we will calculate homology in a more sophisticated way.

434: Namely, we will show how to calculate the \emph{persistent homology}

435: of a \emph{filtered chain complex}.  This will detect homology classes

436: which persist through a range of values in the filtration.

437:

438: Let $\eR$ denote the totally ordered set of extended real numbers $\eR = \R \cup \{-\infty, \infty\}$. Then an increasing

439: \emph{$\eR$-filtration} on a chain complex $(C,d)$ is a sequence of

440: chain complexes $\{\cF_r(C,d)\}_{r \in \eR}$ such that $\cF_r(C,d)$ is

441: a subchain module of $(C,d)$ and $\cF_r(C,d) \subset \cF_{r'}(C,d)$

442: whenever $r \leq r' \in \eR$.  A chain complex together with a

443: $\eR$-filtration is called a \emph{$\eR$-filtered chain complex}.

444:

445: For a filtered chain complex, the inclusions $\cF_j(C,d) \to

446: \cF_{j+l}(C,d)$ induce maps

447: \[

448: H_k(F_j(C,d)) \to H_k(F_{j+l}(C,d)).

449: \]

450: The image of this map is call the $l$-persistent $k$-th homology of

451: $\cF_j(C,d)$.

452:

453: Let $Z^i_k = Z_k(\cF_i(C,d))$ and let $B_k^i = B_k(\cF_i(C,d))$.

454: Assume $\alpha \in Z^i_k$. Then $\alpha$ represents a homology class

455: $[\alpha]$ in $H_*(\cF_i(C,d))$.  Furthermore since $Z^i_k \subset

456: Z^{i'}_k$ for all $i'\geq i$, $\alpha$ also represents a homology

457: class in $H_*(\cF_{i'}(C,d))$, which we again denote $[\alpha]$.  One

458: possibility is that $[\alpha]\neq 0$ in $H_k(\cF_i(C,d))$ but

459: $[\alpha]= 0$ in $H_k(\cF_{i'}(C,d))$ for some $i'>i$.

460:

461: Assume $\Cd$ is a chain complex with an $\eR$-filtration

462: ${\cF}_r(\Cd)$ such that

463: \begin{equation} \label{eqnFiltrnCndn}

464:   \bigcup_{r \in \eR} {\cF}_r\Cd

465:   = \Cd \text{ and } \bigcap_{r \in \eR} {\cF}_r\Cd = 0.

466: \end{equation}

467: Equivalently, $\cF_{\infty}\Cd = \Cd$ and $\cF_{-\infty}\Cd = 0$.

468:

469:

470: \begin{lemma} \label{lemmar} Let $\Cd$ be a filtered chain complex

471:   satisfying \eqref{eqnFiltrnCndn}.  For any $n$-chain $\alpha \in

472:   \Cd$, there is some smallest $r \in \eR$ such that $\alpha \notin

473:   {\cF}_{r'}\Cd$ for all $r' < r$ and $\alpha \in {\cF}_{r''}\Cd$ for

474:   all $r'' > r$.

475: \end{lemma}

476:

477: \begin{proof}

478: This follows from the definition of an $\eR$-filtration, the

479: assumption \eqref{eqnFiltrnCndn}, and the linear ordering of $\eR$.

480: \end{proof}

481:

482: \begin{lemma} \label{lemmaHomologyInterval}

483:   For any $n$-cycle $\alpha \in Z_n$, the set of all $r\in \eR$ such

484:   that $0 \neq [\alpha] \in H_n({\cF}_r\Cd$ is either empty or is

485:   an interval.

486: \end{lemma}

487:

488: \begin{proof}

489:   Let $\alpha \in Z_n$, and let $r_1$ be the corresponding value given by   Lemma~\ref{lemmar}.

490:

491:   If there is some $\beta \in C_{n+1}$ such that $d\beta = \alpha$ then again   let $r_2$ be the corresponding value given by Lemma~\ref{lemmar}.  Since   $\beta \in {\cF}_j\Cd$ implies that $d\beta \in {\cF}_j\Cd$, it follows that   $r_2 \geq r_1$.  Thus $\alpha$ represents a nonzero homology class in   ${\cF}_r\Cd$ exactly when $r$ is in the (possibly empty) interval beginning   at $r_1$ and ending at $r_2$.  This interval contains $r_1$ if and only if   $\alpha \in {\cF}_{r_1}\Cd$, and it does not contain $r_2$ if and only if   $\beta \in {\cF}_{r_2}\Cd$.

492:

493: If $\alpha$ is not a $k$-boundary then $\alpha$ represents a nonzero homology class in ${\cF}_r\Cd$ exactly when $r$ is in the interval $\{x \ | \ x \geq r_1\}$ or $\{x \ | \ x > r_1\}$.  beginning at $r_1$. Again this interval contains $r_1$ if and only if $\alpha \in {\cF}_{r_1}\Cd$.

494: \end{proof}

495:

496: \begin{defn}

497:   For $\alpha \in Z_k$ define the \emph{persistence $k$-homology

498:     interval} represented by $\alpha$ to be the interval given by

499:   Lemma~\ref{lemmaHomologyInterval}.  Denote it by $I_{\alpha}$.

500: \end{defn}

501:

502: \begin{defn} \label{defn:barcode} Define a \emph{Betti--$k$ barcode}

503:   to be a set of intervals\footnote{In

504:     Section~\ref{sectionPersistentHofD} we will see that using the

505:     \v{C}ech filtration, the Betti--$0$ barcode of manifolds will have

506:     uncountably many intervals, so we will define a more appropriate

507:     descriptor, the Betti--$0$ function. In

508:     Section~\ref{sectionBettiBofS} it will also be useful to convert

509:     finite Betti barcodes to functions so that we can analyze limiting

510:     and asymptotic behavior.}  $\{J_{\alpha}\}_{\alpha \in S \subset

511:     Z_k}$ such that

512: \begin{itemize}

513: \item $J_{\alpha}$ is a subinterval of $I_{\alpha}$, and

514: \item for all $r \in \eR$, $\{[\alpha] \ | \ \alpha \in S, \ r \in

515:   J_{\alpha}\}$ is an $\F$-basis for $H_k({\cF}_r\Cd)$.

516: \end{itemize}

517: We will sometimes use $\beta_k$ to denote a Betti--$k$ barcode.

518: \end{defn}

519:

520:

521: The set of barcodes has a

522: metric~\cite{czcg:persistenceBarcodesForShapes} defined as follows.

523:

524: \begin{defn} \label{def:barcodeMetric} Given an interval $J$, let

525:   $\ell(J)$ denote its length. Given two intervals $J$ and

526:   $J'$, the \emph{symmetric difference}, $\Delta(J,J')$, between them

527:   is the one-dimensional measure of $J \cup J' - J \cap J'$. Given two

528:   barcodes $\{J_{\alpha}\}_{\alpha \in S}$ and

529:   $\{J'_{\alpha'}\}_{\alpha' \in S'}$, a \emph{partial matching}, $M$,

530:   between the two sets is a subset of $S\times S'$ where each $\alpha$

531:   and $\alpha'$ appears at most once. Define

532:   \begin{equation*}

533:     \mathcal{D}(\{J_{\alpha}\}_{\alpha \in S},

534:     \{J'_{\alpha'}\}_{\alpha' \in S'}) = \min_M

535:     \left( \sum_{(\alpha,\alpha') \in M}

536:       \Delta(J_{\alpha},J'_{\alpha'}) + \sum_{\alpha \notin M_1}

537:       \ell(J_{\alpha}) + \sum_{\alpha' \notin M_2} \ell(J'_{\alpha'}) \right),

538:   \end{equation*}

539:   where the minimum is taken over all partial matchings, and $M_i$ is

540:   the projection of $M$ to $S_i$.

541:   This defines a quasi-metric (since its value may be infinite). If

542:   desired, it can be converted into a metric.

543: \end{defn}

544:

545:

546:

547:

548: \subsection{Persistent homology from point cloud data}

549: \label{sectionPersistentHfPCD}

550: Let $(\M,\rho)$ be a manifold with a metric $\rho$.

551: Let $X = \{x_1, x_2, \ldots, x_n\} \subset \M$.

552: $X$ is called \emph{point cloud data}.

553: One would like to be able to obtain information on $\M$ from $X$.

554: If $X$ contains sufficiently many uniformly distributed points one may be

555: able to construct a complex from $X$ that in some sense reconstructs $\M$.

556:

557: One such construction is the following $\eR$-filtered simplicial

558: complex called the \v{C}ech complex.  Recall that we are working over

559: a ground field $\F$. Let $\cC_*(X)$ be the largest simplicial complex

560: on the ordered vertex set $X$.  That is $\cC_0(X) = X$ and for $k\geq

561: 1$, $\cC_k(X)$ consists of the ordered subsets of $X$ with $k+1$

562: elements.  Now filter this simplicial complex (along $\eR$) as

563: follows.  Given $r<0$, define $\cF^{\check{C}}_r(\cC_n(X))=0$ for all

564: $n$.  Let $B_r(x)$ denote the ball of radius $r$ centered at $x$.  For

565: $r \geq 0$ and $k\geq 1$, define $\cF^{\check{C}}_r(\cC_k(X))$ to be

566: the $\F$-vector space whose basis is the $k$-simplices $[x_{i_0},

567: \ldots, x_{i_k}]$ such that $\cap_{j=0}^k B_r(x_{i_j}) \neq 0$. We

568: remark that there are fast algorithms for computing

569: $\cF^{\check{C}}_r(\cC_k(X))$.\footnote{The balls of radius $r$

570:   centered at the points $\{x_{i_j}\}$ have nonempty intersection if

571:   and only if there is a ball of radius $r$ containing the points

572:   $\{x_{i_j}\}$. There are fast algorithms for the smallest enclosing

573:   ball problem\cite{fischerGaertnerKutz, gaertner:www}.}

574: $\cF^{\check{C}}_r(\cC_*(X))$ is called the $r$-\v{C}ech complex.  It

575: is the \emph{nerve} of the collection of balls $\{B_r(x_i)\}_{i=1}^n$,

576: and its geometric realization is homotopy equivalent to the union of

577: these balls.

578:

579: A related construction is the Rips complex. For each $r$, the $r$-Rips

580: complex, $\cF^R_r(\cC_*(X))$, is the largest simplicial complex

581: containing $\cF^{\check{C}}_r(\cC_1(X))$. That is, $\cF^R_r(\cC_*(X))$

582: is the $\F$-vector space whose basis is the set of $k$-simplices

583: $[x_{i_0}, \ldots, x_{i_k}]$ such that $\rho(x_{i_j}, x_{i_{\ell}})

584: \leq r$ for all pairs $0 \leq j, \ell \leq k$.

585:

586: Using either of these filtered chain complexes, one obtains a filtered

587: chain complex as follows.  Let $\Delta_*(\cC_*(X))$ be the chain

588: complex on $\cC_*(X)$.  Filter this over $\eR$ by letting

589: \[

590: \cF_r(\Delta_*(\cC_*(X))) = \Delta_*(\cF_r(\cC_*(X))) \text{, where }

591: \cF_r = \cF^{\check{C}}_r \text{ or } \cF^R_r.

592: \]

593: To simplify the notation, we write $\Delta_k(X) := \Delta_k(\cC_*(X))$.

594: We remark that these filtrations satisfy \eqref{eqnFiltrnCndn}:

595: \[

596: \bigcup_{r \in \eR} {\cF}_r(\Delta_*(X)) = \Delta_*(X) \text{ and }

597: \bigcap_{r \in \eR} {\cF}_r(\Delta_*(X)) = 0.

598: \]

599: Let $\alpha$ be an $n$-chain.

600: By Lemma~\ref{lemmar} we know that there is some $r \in \eR$ such that

601:  $\alpha \notin \cF_{r'}(\Delta_n(X))$ for all $r'<r$ and $\alpha \in

602: \cF_{r''}(\Delta_n(X))$ for all $r'' > r$.

603: In fact,

604:

605: \begin{lemma} \label{lemmaRipsr}

606: Consider an $n$-chain, $\alpha = \sum_{i=1}^m \alpha_i

607: (x_{i_0},\ldots, x_{i_n})$. For the \v{C}ech filtration let

608: \[

609: r = \max_{i=1 \ldots m} \min \{ r_i \ | \ \exists x \text{ such that }

610: B_{r_i}(x) \ni x_{i_0}, \ldots x_{i_n} \},

611: \]

612: and for the Rips filtration let

613: \[

614: r = \max_{i=1\ldots m} \max_{j\neq k}

615: \rho(x_{i_j},x_{i_k}) \ \ .

616: \]

617: Then $\alpha \notin \cF_{r'}(\Delta_n(X))$ for all $r'<r$ and $\alpha \in

618: \cF_{r''}(\Delta_n(X))$ for all $r''\geq r$.

619: \end{lemma}

620:

621: If $\alpha$ is an $n$-cycle then by Lemma~\ref{lemmaHomologyInterval}

622: there is a (possibly empty) persistence $n$-homology interval

623: corresponding to $\alpha$.

624: Applying Lemma~\ref{lemmaRipsr} to $\alpha$ and if there is some

625: $\beta \in \Delta_{k+1}(X)$ such that $d\beta = \alpha$, applying

626: Lemma~\ref{lemmaRipsr} to $\beta$, we get the following.

627:

628: \begin{lemma} \label{lemmaRipsHomologyInterval}

629:   Given an $n$-cycle $\alpha$, the persistence $n$-homology interval

630:   associated to $\alpha$ is either empty or has the form $[r_1,r_2)$

631:   or $[r_1,\infty]$.

632: \end{lemma}

633:

634: %[*** PB - mention Delaunay triangulations, Voronoi diagrams,

635: %  \v{C}ech complexes and $\alpha$-shape complexes.]

636:

637: %\section{Chain complexes filtered by densities and their

638: %corresponding persistent homologies}

639:

640:

641:

642: \subsection{Persistent homology of densities}

643: \label{sectionPersistentHofD}

644:

645: Let $f_{\vartheta}$ be a probability density on a manifold $\M$ for

646: some $\vartheta \in \Theta$.  We will use $f_{\vartheta}$ to define two

647: increasing $\eR$-filtrations on $C_*(\M)$, the singular chain complex on

648: $\M$ (see Definition~\ref{defn:singularChainComplex}).

649:

650: \subsubsection{The Morse filtration} \label{section:morse}

651:

652: For $r \in \eR$, the \emph{excursion sets}

653: \begin{equation} \label{eqnMr}

654: \M_{\leq r} = \{ x \in \M \ | \ f_{\vartheta}(x) \leq r\},

655: \end{equation}

656: (used in Morse theory~\cite{milnor:morseTheory}) filter

657: $\M$ over $\eR$.

658: Hence they also provide an $\eR$-filtration of the singular chain

659: complex $C_*(\M)$,

660: \[

661: \cF^M_r(C_*(\M)) = C_*(\M_{\leq r}),

662: \]

663: which we call the \emph{Morse filtration}.

664: We remark that for all $k$,

665: \[

666: H_k(\cF^M_r C_*(\M)) = H_k(\M_{\leq r}).

667: \]

668:

669: \subsubsection{The \v{C}ech filtration} \label{section:rips}

670:

671: There is a dual increasing filtration to the Morse filtration which uses superlevel sets instead of sublevel sets. We modify this filtration slightly so that it mirrors the filtration on the \v{C}ech complex defined in Section~\ref{sectionPersistentHfPCD}, and we will call it the \emph{\v{C}ech   filtration}.  We do this since the filtrations on the \v{C}ech complex and the related Rips complex are the main filtrations used in computations of persistent homology.

672:

673: Notice that in the \v{C}ech complex filtration all of the points in $X$, even distant outliers, appear when $r=0$. So the \v{C}ech filtration starts with all of the points of $M$ and the discrete topology, and then progressively connects the regions with decreasing density.

674:

675: For $r<0$ and all $k$, define $\cF^{\check{C}}_r(C_k(\M)) = 0$.

676: For $r\geq 0$, let $\cF^{\check{C}}_r(C_0(\M)) = C_0(\M)$.

677: Assume $k\geq 1$.

678: Let

679: \[

680: \Const_k = \{\phi:\Delta^k \to \M \ | \ \phi \text{ is constant} \}

681: \subset C_k(\M).

682: \]

683: For $0 \leq s \leq \infty$, let

684: \begin{equation} \label{eqnM1r}

685: \M_{\geq s} = \left\{m \in \M \ | \ f_{\vartheta}(m) \geq s \right\}.

686: \end{equation}

687: For $r \geq 0$, let

688: \begin{equation} \label{eqnFr}

689: \cF^{\check{C}}_r(C_k(\M)) = {\rm Const}_k + C_k(\M_{\geq \frac{1}{r}}).

690: \end{equation}

691: From this filtered chain complex we can calculate persistence $k$-homology intervals and Betti--$k$ barcodes just as in Section~\ref{sectionPersistentHfPCD}.

692:

693: \begin{lemma} \label{lemmaHkFr}

694: For $k\geq 1$, \[H_k(\cF^{\check{C}}_r(C_*(\M))) \isom H_k(\M_{\geq \frac{1}{r}}) \ \ .\]

695: \end{lemma}

696:

697: %\noindent{\bf Proof:}   Follows immediately from the definition of

698: %$\cF^{\check{C}}_r(C_*(M))$.  $\Box$

699:

700: \begin{proof}

701:   By definition, $Z_k({\cF^{\check{C}}}_r C_*(\M)) = {\rm Const}_k +

702:   Z_k C_*(\M_{\geq \frac{1}{r}})$, and $B_k({\cF^{\check{C}}}_r

703:   C_*(\M)) = {\rm Const}_k + B_k C_*(\M_{\geq \frac{1}{r}})$.  So

704: \[

705: H_k({\cF^{\check{C}}}_r C_*(\M) \isom Z_k(C_*(\M_{\geq \frac{1}{r}})) /

706: B_k(C_*(\M_{\geq \frac{1}{r}})) = H_k(\M_{\geq \frac{1}{r}}).

707: \]

708: \end{proof}

709:

710: Let $r \geq 0$.  Recall the notation of

711: Section~\ref{sectionPersistentHomology}: $Z^r_k =

712: Z_k(\cF^{\check{C}}_r(C_*(\M)))$ and $B^r_k =

713: B_k(\cF^{\check{C}}_r(C_*(\M))$.  To start, $Z^r_0 = \F[\M]$.  Then

714: $\cF^{\check{C}}_r(C_1(\M)) = \F[\{ \phi:\Delta^1 \to \M \ | \ \phi

715: \text{ is constant, or } \im{\phi} \subset \M_{\geq \frac{1}{r}}\}]$.

716:

717: For two points $x,y \in M$, there is some map $\phi:\Delta^1 \to \M$

718: such that $\phi(0)=x$, $\phi(1)=y$ and $\im(\phi) \subset \M_{\geq

719:   \frac{1}{r}}$, in which case $d\phi = x-y$, if and only if $x$ and

720: $y$ are in the same path component of $\M_{\geq \frac{1}{r}}$.  Thus

721: \[

722: H_0(\cF^{\check{C}}_r(C_*(\M))) \isom \F [ \M / \sim ],

723: \]

724: where $x \sim y$ if and only if $x$ and $y$ are in the same path

725: component of $\M_{\geq \frac{1}{r}}$.

726:

727:

728: In the case where $\M_{\geq \frac{1}{r}}$ is path-connected,

729: $H_0(\cF^{\check{C}}_r(C_*(\M))) \isom \F [ \M / \M_{\geq \frac{1}{r}}

730: ]$.  In particular $H_0(\cF^{\check{C}}_0(C_*(\M))) \isom

731: \F[\M/\M_{\geq \infty}]$.  Since $f_{\vartheta}$ is a probability

732: density, $\M_{\geq \infty}$ has measure $0$.  Therefore almost all $m

733: \in \M$ represent a distinct homology class in

734: $\cF^{\check{C}}_0(C_0(\M))$ and there are uncountably many

735: $0$-homology intervals.  As a result the Betti--$0$ barcode is not a

736: good descriptor.  In this section, we will describe how the

737: $0$-homology intervals can be used to describe a \emph{Betti--$0$

738:   function}, in the case where the density $f_{\vartheta}$ satisfies a

739: continuity condition.

740:

741: More generally, as long as $\M - \M_{\geq \frac{1}{r}}$ is uncountable

742: and $\M_{\geq \frac{1}{r}}$ has countably many path components, then

743: almost all homology classes in $H_0(\cF^{\check{C}}_r(C_*(\M)))$ have a unique

744: representative.  In this case we use this as justification to consider

745: only those homology classes with a unique representative.

746:

747:

748: Assume that for all $r$, $\M - \M_{\geq \frac{1}{r}}$ is uncountable

749: and $\M_{\geq \frac{1}{r}}$ has countably many path components, and

750: that the following continuity condition holds for all $m \in \M$:

751: \begin{equation} \label{eqnContinuityCndn} \forall \epsilon > 0, \

752:   \exists \text{ injective } \phi: [0,1] \to \M \text{ s.t. } \phi(0)

753:   = m \text{ and } f(\phi(t)) > f(m)-\epsilon.

754: \end{equation}

755: This condition holds if $f_{\vartheta}$ is continuous.

756:

757: \begin{lemma}

758:   Each $m \in M$ is a unique representative for $[m]$ for exactly

759:   those values of $r \in \left[0, \tfrac{1}{f_{\vartheta}(m)}\right)$

760:   or $r \in \left[0, \tfrac{1}{f_{\vartheta}(m)}\right]$.

761: \end{lemma}

762:

763: \begin{proof}

764:   Let $m \in \M$.  Since $dm=0$, $m\in Z^r_0$ for $r\geq 0$.  Let $[m]

765:   \in H_*(\cF^{\check{C}}_r(C_*(\M)))$ denote the homology class

766:   represented by $m$.

767: % [*** PB - turn this into a lemma]

768:   By definition $m \in \M_{\geq \frac{1}{r}}$ if and only if $r \geq

769:   \frac{1}{f_{\vartheta}(m)}$.  Thus $m$ is the unique representative

770:   for $[m]$ for $r < \frac{1}{f_{\vartheta}(m)}$.  By assumption, for

771:   any $\epsilon > 0$ there is a injective map $\phi: [0,1] \to \M$

772:   such that $\phi(0) = m$ and $f_{\vartheta}(\phi(t)) >

773:   f_{\vartheta}(m)-\epsilon$.  Then $\phi \in

774:   \cF^{\check{C}}_r(C_1(\M))$ where $r =

775:   \frac{1}{f_{\vartheta}(m)-\epsilon}$.  This implies that for any

776:   $\epsilon > 0$ there is a non-constant continuous map $\phi:

777:   \Delta^1 \to \M$ with $\phi(0)=m$ such that $\phi \in

778:   \cF^{\check{C}}_{\frac{1}{f_{\vartheta}(m)} + \epsilon}(C_1(\M))$.

779:   Hence $m$ is not a unique representative for $[m]$ for $r >

780:   \frac{1}{f_{\vartheta}(m)}$.  Therefore $m$ is a unique

781:   representative for $[m]$ for either $r \in

782:   \left[0,\frac{1}{f_{\vartheta}(m)}\right)$ or $r \in

783:   \left[0,\frac{1}{f_{\vartheta}(m)}\right]$.

784: \end{proof}

785:

786: Before we formally define the Betti--$0$ function, we give the

787: following intuitive picture.  We draw each of our intervals

788: $\left[0,\frac{1}{f_{\vartheta}(m)}\right]$ or

789: $\left[0,\frac{1}{f_{\vartheta}(m)}\right)$ vertically starting at

790: $r=0$ and ending at $r=f_{\vartheta}(m)$.  Furthermore we order the

791: intervals from left to right according to their length.  In fact we

792: draw all of the intervals between $x=0$ and $x=1$, where the $x$-axis

793: is scaled according to the probability distribution

794: $f_{\vartheta}d\nu$.  The increasing curve traced by the tips of the

795: intervals will be called the Betti--$0$ function.

796:

797: \begin{defn} \label{defn:betti0function} Formally, define the

798:   \emph{Betti--$0$ function} $\beta_0:(0,1] \times \Theta \to

799:   [0,\infty]$ as follows.\footnote{While our definition of $\beta_0$

800:     below \eqref{bb-0} is valid for $x=0$, we get

801:     $\beta_0(0,\vartheta) \equiv 0$. This does not provide any

802:     information, and is furthermore inappropriate in cases such as the

803:     von Mises distribution with $\kappa=0$ (see

804:     Section~\ref{sectionVonMises} below) where $\beta_0(x,\vartheta)$

805:     is constant and nonzero for $x>0$.}

806: %Recall that $\M_{\geq \frac{1}{r}}$ is defined in~\eqref{eqnM1r}.

807:   For $r \in [0,\infty]$, let

808:   \begin{equation} \label{eqngtheta} g_{\vartheta}(r) = \int_{\M_{\geq

809:         \frac{1}{r}}} f_{\vartheta} d\nu.

810:   \end{equation}

811:   Since $f_{\vartheta}$ is a probability density, $g_{\vartheta}$ is

812:   an increasing function $g_{\vartheta}: [0,\infty] \to [0,1]$ for

813:   each fixed ${\vartheta} \in \Theta$.  Also recall that $\M_{\geq

814:     \infty}$ has measure $0$ and by definition $\M_{\geq 0} = \M$.  So

815:   $g_{\vartheta}(0)=0$ and $g_{\vartheta}(\infty)=1$.  For $0 < x \leq

816:   1$, let

817: \begin{equation} \label{bb-0}

818: \beta_0(x,{\vartheta}) = \inf_{g_{\vartheta}(r) \geq x} r \ \ .

819: \end{equation}

820: If $g_{\vartheta}$ is continuous and strictly increasing,\footnote{In this case we can define

821:   $\beta_0(x,\vartheta)$ for $x \in [0,1]$.} then

822: \begin{equation} \label{bb-0c}

823:   \beta_0(x,{\vartheta}) = g_{\vartheta}^{-1}(x) \ \ ,

824: \end{equation}

825: for $\vartheta \in \Theta$.  That is,

826: $\beta_0(x,\vartheta)$ is the unique value of $r$ such that $\int_{M

827:   \geq \frac{1}{r}} f_{\vartheta} d\nu = x$.

828: \end{defn}

829:

830:

831:

832: \subsubsection{Alexander duality}

833:

834: The Morse and \v{C}ech filtration on $S^{p-1}$ are related by

835: Alexander duality.  Let $f$ be a density on $S^{p-1}$. Assume that $r

836: \in \im (f)$ and that $r < \sup (f)$.  Then $S^{p-1}_{f\leq r}$ is a

837: proper, nonempty subset of $S^{p-1}$.  Assume that $S^{p-1}_{f \leq

838:   r}$ is compact and a neighborhood retract.

839:

840: \begin{thm}[Alexander duality for the Morse and \v{C}ech filtrations on $S^{p-1}$]

841: Let $\tilde{H}$ denote reduced homology, let $\F$ be a field, and let $s=\frac{1}{r}$.

842: \[

843: \tilde{H}_i(S^{p-1}_{f > \frac{1}{s}}; \F) \isom

844: \tilde{H}^{p-2-i}(S^{p-1}_{f \leq r}; \F) \isom

845: \tilde{H}_{p-2-1}(S^{p-1}_{f\leq r}; \F).

846: \]

847: \end{thm}

848:

849:

850:

851: \section{Expected barcodes of PCD} \label{sectionBettiBofS}

852:

853: \subsection{Betti barcodes of uniform samples on $S^1$}

854: \label{sectionBettiUniform}

855:

856: Let $f$ be the uniform density on $S^1$.  Let $X = \{X_1, \ldots X_n\}

857: \subset S^1$ be a sample drawn according to $f$.  $X$ is called the

858: point cloud data.  In this section we consider the Betti barcodes

859: obtained for the persistent homology of $\cF^R_*(\Delta_*(X))$ the

860: Rips complex on $X$ (see Section~\ref{sectionPersistentHfPCD}). The

861: metric we use on $S^1$ is $\frac{1}{2\pi}$ times the shortest arc length

862: between two points (we have normalized so that the total length of $S^1$ is one).

863:

864: Before we continue, we introduce some notation.

865: Choose $\alpha$ such that $X_1 = e^{i \cos(\alpha)}$.

866: For $k = 2, \ldots n$ choose $U_k \in [0,1]$ such that

867: \[

868: X_k = e^{2\pi i (\alpha + U_k)}.

869: \]

870: We remark that each $U_k$ is uniformly distributed on $[0,1]$.  Now

871: reorder the $\{U_k\}$ to obtain the order statistic\footnote{Equality

872:   among any of the terms occurs with probability zero.}:

873: \[

874: 0 < U_{n:1} < U_{n:2} < \ldots < U_{n:n-1} < 1.

875: \]

876: Let $U_{n:0} = 0$ and $U_{n:n} = 1$.

877: Reorder the $\{X_k\}$ as $\{X_{n:k}\}$ to correspond with the $\{U_{n:k}\}$.

878: Then for $1 \leq k \leq n$ define

879: \[

880: S_k = U_{n:k} - U_{n:k-1}.

881: \]

882: The set $S = \{S_1, \ldots S_n\}$ is called the set of

883: spacings~\cite{pyke:spacings}.

884: We remark that if $U_k = U_{n:j}$ with $1\leq j \leq n-1$ and take the

885: usual orientation of $S^1$, then the

886: distances from $X_k$ to its nearest backward neighbor and nearest

887: forward neighbor are $S_j$ and $S_{j+1}$, respectively.

888: Also the distance from $X_1$ to its neighbors is $S_n$ and $S_1$.

889: It is well known (for example, \cite{devroye}) that

890: \begin{lemma} \label{lemma:spacingsDistribution}

891:   $(S_1,\ldots,S_n)$ is uniformly distributed on the standard

892:   $(n-1)$-simplex $\{(x_1,\ldots x_n) | x_i\geq 0, \sum_{i=1}^n x_i =

893:   1\}$.

894:   It follows that

895: \[

896: P[S_1>a_1; \cdots; S_n>a_n] =

897: \begin{cases}

898:   (1-\sum_{i=1}^n a_i)^{n-1}& \text{if } \sum_{i=1}^n a_i < 1,\\

899:   0& \text{otherwise.}

900: \end{cases}

901: \]

902: and

903: \begin{equation} \label{eqnProbSimplex}

904: \text{(Whitworth, 1897)} \quad P(S_{n:n} > x) = \sum_{\substack{k \geq 1 \\ kx < 1}} (-1)^{k+1} (1-kx)^{n-1} \binom{n}{k}, \quad \forall x > 0.

905: \end{equation}

906: \end{lemma}

907:

908: Finally, order the spacings to obtain

909: \[

910: 0 < S_{n:1} < S_{n:2} < \ldots < S_{n:n-1} < 1.

911: \]

912:

913: Now we are ready to calculate the homology in degree $0$.

914: Recall that $\beta_0(\cF^R_r(\Delta_*(X)))$ equals the dimension of

915: $H_0(\cF^R_r(\Delta_*(X))$, which equals the number of path components of

916: $\cF^R_r(\Delta_*(X))$.

917: Recall that $\cF^R_r(\Delta_0(X))$ is the empty set for $r<0$ and is the set

918: $X$ for $r \geq 0$.

919: So at $r=0$, there are (almost surely) exactly $n$ distinct homology

920: classes in $H_0(\cF^R_r(\Delta_*(X)))$.

921: Each homology class $[X_k]$ will no longer have a distinct

922: representative when the distance from $X_k$ to one of its neighbors is

923: equal to $r$.

924: That is each time $r$ passes one of the $S_k$ the dimension of

925: $H_*(\cF^R_r(\Delta_*(X)))$ decreases by one.

926: Therefore for $k = 0, \ldots {n-2}$,

927: \[

928: r \in \left[ S_{n:k}, S_{n:k+1} \right) \implies

929: \beta_0(\cF^R_r(\Delta_*(X))) = n-k.

930: \]

931: When $r \geq S_{n:n-1}$, $\cF^R_r(\Delta_*(X))$ is path connected so

932: $\beta_0(\cF^R_r(\Delta_*(X))) = 1$.

933: Translating this, we see that the Betti--$0$ barcode is the collection

934: of homology intervals

935: \[

936: [0, S_{n:k}) \text{ for $k = 1, \ldots {n-1}$ and $[0,\infty]$}.

937: \]

938:

939: Finally, let us consider the homology in degree $1$.

940: Let

941: \[

942: \alpha =

943: (X_{n:1}, X_{n:2}) + \ldots + (X_{n:n-1}, X_{n:n}) + (X_{n:n},

944: X_{n:1}).

945: \]

946: This is a $1$-cycle in $\Delta_*(X)$.

947:

948: \begin{lemma}

949: If $S_{n:n} \leq \frac{1}{2}$ then the Betti--$1$ barcode is the single

950: (possibly empty) persistence homology interval

951: \[

952: I_{\alpha} = [S_{n:n}, R), \quad \text{where } R \in [\tfrac{1}{3}, \tfrac{1}{2}),

953: \]

954: otherwise it is empty.

955: \end{lemma}

956:

957: \begin{rem}

958:   If the large spacing $S_{n:n}$ is greater than or equal than

959:   $\frac{1}{2}$ then all of the points $X_1, \ldots X_n$ are

960:   concentrated on a semicircle, and $\cF^R_r(\Delta_*(X))$ does

961:   not contain any non-trivial $1$-cycles.  By \eqref{eqnProbSimplex},

962:   $P[S_{n:n} > \frac{1}{2}] = \frac{n}{2^{n-1}}$.

963: \end{rem}

964:

965: \begin{proof}

966:   Assume that $S_{n:n} \leq \frac{1}{2}$.  If $r \geq S_{n:n}$,

967:   then $\alpha \in \cF^R_r(\Delta_1(X))$.  We claim that by

968:   using the definition of the Rips filtration and the geometry of

969:   $S^1$, $\alpha$ becomes a boundary at some $R \in [\frac{1}{3},

970:   \frac{1}{2}]$.  Since half the perimeter of $S^1$ is $\frac{1}{2}$, when $r\geq

971:   \frac{1}{2}$, $(X_i,X_j) \in \cF^R_r(\Delta_1(X))$ for all $X_i,X_j

972:   \in X$.  Thus when $r \geq \frac{1}{2}$ then $\cF^R_r(\Delta_*(X)) =

973:   \Delta_*(X)$ which is the full $(n-1)$-simplex on the

974:   vertices $X_1, \ldots X_n$.  In particular if $r \geq \frac{1}{2}$, then

975:   $\alpha$ is a boundary.

976:

977:   Since $S_{n:n} < \frac{1}{2}$, the geometric realization of $\alpha$

978:   is a $n$-gon containing the center of $S^1$.  Thus if there is some

979:   $\beta = \sum \beta_{ijk}(X_i,X_j,X_k) \in

980:   \cF^R_r(\Delta_2(X))$ such that $d\beta=\alpha$ then for some

981:   $(X_i,X_j,X_k) \in \cF^R_r(\Delta_2(X))$ the geometric realization of

982:   $(X_i,X_j,X_k)$ contains the center of $S^1$.  The smallest $r$ for

983:   which this can happen is $\frac{1}{3}$.  So if $r <

984:   \frac{1}{3}$ then $\alpha$ cannot be a boundary.

985:

986:   Thus $\alpha$ becomes a boundary when $r=R$ for some $R \in

987:   [\frac{1}{3},\frac{1}{2}]$.  If $S_{n:n} \geq \frac{1}{3}$ it is possible

988:   that $R = S_{n:n}$, and $\alpha$ is not a non-trivial boundary

989:   in any $\cF^R_r(\Delta_*(X)$.

990: \end{proof}

991:

992: \begin{rem}

993: If $S_{n:n} < \frac{1}{3}$ then the Betti--$1$ barcode is a single

994: non-empty persistence homology interval.

995: Using \eqref{eqnProbSimplex}, $P[S_{n:n} \geq \frac{1}{3}] <

996: n\left(\frac{2}{3}\right)^{n-1}$.

997: \end{rem}

998:

999:

1000: \subsection{Expected values of the Betti barcodes}

1001:

1002: Let $U_1, \ldots U_{n-1}$ be a sample from the uniform distribution on

1003: $[0,1]$.  Let $0 < U_{n:1} < U_{n:2} < \ldots < U_{n:n-1} < 1$ be the

1004: corresponding order statistic.\footnote{We use $n$ here to match the

1005:   notation of Section~\ref{sectionBettiUniform} where $\{U_1, \ldots,

1006:   U_{n-1}\}$ is derived from $\{X_1,\ldots, X_n\} \in S^1$.}  Define

1007: $U_{n:0} = 0$ and $U_{n:n} = 1$.  For $k = 1, \ldots n$, let $S_k =

1008: U_{n:k} - U_{n:k-1}$.  Recall (Lemma~\ref{lemma:spacingsDistribution})

1009: that the set of spacings $S = \{S_1, \ldots S_n\}$ is uniformly

1010: distributed on the standard $(n-1)$-simplex.

1011:

1012: Let $0 < S_{n:1} < \ldots < S_{n:n} < 1$ be the order statistic for

1013: the spacings.

1014: Then one can show~\cite[21.1.15]{shorackWellner} that

1015:

1016: \begin{prop}

1017: For $1\leq i \leq n$ the expected value of the spacings is given by

1018: \[

1019: E S_{n:i} = \frac{1}{n} \sum_{j=1}^i \frac{1}{n+1-j} = \frac{1}{n} \sum_{j=n+1-i}^n \frac{1}{j}

1020: \]

1021: \end{prop}

1022:

1023: So the expected Betti--$0$ barcode is the collection of intervals

1024: \[

1025: \left\{ \left[0, \frac{1}{n} \sum_{j=1}^i \frac{1}{n+1-j}\right) \right\}_{i

1026:       \in \{1, \ldots, n-1\}} \cup \{ [0, \infty] \},

1027: \]

1028: and the expected Betti--$1$ barcode is

1029: \[

1030: \left\{ \left[ \frac{1}{n} \sum_{j=1}^n \frac{1}{n+1-j}, \infty\right] \right\}.

1031: \]

1032:

1033: To obtain the Betti--$0$ function from the Betti--$0$ barcode let

1034: \[

1035: _n \tilde{\beta}_0(x,0) = E S_{n:\lceil (n-1)x \rceil}.

1036: \]

1037: The Betti--$0$ function is a normalized version of this $ \ _n \beta_0

1038: (x,0) = c_n \ _n \tilde{\beta}_0 (x,0) $ so that $\int_0^1 \ _n

1039: \beta_0 (x,0) dx = 1$. (In fact, $c_n =

1040:   \frac{n-1}{1-ES_{n:n}}$, which for large values of $n$ is

1041:   approximately equal to $n$.)  Thus,

1042: \[

1043: \ _n \beta_0 (x,0) =

1044: \frac{c_n}{n} \sum_{j=1}^{\lceil (n-1)x \rceil} \frac{1}{n+1-j} = \frac{c_n}{n} \sum_{j=n+1-\lceil (n-1)x \rceil}^{n} \frac{1}{j}

1045: \]

1046:

1047:

1048: \begin{prop}

1049: For $0<x<1$, as $n \to \infty$,

1050: \[

1051: \ _n \beta_0 (x,0) \to - \ln (1-x).

1052: \]

1053: \end{prop}

1054:

1055: \begin{proof}

1056:   By the definition of $c_n$, $\lim_{n\to \infty}\frac{c_n}{n} = 1$.

1057:   The result then follows from the observation that

1058: \[

1059: \frac{1}{n} + \int_k^n \frac{1}{x} dx < \sum_{j=k}^n \frac{1}{j} < \frac{1}{k} + \int_k^n \frac{1}{x} dx

1060: \]

1061: and the fact that

1062: \[

1063: \lim_{n \to \infty} \ln \left( \frac{n}{n+1-\lceil (n-1)x \rceil} \right) = - \ln (1-x).

1064: \]

1065: \end{proof}

1066:

1067: \begin{figure}

1068: \begin{center}

1069: \includegraphics[width=7cm,keepaspectratio=true]{bettiGraph}

1070: \end{center}

1071: \caption{Graphs of the expected Betti $0$-function for $n=10,100$ and $f(x)=-\ln(1-x)$.}

1072: \label{figure:bettiGraph}

1073: \end{figure}

1074:

1075: In Figure~\ref{figure:bettiGraph}, we graph the expected Betti-$0$ functions $y= \ _{10}\beta_0(x,0)$ and $y= \ _{100}\beta_0(x,0)$ and the limiting function $y=-\ln(1-x)$. For comparison, we also graph $y=1$, the limiting function one would obtain if the spacings became relatively equal in the limit.

1076:

1077:

1078:

1079: \section{Barcodes of certain parametric densities} \label{sectionBarcodesOfDensities}

1080:

1081: \subsection{The von Mises distribution} \label{sectionVonMises}

1082: Let $\M = S^1 = \{e^{i\theta} \ | \ \theta \in [-\pi,\pi)\} \subset \R^2$.

1083: We will use this parametrization to identify $\theta \in [-\pi,\pi)$

1084:   with an element of $S^1$.

1085: Consider the von Mises density on $S^1$ with respect to the uniform measure,

1086: \[ f_{\mu,\kappa}(\theta) = \tfrac{1}{I_0(\kappa)}e^{\kappa \cos(\theta -

1087:   \mu)},  \quad \theta \in [-\pi,\pi)

1088: \]

1089: where $\mu \in [-\pi,\pi)$, $\kappa \in [0,\infty)$ and

1090: $I_0(x)$ is the modified Bessel function of the first kind and

1091: order $0$, where the general $\nu-$th order Bessel function of

1092: the first kind is

1093: \begin{equation}

1094: \label{bessel}

1095: I_{\nu}(\kappa)= \tfrac{(\kappa/2)^{\nu}}{\Gamma\left(\nu + \frac{1}{2}\right)\Gamma\left(\frac{1}{2}\right)}

1096: \int_{-1}^1e^{\kappa t}(1-t^2)^{\nu-\frac{1}{2}}dt \ \ ,

1097: \end{equation}

1098: and $\Gamma (\cdot)$ denotes the gamma function.

1099:

1100: Our homologies will be independent of $\mu$, so assume

1101: that $\mu=0$ and so in this case the parameter $\vartheta = \kappa$.

1102:

1103: We will filter the chain complex on $S^1$ using both the \v{C}ech and

1104: Morse filtrations.

1105: Recall that by \eqref{eqnM1r} and \eqref{eqnMr},

1106: $S^1_{\geq \frac{1}{r}} = \{\theta \in S^1 \ | \ f_{\kappa}(\theta)

1107: \geq \frac{1}{r} \}$ and

1108: $S^1_{\leq r} = \{\theta \in S^1 \ | \ f_{\kappa}(\theta)

1109: \leq r \}$.

1110: Choose $\alpha_{r,\kappa} \in [-\pi,\pi)$ such that

1111: \[

1112: f_{\kappa}(\alpha_{r,\kappa}) = r.

1113: \]

1114: Specifically, let

1115: $

1116: \alpha_{r,\kappa} = cos^{-1}(\frac{1}{\kappa} \ln

1117: (\frac{r}{c(\kappa)})).

1118: $

1119: Our calculations of the persistent homology will follow from the

1120: following straightforward result.

1121:

1122: \begin{lemma} \label{lemmaS1}

1123: For $0 \leq r < \frac{1}{\max f_{\kappa}}$, $S^1_{\geq \frac{1}{r}} = \phi$,

1124: and for  $r < \min f_{\kappa}$, $S^1_{\leq r} = \phi$.

1125: For $\frac{1}{\max f_{\kappa}} \leq r < \frac{1}{\min f_{\kappa}}$,

1126: \[

1127: S^1_{\geq \frac{1}{r}} = \{ \theta \ | \ -\alpha_{\frac{1}{r},\kappa} \leq \theta \leq

1128: \alpha_{\frac{1}{r},\kappa} \}.

1129: \]

1130: For $\min f_{\kappa} \leq r < \max f_{\kappa}$,

1131: \[

1132: S^1_{\leq r} = \{ \theta \ | \ \alpha_{r,\kappa} \leq \theta \leq 2\pi

1133: - \alpha_{r,\kappa} \}.

1134: \]

1135: For $r \geq \frac{1}{\min f_{\kappa}}$, $S^1_{\geq \frac{1}{r}} = S^1$,

1136: and for  $r \geq \max f_{\kappa}$, $S^1_{\leq r} = S^1$.

1137: \end{lemma}

1138:

1139: Since its analysis is simpler, we start with the Morse filtration on

1140: $S^1$.  By Lemma~\ref{lemmaS1}, $S^1_{\leq r}$ is empty if $r < \min

1141: f_{\kappa}$, it is contractible (see Remark~\ref{rem:contractible}) if

1142: $\min f_{\kappa} \leq r < \max f_{\kappa}$ and it is equal to $S^1$ if

1143: $r \geq \maxf_{\kappa}$.  It follows that the Betti--$0$ barcode for

1144: the Morse filtration is the single interval

1145: \[

1146: \left[ \min f_{\kappa}, \infty \right] = \left[\tfrac{1}{ I_0(\kappa)

1147:       e^{\kappa}}, \infty \right],

1148: \]

1149: the Betti--$1$ barcode is the single interval

1150: \[

1151: \left[ \max f_{\kappa}, \infty \right] = \left[\tfrac{e^{\kappa}}{

1152:       I_0(\kappa)}, \infty \right],

1153: \]

1154: and all other Betti--$k$ barcodes are empty.

1155:

1156: Now consider the \v{C}ech filtration on $S^1$.

1157: We will derive a formula for the Betti--$0$ function, $\beta_0(x, \kappa)$, and

1158: calculate the Betti--$k$ barcodes for $k>0$.

1159:

1160: If $\kappa=0$ then $f_0 = 1$.

1161: So for $r<1$, $S^1_{\geq \frac{1}{r}} = \emptyset$, and for $r\geq 1$, $S^1_{\geq \frac{1}{r}} = S^1$.

1162: By definition~\eqref{eqngtheta},

1163: \[

1164: g_{\kappa}(r) = \begin{cases}

1165: 0 & \text{if $r < 1$},\\

1166: 1 & \text{if $r \geq 1$}.

1167: \end{cases}

1168: \]

1169: So by definition \eqref{bb-0}, $\beta_0(x,0) = 1$.

1170:

1171: For $\kappa>0$, let $\minf = \frac{1}{ I_0(\kappa)}e^{-\kappa}$ and

1172: $\maxf=\frac{1}{ I_0(\kappa)}e^{\kappa}$.

1173: For $r<\frac{1}{\maxf}$, $S^1_{\geq \frac{1}{r}} = \emptyset$, and for

1174: $r\geq\frac{1}{\minf}$, $S^1_{\geq \frac{1}{r}} = S^1$.

1175: For $\frac{1}{\maxf} \leq r < \frac{1}{\minf}$, since $f_{\kappa}$ is even and

1176: decreasing for $\theta>0$,

1177: \[

1178: S^1_{\geq \frac{1}{r}} = \{ \theta \ | \ -\alpha_{r,\kappa} \leq \theta \leq \alpha_{r,\kappa}\},

1179: \]

1180: where $\alpha_{r,\kappa} \in (0,\pi)$ and $f_{\kappa}(\alpha_{r,\kappa}) = \frac{1}{r}$.

1181:

1182: Let $x \in [0,1]$ and assume that $\beta_0(x,\kappa)=r$.

1183: Since $\kappa\geq 0$, $g_{\kappa}(r) = \int_{S^1_{\geq \frac{1}{r}}}f_{\kappa}(\theta)d\theta$ is continuous and strictly increasing. So,

1184: \[

1185: x = \int_{S^1_{\geq \frac{1}{r}}} f_{\kappa}(\theta)d\theta .

1186: \]

1187: Define $\alpha_{r,\kappa} \in [0,\pi]$ by the condition that $f_{\kappa}(\alpha_{r,\kappa}) =

1188: \frac{1}{r}$.

1189: So

1190: \begin{equation} \label{eqn:r}

1191: r = \frac{1}{f_{\kappa}(\alpha_{r,\kappa})}.

1192: \end{equation}

1193: For $\psi \in [0,\pi]$, let

1194: \[

1195: F_{\kappa}(\psi) = \int_0^{\psi}f_{\kappa}(\theta) d\theta.

1196: \]

1197: Then

1198: \begin{equation} \label{eqn:x}

1199: x = \int_{S^1_{\geq \frac{1}{r}}} f_{\kappa} d\nu = \int_{-\alpha_{r,\kappa}}^{\alpha_{r,\kappa}} f_{\kappa}(\theta)

1200: d\theta = 2 F_{\kappa}(\alpha_{r,\kappa}).

1201: \end{equation}

1202: Since $F_{\kappa}$ is strictly increasing, it is invertible.  So

1203: $\alpha_{r,\kappa} = F_{\kappa}^{-1}(\frac{x}{2})$.  Thus

1204: \begin{equation} \label{betti-0} \beta_0(x,\kappa) = r =

1205: \frac{1}{f_{\kappa}(F_{\kappa}^{-1}(\frac{x}{2}))} \end{equation} Since

1206: $f_{\kappa}$ and $F_{\kappa}$ are smooth, by the inverse function

1207: theorem, so is $F_{\kappa}^{-1}$.  So

1208: \[

1209: \beta_0(x,\kappa) = (F_{\kappa}^{-1})'\left(\frac{x}{2}\right).

1210: \]

1211: We remark that as $\kappa \rightarrow 0$, $\beta_0(x,\kappa)

1212: \rightarrow 1 = \beta_0(x,0)$.

1213: We can also describe the graph of $r=\beta_0(x,\kappa)$ parametrically

1214: by combining \eqref{eqn:r} and \eqref{eqn:x} (see Figure~\ref{figure:vonMisesBetti0}):

1215: \begin{equation} \label{eqn:vMh}

1216: h_{\kappa}(t) = \left( 2 F_{\kappa}(t), \frac{1}{f_{\kappa}(t)} \right), t \in [0,\pi].

1217: \end{equation}

1218: \begin{figure}

1219: \begin{center}

1220: \includegraphics[width=7cm,keepaspectratio=true]{vonMisesBetti0_3d_hue}

1221: \end{center}

1222: \caption{Graph of the Betti $0$-function of the von Mises density for

1223:   a range of concentration parameters}

1224: \label{figure:vonMisesBetti0}

1225: \end{figure}

1226:

1227:

1228: For $k\geq 1$, recall that

1229: \[

1230: \cF^{\check{C}}_r(C_k(S^1)) = \Const_k + C_k(S^1_{\geq \frac{1}{r}}).

1231: \]

1232: Also recall that for $r<\frac{1}{\maxf}$, $S^1_{\geq \frac{1}{r}} = \emptyset$, for

1233: $\frac{1}{\maxf} \leq r < \frac{1}{\minf}$, $S^1_{\geq \frac{1}{r}}$ is the arc from

1234: $-\alpha_{r,\kappa}$ to $\alpha_{r,\kappa}$ where $f_{\kappa}(\alpha_{r,\kappa}) = \frac{1}{r}$ and for $r\geq

1235: \frac{1}{\minf}$, $S^1_{\geq \frac{1}{r}} = S^1$.

1236: It follows that for $k\geq 1$,

1237: \[

1238: H_k(\cF^{\check{C}}_r(C_*(S^1))) = \begin{cases}

1239: \F & \text{for $k=1$ and $r\geq \frac{1}{\minf}$},\\

1240: 0 & \text{otherwise.}

1241: \end{cases}

1242: \]

1243: Therefore the Betti--$1$ barcode has the single interval

1244: \begin{equation}

1245: \label{1-betti}

1246: \left[\tfrac{1}{\minf},\infty\right] = \left[

1247:   I_0(\kappa)e^{\kappa},\infty\right]

1248: \end{equation}

1249: and for $k>1$ the Betti--$k$ barcode is

1250: empty.

1251: %We remark that this generalizes the result for $\kappa=0$.

1252:

1253: \subsection {The von Mises-Fisher distribution}

1254: Now consider

1255: $\M=S^{p-1}$, $p \geq 3$ and the unimodal von Mises-Fisher density

1256: given by

1257: \[

1258: f_{\mu,\kappa}(x) = c(\kappa) \exp\left\{\kappa x^t \mu\right\}, \quad x \in S^{p-1}

1259: \]

1260: where $\kappa \in [0,\infty)$, $\mu \in S^{p-1}$, and

1261: \begin{equation} \label{normalizing}

1262:   c(\kappa)=\left(\frac{\kappa}{2}\right)^{\frac{p}{2}-1}

1263:   \frac{1}{\Gamma(\frac{p}{2}) I_{\frac{p}{2} -1}(\kappa )}

1264: \end{equation}

1265: is the normalizing constant with respect to the uniform measure. This

1266: is also known as the Langevin distribution. Note that the minimum and

1267: maximum of $f$ also do not depend on $\mu$: $\minf =

1268: c(\kappa)e^{-\kappa}$ and $\maxf = c(\kappa)e^{\kappa}$.  In fact, by

1269: symmetry the homologies will not depend on $\mu$.  Hence once again

1270: take $\vartheta = \kappa$.

1271:

1272: Consider the Morse filtration (defined in Section~\ref{section:morse})

1273: on $S^{p-1}$.  If $r< \minf$ then $S^{p-1}_{\leq r} = \phi$ and if $r

1274: \geq \maxf$ then $S^{p-1}_{\leq r} = S^{p-1}$.  For $\minf) \leq r <

1275: \maxf$

1276: \[

1277: S^{p-1}_{\leq r} = \{ x\in S^{p-1} | x^t\mu \leq a_{r,\kappa}\},

1278: \]

1279: where $a_{r,\kappa} = \frac{1}{\kappa} \ln

1280: \left(\frac{r}{c(\kappa)}\right) \in [-1,1]$.  So $S^{p-1}_{\leq r}$

1281: is the closure of $S^{p-1}$ minus a right circular cone with vertex

1282: $0$ and centered at $\mu$.  In particular, $S^{p-1}_{\leq r}$ is

1283: contractible (see Remark~\ref{rem:contractible}) so

1284: $H_0(\cF^M_r(C_*(S^{p-1}))) = \F$ and for $k\geq 1$,

1285: $H_k(\cF^M_r(C_*(S^{p-1}))) = 0$.

1286:

1287: Thus the Betti--$0$ barcode is the single interval $[\minf, \infty)$,

1288: the Betti--$(p-1)$ barcode is the single interval $[\maxf, \infty)$ and all

1289: other barcodes are empty.

1290:

1291: Consider the \v{C}ech filtration (defined in

1292: Section~\ref{section:rips}) on $S^{p-1}$.

1293: %If $r < \frac{1}{\maxf}$ then $S^{p-1}_{\geq \frac{1}{r}} = \emptyset$ and if $r\geq \frac{1}{\minf}$ then $S^{p-1}_{\geq \frac{1}{r}} = S^{p-1}$.

1294: For $\frac{1}{\max(f_{\kappa})} \leq r < \frac{1}{\min(f_{\kappa})}$,

1295: \[

1296: S^{p-1}_{\geq \frac{1}{r}} = \{x \in S^{p-1} \ | \ x^t\mu \geq a_{\frac{1}{r},\kappa}\}.

1297: \]

1298: So $S^{p-1}_{\geq \frac{1}{r}}$ is the intersection of $S^{p-1}$ and a right circular cone with

1299: vertex $0$ and centered at $\mu$.

1300: In particular for

1301: $\frac{1}{\max(f_{\kappa})} \leq r < \frac{1}{\min(f_{\kappa})}$,

1302: $S^{p-1}_{\geq \frac{1}{r}}$ is contractible, so for $k\geq 1$, $H_k(S^{p-1}_{\geq \frac{1}{r}})=0$.

1303:

1304: Assume $\kappa = 0$. Then $f_0 = c(0)$, and

1305: \[

1306: S^{p-1}_{\geq \frac{1}{r}} = \begin{cases}

1307: \phi & \text{if $r < \frac{1}{c(0)}$},\\

1308: S^{p-1} & \text{if $r \geq \frac{1}{c(0)}$}.

1309: \end{cases}

1310: \]

1311: Thus

1312: \[

1313: g_{\kappa}(r) = \begin{cases}

1314: 0 & \text{if $r < \frac{1}{c(0)}$},\\

1315: 1 & \text{if $r \geq \frac{1}{c(0)}$}.

1316: \end{cases}

1317: \]

1318: Therefore $\beta_0(x,0) := \inf_{g_{\kappa}(r) \geq x} r = \frac{1}{c(0)}$.

1319:

1320: Assume $\kappa > 0$. Then for $k=0$,

1321:  \begin{equation} \label{eqn:g_kappa}

1322:   x = g_{\kappa}(r) =

1323:    \int_{S^{p-1}_{\geq \frac{1}{r}}} f_{\kappa} =

1324:      c(\kappa) \frac{s_{p-2}}{s_{p-1}} \int_0^{\arccos

1325:        \left(-\frac{\ln(rc(\kappa))}{\kappa}\right)} e^{\kappa \cos

1326:        \theta} \sin^{p-2}\theta \ d\theta \ \ ,

1327:  \end{equation}

1328: where $s_{p-1} = \frac{2\pi^{\frac{p}{2}}}{\Gamma\left(\frac{p}{2}\right)}$.

1329: When $\kappa > 0$, $g_{\kappa}(r)$ is continuous and strictly increasing. Hence

1330: \begin{equation}

1331: \label{0-betti-sphere}

1332: \beta_0(x , \kappa ) = g_{\kappa}^{-1}(x)

1333: \end{equation}

1334: for $x \in [0,1]$ and $\kappa > 0$.  As we did for the von Mises

1335: distribution~\eqref{eqn:vMh}, we can describe the graph of $r=\beta_0(x,\kappa)$ more

1336:   explicitly using a parametric equation:

1337:   \begin{equation} \label{eqn:vMFh}

1338:     h_{\kappa}(t) = \left( c(\kappa) \frac{s_{p-2}}{s_{p-1}} \int_0^t e^{\kappa \cos \theta} \sin^{p-2}\theta \ d \theta, \frac{e^{-\kappa \cos t}}{c(\kappa)} \right), \quad t \in [0,\pi].

1339:   \end{equation}

1340:

1341: For $k\geq 1$, by Lemma~\ref{lemmaHkFr},

1342: \[ H_k(\cF^{\check{C}}_r(C_*(S^{p-1}))) = H_k(S^{p-1}_{\geq \frac{1}{r}}) = \begin{cases}

1343: \F & \text{ if } k=p-1 \text{ and } r \geq \frac{1}{\minf},\\

1344: 0 & \text{otherwise.}

1345: \end{cases}

1346: \]

1347: Therefore for $k\geq 1$ the Betti--$k$ barcode has the single interval:

1348: \begin{equation} \label{betti-k}

1349: \left[\tfrac{1}{\minf},\infty\right] = \left[\tfrac{e^{\kappa}}{c(\kappa)},\infty\right]

1350: \end{equation}

1351: for $k=p-1$ and is empty otherwise.

1352:

1353:

1354: \subsection{The Watson distribution} \label{sectionWatson}

1355:

1356: Let $\M = S^{p-1}$ and consider the following bimodal distribution

1357: \begin{equation} \label{eqnWatson}

1358: f_{\mu,\kappa}(x) = d(\kappa) \exp \{ \kappa (x^{t}\mu)^2 \},

1359: \end{equation}

1360: where $\kappa\geq 0$ and $x,\mu \in S^{p-1}$,

1361: called the \emph{Watson distribution}.

1362: We remark that this density is rotationally symmetric, where $\mu$ is

1363: the axis of rotation.

1364: The minimum and maximum densities are given by

1365: \[

1366: \min f = d(\kappa), \quad \max f = d(\kappa) e^{\kappa}.

1367: \]

1368: The maximum is achieved at $x=\pm \mu$ and the minimum is achieved at

1369: all $x$ such that $x^t \mu = 0$.

1370:

1371: Using the Morse filtration we get the following Betti barcodes.

1372: For $p=2$, we remark that for $r < \min f$, $S^1_{\leq r} = \phi$.

1373: For $r = \min f$, $S^1_{\leq r}$ is two points.

1374: As $r$ increases, these points become two arcs of increasing size,

1375: which connect when $r = \max f$.

1376: So the Betti--$0$ barcode consists of the two homology intervals

1377: $[\min f, \infty] $ and $[\min f, \max f)$, and the Betti--$1$ barcode

1378:     has the single interval $[\max f, \infty]$.

1379: All other Betti barcodes are empty.

1380:

1381: For $p>2$, we observe similar behavior.

1382: When $r < \min f$, $S^{p-1}_{\leq r} = \phi$.

1383: For $r = \min f$, $S^{p-1}_{\leq r}$ is equator which is homeomorphic

1384: to $S^{p-2}$.

1385: As $r$ increases, the equator expands until it reaches the poles when

1386: $r = \max f$.

1387: So the Betti--$0$, Betti--$(p-2)$ and Betti--$(p-1)$ barcodes each

1388: consist of a single homology interval:

1389: $[\min f, \infty]$, $[\min f, \max f)$, and  $[\max f, \infty]$,

1390:       respectively.

1391: All other Betti barcodes are empty.

1392:

1393: Using the \v{C}ech filtration, $S^{p-1}_{\geq \frac{1}{r}}$ is either

1394: empty, or consists of two contractible components, or is all of $S^{p-1}$.

1395: So the Betti--$(p-1)$ barcode is the single

1396: homology interval $[\frac{1}{\min f}, \infty]$ and the Betti--$k$

1397:   barcodes for all other $k\geq 1$ are empty.

1398: The Betti--$0$ function is given by $\beta_0(x,\kappa) =

1399: g^{-1}_{\kappa}(x)$, where

1400: \[

1401: g_{\kappa}(r) = \int_{S^{p-1}_{\geq \frac{1}{r}}} f_{\kappa} = 2

1402:   \frac{s_{p-2}}{s_{p-1}} \int_0^{\alpha_{\kappa}(r)} d(\kappa)

1403:   e^{\kappa \cos^2(\theta)} \sin^{p-2}(\theta) d\theta,

1404: \]

1405: with $\alpha_{\kappa}(r) =

1406: \cos^{-1}\left(\sqrt{-\frac{1}{\kappa}\ln(d(\kappa)r)}\right)$ and

1407: $s_{p-1} = \frac{2\pi^{p/2}}{\Gamma(p/2)}$.

1408: As with the von Mises \eqref{eqn:vMh} and von Mises-Fisher distributions \eqref{eqn:vMFh}, the Betti--$0$ function can also be described parametrically.

1409:

1410:

1411: \subsection{The Bingham distribution}

1412:

1413: Again let $\M = S^{p-1}$ with the probability density

1414: \[

1415: f_{K}(x) = d(K) \exp \{ x^t K x \}

1416: \]

1417: where $x \in S^{p-1} \subset \R^3$ and $K$ is a symmetric $p \times p$

1418: matrix.

1419: We remark that $f_{K}(x) = d(K) \exp \{ \tr K x x^t \}$.

1420: Also, by a change of coordinates we can write $K = \diag (k_1, \ldots

1421: k_p)$, where $k_p \geq \ldots \geq k_1$ are the eigenvalues of $K$.

1422: Let $v_i$ be the eigenvector associated to $k_i$.

1423:

1424: Assume that $k_p > \ldots > k_1 > 0$.

1425: Then the minimum and maximum values of $f_K$ are given by

1426: \[

1427: \min f_K = d(K) e^{k_1}, \quad \max f_K = d(K) e^{k_p},

1428: \]

1429: and are attained at $\pm v_1$ and $\pm v_p$.

1430:

1431: The Betti--$k$ barcodes (for $k\geq 1$) when $p=2$ are the same as for

1432: the Watson distribution. When $p\geq 3$, the Bingham distribution

1433: differs significantly from the Watson distribution.  For example,

1434: the minimum of the function is attained at only $\pm v_1$ which is

1435: certainly not homeomorphic to $S^{p-2}$.

1436:

1437: Consider the Morse filtration.  We can calculate the Betti--$k$

1438: barcodes inductively.  If we consider $v_p$ to be the north pole, then

1439: there is a homotopy from $S^{p-1} - \{v_p, -v_p\}$ to $S^{p-2}$ which

1440: collapses the sphere with missing its poles to the equator.  When $r <

1441: k_p$, by the symmetry of $f_K$ this homotopy also gives a homotopy

1442: from $S^{p-1}_{\leq r}$ to $S^{p-2}_{\leq r}$ where the filtration on

1443: $S^{p-2}$ is the Morse filtration associated to the Bingham

1444: distribution with $K = \diag (k_1, \ldots k_{p-1})$.

1445:

1446: As a result, the Betti--$0$ barcode is given by the two homology

1447: intervals $[d(K)e^{k_1}, \infty]$ and $[d(K)e^{k_1},

1448:     d(K)e^{k_{2}})$.

1449: For $1 \leq k \leq p-2$, the Betti--$i$ barcode is given by the

1450: interval $[d(K)e^{k_{i+1}}, d(K)e^{k_{i+2}})$.

1451: Finally, the Betti--$(p-1)$ barcode is given by the interval

1452: $[d(K)e^{k_p}, \infty]$.

1453:

1454: We remark that this barcode corresponds the cellular construction of

1455: $S^{p-1}$ that repeatedly attaches northern and southern hemispheres

1456: of increasing dimension.

1457:

1458: For the \v{C}ech filtration we can use the same argument starting with

1459: $v_1$. The Betti--$0$ barcode is given by the two homology intervals

1460: $\frac{1}{d(K)} \left[e^{-k_p}, \infty\right]$ and $\frac{1}{d(K)} \left[ e^{-k_p}, e^{-k_{p-1}} \right)$. For $1 \leq i \leq

1461:   p-2$, the Betti--$i$ barcode is given by the single interval $\frac{1}{d(K)}

1462:   \left[e^{-k_{p-i}}, e^{-k_{p-i-1}} \right)$.

1463: The Betti--$(p-1)$ barcode is given by the single interval $\frac{1}{d(K)} \left[ e^{-k_1}, \infty \right]$.

1464:

1465: We remark that the correspondence between the two sets of barcodes is

1466: a manifestation of Alexander duality.

1467:

1468:

1469:

1470:

1471: \subsection{The matrix von Mises distribution and a Hopf fibration}

1472: The Lie group of rotations of $\R^3$, $SO(3)$, can be given the matrix

1473: von Mises density

1474: \begin{equation} \label{eqnMatrixVonMises}

1475: f_{A,\kappa}(X) = c(\kappa) \exp \left\{ \kappa \tr (X^t A) \right\},

1476: \end{equation}

1477: where $A \in SO(3)$ and $\kappa > 0$ is a concentration parameter.

1478: We determine the Morse and \v{C}ech filtrations of $SO(3)$ via the

1479: Hopf fibration $S^3 \to \RP^3$.

1480:

1481: The special orthogonal group $SO(3)$ is diffeomorphic to the real

1482: projective space $\RP^3$. The map $S^3 \to \RP^3$ which identifies

1483: each point on the sphere with the one-dimensional subspace on which

1484: it lies is a Hopf fibration whose fiber is $S^0 = \{-1,1\}$. Thus,

1485: $S^3$ is a double-cover of $SO(3)$ (and since $S^3$ is

1486: simply-connected, it is the universal cover).

1487:

1488: If we represent $S^3$ with the unit quaternions and $\RP^3$ with

1489: $SO(3)$, then the Hopf fibration above is represented by the

1490: Cayley-Klein map $\rho: S^3 \to SO(3)$:

1491: \[ \rho \left(

1492: \begin{array}{c}

1493: p_1 \\

1494: p_2 \\

1495: p_3 \\

1496: p_4

1497: \end{array} \right) = I + 2 p_1B + 2B^2 \text{, where } B = \left(

1498: \begin{array}{ccc}

1499: 0 & -p_4 & p_3 \\

1500: p_4 & 0 & -p_2 \\

1501: -p_3 & p_2 & 0

1502: \end{array} \right).

1503: \]

1504: We can use this map to relate the matrix von Mises

1505: density~\eqref{eqnMatrixVonMises} on $SO(3)$ to the Watson

1506: density~\eqref{eqnWatson} on $S^3$ by making the following

1507: observation.

1508: If $P = \rho(p)$ and $Q = \rho(q)$, then

1509: \[

1510: \tr(P^tQ) = 4 (p^t q)^2 - 1.

1511: \]

1512: Then if $\rho(a) = A$,

1513: \[

1514: \rho^{-1} \{ X \in SO(3) \ | \ f_{A,\kappa}(x) = r\} = \{ x \in S^3 \ | \

1515: f_{a,4\kappa}(x) = kr\} %\frac{d(4\kappa)e^{\kappa}r}{c(\kappa)} \}.

1516: \text{, where } k = \frac{d(4\kappa)e^{\kappa}}{c(\kappa)}.

1517: \]

1518: It follows that

1519: \[

1520: \rho^{-1}(SO(3)_{\leq r}) = S^3_{\leq kr} \text{ and }

1521: \rho^{-1}(SO(3)_{\geq \frac{1}{r}}) = S^3_{\geq k\frac{1}{r}},

1522: % \text{, where } k = \frac{d(4\kappa)e^{\kappa}}{c(\kappa)}

1523: \]

1524: where the filtration on $S^3$ is with respect to Watson density $f_{a,4\kappa}$.

1525:

1526: Recall (Section~\ref{sectionWatson}) that for $\frac{1}{\max f} \leq

1527: kr < \frac{1}{\min f}$, $S^3_{\geq \frac{1}{kr}}$ consists of two

1528: contractible components. The Hopf fibration $S^3 \to \RP^3$ and

1529: equivalently the map $\rho: S^3 \to SO(3)$ identify these two

1530: components. So $SO(3)_{\geq \frac{1}{r}}$ is contractible. Therefore,

1531: for the \v{C}ech filtration the Betti--$3$ barcode is the single homology

1532: interval $[\frac{1}{\min f}, \infty)$ and all other Betti--$k$ barcodes

1533: for $k\geq 1$ are empty. The Betti--$0$ function is identical to the

1534: one for the Watson density on $S^3$.

1535:

1536: For $\min f \leq kr < \max f$, $S^3_{\leq kr}$ is homotopy equivalent

1537: (via a projection onto its equator) to $S^2$. The Hopf fibration $S^3

1538: \to \RP^3$ restricted to the equator gives the Hopf fibration and

1539: double cover $S^2 \to \RP^2$. The homotopy equivalences $S^3_{\leq kr}

1540: \homoteq S^2$ induces a homotopy equivalence $SO(3)_{\leq r} \homoteq

1541: \RP^2$. Thus for the Morse filtration, the Betti--$0$ and Betti--$3$

1542: barcodes are the single homology intervals $[\min f, \infty)$ and

1543: $[\max f, \infty)$ and all Betti--$k$ barcodes for $k>3$ are empty.

1544: However, since the fundamental group and integral homology group of

1545: degree one of $\RP^2$ are the cyclic group of order two, the Betti--$1$

1546: and Betti--$2$ barcodes depend on the choice of the field of

1547: coefficients $\F$. If $\F$ is a field of characteristic $0$ (e.g. the

1548: rationals) then both are empty. However if $\F$ is the field of

1549: characteristic two ($\Z/2\Z$), then both are the single homology

1550: interval $[\min f, \max f)$.

1551:

1552:

1553:

1554:

1555:

1556: \section{Statistical estimation of the Betti barcodes}

1557: \label{statestimation}

1558:

1559: In this section we will calculate the expected persistent homology

1560: using statistics sampled from various densities.

1561:

1562:

1563: \subsection{The von-Mises and von-Mises Fisher distributions}

1564: For point cloud data $x_1, \ldots , x_n$ on $S^{p-1}$ sampled from the

1565: von Mises-Fisher distribution (\ref{vmf}): $f_{\mu,\kappa}(x) =

1566: c(\kappa)\exp\{\kappa x^t \mu \}$, we will give the statistical

1567: estimators for the (unknown) parameters. We will show that these can

1568: be used to obtain good estimates of the persistent homology of the

1569: underlying distribution.

1570:

1571: Letting $\bar x = \frac{1}{n} \sum_{i=1}^nx_i$ denote the sample mean,

1572: consider the decomposition

1573: \[ {\bar x} = \|{\bar x}\|\left(\tfrac{{\bar x}}{ \|{\bar x}\|}\right)

1574: \ \ .

1575: \]

1576: The statistical estimator for $\mu$ is ${\bar x}/\|{\bar x}\|$ while

1577: the statistical estimator for $\kappa$ is solved~\cite[Section

1578: 10.3.1]{mardiaJupp:book} by inverting $A_p(\hat \kappa) = \|{\bar

1579:   x}\|$, where $A_p(\lambda) =

1580: \tfrac{I_{p/2}(\lambda)}{I_{p/2-1}(\lambda)}$, and $I_\nu(\lambda)$

1581: is the modified Bessel function of the first kind and order $\nu$.

1582: Hence,

1583: \begin{equation}

1584: \label{est-kappa} {\hat \kappa} = A_p^{-1}(\|{\bar x}\|).

1585: \end{equation}

1586:

1587: A large sample asymptotic normality calculation for

1588: (\ref{est-kappa}) is~\cite[Section 10.3.1]{mardiaJupp:book}

1589: \begin{equation}

1590: \label{asymp-mse-kappa} \sqrt{n}\left( {\hat \kappa} - \kappa

1591: \right) \rightsquigarrow N\left(0, A_p'(\kappa)^{-1}\right),

1592: \end{equation}

1593: as $n \rightarrow \infty$, where $\rightsquigarrow$ means

1594: convergence in distribution and $N(0,\sigma^2)$ stands for a

1595: normally distributed random variable with mean 0 and variance

1596: $\sigma^2 > 0$. \iffalse A large sample calculation for

1597: (\ref{est-kappa}) is~\cite[Section 10.3.1]{mardiaJupp:book}

1598: \begin{equation}

1599: \label{asymp-mse-kappa} E\left( {\hat \kappa} - \kappa \right) =

1600: \frac{(p-1)A_p'(\kappa) - \kappa A_p''(\kappa)}{2\kappa

1601:   A_p'(\kappa)^2} \frac{1}{n} + O\left(\frac{1}{n^2}\right),

1602: \end{equation}

1603: as $n \rightarrow \infty$.\fi

1604: Using this estimate of $\kappa$ we

1605: obtain estimates for the $\beta_{\kappa}$ barcodes for the Morse and

1606: \v{C}ech filtrations. For the Morse filtration, we estimate the

1607: $\beta_0$ barcode and $\beta_{p-1}$ barcode to be

1608: $[c(\hat{\kappa})e^{-\hat{\kappa}},\infty]$ and

1609: $[c(\hat{\kappa})e^{\hat{\kappa}},\infty]$, respectively.  For the

1610: \v{C}ech filtration, we estimate the $\beta_{p-1}$ barcode to be

1611: $[\frac{e^{\hat{\kappa}}}{c(\hat{\kappa})},\infty]$.

1612:

1613: Recall that the space of barcodes has a metric $\mathcal{D}$ (see

1614: Definition~\ref{def:barcodeMetric}). Let $\beta_i^{M}(f)$ and

1615: $\beta_i^{\check{C}}(f)$ denote the Betti--$i$ barcode for the density

1616: $f$ using the Morse and \v{C}ech filtrations.  Then the expectations

1617: of the distance from the estimated persistent homology to the

1618: persistent homology of the underlying density can be bounded as follows.

1619:

1620: \begin{thm}

1621:   For the von Mises--Fisher distribution on $S^{p-1}$ and

1622:   $\kappa \in [\kappa_0, \kappa_1]$, where $0 < \kappa_0 \leq

1623:   \kappa_1 < \infty$,

1624:   \begin{equation*}

1625:     E (\mathcal{D}(\beta_i^M (f_{\hat{\kappa}}),\beta_i^M(f_{\kappa}))) \leq C(\kappa) n^{-1/2}

1626:   \end{equation*}

1627:   as $n \to \infty$ for all $i$, and

1628:   \begin{equation*}

1629:     E (\mathcal{D}(\beta_i^{\check{C}} (f_{\hat{\kappa}}), \beta_i^{\check{C}}(f_{\kappa}))) \leq C(\kappa) n^{-1/2}

1630:   \end{equation*}

1631:   as $n \to \infty$ for all $i \geq 1$, for some constant $C(\kappa)$.

1632: \end{thm}

1633:

1634: \begin{proof}

1635:   Since the barcodes have a particularly simple form, we only need to

1636:   know the barcode metric for the following case:

1637:   \begin{equation*}

1638:     \mathcal{D} ( \{ [a,\infty] \}, \{ [b,\infty] \} ) = |a-b|.

1639:   \end{equation*}

1640:   Using our previous calculations of the Betti barcodes, we have:

1641:   \begin{eqnarray*}

1642:     \mathcal{D} ( \beta_0^M (f_{\hat{\kappa}}), \beta_0^M(f_{\kappa}) )

1643:     & = & |

1644:     c(\hat{\kappa}) e^{-\hat{\kappa}} - c(\kappa) e^{-\kappa} | \\

1645:     \mathcal{D} ( \beta_{p-1}^M (f_{\hat{\kappa}}),

1646:     \beta_{p-1}^M(f_{\kappa}) ) & = & |

1647:     c(\hat{\kappa}) e^{\hat{\kappa}} - c(\kappa) e^{\kappa} | \\

1648:     \mathcal{D} ( \beta_{p-1}^{\check{C}} (f_{\hat{\kappa}}),

1649:     \beta_{p-1}^{\check{C}}(f_{\kappa}) ) & = & |

1650:     c(\hat{\kappa})^{-1} e^{\hat{\kappa}} - c(\kappa)^{-1} e^{\kappa} |.

1651:   \end{eqnarray*}

1652:

1653:   We note that the normalizing constant can be re-expressed as

1654:   \[

1655:   c(\kappa) = \frac{B\left(\frac{p-1}{2},\frac12\right)}{\int_{-1}^1

1656:   e^{\kappa t}(1-t^2)^{\frac{p-3}{2}}dt}  \ \ ,

1657:   \]

1658:   where $B(\cdot,\cdot)$ is the beta function.

1659:   Furthermore,

1660:   \[

1661:   c'(\kappa) = -B\left(\frac{p-1}{2},\frac12\right)\frac{\int_{-1}^1

1662:   e^{\kappa t}t(1-t^2)^{\frac{p-3}{2}}dt}{\left(\int_{-1}^1

1663:   e^{\kappa t}(1-t^2)^{\frac{p-3}{2}}dt\right)^2}  \ \

1664:   \]

1665:   and

1666:   \[

1667:   A_p'(\kappa) = 1 - A_p(\kappa)^2 - \frac{p-1}{\kappa}A_p(\kappa) \

1668:   \ .

1669:   \ \ \]

1670:

1671:   For $0 \leq \kappa_0 \leq \kappa_1 < \infty$ and $\kappa \in

1672:   \left[\kappa_0 , \kappa_1\right]$, we observe $0 < c(\kappa) ,

1673:   |c'(\kappa)|, A_p'(\kappa) < \infty$, and by the mean value theorem,

1674: \[E | c(\hat{\kappa}) e^{ \hat{\kappa}} - c(\kappa) e^{\kappa}|

1675:   = E | (c(\kappa^*)+c'(\kappa^*)) e^{{\kappa}^*} ({\hat

1676:   \kappa}-\kappa)| \ \ ,

1677: \]

1678: where $\kappa^*$ is a value between $\hat \kappa$ and $\kappa$.

1679: Consequently,

1680: \begin{eqnarray*} E | c(\hat{\kappa}) e^{

1681: \hat{\kappa}} - c(\kappa) e^{\kappa}| &\leq& \bar{C}(\kappa)

1682: \left\{E|{\hat \kappa}-\kappa|^2\right\}^{1/2} \\

1683: &\leq& C(\kappa)n^{-1/2}

1684: \end{eqnarray*}

1685: where the first inequality is by the H\"{o}lder inequality, and the

1686: second is by~\eqref{asymp-mse-kappa}.

1687:

1688: Similarly,

1689: \[E | c(\hat{\kappa}) e^{ -\hat{\kappa}} - c(\kappa) e^{-\kappa}|

1690:   = E | (c'(\kappa^*)-c(\kappa^*)) e^{-{\kappa}^*} ({\hat

1691:   \kappa}-\kappa)| \ \ ,

1692: \]

1693: and

1694: \[E \left| \frac{e^{ \hat{\kappa}}}{c(\hat{\kappa})} - \frac{e^{\kappa}}{c(\kappa)}\right|

1695:   = E \left| \left(\frac{c(\kappa^*)-c'(\kappa^*)}{c(\kappa^*)^2}\right) e^{{\kappa}^*} ({\hat

1696:   \kappa}-\kappa)\right| \ \ . \qedhere

1697: \]

1698: \end{proof}

1699:

1700: Expressing the estimated $\beta_0$-function is more challenging.  For

1701: the case of the sphere $S^2$, an exact expression can be obtained.

1702: One can calculate that $c(\kappa) = \frac{\kappa}{\sinh(\kappa)}$, and

1703: from \eqref{eqn:g_kappa},

1704: \[

1705: g_{\kappa}(r) = \frac{e^{\kappa}}{2\sinh(\kappa)} - \frac{1}{2\kappa

1706:   r}.

1707: \]

1708: from which we use (\ref{0-betti-sphere}) to obtain,

1709: \begin{equation} \label{0-betti-2}

1710:   \beta_0(x , \kappa ) = \frac{e^{2\kappa}-1}{2\kappa[(1-x)e^{2\kappa}+x]}

1711: \end{equation}

1712: for $x \in (0,1]$ and $\kappa > 0$.  Notice that $\beta_0(x , \kappa )

1713: \rightarrow 1$ as $\kappa \rightarrow 0$ and $\beta_0(x , \kappa )

1714: \rightarrow 0$ as $\kappa \rightarrow \infty$, for all $x \in (0,1)$.

1715: Furthermore, for (\ref{est-kappa}), \cite[9.3.9]{mardiaJupp:book}

1716: \begin{equation}

1717: \label{A_3}

1718: A_3(\kappa) = \coth \kappa - \tfrac 1 {\kappa} \ \ .  \end{equation}

1719:

1720: We have the following:

1721: \begin{thm} For the von Mises-Fisher distribution on $S^2$, and fixed

1722:   $\kappa > 0$,

1723:   \[

1724:   E \left|\left| \beta_0(x,{\hat \kappa}) -

1725:     \beta_0(x,\kappa) \right|\right|_{\infty} \leq C(\kappa) n^{-1} \ \ ,

1726:   \]

1727:   as $n \rightarrow \infty$.

1728: \end{thm}

1729:

1730: \begin{proof}

1731:   By the mean value theorem,

1732:   \begin{equation} \label{eqn:mvt} \beta_0(x,{\hat \kappa})-

1733:       \beta_0(x,\kappa) = \frac{\partial}{\partial

1734:         \kappa} \beta_0(x,\tilde{\kappa}) ({\hat \kappa} - \kappa)

1735:       \ ,

1736:   \end{equation}

1737:   where $\tilde{\kappa}$ is between $\hat{\kappa}$ and $\kappa$.

1738:   One can calculate that

1739:   \[

1740:   \frac {\partial}{\partial {\kappa}}\beta_0(x,\kappa) =

1741:   \frac{-(1-x)e^{4\kappa} + (1+2\kappa

1742:       -2x)e^{2\kappa} + x}

1743:   {2\kappa^2\left[(1-x)e^{2\kappa}+x\right]^2} \ \ .

1744:   \]

1745:   Recall that the domain of $\beta_0(x,\kappa)$ is $(0,1]$.  For $x

1746:   \in (0,1]$, $\left| \frac{\partial}{\partial \kappa}

1747:     \beta_0(x,\kappa) \right|$ is bounded: for instance,

1748:   \begin{equation} \label{eqn:ddkappaBound} \left| \frac

1749:       {\partial}{\partial {\kappa}}\beta_0(x,\kappa)\right| \leq

1750:     \frac{e^{4\kappa} + (1+2\kappa)e^{2\kappa}+1}{2\kappa^2} \ \ .

1751:   \end{equation}

1752:   Combining \eqref{eqn:mvt}, \eqref{eqn:ddkappaBound}, \eqref{asymp-mse-kappa} and

1753:   \eqref{A_3} produces the desired result.

1754: \end{proof}

1755:

1756:

1757: \subsection{The Watson distribution}

1758:

1759: Recall that the Watson distribution on $S^{p-1}$ is given by

1760: \begin{equation} \label{eqn:watson}

1761: f_{\mu,\kappa}(x) = d(\kappa) \exp \{ \kappa (x^t \mu)^2 \} \text{, where }

1762: \mu \in S^{p-1} \text{ and } \kappa > 0.

1763: \end{equation}

1764: Let us parametrize $\mu$ using the spherical angles: $\mu = \mu(\phi)$, where $\phi = (\phi_1, \ldots, \phi_{p-1})^t$.

1765: Let $X_1, \ldots X_n$ be a random sample from the Watson distribution.

1766:

1767: If we take the sample to be fixed and the underlying parameters to be unknown, then the log-likelihood function of \eqref{eqn:watson} is given by:

1768: \begin{equation*}

1769:   \ell(\phi,\kappa) = n \log d(\kappa) + \kappa \sum_{j=1}^n (X_j^t \mu(\phi))^2.

1770: \end{equation*}

1771: The maximum likelihood estimation of $\mu$ and $\kappa$ comes from the estimating equation:

1772: \begin{equation} \label{eqn:gradient}

1773:   \nabla_{\phi,\kappa} \ell(\phi,\kappa) = 0,

1774: \end{equation}

1775: where $\nabla_{\phi,\kappa}$ denotes the gradient.

1776: Let $\hat{\phi}$ and $\hat{\kappa}$ be the solutions to \eqref{eqn:gradient}, which are the maximum likelihood estimators.

1777: Then the standard theory of maximum likelihood estimators~\cite[pp.294-296]{coxHinkley:book} shows that the large sample asymptotics satisfy:

1778: \begin{equation}

1779:   \label{eqn:asymptotics}

1780:   \sqrt{n}\left[ \left( \begin{array}{c} \hat{\phi} \\ \hat{\kappa} \end{array} \right) - \left( \begin{array}{c} {\phi} \\ {\kappa} \end{array} \right) \right] \to_d N_p(0,I(\phi,\kappa)^{-1})

1781: \end{equation}

1782: as $n \to \infty$, where ``$\to_d$'' means convergence in

1783: distribution, $I(\phi,\kappa)$ is the Fisher information

1784: matrix\footnote{The Fisher information matrix is defined to be

1785:   $I(\phi,\kappa) = -E\nabla^2_{\phi,\kappa} \ell(\phi,\kappa)$, where

1786:   $\nabla^2_{\phi,\kappa}$ is the $p \times p$ Hessian matrix.} and

1787: $N_p$ stands for the $p$-dimensional normal distribution with given

1788: mean and covariance.

1789: It turns out that in the case of the Watson distribution,

1790: \begin{equation*}

1791:   I(\phi,\kappa) = \left[ \begin{array}{ccc} * & \vline & 0 \\ \hline 0 & \vline & -\frac{\partial^2}{\partial \kappa^2} \log d(\kappa) \end{array} \right].

1792: \end{equation*}

1793: Consequently, from~\eqref{eqn:asymptotics}, we have that %marginally

1794: \begin{equation*}

1795:   \sqrt{n} (\hat{\kappa} -\kappa) \to_d N_1 \left( 0, - \left( \frac{\partial^2}{\partial \kappa^2} \log d(\kappa) \right)^{-1} \right),

1796: \end{equation*}

1797: as $n \to \infty$.

1798:

1799:

1800:

1801: \bibliographystyle{halpha}

1802: \bibliography{my}

1803:

1804:

1805: \end{document}

1806:

1807: