abstract:1a9ee6bc903e77e2.tex

1: \begin{abstract}

2: Persistence diagrams (PDs) are the most common descriptors used to encode the topology of structured data appearing in challenging learning tasks;~think e.g.~of graphs, time series or point clouds sampled close to a manifold.

3: Given random objects and the corresponding distribution of PDs, one may want to build a statistical summary---such as a mean---of these random PDs, which is however not a trivial task as the natural geometry of the space of PDs is not linear.

4: In this article, we study two such summaries, the Expected Persistence Diagram (EPD), and its quantization. The EPD is a measure supported on $\R^2$, which may be approximated by its empirical counterpart. We prove that this estimator is optimal from a minimax standpoint on a large class of models with a parametric rate of convergence. The empirical EPD is simple and efficient to compute, but possibly has a very large support, hindering its use in practice. To overcome this issue, we propose an algorithm to compute a quantization of the empirical EPD, a measure with small support which is shown to approximate with near-optimal rates a quantization of the theoretical EPD.

5: \end{abstract}

6: