abstract:c64fff20905818c5.tex

1: \begin{abstract}

2: For a density $f$ on $\R^d$, a {\it high-density cluster} is any

3: connected component of $\{x: f(x) \geq \lambda\}$, for some $\lambda > 0$.

4: The set of all high-density clusters forms a hierarchy called the

5: {\it cluster tree} of $f$. We present two procedures for estimating

6: the cluster tree given samples from $f$. The first is a robust

7: variant of the single linkage algorithm for hierarchical clustering.

8: The second is based on the $k$-nearest neighbor graph of the samples.

9: We give finite-sample convergence rates for these algorithms which also

10: imply consistency, and we derive lower bounds on the sample complexity

11: of cluster tree estimation. Finally, we study a tree pruning procedure

12: that guarantees, under milder conditions than usual, to remove clusters

13: that are spurious while recovering those that are salient.

14: \end{abstract}

15: