eff4a0dc1dd0f2db.tex
1: \begin{abstract} %{{{
2: 
3: Hierarchical clustering is a popular method for analyzing data which associates
4: a tree to a dataset. Hartigan consistency has been used extensively as a
5: framework to analyze such clustering algorithms from a statistical point of
6: view. Still, as we show in the paper, a tree which is Hartigan consistent with a
7: given density can look very different than the correct limit tree. Specifically,
8: Hartigan consistency permits two types of undesirable configurations which we
9: term \emph{over-segmentation} and \emph{improper nesting}.  Moreover, Hartigan
10: consistency is a limit property and does not directly quantify difference
11: between trees.
12: 
13: In this paper we identify two limit properties, \emph{separation} and
14: \emph{minimality}, which address both over-segmentation and improper nesting and
15: together imply (but are not implied by) Hartigan consistency. We proceed to
16: introduce a \emph{merge distortion metric} between hierarchical clusterings and
17: show that convergence in our distance implies both separation and minimality. We
18: also prove that uniform separation and minimality imply convergence in the merge
19: distortion metric.  Furthermore, we show that our merge distortion metric is
20: stable under perturbations of the density.
21: 
22: Finally, we demonstrate applicability of these concepts by proving convergence
23: results for two clustering algorithms.   First, we show convergence (and hence
24: separation and minimality) of the recent robust single linkage algorithm of
25: \cite{chaudhuri_2010}. Second, we provide convergence results on
26: manifolds for  topological  split tree clustering.
27: \end{abstract}