0577e06bcd984ff9.tex
1: \begin{abstract}
2: Archetypal analysis is an unsupervised learning method that uses a convex polytope to summarize multivariate data.
3: For fixed $k$, the method finds a convex polytope with $k$ vertices, called \emph{archetype points}, such that the polytope is contained in the convex hull of the data and the mean squared distance between the data and the polytope is minimal.
4: In this paper, we prove a consistency result that shows if the data is independently sampled from a probability measure with bounded support, then the archetype points converge to a solution of the continuum version of the problem, of which we identify and establish several properties.
5: We also obtain the convergence rate of the optimal objective values under appropriate assumptions on the distribution.
6: If the data is independently sampled from a distribution with unbounded support, we also prove a consistency result for a modified method that penalizes the dispersion of the archetype points.
7: Our analysis is supported by detailed computational experiments of the archetype points for data sampled from the uniform distribution in a disk, the normal distribution, an annular distribution, and a Gaussian mixture model.
8: \end{abstract}
9: