f569f44dfb020e30.tex
1: \begin{abstract}
2: The goal of data clustering is to partition data points into groups
3: to optimize a given objective function. While most existing clustering
4: algorithms treat each data point as vector, in many applications each
5: datum is not a vector but a point pattern or a set of points. Moreover,
6: many existing clustering methods require the user to specify the number
7: of clusters, which is not available in advance. This paper proposes
8: a new class of models for data clustering that addresses set-valued
9: data as well as unknown number of clusters, using a Dirichlet Process
10: mixture of Poisson random finite sets. We also develop an efficient
11: Markov Chain Monte Carlo posterior inference technique that can learn
12: the number of clusters and mixture parameters automatically from the
13: data. Numerical studies are presented to demonstrate the salient features
14: of this new model, in particular its capacity to discover extremely
15: unbalanced clusters in data.
16: 
17: \input{macros.tex}
18: \end{abstract}