abstract:f569f44dfb020e30.tex

1: \begin{abstract}

2: The goal of data clustering is to partition data points into groups

3: to optimize a given objective function. While most existing clustering

4: algorithms treat each data point as vector, in many applications each

5: datum is not a vector but a point pattern or a set of points. Moreover,

6: many existing clustering methods require the user to specify the number

7: of clusters, which is not available in advance. This paper proposes

8: a new class of models for data clustering that addresses set-valued

9: data as well as unknown number of clusters, using a Dirichlet Process

10: mixture of Poisson random finite sets. We also develop an efficient

11: Markov Chain Monte Carlo posterior inference technique that can learn

12: the number of clusters and mixture parameters automatically from the

13: data. Numerical studies are presented to demonstrate the salient features

14: of this new model, in particular its capacity to discover extremely

15: unbalanced clusters in data.

16:

17: \input{macros.tex}

18: \end{abstract}