abstract:7b402e97fba12768.tex

1: \begin{abstract}

2: Finite mixture modelling provides a framework for cluster analysis based on parsimonious Gaussian mixture models.

3: Variable or feature selection is of particular importance in situations where only a subset of the available variables provide clustering information.

4: This enables the selection of a more parsimonious model, yielding more efficient estimates, a clearer interpretation and, often, improved clustering partitions. This paper describes the R package \pkg{clustvarsel} which performs subset selection for model-based clustering.

5: An improved version of the methodology of Raftery and Dean (2006) is implemented in the new version 2 of the package to find the (locally) optimal subset of variables with group/cluster information in a dataset. Search over the solution space is performed using either a stepwise greedy search or a headlong algorithm. Adjustments for speeding up these algorithms are discussed, as well as a parallel implementation of the stepwise search.

6: Usage of the package is presented through the discussion of several data examples. \\

7:

8: \noindent {\it Keywords:} model-based clustering, BIC, subset selection, R.

9: \end{abstract}

10: