abstract:e07aa94acfa79ab8.tex

1: \begin{abstract}

2: The problem of multimodal clustering arises whenever the data are

3: gathered with several physically different sensors. Observations

4: from different modalities are not necessarily aligned in the sense

5: there there is no obvious way to associate or to compare them in

6: some common space. A solution may consist in considering multiple

7: clustering tasks independently for each modality. The main

8: difficulty with such an approach is to guarantee that the unimodal

9: clusterings are mutually consistent. In this paper we show that

10: multimodal clustering can be addressed within a novel framework,

11: namely \textit{conjugate mixture models}. These models exploit the

12: explicit transformations that are often available between an

13: unobserved parameter space (objects) and each one of the

14: observation spaces (sensors). We formulate the problem as a

15: likelihood maximization task and we derive the associated

16: \textit{conjugate expectation-maximization} algorithm. The

17: convergence properties of the proposed algorithm are thoroughly

18: investigated. Several local/global optimization techniques are

19: proposed in order to increase its convergence speed. Two

20: initialization strategies are proposed and compared. A consistent

21: model-selection criterion is proposed. The algorithm and its

22: variants are tested and evaluated within the task of 3D

23: localization of several speakers using both auditory and visual

24: data.

25:

26: \end{abstract}

27: