0702:math0702327/clu_analysis

1: \section{Relationship with Cluster Analysis}

2:

3: The idea of analyzing a large body of empirical data and of partitioning

4: it into sets of ``similar values'' has been well studied in the theory of

5: Cluster Analysis (\eg~see~\cite{KM}).  The overall aim of Cluster Analysis

6: is to separate the original data into clusters where the members of each

7: cluster are much more similar to each other than to members of other clusters.

8: In contrast, our methods are more concerned with thinning out groups of

9: very close values while ignoring more distant points.

10: Below we show how Ward's ``classical" algorithm~\cite{RR}, an agglomerative

11: hierarchical method, and Li's more recent algorithm~\cite{Li}, a divisive hierarchical method, partition the empirical points of Example~\ref{ex11}.

12:

13: \begin{ex}

14:   Let $\mathbb X^\varepsilon$ be the set of empirical points whose set

15:   of specified values is given in Example~\ref{ex11};

16:   similarly, let $\varepsilon=(1.43,1.43)$ as given there.  We recall

17:   that in Examples~\ref{ex31} and~\ref{ex32} both our algorithms AA

18:   and DA obtained the minimal partition into collapsable sets, as

19:   illustrated in Figure~\ref{fig1}.

20:

21: Ward's and Li's algorithms do not obtain this minimal partition.

22: In fact, after~$8$ steps, Ward's algorithm puts the points $(5,-2.9)$ and

23: $(5,0)$ into the same cluster, while the first nine points of $\mathbb X$

24: still belong to different clusters.  Since this is an agglomerative method

25: no set of points is split during the computation, so Ward's

26: algorithm fails to recognise the collapsable set of nine points.  In a similar

27: vein, Li's algorithm goes astray at the third step: it divides the first

28: nine points of $\mathbb X$ into two subsets while the points $(5,-2.9)$ and

29: $(5,0)$ still belong to the same cluster.  Since this is a hierarchical divisive

30: method, once a set is split it can never be joined together again, so Li's algorithm

31: needlessly splits the collapsable set of nine points.

32: \end{ex}

33:

34:

35: \smallskip

36: Now we consider another method of Cluster Analysis, QT~Clustering~\cite{HKY},

37: because it has a number of similarities to our methods, especially AA.  QT~Clustering

38: computes a partition of the input data using a given limit on the diameter of the

39: clusters.  It works by building clusters according to their cardinality, while we are

40: primarily interested in the local geometrical separations of the input data.

41:

42: \begin{ex}

43:   Let $\mathbb X^\varepsilon$ be a set of empirical points with tolerance

44:   $\varepsilon=(0.5)$ and with specified values

45:   $\mathbb X =\{ 0, \;0.05,\; 0.9,\;1,\;1.2 \} \subseteq \R$.

46:   Applying the QT Clustering algorithm with maximum cluster

47:   diameter equal to~$2\varepsilon$, we obtain the partition~$\bigl\{\{ 0,

48:     0.05,0.9,1\}, \; \{1.2\}\bigr\} $ where $\{ 0,

49:     0.05,0.9,1\}^\varepsilon$ is a not collapsable set.

50:   In contrast, if we apply AA or DA to $\mathbb X^\varepsilon$, we obtain the more

51:   balanced partition $\bigl\{ \{0, 0.05\},\; \{0.9,1,1.2\}\bigr\}$

52:   whose elements consist of specified values of collapsable sets.

53:   We maintain that our partition is more

54:   plausible as a grouping of noisy data.

55: \end{ex}

56:

57: