math0702327/clu_analysis_new.tex
1: \section{Relationship with Cluster Analysis}
2: 
3: The idea of analyzing a large body of empirical data and of partitioning
4: it into sets of ``similar values'' has been well studied in the theory of
5: Cluster Analysis (\eg~see~\cite{KM}).  The overall aim of Cluster Analysis
6: is to separate the original data into clusters where the members of each
7: cluster are much more similar to each other than to members of other clusters.
8: In contrast, our methods are more concerned with thinning out groups of
9: very close values while ignoring more distant points.
10: Below we show how Ward's ``classical" algorithm~\cite{RR}, an agglomerative
11: hierarchical method, and Li's more recent algorithm~\cite{Li}, a divisive hierarchical method, partition the empirical points of Example~\ref{ex11}.
12: 
13: \begin{ex}
14:   Let $\mathbb X^\varepsilon$ be the set of empirical points whose set
15:   of specified values is given in Example~\ref{ex11};
16:   similarly, let $\varepsilon=(1.43,1.43)$ as given there.  We recall
17:   that in Examples~\ref{ex31} and~\ref{ex32} both our algorithms AA
18:   and DA obtained the minimal partition into collapsable sets, as
19:   illustrated in Figure~\ref{fig1}.
20: 
21: Ward's and Li's algorithms do not obtain this minimal partition.
22: In fact, after~$8$ steps, Ward's algorithm puts the points $(5,-2.9)$ and
23: $(5,0)$ into the same cluster, while the first nine points of $\mathbb X$
24: still belong to different clusters.  Since this is an agglomerative method
25: no set of points is split during the computation, so Ward's
26: algorithm fails to recognise the collapsable set of nine points.  In a similar
27: vein, Li's algorithm goes astray at the third step: it divides the first
28: nine points of $\mathbb X$ into two subsets while the points $(5,-2.9)$ and
29: $(5,0)$ still belong to the same cluster.  Since this is a hierarchical divisive
30: method, once a set is split it can never be joined together again, so Li's algorithm
31: needlessly splits the collapsable set of nine points.
32: \end{ex}
33: 
34: 
35: \smallskip
36: Now we consider another method of Cluster Analysis, QT~Clustering~\cite{HKY},
37: because it has a number of similarities to our methods, especially AA.  QT~Clustering
38: computes a partition of the input data using a given limit on the diameter of the
39: clusters.  It works by building clusters according to their cardinality, while we are
40: primarily interested in the local geometrical separations of the input data.
41: 
42: \begin{ex}
43:   Let $\mathbb X^\varepsilon$ be a set of empirical points with tolerance
44:   $\varepsilon=(0.5)$ and with specified values
45:   $\mathbb X =\{ 0, \;0.05,\; 0.9,\;1,\;1.2 \} \subseteq \R$.
46:   Applying the QT Clustering algorithm with maximum cluster
47:   diameter equal to~$2\varepsilon$, we obtain the partition~$\bigl\{\{ 0,
48:     0.05,0.9,1\}, \; \{1.2\}\bigr\} $ where $\{ 0,
49:     0.05,0.9,1\}^\varepsilon$ is a not collapsable set.
50:   In contrast, if we apply AA or DA to $\mathbb X^\varepsilon$, we obtain the more
51:   balanced partition $\bigl\{ \{0, 0.05\},\; \{0.9,1,1.2\}\bigr\}$
52:   whose elements consist of specified values of collapsable sets. 
53:   We maintain that our partition is more
54:   plausible as a grouping of noisy data.
55: \end{ex}
56: 
57: