1: \section{Relationship with Cluster Analysis}
2:
3: The idea of analyzing a large body of empirical data and of partitioning
4: it into sets of ``similar values'' has been well studied in the theory of
5: Cluster Analysis (\eg~see~\cite{KM}). The overall aim of Cluster Analysis
6: is to separate the original data into clusters where the members of each
7: cluster are much more similar to each other than to members of other clusters.
8: In contrast, our methods are more concerned with thinning out groups of
9: very close values while ignoring more distant points.
10: Below we show how Ward's ``classical" algorithm~\cite{RR}, an agglomerative
11: hierarchical method, and Li's more recent algorithm~\cite{Li}, a divisive hierarchical method, partition the empirical points of Example~\ref{ex11}.
12:
13: \begin{ex}
14: Let $\mathbb X^\varepsilon$ be the set of empirical points whose set
15: of specified values is given in Example~\ref{ex11};
16: similarly, let $\varepsilon=(1.43,1.43)$ as given there. We recall
17: that in Examples~\ref{ex31} and~\ref{ex32} both our algorithms AA
18: and DA obtained the minimal partition into collapsable sets, as
19: illustrated in Figure~\ref{fig1}.
20:
21: Ward's and Li's algorithms do not obtain this minimal partition.
22: In fact, after~$8$ steps, Ward's algorithm puts the points $(5,-2.9)$ and
23: $(5,0)$ into the same cluster, while the first nine points of $\mathbb X$
24: still belong to different clusters. Since this is an agglomerative method
25: no set of points is split during the computation, so Ward's
26: algorithm fails to recognise the collapsable set of nine points. In a similar
27: vein, Li's algorithm goes astray at the third step: it divides the first
28: nine points of $\mathbb X$ into two subsets while the points $(5,-2.9)$ and
29: $(5,0)$ still belong to the same cluster. Since this is a hierarchical divisive
30: method, once a set is split it can never be joined together again, so Li's algorithm
31: needlessly splits the collapsable set of nine points.
32: \end{ex}
33:
34:
35: \smallskip
36: Now we consider another method of Cluster Analysis, QT~Clustering~\cite{HKY},
37: because it has a number of similarities to our methods, especially AA. QT~Clustering
38: computes a partition of the input data using a given limit on the diameter of the
39: clusters. It works by building clusters according to their cardinality, while we are
40: primarily interested in the local geometrical separations of the input data.
41:
42: \begin{ex}
43: Let $\mathbb X^\varepsilon$ be a set of empirical points with tolerance
44: $\varepsilon=(0.5)$ and with specified values
45: $\mathbb X =\{ 0, \;0.05,\; 0.9,\;1,\;1.2 \} \subseteq \R$.
46: Applying the QT Clustering algorithm with maximum cluster
47: diameter equal to~$2\varepsilon$, we obtain the partition~$\bigl\{\{ 0,
48: 0.05,0.9,1\}, \; \{1.2\}\bigr\} $ where $\{ 0,
49: 0.05,0.9,1\}^\varepsilon$ is a not collapsable set.
50: In contrast, if we apply AA or DA to $\mathbb X^\varepsilon$, we obtain the more
51: balanced partition $\bigl\{ \{0, 0.05\},\; \{0.9,1,1.2\}\bigr\}$
52: whose elements consist of specified values of collapsable sets.
53: We maintain that our partition is more
54: plausible as a grouping of noisy data.
55: \end{ex}
56:
57: