cs0303025/music.tex
1: \documentclass{article}
2: 
3: \usepackage{fullpage,latexsym}
4: \usepackage{epsfig}
5: \usepackage{pslatex}
6: 
7: 
8: \bibliographystyle{plain}
9: 
10: \begin{document}
11: 
12: \title{Algorithmic Clustering of Music}
13: \author{Rudi Cilibrasi\thanks{Supported in part by NWO.
14: Address:
15: CWI, Kruislaan 413,
16: 1098 SJ Amsterdam, The Netherlands.
17: Email: {\tt Rudi.Cilibrasi@cwi.nl}.}
18: \\CWI
19: \and
20: Paul Vitanyi\thanks{Supported in part by the
21: EU project RESQ, IST--2001--37559, the NoE QUIPROCONE
22: +IST--1999--29064,
23: the ESF QiT Programmme, and the EU Fourth Framework BRA
24:  NeuroCOLT II Working Group
25: EP 27150. 
26: Address:
27: CWI, Kruislaan 413,
28: 1098 SJ Amsterdam, The Netherlands.
29: Email: {\tt Paul.Vitanyi@cwi.nl}.}\\CWI and University of Amsterdam
30: \and
31: Ronald de Wolf\thanks{Supported in part by EU project RESQ, IST-2001-37559.
32: Address:
33: CWI, Kruislaan 413,
34: 1098 SJ Amsterdam, The Netherlands.
35: Email: {\tt Ronald.de.Wolf@cwi.nl}.}\\
36: CWI}
37: \date{}
38: \maketitle
39: 
40: 
41: \begin{abstract}
42: We present a fully automatic method for music classification,
43: based only on compression of strings that represent the music
44: pieces. The method uses no background knowledge 
45: about music whatsoever: it is completely
46: general and can, without change, be used in different areas
47: like linguistic classification and genomics. It is based on an ideal
48: theory of the information content in individual objects
49: (Kolmogorov complexity), information distance, and a 
50: universal similarity metric. Experiments show that the method
51: distinguishes reasonably well between various musical genres
52: and can even cluster pieces by composer.
53: \end{abstract}
54: 
55: 
56: 
57: \section{Introduction}
58: 
59: All musical pieces are similar, but some are more similar than others.
60: Apart from being an infinite source of discussion (``Haydn is just 
61: like Mozart --- no, he's not!''), such similarities are also 
62: crucial for the design of efficient music information retrieval systems. 
63: The amount of digitized music available on the internet has grown 
64: dramatically in recent years, both in the public domain 
65: and on commercial sites. Napster and its clones are prime examples.
66: Websites offering musical content in some form or other
67: (MP3, MIDI, \ldots) need a way to organize their wealth of material;
68: they need to somehow classify their files according to 
69: musical genres and subgenres, putting similar pieces together.
70: The purpose of such organization is to enable users 
71: to navigate to pieces of music they already know and like, 
72: but also to give them advice and recommendations
73: (``If you like this, you might also like\ldots'').
74: Currently, such organization is mostly done manually by humans,
75: but some recent research has been looking into the possibilities 
76: of automating music classification.
77: 
78: A human expert, comparing different pieces of music with the aim to cluster
79: likes together, will generally look for certain specific similarities.
80: Previous attempts to automate this process do the same.
81: Generally speaking, they take a file containing a piece of music
82: and extract from it various specific numerical features,
83: related to pitch, rhythm, harmony etc.  
84: One can extract such features using for instance 
85: Fourier transforms~\cite{TC02} or wavelet transforms~\cite{GKCwavelet}.
86: The feature vectors corresponding to the various files are then 
87: classified or clustered using existing classification software, based on 
88: various standard statistical pattern recognition classifiers~\cite{TC02}, 
89: Bayesian classifiers~\cite{DTWml},
90: hidden Markov models~\cite{CVfolk},
91: ensembles of nearest-neighbor classifiers~\cite{GKCwavelet}
92: or neural networks~\cite{DTWml,Sneural}.
93: For example, one feature would be to look for rhythm in the sense 
94: of beats per minute. One can make a histogram where each histogram 
95: bin corresponds to a particular tempo in beats-per-minute and 
96: the associated peak shows how frequent and strong that
97: particular periodicity was over the entire piece. In \cite{TC02}
98: we see a gradual change from a few high peaks to many low and spread-out
99: ones going from hip-hip, rock, jazz, to classical. One can use this
100: similarity type to try to cluster pieces in these categories.
101: However, such a method requires specific and detailed knowledge of 
102: the problem area, since one needs to know what features to look for. 
103: 
104: Our aim is much more general.
105: We do not look for similarity in specific features known to 
106: be relevant for classifying music; 
107: instead we apply a general mathematical theory of similarity.
108: The aim is to capture, in a single similarity metric, 
109: {\em every effective metric\/}: 
110: effective versions of Hamming distance, Euclidean distance, 
111: edit distances, Lempel-Ziv distance, and so on.
112: Such a metric would be able to simultaneously detect {\em all\/}
113: similarities between pieces that other effective metrics can detect.
114: Rather surprisingly, such a ``universal'' metric indeed exists.
115: It was developed in \cite{LBCKKZ01,Li01,Li03}, based on the 
116: ``information distance'' of \cite{LiVi97,BGLVZ98}. 
117: Roughly speaking, two objects are deemed close if
118: we can significantly ``compress'' one given the information
119: in the other, the idea being that if two pieces are more similar,
120: then we can more succinctly describe one given the other.
121: Here compression is based on the ideal mathematical notion of Kolmogorov
122: complexity, which unfortunately is not effectively computable.
123: It is well known that when a pure mathematical theory 
124: is applied to the real world, for example in hydrodynamics
125: or in physics in general, we can in applications only approximate
126: the theoretical ideal. But still the theory gives a framework and foundation
127: for the applied science. Similarly here. We replace the ideal but 
128: noncomputable Kolmogorov-based version by standard compression techniques.
129: We lose theoretical optimality in some cases, but gain an efficiently 
130: computable similarity metric intended to
131:  approximate the theoretical ideal.
132: In contrast, a later and partially independent
133: compression-based approach of
134: \cite{BCL02a,BCL02b} for building language-trees---while 
135: citing \cite{LiVi97,BGLVZ98}---is by {\em ad hoc\/} arguments 
136: about empirical Shannon entropy and Kullback-Leibler distance 
137: resulting in non-metric distances.  
138: 
139: Earlier research has demonstrated that this new universal similarity 
140: metric works well on concrete examples in very different application
141: fields---the first completely automatic construction
142: of the phylogeny tree based on whole mitochondrial genomes,
143: \cite{LBCKKZ01,Li01,Li03} and
144: a completely automatic construction of a language tree for over 50
145: Euro-Asian languages \cite{Li03}. 
146: Other applications, not reported in print, 
147: are detecting plagiarism in student programming assignments 
148: \cite{SID}, and phylogeny of chain letters.
149: 
150: In this paper we apply this compression-based method to the classification of 
151: pieces of music. We perform various experiments on sets of 
152: mostly classical pieces given as MIDI (Musical Instrument Digital 
153: Interface) files. This contrasts with most earlier research,
154: where the music was digitized in some wave format or other
155: (the only other research based on MIDI that we are aware 
156: of is~\cite{DTWml}).
157: We compute the distances between all pairs of pieces, 
158: and then build a tree containing those pieces in a way that 
159: is consistent with those distances. 
160: First, as proof of principle, we run the program on three
161: artificially generated data sets, where we know what 
162: the final answer should be.
163: The program indeed classifies these perfectly.
164: Secondly, we show that our program can distinguish between various
165: musical genres (classical, jazz, rock) quite well.
166: Thirdly, we experiment with various sets of classical pieces.
167: The results are quite good (in the sense of conforming 
168: to our expectations) for small sets of data,
169: but tend to get a bit worse for large sets.
170: Considering the fact that the method knows nothing
171: about music, or, indeed, about any of the other areas
172: we have applied it to elsewhere, one is reminded of Dr Johnson's
173: remark  
174: %Boswell: I told him I had been that morning at a meeting of the people 
175: %called Quakers, where I had heard a woman preach. 
176: %Johnson: "Sir, a woman's preaching is like 
177: about a dog's walking on his hind legs: 
178: ``It is not done well; but you are surprised to find it done at all.''
179: 
180: The paper is organized as follows.
181: We first give a  domain-independent overview of compression-based
182: clustering: the ideal distance metric based on Kolmogorov complexity,
183: and the quartet method that turns the matrix of distances into a tree.
184: In Section~\ref{secdetails} we give the details of the current application
185: to music, the specific file formats used etc.
186: In Section~\ref{secresults} we report the results of our experiments.
187: We end with some directions for future research.
188: 
189: 
190: 
191: \section{Algorithmic Clustering}
192: 
193: \subsection{Kolmogorov complexity}
194: Each object (in the application of this paper: each piece of music) is
195: coded as a string $x$ over a finite alphabet, say the binary
196: alphabet. 
197: The integer $K(x)$ gives
198: the length of the shortest compressed binary version from which
199: $x$ can be fully reproduced, 
200: also known as the {\em Kolmogorov complexity\/} of $x$. 
201: ``Shortest'' means the minimum taken over every
202: possible decompression program, the
203: ones that are currently known as well as the ones that are possible
204: but currently unknown. We explicitly write only ``decompression''
205: because we do not even require that there is also a program that
206: compresses the original file to this compressed version---if there
207: is such a program then so much the better.  
208: Technically, the definition of Kolmogorov complexity is as follows.
209: First, we fix a syntax for expressing all and only computations (computable
210: functions). This can be in the form of an enumeration of all 
211: Turing machines, but also an enumeration of all syntactically correct
212: programs in some universal programming language like Java, Lisp, or C.
213: We then define the Kolmogorov complexity of a finite binary string
214: as the length of the shortest Turing machine, Java program, etc.
215: in our chosen syntax. Which syntax we take is unimportant, but
216: we have to stick to our choice. This choice attaches a definite positive
217: integer as the Kolmogorov complexity to each finite string.
218: 
219: Though defined in terms of a
220: particular machine model, the Kolmogorov complexity
221: is machine-independent up to an additive
222: constant
223:  and acquires an asymptotically universal and absolute character
224: through Church's thesis, and from the ability of universal machines to
225: simulate one another and execute any effective process.
226:   The Kolmogorov complexity of an object can be viewed as an absolute
227: and objective quantification of the amount of information in it.
228:    This leads to a theory of {\em absolute} information {\em contents}
229: of {\em individual} objects in contrast to classic information theory
230: which deals with {\em average} information {\em to communicate}
231: objects produced by a {\em random source}.
232: 
233: So $K(x)$ gives the length of the ultimate 
234: compressed version, say $x^*$, of $x$. 
235: This can be considered as the amount of information, number of bits,
236: contained in the string. Similarly, $K(x|y)$ is the minimal number of
237: bits (which we may think of as constituting a computer program) 
238: required to reconstruct $x$ from $y$.
239: In a way $K(x)$ expresses the individual ``entropy'' of $x$---the
240: minimal number of bits to communicate $x$ when sender and
241: receiver have no knowledge where $x$ comes from. For example,
242: to communicate Mozart's ``Zauberfl\"ote'' from a library of a 
243: million items requires at most 20 bits ($2^{20}\approx 1,000,000$), 
244: but to communicate it from scratch requires megabits.
245: For more details on this pristine notion of individual
246: information content we refer to the textbook
247: \cite{LiVi97}.
248: 
249: 
250: \subsection{Distance-based classification}
251: 
252: As mentioned, our approach is based on a new 
253: very general similarity distance, classifying the objects in
254: clusters of objects that are close together according to this distance.
255: In mathematics, lots of different distances arise in all sorts of contexts,
256: and one usually requires these to be a `metric', since otherwise 
257: undesirable effects may occur. 
258: A metric is a distance function $D(\cdot,\cdot)$ that assigns
259: a non-negative distance $D(a,b)$ to any two objects $a$ and $b$, in such a way that 
260: \begin{enumerate}
261: \item $D(a,b)=0$ only where $a=b$ 
262: \item $D(a,b)=D(b,a)$ (symmetry)
263: \item $D(a,b)\leq D(a,c)+D(c,b)$ (triangle inequality)
264: \end{enumerate}
265: A familiar example of a metric is the Euclidean metric, 
266: the everyday distance $e(a,b)$ between two objects $a,b$ 
267: expressed in, say, meters.  
268: Clearly, this distance satisfies the properties
269: $e(a,a)=0$, $e(a,b)=e(b,a)$, and $e(a,b) \leq e(a,c) + e(c,b)$
270: (Substitute $a=$ Amsterdam, $b=$ Brussels, and $c=$ Chicago.) 
271: We are interested in ``similarity metrics''. 
272: For example, if the objects are classical music pieces
273: then the function $D(a,b)=0$ if $a$ and $b$ are by the same composer
274: and $D(a,b)=1$ otherwise, is a similarity metric, albeit a somewhat elusive one. 
275: This captures only one, but quite a significant, similarity aspect 
276: between music pieces. 
277: 
278: In \cite{Li03}, a new theoretical approach
279: to a wide class of similarity metrics was proposed:
280: the ``normalized information distance'' is a metric, and it is 
281: universal in the sense that this single metric uncovers all similarities 
282: simultaneously that the metrics in the class uncover separately.
283: This should be understood in the sense that if two pieces of music
284: are similar (that is, close) according to the particular feature described by 
285: a particular metric, then they are also similar (that is, close)
286: in the sense of the normalized information distance metric. This justifies
287: calling the latter {\em the\/} similarity metric.
288: Oblivious to the problem area concerned, simply using the distances
289: according to the similarity metric, our method fully automatically
290: classifies the objects concerned, be they music pieces, 
291: text corpora, or genomic data. 
292: %Here we apply this miracle method,
293: %the Kolmogorov Similarity Estimator (KSE), to the problem of clustering
294: %classic music pieces according to their composers. 
295: %By the example metric above, therefore,
296: %the universal metric should capture whether two
297: %pieces are by the same composer. 
298: 
299: More precisely, the approach is as follows.
300: Each pair of such strings $x$ and $y$ is assigned a distance
301: \begin{equation}\label{eq.distance}
302: d(x,y) = \frac{\max\{K(x|y),K(y|x)\}}{\max\{K(x),K(y) \}} .
303: \end{equation}
304: There is a natural interpretation to $d(x,y)$: If, say, $K(y) \geq K(x)$
305: then we can rewrite
306: \[d(x,y) = \frac{K(y)-I(x:y)}{K(y)} , \]
307: where $I(x:y)$ is the information in $y$ about $x$ satisfying
308: the symmetry property $I(x:y)=I(y:x)$ up to a logarithmic additive error
309: \cite{LiVi97}.
310: That is, the distance $d(x,y)$ between $x$ and $y$ is the
311: number of bits of information that is not shared between the two strings
312: per bit of information that could be maximally shared between the two strings.
313: 
314: It is clear that $d(x,y)$ is symmetric, and in \cite{Li03} it
315: is shown that it is indeed a metric. Moreover, it is universal
316: in the sense that every metric expressing some similarity
317: that can be computed from the objects concerned is comprised
318: (in the sense of minorized) by $d(x,y)$. It is these distances that we
319: will use, albeit in the form of a rough approximation: for
320: $K(x)$ we simply use standard compression software like `gzip', `bzip2', or
321: `compress'. To compute the conditional version, $K(x|y)$ we use
322: a sophisticated theorem, known as ``symmetry of algorithmic information''
323: in \cite{LiVi97}. This says 
324: \begin{equation}\label{eq.condition}
325: K(y|x) \approx K(xy)-K(x), 
326: \end{equation}
327: so to compute the conditional complexity $K(x|y)$ we can just take
328: the difference of the unconditional complexities $K(xy)$ and $K(y)$.
329: This allows us to approximate $d(x,y)$ for every pair $x,y$.
330: 
331: 
332: Our actual practice falls short of the ideal theory in at least 
333: three respects:
334: 
335: (i) The claimed universality of the similarity distance $d(x,y)$
336: holds only for indefinitely long sequences $x,y$. Once we consider
337: strings $x,y$ of definite length $n$, the similarity distance
338: is only universal with respect to ``simple'' computable normalized information
339: distances, where ``simple'' means that they are computable by programs
340: of length, say, logarithmic or polylogarithmic in $n$.
341: This reflects the fact that, technically speaking, the universality 
342: is achieved by summing the weighted contribution of all
343: similarity distances in the class considered with respect
344: to the objects considered. Only similarity distances of which
345: the complexity is small (which means that the weight is large)
346: with respect to the size of the data concerned kick in. 
347: 
348: (ii) The Kolmogorov complexity is not computable, and it is 
349: in principle impossible to compute how far off our approximation
350: is from the target value in any useful sense.
351: 
352: (iii) To approximate the information distance in a practical sense
353: we use the standard compression program bzip2. While better compression
354: of a string will always  approximate the Kolmogorov complexity better,
355: this is, regrettably, not true for the (normalized) information distance.
356: Namely, using (\ref{eq.condition}) we consider the difference of
357: two compressed quantities. Different compressors may compress
358: the two quantities differently, causing an increase in the
359: difference even when both quantities are compressed better (but
360: not both as well). In the normalized information distance we
361: also have to deal with a ratio that causes the same problem.
362: Thus, a better compression program may not necessarily mean
363: that we also approximate the (normalized) information distance
364: better. This was borne out by the results of our experiments using
365: different compressors.
366: 
367: Despite these caveats it turns out that the practice inspired by
368: the rigorous ideal theory performs quite well. 
369: We feel this is an example that an {\em ad hoc\/}
370: approximation guided by a good theory is preferable above 
371: {\em ad hoc\/} approaches without underlying theoretical foundation.
372: 
373: 
374: \subsection{The quartet method}
375: 
376: The above approach allows us to compute the distance between 
377: any pair of objects (any two pieces of music).
378: We now need to cluster the objects, so that objects that are similar
379: according to our metric are placed close together. 
380: We do this by computing a phylogeny tree
381: based on these distances. Such a phylogeny tree can represent
382: evolution of species but more widely simply accounts for
383: closeness of objects from a set with a distance
384: metric. Such a tree will group objects in subtrees:
385: the clusters. To find the phylogeny tree there are many methods.
386: One of the most popular is the quartet method. The idea is as 
387: follows: we consider every group of four elements from our set
388: of $n$ elements (in this case, musical pieces); 
389: there are ${n \choose 4}$ such groups.
390: From each group $u,v,w,x$ we construct a tree of arity 3,
391: which implies that the tree consists of two subtrees of two
392: leaves each. Let us call such a tree a {\em quartet}.  There are
393: three possibilities denoted (i) $uv | wx$, (ii) $uw | vx$,
394: and (iii)  $ux | vw$, where a vertical bar divides the two pairs of leaf nodes
395: into two disjoint subtrees (Figure~\ref{figquart}).
396: 
397: \begin{figure}[htb]
398: \begin{center}
399: \epsfig{file=quartet.eps,width=8cm}
400: \end{center}
401: \caption{The three possible quartets for the set of leaf labels {\em u,v,w,x} }\label{figquart}
402: \end{figure}
403: 
404: The cost of a quartet is defined as the sum 
405: of the distances between each pair of neighbors; that
406: is, $C_{uv|wx} = d(u,v) + d(w,x)$.  For any given tree $T$ and any group
407: of four leaf labels $u,v,w,x$, we say $T$ is $consistent$ with $uv | wx$
408: if and only if the path from $u$ to $v$ does not cross
409: the path from $w$ to $x$.  Note that exactly one of the three possible
410: quartets for any set of 4 labels must be consistent for any given tree.
411: \begin{figure}[htb]
412: \begin{center}
413: \epsfig{file=quartex.eps,width=5cm}
414: \end{center}
415: \caption{An example tree consistent with quartet $uv | wx$ }\label{figquartex}
416: \end{figure}
417: We may think of a large tree having many smaller quartet trees embedded
418: within its structure  (Figure~\ref{figquartex}).  The total cost of a large tree is defined to be the
419: sum of the costs of all consistent quartets. 
420: First, generate a list of all possible quartets for all groups of labels
421: under consideration.  For each group of three possible quartets for a given
422: set of four labels, calculate a best (minimal) cost, and a worst (maximal)
423: cost.  Summing all best quartets yields the best (minimal) cost.
424: Conversely, summing all worst quartets yields the worst (maximal) cost.
425: The minimal and maximal values need not be attained by actual trees,
426: however the score of any tree will lie between these two values.
427: In order to be able to compare tree scores in a more uniform way,
428: we now rescale the score linearly such that the worst score maps to 0,
429: and the best score maps to 1, and term this the 
430: {\em normalized tree benefit score} $S(T)$.
431: The goal of the quartet method is to find a full tree with a maximum value
432: of $S(T)$, which is to say, the lowest total cost.
433: This optimization problem is known to be NP-hard \cite{Ji01} (which means that
434: it is infeasible in practice) but we can sometimes solve it, and
435: always approximate it. The current
436: methods in \cite{Br00} are far too computationally intensive;
437: they run many months or years on moderate-sized problems
438: of 30 objects. We have designed a simple method based
439: on randomization and hill-climbing.  First, a random tree with $2n-2$ nodes
440: is created, consisting of $n$ leaf nodes (with 1 connecting edge) labeled 
441: with the names of musical pieces, and $n-2$ non-leaf or {\em internal} nodes
442: labeled with the lowercase letter ``n'' followed by a unique integer identifier.  Each internal node has exactly three connecting edges.  For this 
443: tree $T$, we calculate the total cost of all consistent quartets, 
444: and invert and scale this value to 
445: find $S(T)$.  Typically, a random tree will be consistent with around
446: $\frac{1}{3}$ of all quartets.
447: Now, this tree is denoted the currently best known tree, and is used as
448: the basis for further searching.  We define a simple mutation on a tree
449: as one of the three possible transformations:
450: \begin{enumerate}
451: \item A {\em leaf swap}, which consists of randomly choosing two leaf nodes
452: and swapping them.
453: \item A {\em subtree swap}, which consists of randomly choosing two internal 
454: nodes and swapping the subtrees rooted at those nodes.
455: \item A {\em subtree transfer}, whereby a randomly chosen subtree (possibly a leaf) is detached and reattached in another place, maintaining arity invariants.
456: \end{enumerate}
457: Each of these simple mutations keeps invariant the
458: number of leaf and internal nodes in the tree; only the structure and placements
459: change.  Define a full mutation as a sequence of at least one but potentially
460: many simple mutations, picked according to the following distribution. 
461: First we pick the number $k$ of simple mutations that we will perform with
462: probability $2^{-k}$.  For each such simple mutation, we choose one of
463: the three types listed above with equal probability.  Finally, for each of 
464: these simple mutations, we pick leaves or internal nodes, as necessary.  Notice
465: that trees which are close to the original tree (in terms of number of 
466: simple mutation steps in between) are examined often, while trees that are 
467: far away from the original tree will eventually be examined, but not very 
468: frequently.
469: So in order to search for a better tree,
470: we simply apply a full mutation on $T$ to arrive at $T'$, and then
471: calculate $S(T')$.  If $S(T') > S(T)$, then keep $T'$ as the new best tree.
472: Otherwise, try a new different tree and repeat.  If $S(T')$ ever reaches
473: $1$, then halt, outputting the best tree.  Otherwise, run until it seems
474: no better trees are being found in a reasonable amount of time, in which
475: case the approximation is complete.
476: 
477: \begin{figure}[htb]
478: \begin{center}
479: \epsfig{file=large-graph.eps,width=8cm,angle=270}
480: \end{center}
481: \caption{Progress of the 60-piece experiment over time}\label{figprogress}
482: \end{figure}
483: 
484: Note that if a tree is ever found such that $S(T) = 1$, then we can stop
485: because we can be certain that this tree is optimal, as no tree could 
486: have a lower cost.  In fact, this perfect tree result is achieved in our 
487: artificial tree reconstruction experiment (Section~\ref{sect.artificial}) 
488: reliably in less than ten minutes.  For real-world data, $S(T)$ reaches 
489: a maximum somewhat 
490: less than $1$, presumably reflecting inconsistency in the distance matrix 
491: data fed as input to the algorithm, or indicating a search space too large
492: to solve exactly.
493: On many typical problems of up to 40 objects this tree-search gives a tree 
494: with $S(T) \geq 0.9$ within half an hour.  For large numbers of objects,
495: tree scoring itself can be slow (as this takes order $n^4$ computation steps), 
496: and the space of
497: trees is also large, so the algorithm may slow down substantially.
498: For larger experiments, we use a C++/Ruby implementation with MPI (Message
499: Passing Interface, a common standard used on massively parallel computers) on a
500: cluster of workstations in parallel to find trees more rapidly. We can
501: consider the graph of Figure~\ref{figprogress},
502: mapping the achieved $S(T)$ score as a function
503: of the number of trees examined.  Progress
504: occurs typically in a sigmoidal fashion towards a maximal value $\leq 1$.  
505: 
506: A problem with the outcomes is as follows: For natural
507: data sets we often see 
508: some leaf nodes (data items) placed near the center of the tree as singleton
509: leaves attached to internal nodes, without sibling leaf
510: nodes.  This results in a more linear, stretched out, and less
511: balanced, tree. Such trees, even if they represent the underlying
512: distance matrix faithfully, are hard to fully understand
513: and may cause misunderstanding of represented relations and clusters.
514: To counteract this effect, and to bring out the clusters of
515: related items more visibly, we have added a penalty term of
516: the following form: For each internal node with exactly one leaf
517: node attached, the tree's score is reduced by 0.005.  This induces a
518: tendency in the
519: algorithm to avoid producing degenerate mostly-linear trees in the
520: face of data that is somewhat inconsistent, and creates balanced and
521: more illuminating clusters. It should be noted that the penalty term
522: causes the algorithm in some cases to settle for a slightly lower
523: $S(T)$ score than it would have without penalty term. Also the
524: value of the penalty term is heuristically chosen. The largest 
525: experiment used 60 items, and we typically had only
526: a couple of orphans causing a penalty of only a few percent.
527: This should be set off against the final $S(T)$ score of above 0.85.
528: 
529: Another practicality concerns the stopping criterion, at which $S(T)$
530: value we stop. Essentially we stopped when the $S(T)$
531: value didn't change after examining a large number of mutated trees. 
532: An example is the progress of Figure~\ref{figprogress},
533: 
534: 
535: 
536: \section{Details of Our Implementation}\label{secdetails}
537: 
538: Initially, we downloaded 118 separate MIDI (Musical Instrument Digital
539: Interface, a versatile digital music format
540: available on the world-wide-web) 
541: files selected from a range of classical composers, as well as some
542: popular music.   Each of these files was run through a preprocessor 
543: to extract just MIDI Note-On
544: and Note-Off events.  These events were then converted to a player-piano
545: style representation, with time quantized in $0.05$ second intervals.
546: All instrument indicators, MIDI Control signals, and tempo variations were 
547: ignored.  For each track in the MIDI file, we calculate two quantities:
548: An {\em average volume} and a {\em modal note}.
549: The average volume is calculated by averaging the volume (MIDI Note velocity)
550: of all notes in the track.  The modal note is defined to be the note 
551: pitch that sounds most often in that track.  If this is not unique, 
552: then the lowest such note is chosen.  The modal note is used as a 
553: key-invariant reference point from which to represent all notes.  
554: It is denoted by $0$, higher notes are denoted by positive numbers, and 
555: lower notes are denoted by negative numbers.  A value of $1$ indicates 
556: a half-step above the modal note, and a value of $-2$ indicates
557: a whole-step below the modal note.  The tracks are sorted according to
558: decreasing average volume, and then output in succession.  For each track,
559: we iterate through each time sample in order, outputting a single signed
560: 8-bit value for each currently sounding note.  Two special values are
561: reserved to represent the end of a time step and the end of a track.  This
562: file is then used as input to the compression stage for distance
563: matrix calculation and subsequent tree search.
564: 
565: \section{Results}\label{secresults}
566: 
567: \subsection{Three controlled experiments}\label{sect.artificial}
568: 
569: With the natural data sets of music pieces that we use, one may have the preconception 
570: (or prejudice) that music by Bach should be clustered together, 
571: music by Chopin should be clustered together, and so should music by
572: rock stars. However, the preprocessed music files of a piece by Bach and
573: a piece by Chopin, or the Beatles, may resemble one another 
574: more than two different
575: pieces by Bach---by accident or indeed by design and copying. Thus, natural
576: data sets may have ambiguous, conflicting, or counterintuitive 
577: outcomes. In other words, the experiments on actual pieces have
578: the drawback of not having one clear ``correct'' answer that can 
579: function as a benchmark for assessing our experimental outcomes.
580: Before describing the experiments we did with MIDI files of actual 
581: music, we discuss three experiments that show that our
582: program indeed does what it is supposed to do---at least in 
583: artificial situations where we know in advance what the correct answer is.
584: The similarity machine consists of two parts: (i) extracting a distance matrix
585: from the data, and (ii) constructing a tree 
586: from the distance matrix using our novel quartet-based heuristic.
587: 
588: \begin{figure}[htb]
589: \begin{center}
590: \epsfig{file=arttreereal.eps,width=13cm,height=10cm}
591: \end{center}
592: \caption{The tree that our algorithm reconstructed}\label{figarttreereal}
593: \end{figure}
594: 
595: {\bf Testing the quartet-based tree construction:}
596: We first test whether the quartet-based tree construction
597: heuristic is trustworthy: 
598: We generated a random ternary tree $T$ with 18 leaves, and derived
599: a distance metric from it by defining the distance between 
600: two nodes as follows:
601: Given the length of the path from $a$ to $b$, in an integer number of
602: edges, as $L(a,b)$, let 
603: \[d(a,b) = { {L(a,b)+1} \over 18},
604: \]
605:   except when
606: $a = b$, in which case $d(a,b) = 0$.  It is easy to verify that this
607: simple formula always gives a number between 0 and 1, and is monotonic
608: with path length.
609: Given only the $18\times 18$ matrix of these normalized distances, 
610: our quartet method exactly reconstructed $T$ represented in
611: Figure~\ref{figarttreereal}, with $S(T)=1$.
612: %TODO: Rudi, Paul wants the distance matrix included here as well
613: 
614: \begin{figure}[htb]
615: \begin{center}
616: \epsfig{file=taggedfiles.eps,width=15cm}
617: \end{center}
618: \caption{Classification of artificial files with repeated 1-kilobyte tags }\label{figtaggedfiles}
619: \end{figure}
620: 
621: {\bf Testing the similarity machine on artificial data:}
622: Given that the tree reconstruction method is accurate
623: on clean consistent data, we tried whether the full procedure
624: works in an acceptable manner when we know what the outcome should
625: be like:
626: \begin{figure}[htb]
627: \begin{center}
628: \epsfig{file=filetypes.eps,width=15cm}
629: \end{center}
630: \caption{Classification of different file types}\label{figfiletypes}
631: \end{figure}
632: We randomly generated 22 separate 1-kilobyte blocks of data where
633: each byte was equally probable and called these {\em tags}.  Each tag
634: was associated with a different lowercase letter of the alphabet.  Next,
635: we generated 80-kilobyte files by starting with a block of purely random
636: bytes and applying one, two, three, or four different tags on it.
637: Applying a tag consists of ten repetitions of picking a random location
638: in the 80-kilobyte file, and overwriting that location with the universally
639: consistent tag that is indicated.  So, for instance, to create the file
640: referred to in the diagram by ``a'', we start with 80 kilobytes of random data,
641: then pick ten places to copy over this random data with the arbitrary 
642: 1-kilobyte sequence identified as tag {\em a}.  Similarly, to create file ``ab'',
643: we start with 80 kilobytes of random data, then pick ten places to put
644: copies of tag {\em a}, then pick ten more places to put copies of tag {\em b} (perhaps
645: overwriting some of the {\em a} tags).  Because we never use more than four
646: different tags, and therefore never place more than 40 copies of tags, we
647: can expect that at least half of the data in each file is random and
648: uncorrelated with the rest of the files.  The rest of the file is 
649: correlated with other files that also contain tags in common; the more 
650: tags in common, the more related the files are.
651: The resulting tree is given in Figure~\ref{figtaggedfiles}; it can be
652: seen that clustering occurs exactly as we would expect.
653: The $S(T)$ score is 0.905.
654: 
655: {\bf Testing the similarity machine on natural data:}
656: We test gross classification of files
657: based on markedly different file types.  Here, we chose several files:
658: \begin{enumerate}
659: \item Four mitochondrial gene sequences, from a black bear, polar bear, 
660: fox, and rat.
661: \item Four excerpts from the novel { \em The Zeppelin's Passenger} by 
662: E.~Phillips Oppenheim
663: \item Four MIDI files without further processing; two from Jimi Hendrix and 
664: two movements from Debussy's Suite bergamasque
665: \item Two Linux x86 ELF executables (the {\em cp} and {\em rm} commands)
666: \item Two compiled Java class files.
667: \end{enumerate}
668: As expected, the program correctly classifies each of the different types
669: of files together with like near like. The result is reported
670: in Figure~\ref{figfiletypes} with $S(T)$ equal to 0.984.
671: 
672: 
673: \subsection{Genres: rock vs.~jazz vs.~classical}
674: 
675: Before testing whether our program can see the distinctions
676: between various classical composers, we first 
677: show that it can distinguish between three broader musical genres:
678: classical music, rock, and jazz. This should be easier than
679: making distinctions ``within'' classical music. 
680: All musical pieces we used are listed in the tables in the appendix.
681: For the genre-experiment we used 12 classical pieces (the small set 
682: from Table~\ref{tableclassicalpieces}, consisting of Bach, Chopin, and Debussy),
683: 12 jazz pieces (Table~\ref{tablejazzpieces}), and
684: 12 rock pieces (Table~\ref{tablerockpieces}).
685: The tree that our program came up with is given in Figure~\ref{figgenres}.
686: The $S(T)$ score is 0.858.
687: 
688: \begin{figure}[htb]
689: \begin{center}
690: \epsfig{file=genres.eps,width=15cm,height=12cm}
691: \end{center}
692: \caption{Output for the 36 pieces from 3 genres}\label{figgenres}
693: \end{figure}
694: 
695: The discrimination between the 3 genres is good but not perfect.
696: The upper branch of the tree contains 10 of the 12 jazz pieces,
697: but also Chopin's Pr\'elude no.~15 and a Bach Prelude.
698: The two other jazz pieces, Miles Davis' ``So what'' and John 
699: Coltrane's ``Giant steps'' are placed elsewhere in the tree,
700: perhaps according to some kinship that now escapes us but can be
701: identified by closer studying of the objects concerned.
702: Of the rock pieces, 9 are placed close together in the rightmost branch, 
703: while Hendrix's ``Voodoo chile'', Rush' ``Yyz'', 
704: and Dire Straits' ``Money for nothing'' are further away.
705: In the case of the Hendrix piece this may be explained by the fact
706: that it does not fit well in a specific genre.
707: Most of the classical pieces are in the lower left part of the tree.
708: Surprisingly, 2 of the 4 Bach pieces are placed elsewhere.
709: It is not clear why this happens and may be considered an error
710: of our program, since we perceive the 4 Bach pieces to  be very close,
711: both structurally and melodically (as they all come from the mono-thematic
712: ``Wohltemperierte Klavier'').
713: However, Bach's is a seminal music and has been copied and cannibalized
714: in all kinds of recognizable or hidden manners; closer scrutiny could
715: reveal likenesses in its present company that are not now apparent to us.
716: In effect our similarity engine aims at the ideal of a perfect
717: data mining process, discovering unknown features in which the
718: data can be similar. 
719: 
720: \subsection{Classical piano music (small set)}
721: 
722: \begin{figure}
723: \begin{center}
724: \epsfig{file=small.eps,width=15cm,height=10cm}
725: \end{center}
726: \caption{Output for the 12-piece set}\label{figsmallset}
727: \end{figure}
728: 
729: In Table~\ref{tableclassicalpieces} we list all 60 classical piano pieces used, 
730: together with their abbreviations. Some of these are complete
731: compositions, others are individual movements from larger compositions.
732: They all are piano pieces, but experiments on 34 movements of symphonies
733: gave very similar results (Section~\ref{secsymphonies}).
734: Apart from running our program on the whole set of 60 piano 
735: pieces, we also tried it on two smaller sets: a small 12-piece set, 
736: indicated by `(s)' in the table, and a medium-size 32-piece set, 
737: indicated by `(s)' or `(m)'.
738: 
739: The small set encompasses the 4 movements from Debussy's Suite bergamasque,
740: 4 movements of book 2 of Bach's Wohltemperierte Klavier, and 4 preludes from 
741: Chopin's opus~28. As one can see in Figure~\ref{figsmallset}, 
742: our program does a pretty good job at clustering these pieces.
743: The $S(T)$ score is also high: 0.958.
744: The 4 Debussy movements form one cluster, as do the 4 Bach pieces.  
745: The only imperfection in the tree, judged by what one would 
746: intuitively expect, is that Chopin's Pr\'elude no.~15 lies a bit closer 
747: to Bach than to the other 3 Chopin pieces.
748: This Pr\'elude no~15, in fact, consistently forms an odd-one-out 
749: in our other experiments as well. This is an example of pure data mining,
750: since there is some musical truth 
751: to this, as no.~15 is perceived as by far the most eccentric 
752: among the 24 Pr\'eludes of Chopin's opus~28.
753: 
754: \subsection{Classical piano music (medium set)}
755: \begin{figure}[hbt]
756: \begin{center}
757: \epsfig{file=medium.eps,width=15cm,height=10cm}
758: \end{center}
759: \caption{Output for the 32-piece set}\label{figmediumset}   
760: \end{figure}
761: 
762: The medium set adds 20 pieces to the small set:
763: 6 additional Bach pieces, 6 additional Chopins, 1 more Debussy piece,
764: and 7 pieces by Haydn. The experimental results are given in
765: Figure~\ref{figmediumset}. The $S(T)$ score is slightly lower than
766: in the small set experiment: 0.895.
767: Again, there is a lot of structure
768: and expected clustering. Most of the Bach pieces are together,
769: as are the four Debussy pieces from the Suite bergamasque.
770: These four should be together because they are movements from the same piece;
771: The fifth Debussy item is somewhat apart since it comes from another piece.
772: Both the Haydn and the Chopin
773: pieces are clustered in little sub-clusters of two or three pieces, 
774: but those sub-clusters are scattered throughout the tree instead
775: of being close together in a larger cluster.
776: These small clusters may be an imperfection of the method,
777: or, alternatively point at musical similarities between the
778: clustered pieces that transcend the similarities induced by
779: the same composer. Indeed, this may point the way for further
780: musicological investigation.
781: 
782: 
783: \subsection{Classical piano music (large set)}
784: 
785: 
786: \begin{figure}
787: \begin{center}
788: \epsfig{file=large.eps,width=17cm,height=10cm}
789: \end{center}
790: \caption{Output for the 60-piece set}\label{figlargeset}   
791: \end{figure}
792: 
793: Figure~\ref{figlargeset} gives the output of a run of our program 
794: on the full set of 60 pieces. This adds 10 pieces by Beethoven, 
795: 8 by Buxtehude, and 10 by Mozart to the medium set.
796: The experimental results are given in Figure~\ref{figlargeset}. 
797: The results are still far from random, but leave more to 
798: be desired than the smaller-scale experiments.
799: Indeed, the $S(T)$ score has dropped further from that of the
800: medium-sized set to 0.844.
801: This may be an artifact of the interplay between the relatively small
802: size, and large number, of the files compared: (i) the distances
803: estimated are less accurate; (ii) the number of quartets
804: with conflicting requirements increases; and (iii) the computation
805: time rises to such an extent that the correctness score of the
806: displayed cluster graph within the set time limit
807: is lower than in the smaller samples. 
808: Nonetheless, Bach and Debussy are still reasonably well clustered,
809: but other pieces (notably the Beethoven and Chopin ones)
810: are scattered throughout the tree. Maybe this means
811: that individual music pieces by these composers are more similar
812: to pieces of other composers than they are to each other?
813: The placement of the pieces is closer to intuition on a small level
814: (for example, most pairing of siblings corresponds to musical similarity in
815: the sense of the same composer) than on the larger level.
816: This is similar to the phenomenon of little sub-clusters 
817: of Haydn or Chopin pieces that we saw in the medium-size experiment.
818: 
819: \subsection{Clustering symphonies}\label{secsymphonies}
820: 
821: Finally, we tested whether the method worked for more complicated
822: music, namely 34 symphonic pieces.  We took
823: two Haydn symphonies (no.~95 in one file, and the four movements of~104), 
824: three Mozart symphonies (39, 40, 41), 
825: three Beethoven symphonies (3, 4, 5),
826: of Schubert's Unfinished symphony, and of Saint-Saens Symphony no.~3.
827: The results are reported in Figure~\ref{figsymphonies},
828: with a quite reasonable $S(T)$ score of 0.860.
829: 
830: \begin{figure}
831: \begin{center}
832: \epsfig{file=symphonies.eps,width=14cm,height=10cm}
833: \end{center}
834: \caption{Output for the set of 34 movements of symphonies}\label{figsymphonies}
835: \end{figure}
836: 
837: 
838: 
839: \section{Future Work and Conclusion}
840: 
841: Our research raises many questions worth looking into further:
842: \begin{itemize}
843: \item The program can be used as a data mining machine to discover
844: hitherto unknown similarities between music pieces of different
845: composers or indeed different genres. In this manner we can discover
846: plagiarism or indeed honest influences between music pieces and
847: composers. Indeed, it is thinkable that we can use the method
848: to discover seminality of composers, or separate music eras
849: and fads.
850: \item A very interesting application of our program
851: would be to select a plausible composer for a newly 
852: discovered piece of music of which the composer is not known.
853: In addition to such a piece, this experiment would 
854: require a number of pieces from known composers that 
855: are plausible candidates.  We would just run our program 
856: on the set of all those pieces, and see where the new 
857: piece is placed.  If it lies squarely within a cluster 
858: of pieces by composer such-and-such, then that would be 
859: a plausible candidate composer for the new piece.
860: \item Each run of our program is different---even on 
861: the same set of data---because of our use of randomness for 
862: choosing mutations in the quartet method. 
863: It would be interesting to investigate more precisely 
864: how stable the outcomes are over different such runs.
865: \item At various points in our program, somewhat 
866: arbitrary choices were made.
867: Examples are the compression algorithms we use
868: (all practical compression algorithms will fall short
869: of Kolmogorov complexity, but some less so than others);
870: the way we transform the MIDI files (choice of length 
871: of time interval, choice of note-representation);
872: the cost function in the quartet method.
873: Other choices are possible and may or may not lead 
874: to better clustering.\footnote{We compared the quartet-based
875: approach to the tree reconstruction 
876: with alternatives.  One such alternative that we tried is to
877: compute the Minimum Spanning Tree (MST) from the matrix
878: of distances. MST has the advantage of being very efficiently 
879: computable, but resulted in trees that were much worse than
880: the quartet method. It appears that the quartet method is extremely sensitive
881: in extracting information even from small differences in the entries
882: of the distance matrix, where other methods would be led to error.}
883: Ideally, one would like to have well-founded theoretical 
884: reasons to decide such choices in an optimal way. 
885: Lacking those, trial-and-error seems the only way 
886: to deal with them.
887: \item The experimental results got decidedly worse when
888: the number of pieces grew.
889: Better compression methods may improve this situation, but the effect
890: is probably due to unknown scaling problems with the quartet
891: method or nonlinear scaling of possible similarities in a larger
892: group of objects (akin to the phenomenon described in the so-called
893: ``birthday paradox'': in a group of about two dozen people there
894: is a high chance that at least two of the people have the
895: same birthday). Inspection of the underlying distance matrices
896: makes us suspect the latter.
897: \item Our program is not very good at dealing with very 
898: small data files (100 bytes or so), because significant
899: compression only kicks in for larger files.
900: We might deal with this by comparing various sets of such 
901: pieces against each other, instead of individual ones.
902: \end{itemize}
903: 
904: \subsection*{Acknowledgments}
905: We thank John Tromp for useful discussions.
906: 
907: \begin{thebibliography}{99}
908: 
909: 
910: 
911: \bibitem{BCL02a}
912: D.~Benedetto, E.~Caglioti, and V.~Loreto.
913: Language trees and zipping, {\em Physical Review Letters},
914: 88:4(2002) 048702.
915: 
916: \bibitem{BCL02b}
917: Ph.~Ball. 
918: Algorithm makes tongue tree, {\em Nature}, 22 January,
919: 2002.
920: 
921: \bibitem{BGLVZ98}
922: C.H.~Bennett, P.~G\'acs, M. Li, P.M.B.~Vit\'anyi, and W.~Zurek.
923: Information Distance, {\em IEEE Transactions on Information Theory},
924: 44:4(1998), 1407--1423.
925: 
926: \bibitem{Br00}
927: D.~Bryant, V.~Berry, P.~Kearney, M.~Li, T.~Jiang,
928: T.~Wareham and H.~Zhang. A practical algorithm for
929: recovering the best supported edges of an evolutionary tree.
930: {\em Proc. 11th  ACM-SIAM Symposium on Discrete Algorithms}, 
931: 287--296, 2000.
932: 
933: %\bibitem{CF02}
934: %M.~Cooper and J.~Foote.
935: %Automatic music summarization via similarity analysis,
936: %{\em Proc.~IRCAM}, 2002.
937: 
938: \bibitem{CVfolk}
939: W.~Chai and B.~Vercoe.
940: Folk music classification using hidden Markov models.
941: {\em Proc.~of International Conference on Artificial Intelligence}, 2001.
942: 
943: \bibitem{DTWml}
944: R.~Dannenberg, B.~Thom, and D.~Watson. 
945: A machine learning approach to musical style recognition,
946: {\em Proc.~International Computer Music Conference}, pp. 344-347, 1997.
947: 
948: \bibitem{GKCwavelet}
949: M.~Grimaldi, A.~Kokaram, and P.~Cunningham.
950: Classifying music by genre using the wavelet packet transform
951: and a round-robin ensemble.
952: Technical report TCD-CS-2002-64, Trinity College Dublin, 2002.
953: http://www.cs.tcd.ie/publications/tech-reports/reports.02/TCD-CS-2002-64.pdf
954: 
955: \bibitem{Ji01}
956: T.~Jiang, P.~Kearney, and M.~Li.
957: A Polynomial Time Approximation Scheme for Inferring Evolutionary Trees from
958: Quartet Topologies and its Application.
959: {\em SIAM J. Computing}, 30:6(2001), 1942--1961.
960: 
961: \bibitem{LBCKKZ01}
962: M.~Li, J.H.~Badger, X.~Chen, S.~Kwong, P.~Kearney, and H.~Zhang.
963: An information-based sequence distance and its application
964: to whole mitochondrial genome phylogeny,
965: {\em Bioinformatics}, 17:2(2001), 149--154.
966: 
967: \bibitem{Li01}
968: M.~Li and P.M.B.~Vit\'anyi.
969: Algorithmic Complexity,
970: pp.~376--382 in: {\em International Encyclopedia
971: of the Social \& Behavioral Sciences},
972: N.J.~Smelser and P.B.~Baltes, Eds., Pergamon, Oxford, 2001/2002.
973: 
974: \bibitem{Li03}
975: M.~Li, X.~Chen, X.~Li, B.~Ma, P.~Vit\'anyi.
976: The similarity metric,
977: {\em Proc. 14th ACM-SIAM Symposium on Discrete Algorithms}, 2003.
978: 
979: \bibitem{LiVi97}
980: M.~Li and P.M.B.~Vit\'anyi.
981: {\em An Introduction to Kolmogorov Complexity
982: and its Applications}, Springer-Verlag, New York, 2nd Edition, 1997.
983: 
984: \bibitem{Sneural}
985: P.~Scott.
986: Music classification using neural networks, 2001.\\
987: http://www.stanford.edu/class/ee373a/musicclassification.pdf
988: 
989: \bibitem{SID}
990: Shared Information Distance or Software Integrity
991: Detection, Computer Science, University of California, Santa Barbara,
992:  http://dna.cs.ucsb.edu/SID/
993: 
994: \bibitem{TC02}
995: G.~Tzanetakis and P.~Cook, Music genre classification of audio signals,
996: {\em IEEE Transactions on Speech and Audio Processing},
997: 10(5):293--302, 2002.
998: 
999: \end{thebibliography}
1000: 
1001: 
1002: \appendix
1003: 
1004: \section{Appendix: The Music Pieces Used}
1005: \begin{table}[htb]
1006: \begin{center}
1007: \begin{tabular}{|l|l|l|} \hline
1008: Composer & Piece & Acronym\\ \hline\hline
1009: J.S.~Bach (10) & Wohltemperierte Klavier II: Preludes and fugues 1,2 & BachWTK2\{F,P\}\{1,2\} (s)\\
1010:           & Goldberg Variations: Aria, Variations 1,2  & BachGold\{Aria,V1,V2\} (m) \\
1011:           & Kunst der Fuge: Variations 1,2             & BachKdF\{1,2\} (m) \\
1012:           & Invention 1                                & BachInven1 (m) \\
1013: Beethoven (10) & Sonata no.~8 (Pathetique), 1st movement    & BeetSon8m1 \\ 
1014:           & Sonata no.~14 (Mondschein), 3 movements    & BeetSon14m\{1,2,3\}\\
1015:           & Sonata no.~21 (Waldstein), 2nd movement    & BeetSon21m2\\
1016:           & Sonata no.~23 (Appassionata)               & BeetSon23\\
1017:           & Sonata no.~26 (Les Adieux)                 & BeetSon26\\
1018:           & Sonata no.~29 (Hammerklavier)              & BeetSon29\\
1019:           & Romance no.~1                              & BeetRomance1\\
1020:           & F\"ur Elise                                & BeetFurElise\\
1021: Buxtehude (8) & Prelude and fugues, BuxWV 139,143,144,163  & BuxtPF\{139,143,144,163\} \\
1022:           & Toccata and fugue, BuxWV 165               & BuxtTF165 \\
1023:           & Fugue, BuxWV 174                           & BuxtFug174\\
1024:           & Passacaglia, BuxWV 161                     & BuxtPassa161\\
1025:           & Canzonetta, BuxWV 168                      & BuxtCanz168\\
1026: Chopin (10) & Pr\'eludes op.~28: 1, 15, 22, 24 & ChopPrel\{1,15,22,24\} (s)\\
1027:           & Etudes op.~10, nos.~1, 2, and 3            & ChopEtu\{1,2,3\} (m)\\
1028:           & Nocturnes nos.~1 and 2                     & ChopNoct\{1,2\} (m)\\
1029:           & Sonata no.~2, 3rd movement                 & ChopSon2m3 (m)\\
1030: Debussy (5) & Suite bergamasque, 4 movements             & DebusBerg\{1,2,3,4\} (s)\\
1031:           & Children's corner suite (Gradus ad Parnassum) & DebusChCorm1 (m)\\
1032: Haydn (7) & Sonatas nos.~27, 28, 37, and 38            & HaydnSon\{27,28,37,38\} (m)\\
1033:           & Sonata no.~40, movements 1,2               & HaydnSon40m\{1,2\} (m)\\
1034:           & Andante and variations                     & HaydnAndaVari (m)\\
1035: Mozart (10) & Sonatas nos.~1,2,3,4,6,19                & MozSon\{1,2,3,4,6,19\} \\
1036:           & Rondo from Sonata no.~16                   & MozSon16Rondo \\
1037:           & Fantasias K397, 475                        & MozFantK\{397,475\} \\
1038:           & Variations ``Ah, vous dirais-je madam''    & MozVarsDirais\\ \hline
1039: \end{tabular}
1040: \end{center}
1041: \caption{The 60 classical pieces used 
1042: (`m' indicates presence in the medium set, `s' in the small and medium sets)}\label{tableclassicalpieces}
1043: \end{table}
1044: 
1045: \begin{table}[htb]
1046: %http://www.fortunecity.com/tinpan/mingus/51/eng.html
1047: \begin{center}
1048: \begin{tabular}{|l|l|} \hline
1049: John Coltrane& Blue trane\\
1050:              & Giant steps\\
1051:              & Lazy bird\\
1052:              & Impressions\\
1053: Miles Davis  & Milestones\\
1054:              & Seven steps to heaven\\ 
1055:              & Solar\\
1056:              & So what\\
1057: George Gershwin & Summertime\\
1058: Dizzy Gillespie & Night in Tunisia\\
1059: Thelonious Monk & Round midnight\\
1060: Charlie Parker  & Yardbird suite\\ \hline
1061: \end{tabular}
1062: \end{center}
1063: \caption{The 12 jazz pieces used}\label{tablejazzpieces}
1064: \end{table}
1065: 
1066: \begin{table}[htb]
1067: %http://www.fortunecity.com/tinpan/mingus/51/eng.html
1068: \begin{center}
1069: \begin{tabular}{|l|l|} \hline
1070: The Beatles  & Eleanor Rigby\\
1071:              & Michelle\\
1072: Eric Clapton & Cocaine\\
1073:              & Layla\\
1074: Dire Straits & Money for nothing\\
1075: Led Zeppelin & Stairway to heaven\\
1076: Metallica    & One\\
1077: Jimi Hendrix & Hey Joe\\
1078:              & Voodoo chile\\
1079: The Police   & Every breath you take\\
1080:              & Message in a bottle\\
1081: Rush         & Yyz\\ \hline
1082: \end{tabular}
1083: \end{center}
1084: \caption{The 12 rock pieces used}\label{tablerockpieces}
1085: \end{table}
1086: 
1087: 
1088: \end{document}
1089: 
1090: 
1091: