q-bio0505050/ms.tex
1: \documentclass{article}
2: \pagestyle{plain}
3: \usepackage{epsfig}
4: %\setlength{\textwidth}{22pc}
5: \begin{document}
6: 
7: \title{{\bf HLA and HIV Infection Progression: Application of the Minimum Description Length Principle to Statistical Genetics}}
8: 
9: \author{Peter T. Hraber$^{\ast,\dag}$, 
10: Bette T. Korber$^{\ast,\dag}$, 
11: Steven Wolinsky$^\ddag$,\\
12: Henry Erlich$^\S$, 
13: Elizabeth Trachtenberg$^\P$, and 
14: Thomas B. Kepler$^{\ast,||}$\\
15: \\
16: \normalsize $^\ast$Santa Fe Institute, 1399 Hyde Park Road, Santa Fe NM 87501\\
17: \normalsize $^\dag$Los Alamos National Laboratory, Los Alamos NM 87545\\
18: \normalsize $^\ddag$Feinberg School of Medicine, Northwestern University,\\
19: \normalsize 676 North St. Claire, Suite 200, Chicago IL 60611\\
20: \normalsize $^\S$Roche Molecular Systems, 1145 Atlantic Avenue, Alameda CA 94501\\
21: \normalsize $^\P$Children's Hospital Oakland Research Institute,\\
22: \normalsize 5700 Martin Luther King Jr. Way, Oakland CA 94609\\
23: \normalsize $^{||}$Department of Biostatistics and Bioinformatics, and\\
24: \normalsize Center for Bioinformatics \& Computational Biology,\\
25: \normalsize Box 90090, Duke University, Durham NC 27708\\
26: }
27: \date{}
28: \maketitle
29: 
30: \thispagestyle{empty}
31: \normalsize
32: 
33: \begin{center}
34: \begin{description}
35: 
36: \item {\bf Classification}\\
37: Biological Science/Immunology \& Physical Science/Applied Mathematics\\
38: 
39: \item {\bf Corresponding author}\\
40: Peter T. Hraber\\
41: address: Santa Fe Institute, 1399 Hyde Park Road, Santa Fe NM 87501\\
42: phone: +1 (505) 984-8800\\
43: fax: +1 (505) 982-0565\\
44: email: pth@santafe.edu\\
45: 
46: \item {\bf Manuscript information}\\
47: Text pages: 14\\
48: Figures: 1\\
49: Tables: 2\\
50: Words in abstract: 245 ($<250$)\\
51: Character count: $45536$ ($<47000$)\\
52: 
53: \item {\bf Nonstandard abbreviations}\\
54: MACS: multicenter AIDS cohort study\\
55: MDL: minimum description length\\
56: 
57: \end{description}
58: \end{center}
59: \newpage
60: 
61: \vspace{3 in}
62: \begin{abstract}
63: 
64: The minimum description length (MDL) principle was developed in the
65: context of computational complexity and coding theory. It states that
66: the best model to account for some data minimizes the sum of the
67: lengths, in bits, of the descriptions of the model and the data as
68: encoded via the model. The MDL principle gives a criterion for
69: parameter selection, by using the description length as a test
70: statistic. Class I HLA genes play a major role in the immune response
71: to HIV, and are known to be associated with rates of progression to
72: AIDS. However, these genes are highly polymorphic, making it difficult
73: to associate alleles with disease outcome, given statistical issues of
74: multiple testing. Application of the MDL principle to immunogenetic
75: data from a longitudinal cohort study (Chicago MACS) enables
76: classification of alleles associated with plasma HIV RNA abundance, an
77: indicator of infection progression. We recently reported that MDL
78: analysis of the relationship of HLA supertypes (a classification of
79: alleles by epitope-binding anchor motifs) with HIV RNA levels
80: identifies associations between human genotype and viral RNA. Details
81: of the MDL approach and more extended analyses of HLA and viral RNA
82: are described here. Variation in progression is strongly associated
83: with HLA-B. Allele associations with viral levels support and extend
84: previous studies. In particular, individuals without {\em B58s}
85: supertype alleles average viral RNA levels 3.6-fold greater than
86: individuals with them. Mechanisms for these associations include
87: variation in epitope specificity and selection that favors rare
88: alleles.
89: \end{abstract}
90: \newpage
91: 
92: Progression of HIV infection is characterized by three phases: acute,
93: or early, chronic, and AIDS, the final phase of infection preceeding
94: death \cite{McMichael01}. The chronic phase is variable in duration,
95: lasting ten years on average, but varying from two to twenty years.  A
96: good predictor of the duration of the chronic phase is the viral RNA
97: level during chronic infection, with higher levels consistently
98: associated with more rapid progression than lower levels
99: \cite{Mellors96}. A major challenge for treating HIV and developing
100: effective vaccination strategies is to understand what contributes to
101: variation in plasma viral RNA levels, and hence to infection
102: progression.
103: 
104: The cell-mediated immune response identifies and eliminates infected
105: cells from an individual. A central role in this response is played by
106: the major histocompatibility complex (MHC), in humans, also known as
107: human leukocyte antigens (HLA). Two classes of HLA genes code for
108: co-dominately expressed cell-surface glycoproteins, and present
109: processed peptide to circulating T-cells, which discriminate between
110: self and non-self \cite{Germain,WilliamsReview}.
111: 
112: Class I HLA molecules are expressed on all nucleated cells except germ
113: cells. In infected cells, they bind and present antigenic peptide
114: fragments to T-cell receptors on CD8$^+$ T-lymphocytes, which are
115: usually cytotoxic and cause lysis of the infected cell. Class II HLA
116: molecules are expressed on immunogenetically reactive cells, such as
117: dendritic cells, B-cells, macrophages, and activated T-cells. They
118: present antigen peptide fragments to T-cell receptors on CD4$^+$
119: T-lymphocytes and the interaction results in release of cytokines that
120: stimulate the immune response.
121: 
122: Human HLA loci are among the most diverse known \cite{Bodmer,Little}.
123: This diversity provides a repertoire to recognize evolving antigens
124: \cite{Little,Hill}. Previous studies of associations between
125: HLA alleles and variation in progression of HIV-1 infection have
126: established that within-host HLA diversity helps to inhibit viral
127: infection, by associating degrees of heterozygosity with rates of HIV
128: disease progression \cite{Roger}. Thus, homozygous individuals,
129: particularly at the HLA-B locus, suffer a greater rate of progression
130: than do heterozygotes \cite{Roger,Carrington99}.  Identifying which
131: alleles are associated with variation in rates of infection
132: progression has been difficult, due in part to the compounding of
133: error rates incurred when testing many alternative hypotheses, and
134: published results do not always agree \cite{otherMS,Trachtenberg01}.
135: 
136: This study demonstrates the use of an information-based criterion for
137: statistical inference. Its approach to multiple testing differs from
138: that of standard analytic techniques, and provides the ability to
139: resolve associations between variation in HIV RNA abundance and
140: variation in HLA alleles.
141: 
142: As an application of computational complexity and optimal coding
143: theory to statistical inference, the minimum description length (MDL)
144: principle states that the best statistical model, or hypothesis, to
145: account for some observed data is the model that minimizes the sum of
146: the number of bits required to describe both the model and the data
147: encoded via the model \cite{Rissanen,Li93,HansenYu}. It is a
148: model-selection criterion that balances the need for parsimony and
149: fidelity, by penalizing equally for the information required to
150: specify the model and the information required to encode the residual
151: error.
152: 
153: The analyses detailed below apply the MDL principle to the problem of
154: partitioning individuals into groups having similar HIV RNA levels,
155: based on HLA alleles present in each case.
156: 
157: \subsection*{Chicago MACS HLA \& HIV Data}
158: 
159: The Chicago Multicenter AIDS Cohort Study (MACS) provided an
160: opportunity to analyze a detailed, long-term, longitudinal set of
161: clinical HIV/HLA data \cite{otherMS}. Each participant provided
162: informed consent in writing. Of 564 HIV-positive cases sampled in the
163: Chicago MACS, 479 provided information about both the rate of disease
164: progression and HLA genetic background. Progression was indicated by
165: the quasi-stationary ``set-point'' viral RNA level during chronic
166: infection. Immunogenetic background was obtained by determining which
167: HLA alleles from class I (HLA-A, -B, and -C) and class II (HLA-DRB1,
168: -DQB1, and -DPB1) loci were present in each individual.
169: 
170: Viral RNA set-point levels were determined after acute infection and
171: prior to any therapeutic intervention or the onset of AIDS, as defined
172: by the presence of an opportunistic infection or CD4$^+$ T-cell count
173: below 200 per ml of plasma. Because the assay has a detection
174: threshold of 300 copies of virus per ml \cite{otherMS},
175: maximum-likelihood estimators were adjusted to avoid biased estimates
176: of population parameters from a truncated, or censored, sample
177: distribution \cite{normal}. Viral RNA levels were log-transformed so
178: as better to approximate a normal distribution.
179: 
180: High-resolution class I and II HLA genotyping \cite{otherMS} provided
181: four-digit allele designations, though analyses were generally
182: performed using two-digit allele designations because of the resulting
183: reduction of allelic diversity and increased number of samples per
184: allele. Because of the potential for results to be confounded by an
185: effect associated with an individual's ethnicity or revised sampling
186: protocol, two separate analyses were performed, one using data from
187: the entire cohort, and another using only data from Caucasian
188: individuals. Sample numbers were too small to study other subgroups
189: independently.
190: 
191: HLA supertypes group class I alleles by their peptide-binding anchor
192: motifs \cite{supertypes}. Assignment of four-digit allele designations
193: to functionally related groups of supertypes at HLA-A and -B loci
194: facilitated further analysis. Where they could be determined, HLA-A
195: and HLA-B supertypes were assigned from four-digit allele designations
196: \cite{otherMS}. As with two-digit allele designations for each locus,
197: HLA-A and -B supertypes were assessed for association with viral RNA
198: levels. Cases having other alleles were withheld from classification
199: and subsequent analysis of supertypes.
200: 
201: A description length analysis determined whether HIV RNA levels were
202: non-trivially associated with alleles at any HLA locus.
203: 
204: \subsection*{Description Lengths}
205: 
206: The challenge of data classification is to find the best partition,
207: such that observations within a group are well-described as
208: independent draws from a single population, but differences in
209: population distributions exist between groups. Whether the data are
210: better represented as two groups, or more, than as one depends on the
211: description lengths that result.
212: 
213: We use the family of Gaussian distributions to model viral RNA
214: levels. While the MDL strategy can be applied using any probabilistic
215: model, a log-normal distribution is a good choice for the observed
216: plasma viral RNA values. First, the description length of the model
217: and of the data given the model is calculated as described below,
218: grouping all of the observations into one normal distribution,
219: $L_1$. Next, the data are broken into two partitions, $L_2$, and
220: the log-RNA values associated with HLA alleles are partitioned to
221: minimize the description length given the constraint that two Gaussian
222: distributions, each having their own mean and variance, are used to
223: model the data.
224: 
225: For fixed $n \times n$ covariance matrix $\Sigma$, the description
226: length is $L_\Sigma = \frac{1}{2}\log |\Sigma| + \frac{1}{2}
227: Y'\Sigma^{-1}Y + C$, where $Y$ is the $n$-component vector of
228: observations and $C$ is the quantity of information required to
229: specify the partition. Logarithms are computed in base two, with
230: fractional values rounded upwards, so that the resulting units are
231: bits. The description length of interest results from integrating $L$
232: over all covariance matrices with the appropriate structure. In
233: practice, we use Laplace's approximation for the integral
234: \cite{Rissanen,Lindley} which gives, asymptotically, $L =
235: \frac{1}{2}\log |\hat{\Sigma}| + \frac{1}{2} Y'\hat{\Sigma}^{-1}Y +
236: \frac{k}{2} \log n + C$, where $k$ is the number of free parameters in
237: the covariance model, and $\hat{\Sigma}$ is the specific covariance
238: matrix of the appropriate structure that minimizes $L_\Sigma$.
239: A more detailed account appears in the Appendix.
240: 
241: The analog of a null hypothesis is the assumption that one group of
242: alleles is sufficient to account for the variation in viral RNA. The
243: description length for one group is: $L_1=\frac{1}{2}\left(n+(n-1)\log
244: s^2 +\log n\overline{x}^2+2\log n\right)$, where $n$ is the total
245: number of observations, $s^2$ is the maximum-likelihood estimate of
246: the population variance and $\overline{x}$ is the sample mean,
247: computed as the Winsorized mean \cite{normal} because of truncation
248: below the sensitivity limit of the RNA assay.
249: 
250: %In general, $L = \frac{1}{2}\log |\Sigma| + \frac{1}{2} Y'\Sigma^{-1}Y
251: %+ C$, where $\Sigma$ is the $n \times n$ covariance matrix, $Y$ is
252: %the observation vector, $C$ represents the information required to
253: %specify the partition, and logarithms are computed in base two, with
254: %fractional values rounded upwards, so that the resulting units are
255: %bits. A more detailed account appears in the Appendix.
256: 
257: %The analog of a null hypothesis is the assumption that one group of
258: %alleles is sufficient to account for the variation in viral RNA. The
259: %description length for one group is: $L_1=\frac{1}{2}\left(n+(n-1)\log
260: %s^2 +\log n\overline{x}^2+2\log n\right)$, where $n$ is the total
261: %number of observations, $s^2$ is the maximum-likelihood estimate of
262: %the population variance and $\overline{x}$ is the sample mean,
263: %computed as the Winsorized mean \cite{normal} because of truncation
264: %below the sensitivity limit of the RNA assay.
265: 
266: It follows that the description length for two groups can be computed
267: as:
268: 
269: \[ L_{2}=\frac{1}{2} \sum_{i=1}^2 \left( n_i + (n_i-1)\log s_i^2 + \log n_i\overline{x}_i^2 + 2 \log n_i \right) + C,\]
270: 
271: where $C$ is an adjustment for performing multiple comparisons.
272: Because additional information is required to specify the optimum
273: partition, the description length is increased by a quantity related
274: to the number of partitions evaluated, such that $C = N \log k$ bits,
275: where $N$ is the number of alleles observed at the partitioned
276: locus. For $k=2$, $C=N$.
277: 
278: Further partitions of alleles into more than two groups might yield a
279: shorter description length, computed as a summation over terms in the
280: equation for $L_{2}$ for each of the $k$ distinct groups.
281: 
282: The shortest description length for any value of $k$ indicates the
283: best choice of model parameters, including the number of parameters,
284: and hence, the optimum partition of $N$ alleles into $k$ groups. We
285: denote this as $L^*$.
286: 
287: \subsection*{Algorithm}
288: 
289: The minimum description length is found by iteratively computing the
290: description length for each possible partition of alleles into groups
291: and taking the minimum as optimal. Iteration consists first of
292: determining the number of alleles, $N$, at a particular locus, and
293: then incrementing through each of the $k^{(N-1)}$ possible partitions
294: of alleles into $k$ groups, computing the associated description
295: length, and reporting the best results. Each iteration evaluates one
296: possible mapping of alleles to groups. Searching through all possible
297: partitions using the description length as an optimality criterion
298: ensures selection of the best partition as a result of the search.
299: 
300: In this mapping, the ordering of groups is informative, because the
301: ordering gives the relative dominance of alleles for diploid loci. An
302: individual having an allele assigned to the first-order group is
303: assigned to that group. Otherwise, the individual is assigned to the
304: next appropriate group. Two individuals sharing one allele might be
305: placed in either the same group or different groups, depending on the
306: mapping of alleles to groups in a particular iterate. For example,
307: consider how one might group two individuals, one with alleles {\em A1}
308: and {\em A2} at some locus, and another with alleles {\em A2} and {\em
309: A3}. Whether or not they are grouped together depends on the assignment
310: of alleles to groups, and can be done several different ways. The
311: algorithm enumerates each possible assignment of alleles to groups.
312: 
313: The extent of the search scales as $k^N$. In practice, the most
314: diverse locus was HLA-B, with 30 alleles when analyzed using two-digit
315: allele designations. For two groups, this gives $2^{30} \approx
316: 10^{8}$ possible partitions. Serial iteration on an UltraSPARC-IIi
317: 440MHz CPU (Sun Microsystems) requires roughly 36 hours for
318: completion. A parallel implementation requires no message passing, so
319: computing time scales inversely with an increasing number of CPUs, or
320: doubling available processors halves the time for iteration. With
321: many CPUs, the search space of $2^{30}$ partitions can be exhaustively
322: evaluated in an hour or less. Unfortunately, exhaustively evaluating
323: all three-way partitions is prohibitive, as $3^{30} \approx 2 \times
324: 10^{14}$, over a million-fold increase in computational effort!
325: Supertype classification reduced the diversity of possible partitions
326: and enabled partitioning of the data into more than two groups.
327: 
328: The algorithm was implemented in C and will be distributed on request.
329: 
330: \subsection*{Class I \& II HLA Results}
331: 
332: The description length for the entire cohort as one group is $L_1=934$
333: bits; for the Caucasian subsample, it is $L_1=721$ bits. In general,
334: $L_1 < L_2$ at most loci (Table~1), so the MDL criterion does not
335: support partitioning alleles into groups that are predictive of high
336: or low RNA levels, except at HLA-B, where $L_2 < L_1$. In the
337: subsample, partitioning HLA-C or HLA-DQB1 alleles can also provide
338: preferred two-way splits, though not as well as HLA-B.  Further
339: partitioning was intractable because of great allelic diversity, as
340: previously mentioned. Partitions of HLA-B alleles provide the best
341: groupings among all loci. Because $L_2^* < L_1$, two groups,
342: partitioned by HLA-B alleles, provide a better description than one
343: (Fig.~1a and 1b).
344: 
345: What is the composition of the optimum groupings?  For the entire
346: cohort, the following alleles were associated with low viral RNA
347: levels: {\em B*13}, {\em B*27}, {\em B*38}, {\em B*45}, {\em B*49},
348: {\em B*57}, {\em B*58}, and {\em B*81}. The remaining alleles,
349: associated with greater viral RNA than the first group, are: {\em
350: B*07}, {\em B*08}, {\em B*14}, {\em B*15}, {\em B*18}, {\em B*35},
351: {\em B*37}, {\em B*39}, {\em B*40}, {\em B*41}, {\em B*42}, {\em
352: B*44}, {\em B*47}, {\em B*48}, {\em B*50}, {\em B*51}, {\em B*52},
353: {\em B*53}, {\em B*55}, {\em B*56}, {\em B*67}, and {\em B*82}. As
354: described earlier, having any alleles associated with the first group
355: is sufficient for an individual to be assigned to the group having
356: lower viral RNA.
357: 
358: How robust are these assignments of alleles to groups?  Four
359: alternative groupings provide description lengths within one bit of
360: the optimum. They do not dramatically rearrange the assigment of
361: individuals to groups, but do provide insight as to which alleles are
362: assigned to either group with less confidence. Among near-optimal
363: partitions, alleles {\em B*82} and {\em B*67} were assigned to groups
364: other than in the optimum partition.
365: 
366: In the Caucasian subsample, alleles {\em B*13}, {\em B*27}, {\em
367: B*40}, {\em B*45}, {\em B*48}, {\em B*49}, {\em B*57}, and {\em B*58}
368: are associated with lower viral RNA, and the remaining alleles, {\em
369: B*07}, {\em B*08}, {\em B*14}, {\em B*15}, {\em B*18}, {\em B*35},
370: {\em B*37}, {\em B*38}, {\em B*39}, {\em B*41}, {\em B*44}, {\em
371: B*47}, {\em B*50}, {\em B*51}, {\em B*52}, {\em B*53}, {\em B*55}, and
372: {\em B*56}, or lack of any alleles from the first group, are
373: associated with greater viral RNA levels. Two nearly optimal
374: partitions assigned alleles {\em B*47} and {\em B*48} to the second
375: group.  Fig.~1 illustrates the distributions of viral RNA levels from
376: this subsample, as one group (Fig.~1c) and as the best partition at
377: HLA-B (Fig.~1d).
378: 
379: To summarize the most robust inferences from the analyses of two-digit
380: allele designations, individuals having HLA-B alleles {\em B*13}, {\em
381: B*27}, {\em B*45}, {\em B*49}, {\em B*57}, or {\em B*58} were
382: associated with lower viral RNA levels than their counterparts lacking
383: these alleles.
384: 
385: Comparison of groupings obtained via the MDL approach with more
386: traditional means for statistical inference, a two-tailed, two-sample,
387: Welch modified t-test, which does not assume equal variances, and its
388: non-parametric variant, the Wilcoxon rank-sum test \cite{Venables},
389: was very favorable. In each case, the null hypothesis was that of no
390: difference between the group mean log-transformed viral RNA levels,
391: and the alternative hypothesis was that the means differ. Both tests
392: agreed in rejecting the null hypothesis in favor of the alternative
393: ($P<10^{-10}$).
394: 
395: \subsection*{HLA Supertype Results}
396: 
397: Assigning the diploid, co-dominantly expressed HLA-A alleles to four
398: HLA-A supertypes \cite{supertypes}, {\em A1s}, {\em A2s}, {\em A3s},
399: and {\em A24s}, was possible for 399 individuals. The mapping of HLA-B
400: alleles to five supertypes, {\em B7s}, {\em B27s}, {\em B44s}, {\em
401: B58s}, and {\em B62s}, was made for 352 individuals. The resulting
402: decrease in allelic diversity enabled analysis for $k>2$.
403: 
404: Description lengths of the best $k$-way partitions of supertype
405: alleles for HLA-A supertypes are: $L_1=793$, $L_2=782$, $L_3=789$, and
406: $L_4=794$ bits. The best description length results from a two-way
407: split, though a three-way split also yields a shorter description
408: length than that obtained from one group. The best partition of HLA-A
409: supertypes assigned individuals having {\em A1s} alleles to the low
410: RNA group.
411: 
412: For HLA-B supertypes, $L_1=704$, $L_2=691$, $L_3=693$, and $L_4=697$
413: bits (Fig.~1e). The best model results when $k=2$. Overall,
414: individuals lacking {\em B58s} alleles averaged viral RNA levels
415: $3.6$-times greater than individuals having {\em B58s} supertype
416: alleles (Fig.~1f). Thus, individuals with {\em B58s} alleles have
417: significantly lower viral RNA levels than individuals without them.
418: 
419: Table~2 summarizes results of assigning HLA-B associations to high or
420: low viral-RNA categories as two-digit allele designations from both
421: the entire cohort and the Caucasian subsample, and as supertypes for
422: those individuals having two alleles that could be assigned to a
423: supertype. Alleles not found in a sample are indicated by a dash. The
424: {\em B*15} alleles are not shown because their high-resolution
425: genotype designations correspond to four different supertypes.
426: 
427: Overall, the most consistent associations with low viral RNA are among
428: the {\em B58s}, and with high viral RNA, the {\em
429: B7s}. Inconsistencies in assignment to a category occur for the {\em
430: B*13}, {\em B*27}, {\em B*45}, and {\em B*49} alleles, which are in
431: the low viral-RNA group when analyzed as such, but the high viral-RNA
432: group when assigned to supertypes.
433: 
434: When compared with alternative inferential techniques, the difference
435: between group viral RNA levels was highly significant. This and
436: agreement with alleles reported to be associated with variation in
437: viral RNA levels in previously published studies indicate that using
438: the description length as a test statistic can provide reliable
439: inferences.
440: 
441: \subsection*{MDL \& Statistical Inference}
442: 
443: The traditional statistical solution is to pose a question as follows:
444: suppose that the simpler model (e.g., one homogeneous population) were
445: actually true; call this the null hypothesis. How often would one, in
446: similar experiments, get data that look as different from that
447: expected under the null hypothesis as in the actual experiment?
448: 
449: This technique has limitations when the partition that represents the
450: alternative hypothesis is not given in advance. There are then many
451: potential alternative partitions and the appropriate distribution
452: under the null hypothesis for this ensemble of tests is very difficult
453: to estimate. Furthermore, for proper interpretation, the outcome
454: relies upon the truth of the initial assumption: that the data are
455: distributed as dictated by the null hypthothesis.
456: 
457: An alternative is to choose that model that represents the data most
458: efficiently. Here, efficiency is the amount of information, quantified
459: as bits, required to transmit electronically both the model and the
460: data as encoded by the model. This criterion may not seem intuitively
461: clear on first exposure. However, it follows naturally from a profound
462: relationship between probability and coding theory that was
463: discovered, explored, and elaborated by Solomonoff, Kolmogorov,
464: Chaitin, and Rissanen
465: \cite{Kolmogorov65,Chaitin66,Chaitin87,Rissanen86,Rissanen99}.
466: 
467: The idea is quite simple and elegant. It can be illustrated by analogy
468: to the problem of designing an optimal code for the efficient
469: transmission of natural-language messages. Consider the international
470: Morse code. Recall that Morse code assigns letters of the Roman
471: alphabet to codewords comprised of dots (``$\cdot$'') and dashes
472: (``$-$''). The codewords do not all have the same number of dots
473: and/or dashes; it is a variable-length code.
474: 
475: Efficient, compact encodings result from the design of a codebook such
476: that the shortest codewords are assigned to the most frequently
477: encoded letters and long codewords are assigned to rare letters. Thus,
478: {\em e} and {\em t} are encoded as ``$\cdot$'' and ``$-$'',
479: respectively, while {\em q} and {\em j} are encoded as ``$-$ $-$
480: $\cdot$ $-$'' and ``$\cdot$ $-$ $-$ $-$''. The theory of optimal
481: coding provides an exact relationship between frequency and code
482: length and thus, probability and description length.
483: 
484: The key departure of MDL from Morse-codelike schemes is that, while
485: Morse code would generally be good for sending messages over an
486: average of many texts, specific texts might be encoded even more
487: efficiently, by encoding not only letters, but letter combinations,
488: common words, or even phrases, perhaps as abbreviations or
489: acronyms. However, if one is to recode for particular texts, one must
490: first transmit the coding scheme. So perhaps one might use Morse code
491: to transmit the details of the new coding scheme and then transmit the
492: text itself with the new scheme. Whether this might yield greater
493: efficiency depends not only on how much compression is achieved in the
494: new encoding, but also on how much overhead is incurred in having to
495: transmit the coding scheme.
496: 
497: The analogy to scientific data analysis is clear. A statistical model
498: is an encoding scheme that encapsulates the regularities in the data
499: to yield a concise representation thereof. The best model effectively
500: compresses regularities in the data, but is not so elaborate that its
501: own description demands a great deal of information to be encoded.
502: The MDL principle provides a model-selection criterion that balances
503: the need for a model that is both appropriate and parsimonious, by
504: penalizing with equal weights the information required to specify the
505: model and the unexplained, or residual error.
506: 
507: %Another advantage of using the minimum description length principle as
508: %a test statistic is that it provides an objective criterion for
509: %selecting model parameters, including the optimum value for $k$, the
510: %number of groups. Thus, an algorithm iteratively evaluated the
511: %description length associated with partitioning $N$ alleles into $k$
512: %groups, by computing the description length $L$ associated with each
513: %possible partition, and taking as optimum the model with minimum
514: %description length $L^*$.
515: 
516: %Further, model parameters, such as the appropriate number of groups,
517: %are readily optimized using the description length as a test
518: %statistic. We considered here several possibilities for the proper
519: %number of groups, $k$. Computing the true minimum description length
520: %was possible here because the combinatorial diversity of the search
521: %space could be exhaustively evaluated, with sufficiently low allelic
522: %diversity.
523: 
524: Yet another contribution the MDL principle brings to statistical
525: modelling is that the penalty for multiple comparisons is less
526: restrictive than the penalty of compounded error rates incurred with
527: canonical inferential approaches. In order to maintain a desired
528: experiment-wide error rate, the standard adjustment is to make the
529: per-comparison error rate considerably more stringent. With current
530: technology, realistic sample sizes for such studies will generally be
531: less than a thousand and stringent significance levels will be
532: difficult to surpass. Unfortunately, fixing the false-positive error
533: rate does not address the false-negative probability, which may leave
534: researchers powerless to detect effects among many competing
535: hypotheses with limited samples.
536: 
537: %Testing 200 alternative hypotheses in this scenario, an approximation
538: %of the number of tests that might be performed on an HLA-disease
539: %outcome study given the alleleic diversity of these highly polymorphic
540: %loci, would yield $\alpha' = 0.00005$. Given the experimental
541: %complexity of defining these alleles and disease correlates, with
542: %current technology, realistic sample sizes for such studies will
543: %generally be less than a thousand individuals, and such stringent
544: %P-values will be difficult to obtain. Unfortunately, fixing the
545: %false-positive error rate does not address the false-negative
546: %probability, often leaving researchers with little statistical power
547: %to detect significant effects among many competing hypotheses with
548: %limited sample observations.
549: 
550: %Model selection under MDL does not depend on the assumption of the
551: %truth of the simpler model, given the data. No such idea is required
552: %or even relevant. On the other hand, one can take the results of an
553: %MDL model selection procedure and connect back to traditional
554: %statistical approaches by taking the simpler model as null hypothesis
555: %and then asking, What are the type I and II error rates, $\alpha$ and
556: %$\beta$, under the MDL procedure?  One finds that both error rates
557: %decline as the sample size $n$ increases. This is a contrast to the
558: %usual procedure of fixing $\alpha$ and letting the power to detect
559: %differences ($1 - \beta$) alone decrease with $n$. Indeed, the power
560: %does increase more slowly under MDL than it would under fixed a level
561: %testing; the reduction of $\alpha$ is accomplished at that expense.
562: 
563: %Potential drawbacks of using the MDL principle for inference in
564: %statistical genetics include the lack of an explicit $P$-value, which
565: %makes the approach difficult to explain to audiences accustomed to
566: %more common inferential techniques, and a requirement for data
567: %consisting of large sample observations.
568: 
569: \subsection*{Mechanisms}
570: 
571: Of HLA supertype alleles, individuals with {\em B58s} have lower viral
572: RNA levels than those who lack them, even among homozygotic
573: individuals. Naturally, this leads one to consider mechanisms that
574: underlie patterns found in the data. Elsewhere, we consider two
575: hypotheses to explain the observed associations between HLA alleles
576: and variation in viral RNA \cite{otherMS}.
577: 
578: There may be allele-specific variation in antigen-binding
579: specificity. Some alleles may have greater affinity than others for
580: HIV-specific peptide fragments due to the peptide-binding anchor
581: motifs they present. We were not able to identify any clear
582: association between the frequency of anchor motifs among HIV-1
583: proteins and viral RNA levels in the Chicago MACS \cite{otherMS},
584: though others have suggested that such a relationship might exist
585: \cite{Nelson}.
586: 
587: It may also the case that frequency-dependent selection has favored
588: rare alleles. Frequent alleles provide the evolving pathogen greater
589: opportunity to explore mutant phenotypes that may escape detection by
590: the host's immune response. By encountering rare alleles less
591: frequently, the virus has not had the same opportunity to explore
592: mutations that evade the host's defense response. This hypothesis is
593: corroborated by a significant association between viral RNA and HLA
594: allele frequency in the Chicago MACS sample \cite{otherMS}.
595: 
596: Because their predictions differ, these hypotheses could be tested
597: with data from another cohort, where a different viral subtype
598: predominates. That is, if other alleles were associated with low viral
599: RNA than those identified in this study, and an association between
600: rare alleles and low viral RNA levels were observed there, then the
601: second hypothesis would be more viable than the first. Alternatively,
602: if a clear association between antigen peptide-binding anchor motifs
603: and variation in viral RNA levels were found, the first hypothesis
604: would be more viable. Other mechanisms are also possible, and
605: hypotheses by which to evaluate them merit consideration.
606: 
607: \subsection*{Acknowledgments}
608: %\vspace{0.125 in}
609: 
610: We thank Bob Funkhouser, Cristina Sollars, and Elizabeth Hayes for
611: sharing their expertise, and researchers of the Santa Fe Institute for
612: insight and inspiration. This research was financed by funds from the
613: Elizabeth Glazer Pediatric AIDS Foundation, the National Cancer
614: Institute, the National Institute of Allergy and Infectious Diseases,
615: National Institutes of Health, National Science Foundation award
616: \#0077503, and the U.S. Department of Energy.  We have no conflicting
617: interests.
618: 
619: 
620: \subsection*{Appendix}
621: 
622: In Gaussian Process modeling \cite{Williams97}, the population means
623: are treated as random variables and integrated out of the
624: likelihood. The model is then specified entirely by the structure of
625: the covariance matrix $\Sigma$, which specifies how each pair of
626: observations is correlated. The covariance is greater for two
627: observations from the same partition than for two observations from
628: different partitions. Any given partition is specified entirely by a
629: corresponding covariance structure.
630: 
631: {\bf Partitioning with Gaussian Models.}  Denote the $n$ observations
632: as the vector $Y$ and the covariance matrix with parameter vector
633: $\theta$ by $\Sigma(\theta)$. Let the number of components of $\theta$
634: (the number of free parameters in the covariance matrix) be $k$.  Then
635: the MDL for the given covariance structure is: 
636: $L = \frac{1}{2}\log |\Sigma(\hat{\theta})| + \frac{1}{2}
637: Y'\Sigma(\hat{\theta})^{-1}Y + \frac{k}{2}\log n + C$, where $C$ is
638: the information required to specify the partition or, equivalently,
639: the covariance structure, and $\hat{\theta}$ is the vector of
640: covariance parameters evaluated at maximum likelihood.
641: 
642: {\bf One Gaussian Population.}  The covariance matrix has a component
643: $\sigma ^2 _m$ for the covariance among observations, induced by their
644: sharing an unspecified mean, and an error component $\sigma ^2
645: _\varepsilon$: $\Sigma = \sigma ^2 _\varepsilon I + \sigma ^2 _m
646: \mathbf{11}'$, with $\mathbf{1}$ the column vector of all ones,
647: $\mathbf{11}'$ the matrix of all ones, and $I$ the identity
648: matrix. The inverse is:
649: 
650: %Note that $k=2$ since there are two free parameters in the covariance matrix.
651: 
652: \[\Sigma^{-1}= \frac{1}{\sigma ^2 _\varepsilon} \left( I - \frac{\sigma ^2 _m}{\sigma ^2 _\varepsilon + n \sigma ^2 _m} \mathbf{11}' \right), \]
653: 
654: and the log-determinant: $\log |\Sigma | = (n-1)\log \sigma ^2 _\varepsilon + \log (\sigma ^2 _\varepsilon + n \sigma ^2 _m)$.
655: 
656: This gives $L = \frac{1}{2} \left(n + (n-1)\log \sigma ^2 _\varepsilon + \log(\sigma ^2 _\varepsilon + n\sigma^2 _m ) + 2 \log n \right)$.
657: 
658: We find the maximum likelihood values of the parameters by minimizing
659: over the description lengths. There are two cases.\\
660: 
661: Case 1: $n^2 \overline{Y}^2 - Y'Y \ge 0$. Here we have 
662: $\hat{\sigma}^2 _\varepsilon =(n-1)^{-1}(Y'Y-n\overline{Y}^2)$ and
663: $\hat{\sigma}^2 _m =(n-1)^{-1}(n\overline{Y}^2 -\frac{1}{n}Y'Y)$, so 
664: $L=\frac{1}{2}(n +(n-1)\log\hat{\sigma}^2 _\varepsilon+ \log n \overline{Y}^2 + 2\log n)$.\\
665: 
666: Case 2: $n^2 \overline{Y}^2 - Y'Y < 0$. Here the common mean vanishes, giving
667: $\hat{\sigma} ^2 _\varepsilon = \frac{1}{n}Y'Y$, $\hat{\sigma}^2 _m = 0$, so 
668: $L = \frac{n}{2}(1+\log \hat{\sigma}^2 _\varepsilon +\frac{2}{n}\log n)$.\\
669: 	 
670: {\bf Many Gaussian Populations.}
671: Two partitions give two populations. To analyze the HLA/HIV data,
672: we treated these populations as independent.  That is, we take the
673: covariance between observations in separate partitions to be zero, and
674: apply the fitting procedure outlined above separately to the two
675: populations. An alternative is to take non-zero covariance between the
676: two populations. This results in a more elaborate estimation
677: procedure, unlikely to yield large efficiency gains because the
678: two degrees of freedom (population means) are essentially mixed into
679: one, with residual error.
680: 
681: The procedure examines each admissible partition and computes the MDL
682: for that partition as the sum of individual description lengths over
683: the two independent populations. The best partition yields the lowest
684: description length over all partitions. This, plus the cost of
685: specifying the partition, is compared with the MDL from the
686: unpartitioned data. If the best partition provides a better
687: representation of the data than the unpartitioned set ($L _{k} < L
688: _{k-1}$), then the process is repeated in a recursive manner,
689: independently within each of the partitioned populations.
690: 
691: \newpage
692: \begin{thebibliography}{99}
693: 
694: \bibitem{McMichael01}
695: McMichael,~A.~J. \& Rowland-Jones,~S.~L.
696: \newblock (2001)
697: %\newblock Cellular immune responses to {HIV}.
698: \newblock {\em Nature} {\bf 410}, 980-987.
699: 
700: \bibitem{Mellors96}
701: \newblock Mellors,~J.~W., Rinaldo,~C.~R.,~Jr., Gupta,~P., White,~R.~M., Todd,~J.~A. \& Kingsley,~L.~A.
702: \newblock (1996)
703: \newblock {\em Science} {\bf 272}, 1167-1170.
704: 
705: \bibitem{Germain}
706: Germain,~R.~N.
707: \newblock (1999)
708: \newblock Chapter 9 in {\em Fundamental Immunology}, fourth edition, ed.
709: \newblock Paul,~W.~E.
710: \newblock (Lippincott-Raven, Philadelphia PA), pp. 287-340.
711: 
712: \bibitem{WilliamsReview}
713: Williams,~A., Au~Peh,~C. \& Elliott,~T.
714: \newblock (2002)
715: \newblock {\em Tissue Antigens} {\bf 59}, 3-17.
716: 
717: \bibitem{Bodmer}
718: Bodmer,~W.~F.
719: \newblock (1972)
720: \newblock {\em Nature} {\bf 237}, 139-145.
721: 
722: \bibitem{Little}
723: Little,~A.~M. \& Parham,~P.
724: \newblock (1999)
725: \newblock {\em Rev. Immunogenet.} {\bf 1}, 105-123.
726: 
727: \bibitem{Hill}
728: Hill,~A.~V.~S.
729: \newblock (1998)
730: \newblock {\em Ann. Rev. Immunol.} {\bf 16}, 593-617.
731: 
732: \bibitem{Roger}
733: Roger,~M.
734: \newblock (1998)
735: \newblock {\em FASEB J.} {\bf 12}, 625-632.
736: 
737: \bibitem{Carrington99}
738: Carrington,~M., Nelson,~G.~W., Martin,~M.~P., Kissner,~T., Vlahov,~D., Goedert,~J.~J., Kaslow,~R., Buchbinder,~S., Hoots,~K. \& O'Brien,~S.~J.
739: \newblock (1999)
740: %\newblock {{\em HLA}} and {HIV}-1: {H}eterozygote advantage and {{\em {B}*35-{C}w*04}} disadvantage.
741: \newblock {\em Science} {\bf 283}, 1748-1752.
742: 
743: \bibitem{otherMS}
744: Trachtenberg,~E.~A., Korber,~B.~T., Sollars,~C., Kepler,~T.~B., Hraber,~P.~T., Hayes,~E., Funkhouser,~R., Fugate,~M., Theiler,~J., Hsu,~M., Kunstman,~K., Wu,~S., Phair,~J., Erlich,~H.~A. \& Wolinsky,~S.
745: \newblock (2003)
746: \newblock {\em Nat. Med.}, 9:928-935.
747: 
748: \bibitem{Trachtenberg01}
749: Trachtenberg,~E.~A. \& Erlich,~H.~A.
750: \newblock (2001)
751: %A Review of the Role of the Human Leukocyte Antigen (HLA) System as a
752: %Host Immunogenetic Factor Influencing HIV Transmission and Progression
753: %to AIDS
754: \newblock in {\em HIV Molecular Immunology 2001}, eds.
755: \newblock Korber,~B.~T., Brander,~C., Haynes,~B.~F., Koup,~R., Kuiken,~C., Moore,~J.~P., Walker,~B.~D. \& Watkins,~D.
756: \newblock (Theoretical Biology and Biophysics Group, Los Alamos National Laboratory, Los Alamos NM), pp. I-43-60.
757: 
758: \bibitem{Rissanen}
759: Rissanen, J.
760: \newblock (1989)
761: \newblock {\em Stochastic Complexity in Statistical Inquiry}
762: \newblock (World Scientific, Singapore).
763: 
764: \bibitem{Li93}
765: Li, M. \& Vit\'{a}nyi, P.
766: \newblock (1993)
767: \newblock {\em An Introduction to Kolmogorov Complexity and its Applications}
768: \newblock (Springer-Verlag, New York NY).
769: 
770: \bibitem{HansenYu}
771: Hansen,~M.~H. \& Yu,~B.
772: \newblock (2001)
773: \newblock {\em J. Am. Stat. Assoc.} {\bf 96}, 746-774.
774: 
775: \bibitem{normal}
776: Johnson,~N.~L., Kotz,~S. \& Balakrishnan,~N. 
777: \newblock (1994)
778: \newblock {\em Continuous Univariate Distributions}, volume~1, second edition
779: \newblock (Wiley Interscience, New York NY).
780: 
781: \bibitem{supertypes}
782: Sette,~A. \& Sidney,~J.
783: \newblock (1999)
784: %\newblock Nine major {HLA} class {I} supertypes account for the vast preponderance of {HLA-A} and -{B} polymorphism.
785: \newblock {\em Immunogenetics} {\bf 50}, 201-212.
786: 
787: \bibitem{Lindley}
788: Lindley,~D.~V.
789: \newblock (1980)
790: \newblock in {\em Bayesian Statistics}, eds.
791: \newblock Bernardo,~J.~M, DeGroot,~M.~H., Lindley,~D.~V. \& Smith,~A.~F.~M.
792: \newblock (Valencia University Press, Valencia), pp. 223-237.
793: 
794: \bibitem{Venables}
795: Venables,~W.~N. \& Ripley,~B.~D. 
796: \newblock (1999)
797: \newblock {\em Modern Applied Statistics with {S-PLUS}}, third edition
798: \newblock (Springer, New York NY).
799: 
800: \bibitem{Kolmogorov65}
801: Kolmogorov, A.~N.
802: \newblock (1965)
803: % Three approaches to the quantitative definition of information.
804: \newblock {\em Prob. Inform. Transmission} {\bf 1}, 4-7.
805: 
806: \bibitem{Chaitin66}
807: Chaitin, G.~J.
808: \newblock (1966)
809: % On the lengths of programs for computing binary sequences.
810: \newblock {\em J. Assoc. Comput. Mach.} {\bf 13}, 547-569.
811: 
812: \bibitem{Chaitin87}
813: Chaitin, G.~J.
814: \newblock (1987)
815: \newblock {\em Algorithmic Information Theory}
816: \newblock (Cambridge University Press, Cambridge UK).
817: 
818: \bibitem{Rissanen86}
819: Rissanen, J.
820: \newblock (1986)
821: % Stochastic complexity and modeling.
822: \newblock {\em Ann. Statist.} {\bf 14}, 1080-1100.
823: 
824: \bibitem{Rissanen99}
825: Rissanen, J.
826: \newblock (1999)
827: %\newblock Hypothesis selection and testing by the {MDL} principle.
828: \newblock {\em Comput. J.} {\bf 42}, 260-269.
829: 
830: \bibitem{Nelson}
831: Nelson,~G.~W., Kaslow,~R. \& Mann,~D.~L.
832: \newblock (1997)
833: \newblock {\em Proc. Natl. Acad. Sci. USA} {\bf 94}, 9802-9807.
834: 
835: \bibitem{Williams97}
836: Williams,~C.~K.~I.
837: \newblock (1997)
838: % Regression with Gaussian Processes.
839: \newblock in {\em Mathematics of Neural Networks: Models, Algorithms and Applications}, eds.
840: \newblock Ellacott,~S.~W., Mason,~J.~C. \& Anderson,~I.~J.
841: \newblock (Kluwer, Boston MA), pp. 378-382.
842: 
843: \end{thebibliography}
844: 
845: \newpage
846: \section*{Figure Legends}
847: 
848: Fig.~1. Description-length comparisons of viral RNA distributions as
849: one ($L_1$) or two ($L_2$) groups. Ordinate units are the expected
850: number of observations between two tick marks over the abscissa, or
851: one doubling of viral RNA. Impulses along the abscissa show individual
852: observations, with jitter added to enhance rendering of identical
853: values. (a) Observations ($n$) from the Chicago MACS cohort lumped
854: into one group, and (b) split into the best partition as two groups,
855: with individuals having alleles {\em B*13}, {\em B*27}, {\em B*38},
856: {\em B*45}, {\em B*49}, {\em B*57}, {\em B*58}, or {\em B*81} assigned
857: to the lower group ($n_1$), and remaining individuals assigned to the
858: group with greater viral RNA ($n_2$). (c) Observations from the
859: Caucasian subsample as one group, and (d) as the best split into two
860: groups, where having alleles {\em B*13}, {\em B*27}, {\em B*40}, {\em
861: B*45}, {\em B*48}, {\em B*49}, {\em B*57}, or {\em B*58} was the
862: criterion for being assigned to the low viral-RNA group. Observations
863: from individuals having two HLA-B supertype alleles, (e) in one group,
864: and (f) partitioned into two groups, contingent on the presence of
865: {\em B58s}.
866: 
867: \clearpage
868: \renewcommand{\baselinestretch}{2}
869: 
870: \begin{table}[tb]
871: \begin{center}
872: \caption{Optimum two-way partitions at each locus, with per-locus 
873: allelic diversity ($N$), description lengths without the information
874: cost to specify model parameters ($L_2 - C$), and minimum description 
875: lengths ($L_2$).
876: \label{tab:all-loci}}
877: \begin{tabular}{ccrcclcrccl}
878: \\
879: \hline
880: &\multicolumn{5}{c}{\sc{Entire Cohort}} & \multicolumn{5}{c}{\sc{Caucasian Subsample}}\\
881: & \multicolumn{5}{c}{$n=479$, $L_1=934$} & \multicolumn{5}{c}{$n=379$, $L_1=721$}\\
882: {\sc Locus}& & $N$ & $L_2-C$ && \multicolumn{1}{c}{$L_2$} & & $N$ & $L_2-C$ && \multicolumn{1}{c}{$L_2$}\\
883: \hline
884: \sc{Class I}\\
885: HLA-A && 19 & 916 && 935 && 18 & 703 && 721\\
886: HLA-B && 30 & 887 && 917* && 26 & 681 && 707*\\
887: HLA-C && 14 & 921 && 935 && 13 & 706 && 719\\
888: \sc{Class II}\\
889: DRB1  && 13 & 927 && 940 && 13 & 711 && 724\\
890: DQB1  &&  5 & 936 && 941 &&  5 & 715 && 720\\
891: DPB1  && 24 & 927 && 951 && 21 & 710 && 731\\
892: \hline
893: \end{tabular}
894: \end{center}
895: \end{table}
896: \newpage
897: \clearpage
898: 
899: \renewcommand{\baselinestretch}{1}
900: 
901: \begin{table}[t]
902: \begin{center}
903: \caption{HLA-B alleles associated with low ($\circ$) or high ($\bullet$) viral RNA levels.
904: \label{tab:hlab-summary}}
905: \begin{tabular}{cccc}
906: \\
907: \hline
908:              & {\sc Entire} & {\sc Caucasian} & {\sc Supertypes}\\
909: {\sc Allele} & {\sc Cohort} & {\sc Subsample} &    {\sc Only}\\
910:              &    $n=479$   &     $n=379$     &     $n=352$\\
911: \hline
912: \multicolumn{4}{l}{{\em B7s}}\\
913: {\em B*07}   & $\bullet$  & $\bullet$  &  $\bullet$\\
914: {\em B*35}   & $\bullet$  & $\bullet$  &  $\bullet$\\
915: {\em B*51}   & $\bullet$  & $\bullet$  &  $\bullet$\\
916: {\em B*53}   & $\bullet$  & $\bullet$  &  $\bullet$\\
917: {\em B*55}   & $\bullet$  & $\bullet$  &  $\bullet$\\
918: {\em B*56}   & $\bullet$  & $\bullet$  &  $\bullet$\\
919: {\em B*67}   & $\circ /\bullet$ & --   &  $\bullet$\\
920: \multicolumn{4}{l}{{\em B27s}}\\
921: {\em B*14}   & $\bullet$  & $\bullet$  & $\bullet$\\
922: {\em B*27}   & $\circ$    & $\circ$    & $\bullet$\\
923: {\em B*38}   & $\circ$    & $\bullet$  & $\bullet$\\
924: {\em B*39}   & $\bullet$  & $\bullet$  & $\bullet$\\
925: {\em B*48}   & $\circ /\bullet$  & $\circ /\bullet$ & $\bullet$\\
926: \multicolumn{4}{l}{{\em B44s}}\\
927: {\em B*18}   & $\bullet$  & $\bullet$  & $\bullet$\\
928: {\em B*37}   & $\bullet$  & $\bullet$  & $\bullet$\\
929: {\em B*40}   & $\bullet$  & $\circ$    & $\bullet$\\
930: {\em B*41}   & $\bullet$  & $\bullet$  & $\bullet$\\
931: {\em B*44}   & $\bullet$  & $\bullet$  & $\bullet$\\
932: {\em B*45}   & $\circ$    & $\circ$    & $\bullet$\\
933: {\em B*49}   & $\circ$    & $\circ$    & $\bullet$\\
934: {\em B*50}   & $\bullet$  & $\bullet$  & $\bullet$\\
935: \multicolumn{4}{l}{{\em B58s}}\\
936: {\em B*57}   & $\circ$    & $\circ$    & $\circ$\\
937: {\em B*58}   & $\circ$    & $\circ$    & $\circ$\\
938: \multicolumn{4}{l}{{\em B62s}}\\
939: {\em B*13}   & $\circ$    & $\circ$    & $\bullet$\\
940: {\em B*52}   & $\bullet$  & $\bullet$  & $\bullet$\\
941: \multicolumn{4}{l}{\sc Other}\\
942: {\em B*08}   & $\bullet$  & $\bullet$  & --\\
943: {\em B*15}   & $\bullet$  & $\bullet$  & --\\
944: {\em B*42}   & $\bullet$  & --         & --\\
945: {\em B*47}   & $\bullet$  & $\circ /\bullet$ & --\\
946: {\em B*81}   & $\circ$    & --         & --\\
947: {\em B*82}   & $\circ /\bullet$ & --   & --\\
948: \hline
949: \end{tabular}
950: \end{center}
951: \end{table}
952: 
953: \renewcommand{\baselinestretch}{1}
954: 
955: \clearpage
956: 
957: \thispagestyle{empty}
958: \setlength{\textwidth}{20cm}
959: \setlength{\headheight}{0cm}
960: \setlength{\headsep}{0cm}
961: \setlength{\topmargin}{0cm}
962: \setlength{\oddsidemargin}{0cm}
963: \setlength{\evensidemargin}{0cm}
964: \begin{figure}[p!]
965: \begin{center}
966: %\leavevmode
967: \epsfig{file=figure1.eps,width=18cm}
968: %\caption{}
969: \end{center}
970: \end{figure}
971: \clearpage
972: 
973: \end{document}
974: