1: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2: %2345678901234567890123456789012345678901234567890123456789012345678901234567890
3: % 1 2 3 4 5 6 7 8
4:
5: %\documentclass[letterpaper, 10 pt, conference]{ieeeconf} % Comment this line out
6: % if you need a4paper
7: \documentclass[a4paper, 10pt, conference]{ieeeconf} % Use this line for a4
8: % paper
9:
10: \IEEEoverridecommandlockouts % This command is only
11: % needed if you want to
12: % use the \thanks command
13: \overrideIEEEmargins
14: % See the \addtolength command later in the file to balance the column lengths
15: % on the last page of the document
16:
17:
18:
19: % The following packages can be found on http:\\www.ctan.org
20: \usepackage{graphics} % for pdf, bitmapped graphics files
21: \usepackage{epsfig} % for postscript graphics files
22: \usepackage{rotating}
23: %\usepackage{mathptmx} % assumes new font selection scheme installed
24: %\usepackage{times} % assumes new font selection scheme installed
25: %\usepackage{amsmath} % assumes amsmath package installed
26: %\usepackage{amssymb} % assumes amsmath package installed
27:
28: \title{\LARGE \bf
29: Overlapping Probabilities of Top Ranking Gene Lists,
30: Hypergeometric Distribution, and Stringency of Gene Selection Criterion
31: }
32:
33:
34: \author{Wen Fury, Franak Batliwalla, Peter K. Gregersen, and Wentian Li% <-this % stops a space
35: \thanks{W. Fury is a Senior Bioinformatics Scientist at Regeneron Pharmaceutical, Inc.
36: Tarrytown, NY 10591, USA.
37: {\tt\small wen.fury@regeneron.com}}%
38: \thanks{F. Batliwalla, P.K. Gregersen, and W. Li are Research Scientists
39: with the Robert S Boas Center for Genomics and Human Genetics,
40: Feinstein Institute for Medical Research, North Shore LIJ Health System,
41: Manhasset, NY 11030, USA
42: {\tt\small fb@nshs.edu},
43: {\tt\small peterg@nshs.edu},
44: {\tt\small wli@nslij-genetics.org}}%
45: }
46:
47:
48: \begin{document}
49:
50:
51:
52: \maketitle
53: \thispagestyle{empty}
54: \pagestyle{empty}
55:
56:
57: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
58: \begin{abstract}
59:
60: When the same set of genes appear in two top ranking gene lists in
61: two different studies, it is often of interest to estimate
62: the probability for this being a chance event. This overlapping
63: probability is well known to follow the hypergeometric
64: distribution. Usually, the lengths of top-ranking gene lists
65: are assumed to be fixed, by using a pre-set criterion on, e.g.,
66: $p$-value for the $t$-test. We investigate how overlapping probability
67: changes with the gene selection criterion, or simply, with the
68: length of the top-ranking gene lists. It is concluded that
69: overlapping probability is indeed a function of the gene list
70: length, and its statistical significance should be quoted in
71: the context of gene selection criterion.
72:
73:
74: \end{abstract}
75:
76:
77: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
78: \section{INTRODUCTION}
79:
80: One of the most common tasks in microarray analysis
81: is to identify a list of genes that are differentially
82: expressed under two conditions, such as being affected by
83: a disease vs. normal, before vs. after a medical
84: treatment, and one vs. another disease subtype. The
85: number of genes on the top-ranking list is
86: usually much smaller than the total number of genes
87: on the chip, $n$. If the same type of microarray chip is used for
88: two different studies (e.g. disease-A vs. control,
89: and disease-B vs. control), two differentially
90: expressed gene lists can be obtained, with $n_1$ and
91: $n_2$ genes. Researchers often find the same genes
92: appear in both lists and hypothesize that these common
93: genes are involved the etiology of both diseases.
94:
95: However, for such a hypothesis to be convincing,
96: one has to first estimate the probability for
97: overlapping genes by chance alone. In other words,
98: if two lists of genes are selected out of $n$ genes
99: randomly, we would like to calculate the probability
100: for $m$ genes in common in the two lists,
101: with the lengths of the two lists being $n_1$ and $n_2$.
102: This overlapping probability is known to follow the
103: hypergeometric distribution \footnote{Despite certain
104: similarity, this problem is not the birthday problem
105: -- the probability for two people in a room to
106: have the same birthday.}. The name hypergeometric
107: distribution was first used in \cite{hyper}, and
108: was popularized by its role in Fisher's exact
109: test \cite{fisher}.
110:
111: In microarray analysis, overlapping probability and
112: hypergeometric distribution mainly appear in testing
113: the enrichment of genes in certain functional
114: category \cite{tavazoie, draghici, fino, hosack,
115: boorsma, curtis, mao, tian}. In this application,
116: the first list is the top-ranking differentially
117: expressed genes, and a gene selection process is
118: involved. The second list is nevertheless given:
119: $n_2$ genes are known to be in a pathway, a
120: member of a protein family, described by a gene ontology term,
121: etc. One asks the question on chance probability
122: for $m$ out of $n_1$ selected genes to be in
123: a given pathway, a protein family, and describable
124: by a gene ontology term. Fixing $n_2$ or not is the
125: main difference between their application and ours.
126:
127:
128: When a different gene selection criterion is used,
129: the number of genes in the two top-ranking lists
130: of two studies ($n_1$ and $n_2$) will also change.
131: Because the stringency of a gene selection criterion
132: is always adjustable and to some extent arbitrary,
133: we would like to examine whether these changes will
134: affect the overlapping probability. At two
135: extreme situations, very small $n_1 = n_2 \approx 1 $
136: and very large $n_1=n_2 =n$, it is clear that
137: the number of overlapping genes is $m=0$ and $m=n$.
138: These $m$ values appear 100\% of the times, so
139: the corresponding $p$-value is equal to 1, i.e.,
140: not significant. For intermediate $n_1 \approx n_2$
141: values, it is not clear what the overlapping
142: probability and significance will be, and it is
143: the topic of this abstract.
144:
145:
146:
147:
148:
149:
150:
151:
152: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
153: \section{HYPERGEOMETRIC DISTRIBUTION AND OVERLAPPING P-VALUES}
154:
155:
156: Given integers $n$, $n_1$, $n_2$, $m$
157: ($ \max(n_1, n_2) \le n$ and $m \le \min(n_1, n_2$) ), the hypergeometric
158: distribution is defined as
159: $$
160: P(m) =\frac{ C(n_1, m ) C(n-n_1, n_2-m )}{ C(n, n_2) }
161: = \frac{ \left( \begin{array}{c} n_1 \\ m \end{array} \right)
162: \left( \begin{array}{c} n-n_1 \\ n_2-m \end{array} \right)}
163: { \left( \begin{array}{c} n \\ n_2 \end{array} \right) }
164: $$
165: where $C(n, m)$ is the number of possibilities of choosing
166: $m$ objects out of $n$ objects: $C(n, m)= n!/[m! (n-m) !] $.
167:
168: When $n_1$ genes are randomly chosen from the total of
169: $n$ genes, and another random sampling leads to $n_2$
170: genes, the probability that the two lists of genes have
171: $m$ in common is exactly the hypergeometric probability
172: $P(m)$. This can be proven by the following steps:
173: 1) The total number of possible choices for the two
174: lists of genes is $C(n, n_1) \cdot C(n, n_2)$.
175: 2) There are $C(n, n_1)$ possibilities for choosing the first
176: list.
177: 3) Among the $n_1$ genes in the first list, there are
178: $C(n_1, m)$ possibilities for choosing $m$ genes to
179: be in common with the second list.
180: 4) In the second list, besides the $m$ genes that are in
181: common with the first list, the remaining $n_2-m$ genes
182: are chosen among the $n-n_1$ ``leftover" genes not
183: in the first list, thus $C(n-n_1, n_2-m)$ possibilities.
184: The $P(m)$ is simply (\#2 $\times$ \#3 $\times$ \#4) / \#1.
185: Note that $n_1$ and $n_2$ can be switched without
186: changing the $P(m)$ value.
187:
188: It is usually more interesting to calculate the sum of
189: $P(m)$ for $m$'s equal or larger than the observed value
190: (i.e., the $p$-value):
191: $$
192: p\mbox{-value} = \sum_{k = m}^{\min(n_1, n_2)} p(k)
193: = \sum_{k=0}^{\min(n_1, n_2)} p(k)
194: -\sum_{k=0}^{m-1} p(k)
195: $$
196: In statistical package $R$ ({\sl http://www.r-project.org/}),
197: there are at least two ways to calculate the overlapping $p$-value.
198: The first is to use the accumulative distribution of
199: hypergeometric distribution, {\sl phyper(m, $n_1$, $n-n_1$, $n_2$)}:
200: $p$-value $= phyper(\min(n_1, n_2), n_1, n-n_1, n_2)
201: - phyper(m-1, n_1, n-n_1, n_2)$ if $m >0$, and
202: $p$-value=1 if $m=0$. The second method is to use
203: the $p$-value from the Fisher's exact test on
204: the following 2-by-2 table:
205: $$
206: \begin{array}{c|cc|c}
207: & col_1 & col_2 & total \\
208: \hline
209: row_1& m & n_1 -m & n_1 \\
210: row_2& n_2-m & n -n_1-n_2+m & n-n_1 \\
211: \hline
212: total & n_2 & n-n_2 & n
213: \end{array}
214: $$
215: The two approaches lead to the identical result.
216:
217: % \begin{figure}[thpb]
218: \begin{figure}[t]
219: \centering
220: \begin{turn}{-90}
221: % \includegraphics[scale=1.0]{wen-fig1.eps}
222: \resizebox{8.0cm}{8.0cm}{ \includegraphics{wen-fig1.eps} }
223: \end{turn}
224: \caption{First column: proportion of overlapping genes between
225: two top ranking gene lists for a pair of studies ($m/n_1$)
226: as a function of the gene list length ($n_1(=n_2)$). Top is
227: for gene ranking by $t$-test and bottom is for gene ranking
228: by logistic regression. The overlapping proportion for
229: two randomly shuffled lists is shown in crosses, and the line
230: $m/n_1 = n_1/n$ is marked. Second column: observed number
231: of overlapping genes ($m$) subtract the expected number
232: of overlapping genes ($n_1^2/n$).
233: }
234: \label{fig1}
235: \end{figure}
236:
237: \section{PROPORTION OF OVERLAPPING GENES IN A COLLECTION
238: OF MICROARRAY DATASET}
239:
240:
241: In hypergeometric distribution, the number of overlapping
242: elements $m$ is an independent variable from the the
243: list lengths $n_1, n_2$. In order to get a rough idea on
244: how $m$ changes with the list lengths, we use three real
245: microarray datasets. Theese studies concern three
246: autoimmune diseases: rheumatoid
247: arthritis (RA), systemic lupus erythematosus (SLE), and
248: psoriatic arthritis (PsA), described in details in
249: \cite{ra, sle, psa}. The number of controls (C) and patients (P)
250: in these three datasets are (C=39, P=46), (C=41, P=81), and
251: (C=19, P=19), respectively. The total number of genes/probe-sets
252: is $n=$22283, and the expression levels are log transformed.
253: Genes are ranked for their degree of differential expression
254: which can be measured by various tests or models, such
255: as $t$-test and logistic regression.
256:
257: For any pair of studies, with a fixed number of top-ranking
258: gene lists $n_1(=n_2)$, one can count the number of overlapping genes
259: $m$ and the proportion $m/n_1(=m/n_2)$. Fig.\ref{fig1} (left
260: column) shows this proportion as a function of $n_1(=n_2)$
261: for three study-pairs (RA-SLE, SLE-PsA, RA-PsA) as well as for two ranking methods
262: ($t$-test and logistic regression). Similar overlapping
263: proportion of two random shuffled lists is also
264: indicated in Fig.\ref{fig1} as crosses.
265:
266: When $n_1(=n_2)$ is small, $m$ is more likely to be zero, so
267: the proportion is also zero. When $n_1(=n_2)$ approaches the
268: total number of genes, $n$, all genes are overlapping genes,
269: and the proportion is 1. Fig. \ref{fig1} indeed shows these
270: trends at the two extreme points. In order to check
271: behavior in-between, we draw a reference line in Fig.\ref{fig1}
272: (left column) that assume a linear relationship between
273: $m/n_1$ and $n_1/n$. Most of the points on Fig.\ref{fig1}
274: are above this line, and the overlapping proportion of two
275: random lists is exactly on this line.
276:
277: To have an idea of the absolute number of common genes
278: more than expected by random chance, Fig.\ref{fig1} (right
279: column) plots the observed $m$ subtract the expected $m_{exp}= n_1^2/n(=n_2^2/n)$
280: as a function of $n_1(=n_2)$. The maximum difference between
281: the observed and expected is reached between $n_1=5000$ and
282: $n_1=10000$. The difference of observed and expected $m$'s
283: can be as much as 600--800.
284:
285: % \begin{figure}[thpb]
286: \begin{figure}[t]
287: \centering
288: \begin{turn}{-90}
289: \resizebox{4.0cm}{7.50cm}{ \includegraphics{wen-fig2.eps} }
290: \end{turn}
291: \caption{
292: Overlapping significance as measured by $-\log_{10}(p$-value)
293: where $p$-value is obtained by the hypergeometric distribution,
294: as a function of $n_1(=n_2)$, the number of genes in the
295: top-ranking gene lists. The $R$ program reports $p$-value to
296: be zero whenever it is lower than 2.2$\times 10^{-16}$, and
297: we use a ceiling of 15.65758 $=-\log_{10}(2.2 \times 10^{-16})$
298: in the plot. Six lines are shown for three
299: study pairs (RA-SLE, SLE-PsA, RA-PsA) and two tests/models
300: ($t$-test and logistic regression). Similar overlapping significance
301: for two randomly shuffled lists is also shown (indicated by crosses).
302: }
303: \label{fig2}
304: \end{figure}
305:
306: \section{OVERLAPPING SIGNIFICANCE}
307:
308: The overlapping $p$-value corresponding to the $m$ counts
309: plotted in Fig.\ref{fig1} was calculated by the hypergeometric
310: distribution, and is shown in Fig.\ref{fig2}:
311: $y$-axis is $-\log_{10}(p$-value), and $x$-axis is
312: $n_1(=n_2)$. Six lines are shown for
313: three comparisons (RA-SLE, SLE-PsA, RA-PsA) and two
314: measurements of the differential expression ($t$-test and
315: logistic regression). Zero $p$-values are converted to
316: 2.2 $\times 10^{-16}$ which is the minimum value
317: reported by $R$ program. Fig.\ref{fig2} shows that
318: besides the two ends ($m=n_1=n_2=0$ and $m=n_1=n_2=n$) where
319: the $p$-value is 1, the overlapping significance
320: quickly increases with the length of top-ranking gene list
321: $n_1(=n_2$), and can be extremely significant when a
322: large number of genes are kept in the two lists
323: for comparison.
324:
325: This result confirm our previous suspicion that overlapping
326: significance is a function of the gene list lengths.
327: If the selection of $n_1, n_2$ is arbitrary, the
328: overlapping significance thus calculated is also
329: arbitrary. It is not surprising that
330: overlapping significance may keep increasing
331: (or, $p$-value decreasing) with the increase of $n_1(=n_2)$,
332: because $p$-value in general depends on the sample
333: size. When a signal is real (true positive), $p$-value
334: will monotonically decrease with the sample size.
335: On the contrast, if a true signal is absent, the
336: sample size does not affect the conclusion. As
337: can be seen in Fig.\ref{fig2}, the overlapping significance
338: for two random lists does not really change with $n_1(=n_2)$.
339:
340: One may argue that it is unlikely to consider
341: top 5000 genes as being differentially expressed,
342: because by a typical selection criterion (e.g. $p$-value of
343: $t$-test smaller than 0.01, with or without multiple
344: testing correction), the number of genes selected
345: is less than a few hundreds. However, as can be
346: seen in Fig.\ref{fig2}, even in the range
347: of 10--500, the overlapping $p$-value changes dramatically.
348:
349: This pitfall of gene-list-length dependence of overlapping
350: $p$-values has not been noticed before
351: perhaps because in other application of hypergeometric
352: distribution for calculating overlapping probability,
353: the length of the second list $n_2$ is fixed, for example,
354: in the study of overrepresentation of genes in
355: certain pathway. The number of overlapping genes $m$
356: is then constrained from above by $\min(n_1, n_2)$ even though
357: the length of the first list, $n_1$, might increase
358: by relaxing the gene selection criterion.
359:
360: % \begin{figure}[thpb]
361: \begin{figure}[t]
362: \centering
363: \begin{turn}{-90}
364: \resizebox{4.0cm}{7.0cm}{ \includegraphics{wen-fig3.eps} }
365: \end{turn}
366: \caption{The test significance ($-\log_{10}(p$-value))
367: from $t$-test of $n=$22283 genes sorted by the averaged expression
368: level (log-transformed) across all 245 samples in 3 studies
369: (RA, SLE, PsA). The three $t$-tests are for RA vs. control, SLE vs. control,
370: and PsA vs. control.
371: }
372: \label{fig3}
373: \end{figure}
374:
375: % \begin{figure}[thpb]
376: \begin{figure}[t]
377: \centering
378: \begin{turn}{-90}
379: \resizebox{8.0cm}{8.0cm}{ \includegraphics{wen-fig4.eps} }
380: \end{turn}
381: \caption{Several measures of overlapping genes between
382: a pair of studies as a function of the number of genes included
383: in the top-ranking list, for the reduced dataset with 15283 genes.
384: First column: proportion of overlapping genes ($m/n_1$);
385: second column: number of observed overlapping genes subtracting the
386: number of expected ($m- n_1^2/15283$); third column: $-\log_{10}(p$-value)
387: by the hypergeometric distribution. First row is for lists ranked
388: by $t$-test result, and second row is for lists ranked by
389: logistic regression.
390: }
391: \label{fig4}
392: \end{figure}
393: \section{THE EFFECTS OF UNEXPRESSED GENES}
394:
395: There are many genes/probe-sets on the microarray chip
396: that do not register much signal. Since these low-expressed
397: genes are lowly expressed in both control and patient
398: samples, they usually do not appear in the top-ranking
399: differentially expressed gene list. Fig.\ref{fig3}
400: shows $-\log_{10}(p$-value) of each gene of 3 $t$-tests
401: sorted by average expression (log-transformed)
402: across all 245 samples in 3 datasets (for both cases and controls). Although
403: we cannot use the average expression level to predict
404: the degree of differential expression, there is
405: a general trend for low-expressed genes to rank lower in the
406: differentially expressed list as seen from Fig.\ref{fig3}.
407:
408: We removed 7000 genes with lower overall expression across
409: all samples, leaving $n=15283$ genes. Figs.\ref{fig1} and \ref{fig2}
410: are reproduced in Fig.\ref{fig4} for the dataset with a reduced gene pool.
411: As in Figs.\ref{fig1} and \ref{fig2}, the observed number
412: of overlapping genes $m$ is much larger than the expected,
413: though the difference peaks at 400--600, as versus 600-800
414: in Fig.\ref{fig1}. The overlapping significance as measured
415: by $-\log(p$-value) again quickly moves up with $n_1(=n_2)$
416: as shown in the last column of Fig.\ref{fig4}.
417:
418: The qualitative similarity between Figs.\ref{fig1}, \ref{fig2}
419: and Fig.\ref{fig4} indicates that the presence of
420: low-expressed genes does not affect our conclusion.
421:
422: \addtolength{\textheight}{-12cm} % This command serves to balance the column lengths
423: % on the last page of the document manually. It shortens
424: % the textheight of the last page by a suitable amount.
425: % This command does not take effect until the next page
426: % so it should come on the page before the last. Make
427: % sure that you do not shorten the textheight too much.
428:
429:
430: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
431: \section{CONCLUSIONS AND FUTURE WORKS}
432:
433: \subsection{Conclusions}
434:
435: Using the hypergeometric distribution to calculate the
436: overlapping probability between two top-ranking differentially
437: expressed genes in two studies, we have shown that the
438: overlapping significance depends on the stringency of
439: gene selection criterion, or equivalently, the length
440: of the gene lists. This observation presents a problem
441: when an overlapping $p$-value is reported but the
442: gene selection criterion is not specified. On the other
443: hand, the increase of the overlapping significance
444: with the gene list length can be an indication that
445: the significant overlapping of genes is a true signal.
446:
447:
448: \subsection{Future Works}
449:
450: The overlapping probability calculated here assumes the two
451: top-ranking gene lists are selected from the same pool of $n$
452: genes. If the two studies are based on different chip
453: platforms, the two initial gene pools are not identical,
454: though there are perhaps certain common genes. We plan to
455: derive the overlapping distribution for this situation.
456:
457: We also plan to study the probability for genes appearing
458: in three top-ranking gene lists. Although a permutation based
459: approach comparing multiple studies was proposed in \cite{rhode},
460: there is no analytic formula available.
461:
462:
463: \section{ACKNOWLEDGMENTS}
464:
465: We would like to thank Prof. Richard Friedberg for suggestions.
466:
467:
468: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
469:
470:
471: \begin{thebibliography}{99}
472:
473: \bibitem{hyper}
474: H.T. Gonin,
475: ``The use of factorial moments in the treatment of the hypergeometric
476: distribution and in tests for regression",
477: {\it Philosophical Mag.}, vol 7, 1936, pp 215-226.
478:
479: \bibitem{fisher}
480: R.A. Fisher,
481: {\sl Statistical Methods for Research Workers}
482: Oliver and Boyd, Edinburgh; 1934.
483:
484: \bibitem{tavazoie}
485: S. Tavazoie, J.D. Hughes, M.J. Campbell, R.J. Cho, G.M. Church,
486: ``Systematic determination of genetic network architecture",
487: {\it Nature Genet.}, vol 22, 1999, pp 281-285.
488:
489: \bibitem{draghici}
490: S. Dr\v{a}ghici, P. Khatri, R.P. Martins, G.C. Ostermeier,
491: S.A. Krawetz,
492: ``Global functional profiling of gene expression",
493: {\it Genomics}, vol 81, 2003, pp.98-104.
494:
495: \bibitem{fino}
496: G. Finocchiaro, F. Mancuso, H. Muller,
497: ``Mining published lists of cancer related microarray experiments:
498: identification of a gene expression signature having a
499: critical role in cell-cycle control",
500: {\it BMC Bioinf.}, vol 6(suppl 4), 2003, S14.
501:
502: \bibitem{hosack}
503: D.A. Hosack, G. Dennis Jr., B.T. Sherman, H.C. Lane,
504: R.A. Lempicki
505: (2003),
506: ``Identifying biological themes within lists of genes with EASE",
507: {\it Genome Biol.}, vol 4, 2003, R70.
508:
509:
510: \bibitem{boorsma}
511: A. Boorsma, B.C. Foat, D. Vis, F. Klis, H.J. Bussemaker,
512: ``T-profiler: scoring the activity of predefined groups
513: of genes using gene expression data",
514: {\it Nucleic Acids Res.}, vol 33, 2005, pp W592-W595.
515:
516: \bibitem{curtis}
517: R.K. Curtis, M. Ore\v{s}i\v{c}, A. Vidal-Puig,
518: ``Pathways to the analysis of microarray data",
519: {\it Trends Biotech.}, vol 23, 2005, pp 429-435.
520:
521: \bibitem{mao}
522: X. Mao, T. Cai, J.G. Olyarchuk, L. Wei,
523: ``Automated genome annotation and pathway identification using
524: the KEGG Orthology (KO) as a controlled vocabulary",
525: {\it Bioinfo.}, vol 21, 2005, pp 3787-3793.
526:
527: \bibitem{tian}
528: L. Tian, S.A. Greenberg, S.W. Kong, J. Altschuler,
529: I.S. Kohane, P.J. Park,
530: ``Discovering statistically significant pathways in expression profiling studies",
531: {\it Proc. Natl. Acad. Sci.}, vol 102, 2005, pp 13544-13549.
532:
533: \bibitem{ra}
534: F.M. Batliwalla, E.C. Baechler, X. Xiao, W. Li,
535: S. Balasubramaniuan, H. Khalili, A. Damle, W.A. Ortmann, A. Perrone,
536: A.B. Kantor, M. Kern, P.S. Gulko, M. Kern, R. Furie, T.W. Behrens, P.K. Gregersen,
537: ``Peripheral blood gene expression profiling in rheumatoid arthritis",
538: {\it Gene and Immunity}, vol 6, 2005, pp 388-397.
539:
540: \bibitem{sle}
541: E.C. Baechler, F.M. Batliwalla, G. Karypis, P.M. Gaffney, W.A. Ortmann,
542: K.J. Espe, K.B. Shark, W.J. Grande, K.M. Hughes, V. Kapur, P.K. Gregersen,
543: T.W. Behrens,
544: ``Interferon-inducible gene expression signature in peripheral
545: blood cells of patients with severe lupus",
546: {\it Proc. Natl. Acad. Sci. }, vol 100, 2003, pp 2610-2615.
547:
548: \bibitem{psa}
549: F.M. Batliwalla, W. Li, C.T. Ritchlin, X. Xiao, M. Brenner,
550: T. Laragione, T. Shao, R. Durham, S. Kemshetti, E. Schwarz,
551: R. Coe, M. Kern, E.C. Baechler, T.W. Behrens, P.K. Gregersen, P.K. Gulko,
552: ``Microarray analyses of peripheral blood cells identifies
553: unique expression signature in psoriatic arthritis",
554: {\it Mol. Med.}, 2006, to appear.
555:
556: \bibitem{rhode}
557: D.R. Rhodes, J. Yu, K. Shanker, N. Deshpande, R. Varambally, D. Ghosh,
558: T. Barrette, A. Pandey, A.M. Chinnaiyan,
559: ``Large-scale meta-analysis of cancer microarray data identifies
560: common transcriptional profiles of neoplastic transformation and progression",
561: {\it Proc. Natl. Acad. Sci. }, vol 101, 2004, pp 9309-9314.
562:
563:
564:
565:
566:
567: \end{thebibliography}
568:
569: \end{document}
570:
571: