1: \documentclass{elsart}
2: \usepackage{graphicx}
3: \usepackage{epsfig}
4:
5: \begin{document}
6: \begin{frontmatter}
7: \title{Resultants in Genetic Linkage Analysis}
8: \author{Ingileif B. Hallgr\'{\i}msd\'ottir}
9: \address{Department of Statistics, University of California, Berkeley}
10: \author{Bernd Sturmfels}
11: \address{Department of Mathematics, University of California, Berkeley}
12:
13: \begin{abstract}
14: Statistical models for genetic linkage analysis of $k$ locus diseases
15: are $k$-dimensional subvarieties of a $(3^k-1)$-dimensional
16: probability simplex. We determine the algebraic invariants of
17: these models with general characteristics for $k=1$,
18: in particular we recover, and generalize, the Hardy-Weinberg curve.
19: For $k = 2$, the algebraic invariants are presented as determinants of
20: $32 \times 32$-matrices of linear forms in $9$ unknowns, a suitable
21: format for computations with numerical data.
22: \end{abstract}
23: \end{frontmatter}
24:
25: \section{Introduction}
26:
27: Most common diseases have a genetic component. The first step
28: towards understanding a genetic disease is to identify the genes
29: that play a role in the disease etiology. Genes are identified by their
30: location within the genome. \emph{Genetic linkage analysis}, or gene
31: mapping \cite{ds,holmans,lander,ott},
32: is concerned with this problem of finding the chromosomal
33: location of disease genes. Over 1,200 disease genes
34: for have been successfully mapped~\cite{botrisch}, and this
35: has led to a much better understanding of
36: Mendelian (one gene) disorders. Most common diseases are,
37: however, not caused by one gene but by $k \geq 2$ genes.
38: The challenge today is to understand complex diseases
39: (such as cancer, heart disease and diabetes) which are caused
40: by many interacting genes and environmental factors.
41:
42: The human genome has approximately 25,000 genes. Genes encode
43: for proteins, and proteins perform all the cellular functions
44: vital to life. We all have the same set of genes, but there are
45: many variants of each gene, called \emph{alleles}. Usually these
46: variants all produce a functional protein, but a mutation in a
47: gene can change the protein product of the gene, and this may
48: result in disease. Since mutations are rare, two affected
49: siblings who have the same genetic disease probably inherited
50: the same mutation from a parent. Genetic linkage analysis makes
51: use of this fact: one tries to locate disease genes by identifying
52: regions in the genome that display statistically significant
53: increased sharing across a sample of affected relatives, such as
54: sibling pairs~\cite{elston}.
55:
56: The statistical models used in genetic linkage analysis are
57: algebraic varieties. The given data are $k$-dimensional tables
58: of format $3 \times 3 \times \cdots \times 3$. As usual in
59: algebraic statistics (\cite{gss}, \cite{prw}, \cite[\S 7]{stbook}),
60: there is one \emph{model coordinate} $z_{i_1 i_2 \cdots i_k}$ for each cell
61: entry, where $i_1 ,i_2 ,\ldots,i_k \in \{0,1,2\}$. This coordinate
62: represents the probability that for an affected sibling pair
63: the IBD sharing (see section 2) at the first locus is $i_1$, the IBD
64: sharing at the second locus is $i_2$, etc.
65: The model is a subvariety of the probability simplex with
66: these coordinates. It is $k$-dimensional, because the
67: $z_{i_1 i_2 \cdots i_k}$ are given as polynomials
68: in $k$ \emph{model parameters} $\,p_1,p_2,\ldots,p_k$.
69: Here $p_j$ represents the frequency of the disease allele
70: at the $j$-th locus. We consider an infinite family of
71: models which depends polynomially on $3^k$
72: \emph{model characteristics} $f_{i_1 i_2 \cdots i_k}$.
73: The characteristic $\,f_{i_1 i_2 \cdots i_k}\,$ represents
74: the probability that an individual who has $i_j$ copies
75: of the disease gene at the $j$-th locus will get affected.
76: Note that the parameters $p_i$ and the characteristics
77: $f_{i_1 i_2 \cdots i_k} $
78: are unknown, but we might be interested in estimating
79: them from the given data~$z$.
80:
81: This paper is organized as follows. Section~\ref{oneloc} contains
82: a self-contained derivation of the models in the one-locus
83: case $(k=1)$. Here the models are curves in a triangle with
84: coordinates $(z_0,z_1,z_2)$. For general characteristics, $(f_0,f_1,f_2)$,
85: the curve has degree four. In Section~3 we compute
86: its defining polynomial, a big expression in $z_0,z_1,z_2,f_0,f_1,
87: f_2$. This is done by elimination using the univariate
88: B\'ezout resultant. We discuss what happens for special
89: choices of characteristics
90: which have been studied in the genetics literature.
91:
92: In Section~4 we derive the parametrization of
93: the linkage models for $k \geq 2$. In the two-locus
94: case $(k=2)$, the models are surfaces in the space of
95: nonnegative $3 \times 3$-tables $(z_{ij})$ whose entries
96: sum to one. For general characteristics $(f_{ij})$,
97: the surface has degree $32$. In Section~\ref{surf}
98: we apply Chow forms to derive a system
99: of \emph{algebraic invariants}. These are
100: the polynomials which cut out the surface. Each
101: invariant is presented as the
102: determinant of a $32 \times 32$-matrix whose
103: entries are linear forms in the $z_{ij}$ whose
104: coefficients depend on the $f_{ij}$. We argue that
105: this format is suitable for statistical analysis
106: with numerical data. Computational issues and further
107: directions are discussed in Section~6.
108:
109: \section{Derivation of the One-Locus Model}
110: \label{oneloc}
111:
112: The genetic code, the blueprint of life, is stored in our genome.
113: The genome is arranged into chromosomes which can be thought of as
114: linear arrays of genes. The human genome has two copies of
115: each chromosome, with 23 pairs of chromosomes, 22 autosomes
116: and the sex chromosomes X and Y (women have XX and men XY).
117: Each parent passes one copy of each chromosome to a child.
118: A chromosome passed from parent to child is a mosaic
119: of the two copies of the parent, and a point at which the origin of a
120: chromosome changes is called a \emph{recombination}. This is illustrated in
121: Figure~\ref{fig:sibs}.
122:
123: Between any two recombination sites, the inheritance pattern
124: of the two siblings is constant and is encoded by
125: the {\em inheritance vector} $\,x=(x_{11}, x_{12},
126: x_{21}, x_{22})$. The entry $x_{kj}$ is the label of
127: the chromosome segment that sibling $k$ got from parent $j$.
128: If we label the paternal chromosomes with $1$ and $2$ and the
129: maternal chromosomes with $3$ and $4$, then
130: $x_{11}, x_{21} \in \{1,2\}$ and $x_{12}, x_{22} \in \{3,4\}$, so
131: there are 16 possible inheritance vectors $x$.
132: They come in three classes:
133: \begin{eqnarray*}
134: C_0 \quad & = \quad &
135: \bigl\{ \,
136: (1,3,2,4),\,
137: (1,4,2,3),\,
138: (2,3,1,4),\,(2,4,1,3) \, \bigr\}, \\
139: C_1 \quad & = \quad &
140: \bigl\{ \,
141: (1,3,1,4),\,
142: (1,4,1,3),\,
143: (2,3,2,4),\,
144: (2,4,2,3),\, \\ & & \,\,\,\,
145: (1,3,2,3),\,
146: (2,3,1,3),\,
147: (1,4,2,4),\,
148: (2,4,1,4)\, \bigr\} , \\
149: C_2 \quad & = \quad &
150: \bigl\{ \,
151: (1,3,1,3), \,
152: (1,4,1,4), \,
153: (2,3,2,3), \,
154: (2,4,2,4)\, \bigr\} .
155: \end{eqnarray*}
156:
157: We say that two siblings share genetic material, at a locus,
158: identical by descent (IBD) if it originated from the same parent.
159: The IBD sharing at a locus can be 0, 1 or 2, where the inheritance
160: vectors in $C_i$ correspond to IBD sharing of $i$. Since at a
161: random locus in the genome each inheritance vector is equally likely
162: the IBD sharing is 0, 1 or 2 with probabilities $1/4$, $1/2$ and $1/4$.
163:
164: \begin{figure}[h]
165: \begin{center}
166: \leavevmode
167: \epsfig{file=sibs_sharing.eps, height=8cm}
168: \caption{An example of the inheritance of one chromosome pair in parents and a sibling pair. Squares represent males and circles females.}
169: \label{fig:sibs}
170: \end{center}
171: \end{figure}
172:
173: Each individual has two alleles, i.e. two copies of every gene, one on
174: each chromosome. A \emph{genotype} at a locus is the unordered pair
175: of alleles. We are
176: only concerned with whether one carries an allele that predisposes
177: to disease, which we call $d$, or a normal allele, called $n$.
178: The set of possible genotypes at a disease locus is
179: $\,G=\{nn, nd, dn, dd\}$.
180:
181: Let $p$ denote the frequency of the disease allele
182: $d$ in the population. This quantity is
183: our {\em model parameter}. We assume Hardy-Weinberg equilibrium:
184: $$ \hbox{$Pr(nn)=(1-p)^2, \, Pr(nd)=p(1-p), \, Pr(dn)=p(1-p)$ and $Pr(dd)=p^2$.}$$
185: A disease model is specified by
186: $f = (f_0,f_1,f_2)$, where $f_i$ is the probability
187: that an individual is affected with the disease, given
188: $i$ copies of the disease allele,
189: \vskip -0.3cm
190: \begin{eqnarray*}
191: f_0 &\,=\,& Pr(\mbox{affected} \,|\, nn), \quad f_2 \,=\, Pr(\mbox{affected} \,|\, dd), \\
192: f_1 &\,=\,& Pr(\mbox{affected} \,|\, nd ) \,= \, Pr(\mbox{affected} \,|\, dn).
193: \end{eqnarray*}
194: The quantities $f_i$ are known as {\em penetrances} in the
195: genetics literature. In this paper, we call them
196: {\em model characteristics} to emphasize their algebraic role.
197:
198: The {\em coordinates} of a disease model are $z = (z_0,z_1,z_2)$, where
199: $z_i$ is the probability that the IBD sharing for an affected sibling pair
200: is $i$ at a given locus,
201: $$ z_i \,\, = \,\,
202: Pr(\mbox{IBD sharing}=i \,|\, \mbox{both sibs affected}), \quad i=0,1,2. $$
203: Then, as was stated above, at a random locus not linked to
204: the disease gene the distribution is $z_{null}=(1/4,1/2,1/4)$.
205: Data for linkage analysis are collected from a sample of $n$
206: siblings (and parents) as follows.
207: The marker information is used to infer the IBD sharing at
208: each marker locus for each sibling pair and
209: at any particular locus, one uses the
210: vector $(n_0,n_1,n_2)$, where $n_i$ is the number of
211: sibling pairs whose inferred IBD sharing is $i$ at the locus.
212: Each such data point determines an empirical distribution
213: $$ \hat{z} \,\, = \,\, (\hat{z}_0,\hat{z}_1,\hat{z}_2) \,\, = \,\, (n_0/n,n_1/n,n_2/n) \, , \qquad \hbox{where} \,\,\,\,
214: n_0+n_1+n_2 = n. $$
215: The objective is to look for regions in the genome where $\,\hat{z}\,$
216: deviates significantly from $\,z_{null} = (1/4, 1/2, 1/4)$.
217: Such regions may be linked to the disease.
218:
219: The one-locus model is given by expressing the coordinates
220: $(z_0,z_1,z_2)$ as polynomial functions of
221: the parameter $p$ and the characteristics $f_0,f_1,f_2$.
222: These polynomials are derived as follows. Consider
223: the set of events $\,\mathcal{E}_i \,=\, C_i \times G \times G\,$ for
224: $i=0,1,2$.
225: Each event in $\mathcal{E}_i$ consists of
226: an inheritance vector, a genotype for
227: the mother and a genotype for the father.
228: This triple determines the total number $m$
229: of disease alleles carried by the parents
230: and the numbers $k_1$ and $k_2$ of disease alleles
231: carried by the two siblings.
232: The probability of the event is
233: $$
234: f_{k_1} f_{k_2} p^m q^{4-m} \,, \quad \quad
235: \hbox{where $q = 1-p$.}
236: $$
237: Then, up to a global normalizing constant,
238: the IBD sharing probability $z_i$ is the sum over
239: all events in $\mathcal{E}_i$ of the monomials
240: $\,f_{k_1} f_{k_2} p^m q^{4-m}$.
241: Hence $z_0$ is a sum of $|\mathcal{E}_0| = 64$ monomials,
242: $z_1$ is a sum of $ 128$ monomials,
243: and $z_2$ is a sum of $ 64$ monomials.
244: But these monomials are not all distinct.
245: For instance, all four elements of
246: $\, C_0 \times \{nn\} \times \{nn\}\,\subset \,\mathcal{E}_0\,$
247: contribute the same monomial $\, f_0^2 q^4\,$ to $z_0$.
248: By explicitly listing all events in
249: $\mathcal{E}_0, \mathcal{E}_1$ and $ \mathcal{E}_2$,
250: we get the following result.
251:
252: \begin{prop} \label{matrixform}
253: The coordinates $z_i$ of the one-locus model
254: are homogeneous polynomials of bidegree
255: $(2,4)$ in the characteristics
256: $(f_0,f_1,f_2)$ and the parameters $(p,q)$.
257: The column vector $(z_0,z_1,z_2)^T$ equals
258: the matrix-vector product
259: \vskip -.4cm
260: \begin{eqnarray*}
261: \!\!\!\!\!\!
262: \left( \begin{array}{ccccc}
263: 4f_0^2 & 16f_0f_1 & 8f_0f_2+16f_1^2 & 16f_1f_2 & 4f_2^2 \\
264: 8f_0^2 & 8(f_0^2 \!+\! 2f_0f_1 \!+\! f_1^2) &
265: 16 (f_0f_1\!+ \!f_1^2 \! + \! f_1f_2) &
266: 8(f_1^2\!+\!2f_1f_2\!+ \!f_2^2) & 8f_2^2 \\
267: 4f_0^2 & 8f_0^2+8f_1^2 & 4f_0^2+16f_1^2+4f_2^2 & 8f_1^2+8f_2^2 & 4f_2^2
268: \end{array} \right)
269: \!\!
270: \left( \begin{array}{l}
271: q^4\\
272: pq^3\\
273: p^2q^2 \! \\
274: p^3q\\
275: p^4
276: \end{array} \right)
277: \end{eqnarray*}
278: \end{prop}
279:
280: Proposition \ref{matrixform} says that
281: the one-locus model has the form
282: \begin{equation}
283: \label{zFq} (z_0,z_1,z_2)^T \,\, = \,\, F \cdot
284: (q^4, pq^3, p^2q^2, p^3q, p^4)^T ,
285: \end{equation}
286: where $F$ is a $3 \times 5$-matrix
287: whose entries are quadratic polynomials
288: in the penetrances $f_i$. The resultant
289: computation to be described in the
290: next section works for any model of this form,
291: even if the matrix $F$ were more complicated.
292:
293: \section{Curves in a Triangle}
294:
295: Suppose that we fix the model characteristics
296: $f_0,f_1,f_2$ and hence the matrix $F$.
297: Then (\ref{zFq}) defines a curve in the projective
298: plane with coordinates $(z_0:z_1:z_2)$. The positive
299: part of the projective plane is identified with the
300: triangle
301: \begin{equation}
302: \label{bigtriangle}
303: \bigl\{\,
304: (z_0,z_1,z_2) \,:\,
305: z_0,z_1,z_2 \geq 0 \,\,\, \hbox{and} \,\,\,
306: z_0+z_1+z_2 = 1 \,\,\bigr\}.
307: \end{equation}
308: The one-locus model with characteristics
309: $f_0,f_1,f_2$ is the intersection of the curve
310: with the triangle. We are interested in its
311: defining polynomial.
312:
313: \begin{prop} \label{twolocusprop}
314: For general characteristics $f_0,f_1,f_2$,
315: the one-locus model is a plane curve of degree four.
316: The defining polynomial of this
317: curve equals
318: \vskip -.3cm
319: \begin{eqnarray*}
320: I(z_0,z_1,z_2) &\,\,=\,\,&
321: a_{1} z_0^3 z_2
322: + a_{2} z_0^2 z_1^2
323: + a_{3} z_0^2 z_1 z_2
324: + a_{4} z_0^2 z_2^2
325: + a_{5} z_0 z_1^3\\
326: & &
327: + \, a_{6} z_0 z_1^2 z_2
328: + a_{7} z_0 z_1 z_2^2
329: + a_{8} z_0 z_2^3
330: + a_{9} z_1^4\\
331: & &
332: + \, a_{10} z_1^3 z_2
333: + a_{11} z_1^2 z_2^2
334: + a_{12} z_1 z_2^3
335: + a_{13} z_2^4,
336: \end{eqnarray*}
337: where each $a_i$ is a polynomial homogeneous
338: of degree eight in $(f_0,f_1,f_2)$.
339: \end{prop}
340:
341: This proposition is proved by
342: an explicit calculation. Namely,
343: the invariant $I(z_0,z_1,z_2)$ is gotten by
344: eliminating $p$ and $q$
345: from the three equations in (\ref{zFq}).
346: This is done using the \emph{B\'ezout resultant}
347: (\cite[Theorem 2.2]{StuSanDiego},
348: \cite[Theorem 4.3]{stbook}).
349: Specifically, we are using the
350: following $4 \times 4$-matrix from
351: \cite[Equation (1.5)]{StuSanDiego}:
352: \begin{equation}
353: \label{bezout}
354: \qquad \qquad B \,\,\, = \,\,\,
355: \left( \begin{array}{cccccccc}
356: & [12] & & [13] & [14] & & [15] & \\
357: & [13] & & [14] \! + \! [23] & [15] \! + \! [24] & & [25] & \\
358: & [14] & & [15] \! + \! [24] & [25] \! + \! [34] & & [35] & \\
359: & [15] & & [25] & [35] & & [45] &
360: \end{array} \right).
361: \end{equation}
362:
363: The determinant of this matrix is the {\em Chow form} \cite{DalStu}
364: of the curve in projective $4$-space $P^4$ which is parameterized
365: by the vector of monomials $(q^4,pq^3,p^2 q^2, p^3 q, p^4)$.
366: We are interested in the curve in the projective plane $P^2$
367: which is the image of that monomial curve under the linear map from
368: $P^4$ to $P^2$ given by the matrix $F$. Section 2.2 in \cite{DalStu}
369: explains how to compute the image under a linear map of a variety
370: that is presented by its Chow form. Applying the method described
371: there means replacing the bracket $\, [i \, j]\,$ by
372: the $3 \times 3$-subdeterminant with column indices
373: $i$, $j$ and $6$ in the matrix
374: from Proposition \ref{matrixform} augmented by $z$:
375: $$\! (F,z) =
376: \left( \begin{array}{ccccccc}
377: \! 4f_0^2 & 16f_0f_1 & 8f_0f_2+16f_1^2 & 16f_1f_2 & 4f_2^2 && z_0 \\
378: \! 8f_0^2 & 8(f_0^2 \!+\! 2f_0f_1 \!+\! f_1^2) &
379: 16 (f_0f_1\!+ \!f_1^2 \! + \! f_1f_2) &
380: 8(f_1^2\!+\!2f_1f_2\!+ \!f_2^2) & 8f_2^2 & & z_1 \\
381: \! 4f_0^2 & 8f_0^2+8f_1^2 & 4f_0^2+16f_1^2+4f_2^2 & 8f_1^2+8f_2^2 & 4f_2^2
382: & & z_2
383: \end{array} \right)
384: $$
385: The desired algebraic invariant equals
386: (up to a factor) the determinant of $\,B$:
387: \begin{equation}
388: \label{formulaforcurve}
389: I(z_0,z_1,z_2)
390: \,\, = \,\,
391: 2^{-16} f_0^{-2} f_2^{-2} (f_0 - 2 f_1 + f_2)^{-4} \cdot {\rm det}(B).
392: \end{equation}
393:
394:
395: If the characteristics $f_0,f_1,f_2$
396: are arbitrary real numbers between
397: $0$ and $1$ then the polynomial $\,I(z_0,z_1,z_2) \,$
398: is irreducible of degree four and its
399: zero set is precisely the model.
400: For some special choices of characteristics $f_i$, however,
401: the polynomial $I(z_0,z_1,z_2)$ may become reducible
402: or it may vanish identically.
403: In the reducible case,
404: the defining polynomial is one of the factors.
405: Consider the following special
406: models which are commonly used in genetics:
407: \begin{center}
408: \begin{tabular}{rcccc}
409: & & $f_0$ & $f_1$ & $f_2$ \\
410: {\it dominant} &:& 0 & $f$ & $f$ \\[-2mm]
411: {\it additive} &:& $0$ & $f/2$ & $f$ \\[-2mm]
412: {\it recessive} &:& 0 & 0 & $f$ \\[-2mm]
413: \end{tabular}
414: \end{center}
415: Here $0 < f < 1$. For the {\em dominant model}
416: our invariant specializes to
417: $$ I(z_0,z_1,z_2) \,\, = \,\,
418: 4 f^8 (z_1-z_0-z_2)
419: (\underline{z_1^2 z_0-8 z_1 z_0 z_2
420: +4 z_1 z_2^2+4 z_0^2 z_2+4 z_0 z_2^2-4 z_2^3}),
421: $$
422: and the defining polynomial of the model is the underlined cubic factor.
423:
424: For the {\em additive model}
425: our invariant specializes to
426: $$
427: I(z_0,z_1,z_2) \,\, = \,\,
428: \frac{f^8}{2^4}(z_1^2+2 z_1 z_2-8 z_0 z_2+z_2^2)
429: (\underline{z_1-z_0-z_2})^2 ,
430: $$
431: and the defining polynomial of the model is the underlined linear factor.
432:
433: It can be shown that $\,I(z_0,z_1,z_2)\,$ vanishes identically if and only if
434: $$ f_0=f_1=0 \quad \hbox{or} \quad
435: f_1 = f_2 = 0 \quad \hbox{or} \quad
436: f_0=f_1 = f_2 . $$
437: This includes the {\em recessive model}, which is the familiar
438: Hardy-Weinberg curve:
439: $$z_1^2-4z_0z_2 \,\, = \,\, 0 .$$
440: \vspace{-0.5cm}
441: \begin{figure}[h]
442: \begin{center}
443: \leavevmode
444: \epsfig{file=holmans.ps, height=9cm, angle=270}
445: \caption{Holmans' triangle. The larger triangle is the probability simplex, $z_0+z_1+z_2=1$ and the smaller triangle is the possible triangle for sibling pair IBD sharing probabilities. The curve from (1/4,1/2,1/4) to (0,0,1) is the Hardy-Weinberg (recessive) curve. The curve from $(1/4,1/2,1/4)$ to $(0,1/2,1/2)$ is the dominant curve and the line between the same points is the additive curve.}
446: \label{fig:holmans}
447: \end{center}
448: \end{figure}
449: %\vspace{-0.5cm}
450: \newpage
451: Holmans \cite{holmans} showed that the IBD sharing probabilities
452: for affected sibling pairs must satisfy
453: $\,2 z_0 \leq z_1 \leq z_0+z_2 $. This means we can restrict our
454: attention to the smaller triangle (Holmans' triangle)
455: in Figure~\ref{fig:holmans}.
456: We can graph the curve in the triangle for any choice
457: of model characteristics.
458: The part of the curve corresponding to values
459: of $p \in [0,1]$ is within the smaller triangle.
460:
461: It is worth noting that not all points $(z_0,z_1,z_2)$ in Holmans'
462: triangle which satisfy the algebraic invariant are in the image of
463: a point $(p,q)$ with real coordinates.
464: Consider e.g. the model with characteristics $f_0=1, f_1=0$ and $f_2=1$
465: and complex parameters $(p,q)$.
466: The real part of the curve corresponding to this model is shown in
467: Figure~\ref{fig:complex}. Two segments of the curve are within
468: Holmans' triangle, one of which (dotted) corresponds to values
469: $p \in [0,1]$. The other segment has a complex pre-image.
470:
471: %\vspace{-0.5cm}
472: \vspace{0.3cm}
473: \begin{figure}[h]
474: \begin{center}
475: \leavevmode
476: \epsfig{file=complex.ps, height=8cm, angle=270}
477: \caption{Holmans' triangle. The larger triangle is the probability simplex, $z_0+z_1+z_2=1$ and the smaller triangle is the possible triangle for sibling pair IBD sharing probabilities. The curve corresponds to a model with characteristics $f_0=1, f_1=0$ and $f_2=1$. The dotted part of the curve is the image of real valued $p$, and the solid part is the image of $\,p=1/2+y\sqrt{-1}$, for a real number $y$.}
478:
479: \label{fig:complex}
480: \end{center}
481: \end{figure}
482: %\vspace{-0.5cm}
483:
484: We expressed the IBD sharing of the sibling pair at a gene locus
485: (the model coordinate $z$) as a function of $f_0,f_1,f_2$ and $p$.
486: In practice, however, we get data at \emph{marker loci},
487: regularly spaced across the chromosomes, not at the gene locus.
488: If there has been no recombination between the gene locus
489: and a marker locus then the IBD sharing at the two loci is the same,
490: but different if there has been a recombination in either sibling.
491: Let $\theta$ be the \emph{recombination fraction}
492: between the gene locus and the marker locus. The new parameter
493: $\theta$ depends on the distance between the two loci. Following~\cite{ds},
494: we can express the IBD sharing probabilities at a marker locus
495: distance $\theta$ away from the gene by the formula
496: \begin{equation}
497: \label{zFthetaq} (z_0,z_1,z_2)^T \,\, = \,\, F_{\theta} \cdot
498: (q^4, pq^3, p^2q^2, p^3q, p^4)^T ,
499: \end{equation}
500: where $\,F_{\theta} = \Psi F \,$ and
501: \begin{eqnarray*}
502: \Psi \,\,\, = \,\,\,
503: \left( \begin{array}{ccc}
504: \psi^2 & \bar{\psi} \psi & \bar{\psi}^2 \\
505: 2 \bar{\psi} \psi & \psi^2 + \bar{\psi}^2 & 2 \psi \bar{\psi} \\
506: \bar{\psi}^2 & \bar{\psi} \psi & \psi^2
507: \end{array} \right), \quad
508: \hbox{with $\psi = \theta^2 + (1-\theta)^2$
509: and $\bar{\psi} = 1-\psi$.}
510: \end{eqnarray*}
511: One can easily repeat the resultant calculation in
512: Proposition~\ref{twolocusprop} to obtain the equation of the larger family
513: of curves defined by (\ref{zFthetaq}). Note that $\theta = 0$ corresponds
514: to the earlier case, and increasing $\theta$ shifts the curve
515: towards $z_{null}$.
516:
517: We close this section with a statistical discussion.
518: We wish to find the gene locus using the inferred IBD
519: sharing at the marker loci. Since $\theta$ can be thought of
520: as a measure of the distance between the marker locus and the
521: gene locus we wish to estimate $\theta$ at each marker locus.
522: The inferred IBD sharing can be used to obtain an estimate of the
523: model coordinates $z$. If $p, f_0, f_1$ and $f_2$ are known it is
524: then easy to estimate $\theta$. However that is rarely the case,
525: and it is impossible to identify all of the unknown quantities
526: $p, f_0, f_1, f_2$ and $\theta$ from the coordinates $z$.
527: Instead the model (\ref{zFq})
528: is applied to biological data as follows.
529: The IBD sharing at the gene locus (and at nearby marker loci)
530: is largest when the disease allele
531: has a strong effect and/or the disease allele is rare, i.e. when
532: $f_0 \leq f_1 \leq f_2$ (and preferably $f_0 \ll f_2$),
533: and $p$ is small. In these, biologically interesting,
534: situations the data point $\hat{z}$ is clearly different from $z_{null}$.
535: So in practice a test for genetic linkage tests whether $\hat{z}$ is
536: significantly different from $z_{null}$. A widely used test statistic for
537: linkage is $S_{pairs} = \hat{z}_2+\hat{z}_1/2$ which measures deviations
538: from $z_{null}$ along the line corresponding to the additive model.
539:
540: \section{Derivation of the Two-Locus Model}
541:
542: Many common genetic disorders are caused by not one but many
543: interacting genes. We now consider the two-locus model, $k=2$,
544: where we assume that two genes cause the disease,
545: independently or together. We shall assume that the genes are
546: unlinked, i.e., they are either on different chromosomes or
547: far apart on the same chromosome. The derivation
548: is much like in Section 2.
549:
550: The {\em model parameters} are $p_1$ and $p_2$, where
551: $p_i$ is the frequency of the disease allele at the $i$th locus.
552: A two-locus genotype is an
553: element in $G \times G = \{nn, nd, dn, dd\}^2$.
554: The {\em model characteristics} are
555: $\,f=(f_{00}, f_{01},
556: \ldots, f_{22})$ where $f_{ij}$,
557: is the probability that an individual is affected with the
558: disease, given $i$ copies of the first disease allele and
559: $j$ copies of the second disease allele:
560: \begin{eqnarray*}
561: f_{00} \,\, &= & \,\, Pr(\,\mbox{affected} \,\,\,|\,\,\, (nn, nn)\,), \\
562: f_{01} \,\, &= & \,\, Pr(\,\mbox{affected} \,\,\,|\,\,\, (nn, nd)\,) \,\, = \,\, Pr(\,\mbox{affected} \,\,\,|\,\,\,(nn, dn)\,), \\
563: f_{02} \,\, &=& \,\, Pr(\,\mbox{affected} \,\,\,|\,\,\, (nn, dd)\,), \\
564: f_{10} \,\, &=& \,\, Pr(\,\mbox{affected} \,\,\,|\,\,\, (nd, nn)\,) \,\, = \,\, Pr(\,\mbox{affected} \,\,\,|\,\,\, (dn, nn)\,), \\
565: f_{11} \,\, &=& \,\, Pr(\,\mbox{affected} \,\,\,|\,\,\, (nd, nd)\,) \,\,= \dots = \,\, Pr(\,\mbox{affected} \,\,\,|\,\,\, (dn, dn)\,), \\
566: f_{12} \,\, &=& \,\, Pr(\,\mbox{affected} \,\,\,|\,\,\, (nd, dd)\,) \,\, = \,\, Pr(\,\mbox{affected} \,\,\,|\,\,\, (dn, dd)\,), \\
567: f_{20} \,\, &=& \,\, Pr(\,\mbox{affected} \,\,\,|\,\,\, (dd, nn)\,), \\
568: f_{21} \,\, &=& \,\, Pr(\,\mbox{affected} \,\,\,|\,\,\, (dd, nd)\,) \,\,=\,\, Pr(\,\mbox{affected} \,\,\,|\,\,\, (dd, dn)\,), \\
569: f_{22} \,\, &=& \,\, Pr(\,\mbox{affected} \,\,\,|\,\,\, (dd, dd)\,).
570: \end{eqnarray*}
571: The {\em model coordinates} are
572: $\,z=(z_{00}, z_{01}, z_{02}, z_{10}, z_{11}, z_{12}, z_{20}, z_{21}, z_{22})$,
573: where $z_{ij}$ represents the probability for an affected sibling pair
574: that the IBD sharing at the first gene locus is $i$,
575: and $j$ at the second gene locus:
576: \begin{displaymath}
577: z_{ij} \,\, = \,\, Pr(\,\mbox{IBD sharing}
578: \,\, = \,\, (i, j) \,|\, \mbox{both sibs affected}
579: \,), \qquad i,j = 0,1,2.
580: \end{displaymath}
581: The IBD sharing at two random loci, neither of which
582: linked to the disease genes, is the null hypothesis
583: $\,z_{null}~=~(1/16, 1/8, 1/16, 1/8, 1/4, 1/8, 1/16, 1/8, 1/16)$.
584:
585: The polynomial functions which express the
586: coordinates $z_{ij}$ in terms of $p_1,p_2$ and the
587: $f_{ij}$ are derived as follows.
588: We consider the set of events
589: $$
590: \mathcal{E}_i \times \mathcal{E}_j
591: \,\, = \,\,
592: C_i \times G \times G \times C_j \times G \times G
593: \quad \qquad \hbox{for $i,j=0,1,2$}. $$
594:
595: Each event in $\,\mathcal{E}_i \times \mathcal{E}_j \,$ consists of an
596: inheritance vector, the genotype of the father and the genotype
597: of the mother, at each locus. For a given event we know
598: the total number $m_1$ and $m_2$ of disease alleles
599: carried by the parents at the first and second locus
600: and $k_{11}, k_{12}, k_{21}, k_{22}$, where
601: $k_{ij}$ is the number of disease
602: alleles carried by sibling $i$ at locus $j$. The probability of the event is
603: \begin{displaymath}
604: f_{k_{11} k_{12}} f_{k_{21} k_{22}} p_1^{m_1}q_1^{4-m_1} p_2^{m_2} q_2^{4-m_2}, \quad \mbox{where} \quad q_1 = 1-p_1 \quad \mbox{and} \quad q_2 = 1-p_2.
605: \end{displaymath}
606: Up to a normalizing constant,
607: each IBD sharing probability $z_{ij}$ is the
608: sum of the monomials $\,f_{k_{11} k_{12}}
609: f_{k_{21} k_{22}} p_1^{m_1}q_1^{4-m_1} p_2^{m_2} q_2^{4-m_2}\,$
610: over all events in $\,\mathcal{E}_i \times \mathcal{E}_j $.
611:
612: \begin{prop} \label{matrixform2}
613: The coordinates $z_{ij}$ of the two-locus model
614: are homogeneous polynomials of tridegree
615: $(2,4,4)$ in the characteristics $(f_0,f_1,f_2)$,
616: the parameters $(p_1,q_1)$ at the first locus, and
617: the parameters $(p_2,q_2)$ at the second locus.
618: \end{prop}
619:
620: The matrix form of the one-locus model given in
621: Proposition \ref{matrixform} immediately generalizes
622: to the two-locus model. Let $\pi$ denote the
623: column vector whose entries are
624: the $25$ monomials of bidegree $(4,4)$ listed
625: in lexicographic order:
626: $$ \pi \,\, := \,\, \bigl(\,
627: q_1^4 q_2^4,\,
628: q_1^4 p_2 q_2^3,\,
629: q_1^4 p_2^2 q_2^2,\,
630: \ldots\,,\,
631: p_1 q_1^3 q_2^4,\,
632: p_1 q_1^3 p_2 q_2^3,\,
633: \ldots, \,
634: p_1^4 p_2^4 \, \bigr).
635: $$
636:
637: \begin{cor} \label{ninetwentyfive}
638: The two-locus model has the form
639: $\,z^T = F \cdot \pi \,$ where
640: $F$ is a $9\times 25$-matrix
641: whose entries are quadratic forms
642: in the characteristics $f_{ij}$.
643: \end{cor}
644:
645: A typical entry in our $9 \times 25$ matrix $F$ looks like
646: $$
647: 32 \cdot ( f_{00}^2 + 2 f_{00} f_{10} + 4 f_{01}^2
648: + 8 f_{01} f_{11} + f_{02}^2 + 2 f_{02} f_{12}
649: + f_{10}^2 + 4 f_{11}^2 + f_{12}^2). \eqno (*)
650: $$
651: This quadratic form appears in $F$ in row $6$ and column $8$.
652: It is the coefficient of the
653: $8^{th}$ biquartic monomial $\,p_1 q_1^3 p_2^2 q_2^2 \,$ in
654: the expression for the $6^{th}$ coordinate:
655: \vskip -0.3cm
656: \begin{eqnarray*}
657: z_{12} &\quad=\,\,& \,\,\,\,(32 f_{00}^2) \cdot q_1^4 q_2^4 \,\,+\,\,
658: (64 f_{00}^2+64 f_{01}^2) \cdot q_1^4 p_2 q_2^3\\
659: & & + \, (32 f_{00}^2+128 f_{01}^2+32 f_{02}^2) \cdot q_1^4 p_2^2 q_2^2
660: \,+\, \cdots \cdots \, + \\
661: & & +\, (*) \cdot p_1 q_1^3 p_2^2 q_2^2 \,+\, \cdots\,
662: + (64 f_{21}^2+64 f_{22}^2) \cdot p_1^4 q_2 p_2^3
663: \, +\, (32 f_{22}^2) \cdot p_1^4 p_2^4.
664: \end{eqnarray*}
665:
666: \section{Surfaces of degree 32 in the 8-dimensional simplex}
667: \label{surf}
668:
669: Let $\Delta_8$ denote the eight-dimensional probability simplex
670: $$
671: \{\,(z_{00},z_{01}, \ldots , z_{22}) \,\,
672: : \,\,
673: z_{ij} \geq 0 \,\, \mbox{for} \,\, i,j \in \{0,1,2\}
674: \quad \mbox{and} \quad \sum_{i=0}^2 \sum_{j=0}^2 z_{ij} = 1\}.
675: $$
676: Likewise, we consider the
677: product of two $1$-simplices,
678: which is the square
679: $$ \Delta_1 \times \Delta_1 \,\,\, = \,\,\,
680: \bigl\{\, (p_1,q_1,p_2,q_2) \,\,:\,\,
681: p_1,q_1,p_2,q_1 \geq 0 \quad \mbox{and} \quad
682: p_1+q_1 = p_2 + q_2 = 1 \,\bigr\}. $$
683: For fixed $F$,
684: the formula $\,z^T = F \cdot \pi \,$ in
685: Corollary \ref{ninetwentyfive}
686: specifies a polynomial map
687: $$ \tilde F \quad : \quad
688: \Delta_1 \times \Delta_1 \,\,\longrightarrow \,\,
689: \Delta_8 \qquad \qquad
690: \mbox{of bidegree $(4,4)$}. $$
691: The image of the map $\tilde F$
692: is the two-locus model
693: for fixed characteristics $f_{ij}$.
694: The model is a surface in the simplex $\Delta_8$.
695: Our goal in this section is
696: to express this surface as the common zero set
697: of a system of polynomials in the $z_{ij}$.
698:
699: \begin{thm} \label{thirtytwo}
700: For almost all characteristics $f_{ij}$,
701: the two-locus model is a surface of degree
702: $32$ in the simplex $\Delta_8$. This surface is the
703: common zero set of the degree $32$ polynomials
704: gotten by projection into three-dimensional subspaces.
705: \end{thm}
706:
707: \noindent {\sl Proof. }
708: We work in the setting of complex projective
709: algebraic geometry. Consider the embedding
710: of the product of projective lines $P^1 \times P^1$
711: by the ample line bundle $\mathcal{O}(4,4)$. This
712: is a toric surface $X$ of degree $32$ in $P^{24}$.
713: The $9 \times 25$-matrix $F$ defines a rational
714: map from $P^{24}$ to $P^8$, and it can be checked
715: computationally that this map has no base points on
716: $X$ for general $f_{ij}$. Hence the image $F(X)$ of $X$ in
717: $P^8$ is a rational surface of degree $32$. The two-locus model
718: is the intersection of $F(X)$ with $\Delta_8$, which is
719: the positive orthant in $P^8$.
720:
721: Let $A$ denote a generic $4 \times 9$-matrix,
722: defining a rational map $P^8 \rightarrow P^3$.
723: It has no base points on $F(X)$, hence the image
724: $AF(X)$ of $F(X)$ under $A$ is a surface
725: of degree $32$ in projective $3$-space $P^3$.
726: The inverse image of $AF(X)$ in $P^8$
727: is an irreducible hypersurface of degree
728: $32$ in $P^8$. It is defined
729: by an irreducible homogeneous polynomial
730: of degree $32$ in $\, z = (z_{00}, z_{01}, \ldots,z_{22})$.
731: These polynomials for various $4 \times 9$-matrices $A$
732: are known as the \emph{Chow equations} of the surface $F(X)$.
733: Computing them is equivalent to computing the
734: \emph{Chow form} of $F(X)$. A well-known
735: construction in algebraic geometry (see e.g.~\cite[\S 3.3]{DalStu})
736: shows that any irreducible projective variety
737: is set-theoretically defined by its
738: Chow equations. Applying this result
739: to $F(X)$ completes the proof. \qed
740:
741: We now explain how Theorem \ref{thirtytwo}
742: translates into an explicit algorithm for
743: computing the algebraic invariants of the
744: two-locus model. Let $\,\mathcal{R}_X\,$ be
745: the Chow form of the toric surface
746: $\, X \simeq P^1 \times P^1 \,$ in $\,P^{24}$.
747: The Chow form $\,\mathcal{R}_X\,$
748: is the multigraded resultant of three polynomial equations
749: of bidegree $(4,4)$:
750: $$
751: \sum_{i=0}^4 \sum_{j=0}^4 \alpha_{ij} x^i y^j
752: \,=\,
753: \sum_{i=0}^4 \sum_{j=0}^4 \beta_{ij} x^i y^j
754: \,=\,
755: \sum_{i=0}^4 \sum_{j=0}^4 \gamma_{ij} x^i y^j
756: \,=\, 0 . $$
757: In concrete terms, $\,\mathcal{R}_X\,$ is
758: the unique (up to sign) irreducible polynomial
759: of tridegree $(32,32,32)$ in the
760: $75$ unknowns $\alpha, \beta,\gamma$ which vanishes
761: if and only if the three equations have a common
762: solution in $\,P^1 \times P^1$.
763:
764: We use the B\'ezout matrix representation
765: of the resultant $\mathcal{R}_X$
766: given in \cite[Theorem 6.2]{DicEmi}.
767: This is a $32 \times 32$-matrix ${\bf B}$ which is
768: a direct generalization of the $4 \times 4$-matrix
769: in (\ref{bezout}). Consider the
770: $3 \times 25$-coefficient matrix
771: $$
772: \left( \begin{array}{cccccccccc}
773: \alpha_{00} & \alpha_{01} & \alpha_{02} & \alpha_{03} & \alpha_{04} &
774: \alpha_{10} & \alpha_{11} & \cdots \cdots & \alpha_{43} & \alpha_{44} \\
775: \beta_{00} & \beta_{01} & \beta_{02} & \beta_{03} & \beta_{04} &
776: \beta_{10} & \beta_{11} & \cdots \cdots & \beta_{43} & \beta_{44} \\
777: \gamma_{00} & \gamma_{01} & \gamma_{02} & \gamma_{03} & \gamma_{04} &
778: \gamma_{10} & \gamma_{11} & \cdots \cdots & \gamma_{43} & \gamma_{44} \\
779: \end{array} \right)
780: $$
781: For $1 \leq i < j < k \leq 25$, let $\,[\, i \,j \, k \,]\,$ denote the
782: determinant of the $3 \times 3$-submatrix with column indices $i,j,k$.
783: The entries in the Bezout matrix ${\bf B}$
784: are the linear forms in the brackets
785: $\,[\, i \,j \, k \,]$, and we have
786: $\,\mathcal{R}_X = {\rm det}({\bf B})$.
787:
788: Let $F$ be the $9 \times 25$-matrix
789: in Corollary \ref{ninetwentyfive}.
790: We add the column vector $z$ to get the
791: $ 9 \times 26$-matrix $\,( F \, z )$.
792: Next we pick any $4 \times 9$-matrix $A$
793: and we consider
794: $$ A \cdot (F \,\, z) \,\, = \,\, (A \cdot F \, \,\,\, A \cdot z). $$
795: This is a $4 \times 26$-matrix whose last column consists of
796: linear forms in the $z_{ij}$.
797:
798: In the B\'ezout matrix ${\bf B}$, we now replace
799: each bracket $\,[\, i \,j \, k \,]\,$ by the $4 \times 4$-subdeterminant
800: of $\, A \cdot (F \, \, z)\,$ with column indices
801: $i,j,k$ and $26$. Thus $\,[\, i \,j \, k \,]\,$ is a linear
802: form in the $z_{ij}$ whose coefficients are homogeneous
803: polynomials of degree six in the $f_{ij}$.
804: The matrix gotten by this substitution is
805: denoted $\,{\bf B}\bigl(A \cdot (F \,\, z) \bigr)$.
806: Its determinant is the specialized resultant
807: $\,\mathcal{R}_X \bigl( A \cdot (F \,\, z) \bigr)$.
808:
809: \begin{cor}
810: The resultant $\,\mathcal{R}_X \bigl( A \cdot (F \,\, z) \bigr)\,$
811: is a homogeneous polynomial of degree $32$ in the entries $a_{ij}$ of $A$.
812: Its coefficients are polynomials which are bihomogeneous of degree $32$
813: in the $z_{ij}$ and degree $192$ in the $f_{ij}$.
814: The two-locus model is cut out by this finite list of coefficient polynomials
815: in the $z_{ij}$ and $f_{ij}$. \end{cor}
816:
817: \noindent {\sl Proof. }
818: Each entry of the $32 \times 32$-matrix
819: $\,{\bf B}\bigl(A \cdot (F \,\, z) \bigr)\,$ is
820: a polynomial which is trihomogeneous of degree
821: $(1,6,1)$ in $(a_{ij},f_{ij},z_{ij})$. Hence its determinant
822: is trihomogeneous of degree $(32,192,32)$.
823: For fixed $A$ and fixed $F$, the resulting polynomial
824: defines a hypersurface of degree $32$ in $P^{24}$.
825: This hypersurface is the inverse image of the
826: surface $AF(X)$ in $P^3$. As discussed in the
827: proof of Theorem \ref{thirtytwo}, our model is
828: the intersection of these hypersurfaces
829: for all possible choices of $A$. A finite basis for
830: the linear system of these hypersurfaces
831: is given by the coefficient polynomials
832: of $\,\mathcal{R}_X \bigl( A \cdot (F \,\, z) \bigr)\,$
833: with respect to $A$. \qed
834:
835: The finite list of algebraic invariants described in the
836: previous corollary is the two-locus generalization
837: of the one-locus invariant in Proposition
838: \ref{twolocusprop}.
839: Note that the bidegree in $(F,z)$ has now
840: increased from $(4,8)$ to $(32,192)$.
841: Our derivation of these invariants
842: from the Chow form of a Segre-Veronese variety
843: generalizes to the $k$-locus case,
844: where $F$ and $z$ are $k$-dimensional tables of format $3 \times 3 \times \cdots \times 3$.
845: The analogous invariants have bidegree
846: $\,\bigl( \,k ! \, 4^k,\, 2 (k+1)! \, 4^k \,\bigr) \,$ in $(z,F)$.
847:
848: \section{Computational experiments and statistical perspectives}
849: We prepared a test implementation in {\tt maple} of the elimination
850: technique described in the previous section. That code is available
851: at the first author's website {\tt www.stat.berkeley.edu/$\sim$ingileif/}.
852: The input is a triple
853: $\bigl((f_{ij}), (z_{ij}),A\bigr) $ consisting of
854: a $3 \times 3$-matrix of model characteristics,
855: a $3 \times 3$-matrix of model coordinates.
856: and a projection matrix of size $4 \times 9$.
857: Each entry in these input matrices can be either
858: left symbolic or it can be specialized to a number.
859: Our program builds the specialized B\'ezout matrix
860: $\,{\bf B}\bigl(A \cdot (F \,\, z) \bigr)$, and, if the
861: matrix entries are purely numeric, then
862: it evaluates the determinant $\,\mathcal{R}_X \bigl( A \cdot (F \,\, z) \bigr)$.
863:
864: Here are some examples of typical
865: computations with our {\tt maple} program.
866: Set \vskip -0.4cm
867: \begin{tabbing}
868: $\quad$ \= $z_{00} = 3 \quad$ \= $z_{01} = 3 \quad$ \= $z_{02} = 5 \quad$ \= $\quad$ \= $f_{00} = 32 \quad$ \= $f_{01} = 21 \quad$ \= $f_{02} = 48 \quad$ \\
869: \> $z_{10} = 29$ \> $z_{11} = 11$ \> $z_{12} = 13$ \> $\quad$ \> $f_{10} = 14$ \> $f_{11} = 27$ \> $f_{12} = 39$ \\
870: \> $z_{20} = 17$ \> $z_{21} = 19$ \> $z_{22} = 23$ \> $\quad$ \> $f_{20} = 36$ \> $f_{21} = 19$ \> $f_{22} = 22$ \\
871: \end{tabbing}
872: \vskip -0.4cm
873: $$ \hbox{and}
874: \qquad \qquad A \,\,\, = \,\,\,
875: \left( \begin{array}{cccccccccc}
876: 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
877: 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\
878: 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\
879: 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\
880: \end{array} \right). \qquad \qquad \qquad \qquad $$
881: Then $\,{\bf B}\bigl(A \cdot (F \,\, z) \bigr)$ is a $32 \times 32$-matrix whose
882: entries $b_{i,j}$ are integers, e.g.,
883: $$ b_{1,1} = 26967093018624, \,\,b_{1,2} = -114552012275712, \ldots, \,\,b_{32,32} = 845647773696. $$
884: The determinant of this $32 \times 32$-matrix is a non-zero integer with $469$ digits:
885: $$ \mathcal{R}_X \bigl( A \cdot (F \,\, z) \bigr) \,\, = \,\,
886: 0.2704985126... \cdot 10^{469}. $$
887: We now retain the numerical values for the model characteristics $f_{ij}$
888: and the matrix $A$ from before but we make the model
889: coordinates $z_{ij}$ indeterminates. Then
890: $\,{\bf B}\bigl(A \cdot (F \,\, z) \bigr)$ is a $32 \times 32$-matrix whose
891: entries $b_{i,j}$ are linear forms
892: \vskip -0.4cm
893: \begin{eqnarray*}
894: b_{1,1} &\,\, = \,\, & -2630935904256 \, z_{00}+1315467952128 \, z_{01} \\
895: & & +1315467952128 \, z_{10}-657733976064 \, z_{11} \\
896: b_{1,2} &\,\, = \,\,& 11746198683648 \, z_{00}-8211709034496 \, z_{01} \\
897: & & -5873099341824 \, z_{10}+4105854517248 \, z_{11}\\
898: & & \qquad \dots \quad \dots \quad \dots \quad \dots \quad \dots
899: \end{eqnarray*} \vskip -0.3cm
900: Its determinant $\mathcal{R}_X \bigl( A \cdot (F \,\, z) \bigr)$ is an irreducible
901: polynomial of degree $32$
902: which vanishes on the model with the given characteristics $f_{ij}$.
903: In fact, up to scaling, it is the unique such polynomial
904: which depends only on $\,z_{00},z_{01},z_{10}$ and $z_{11}$.
905:
906: Finally, we reverse the role of the coordinates $z_{ij}$
907: and the characteristics $f_{ij}$, namely, we fix the former
908: at their previous numerical values $(z_{00} =3,\ldots,z_{22} = 22)$
909: but we regard the $f_{ij}$ as indeterminates. Then $\,{\bf B}\bigl(A \cdot (F \,\, z) \bigr)$
910: is a $32 \times 32$-matrix whose entries $b_{i,j}$ are
911: homogeneous polynomials of degree six, e.g.,
912: \vskip -0.3cm
913: \begin{eqnarray*}
914: b_{1,1} &\,\, =\,\, & \quad 671744 \, f_{00}^6-1343488 \, f_{00}^5 f_{01}-1343488 \, f_{00}^5 f_{10}\\
915: & & + \, 671744 \, f_{00}^4 f_{01}^2 + 2686976 \, f_{00}^4 f_{01} f_{10}+671744 \, f_{00}^4 f_{10}^2 \\
916: & & - \, 1343488 \, f_{00}^3 f_{01}^2 f_{10}-1343488 \, f_{00}^3 f_{01} f_{10}^2 + 671744 \,f_{00}^2 f_{01}^2 f_{10}^2.
917: \end{eqnarray*}
918: Now $\mathcal{R}_X \bigl( A \cdot (F \,\, z) \bigr)$ is an irreducible homogeneous
919: polynomial of degree $192$ in the nine characteristics $f_{ij}$.
920: The vanishing of this polynomial provides an algebraic constraint on the
921: set of all models $(f_{ij})$ which fit the given data $(z_{ij})$.
922:
923:
924: In linkage analysis, the characteristics $f_{ij}$ can take on any
925: real value between $0$ and $1$.
926: %(here between $0$ and $100$ for numerical reasons).
927: Two-locus models are often constructed by
928: first picking two one-locus characteristics, $g=(g_0, g_1, g_2)$ and $h=(h_0, h_1, h_2)$, from a class of special models such as recessive or dominant.
929: Then the two-locus model is defined by combining the one-locus characteristics in one of the following ways:
930: \begin{center}
931: \begin{tabular}{rcl}
932: {\it multiplicative} &:& $f_{ij} \,=\, g_i \cdot h_j$ \\
933: {\it heterogeneous} &:& $f_{ij} \,=\, g_i + h_j -g_i\cdot h_j$ \\
934: {\it additive} &:& $f_{ij} \,=\, g_i + h_j$ \\
935: \end{tabular}
936: \end{center}
937: The $9 \times 25$-matrix $F$ of the multiplicative model
938: is the tensor product of the two $3 \times 5$-matrices gotten
939: from $g$ and $h$ as in Proposition \ref{matrixform}.
940: Hence the surface of the multiplicative model is the
941: \emph{Segre product} of two one-locus curves.
942: The heterogeneous model and the additive model are too special,
943: in the sense that the corresponding surfaces in $P^8$ have degree
944: less than $32$. In these two cases, the resultant
945: $\,\mathcal{R}_X \bigl( A \cdot (F \,\, z) \bigr)\,$ vanishes
946: identically, and our {\tt maple} code always outputs zero.
947: The surfaces arising from these two models require
948: a separate algebraic study. Conducting this study could be
949: a worthwhile next step.
950:
951: The following two-locus analogue to Holmans' triangle (the smaller triangle
952: in Figure \ref{fig:holmans}) was derived in~\cite{olof}. For affected sibling pairs the IBD sharing probabilities $ \,z = (z_{00}, z_{01}, \ldots, z_{22})\,$
953: satisfy
954: $\, H \cdot z^T \geq 0 \,$ where $H$ is the inverse of $K^{\otimes 2}$ and
955: \vskip -0.5 cm
956: \begin{eqnarray*}
957: K &\,\,\, = \,\,\, & \frac{1}{4}
958: \left( \begin{array}{rrr}
959: 1 & 0 & 0 \\
960: 2 & 2 & 0 \\
961: 1 & 2 & 4 \\
962: \end{array} \right)
963: \end{eqnarray*}
964: \vskip -0.5 cm
965: So, in practical applications we are only interested in the
966: intersection of our degree $32$ surface with the $8$-simplex defined by
967: these linear inequalities.
968:
969: In summary, in this paper we have presented a model for the sharing
970: of genetic material of two affected siblings, used in genetic linkage
971: analysis, in the framework of algebraic geometry.
972: The model is rich in structure, but this
973: structure is not yet fully exploited in statistical tests for genetic linkage.
974: For plausible biological models we expect to see increased sharing between
975: affected sibling pairs at gene loci linked to the disease.
976: The null hypothesis for linkage is rejected only if the estimate
977: of the model coordinates, $z$, differs significantly from $z_{null}$.
978: This is a geometric statement about the
979: distance between two points in a triangle (for $k=1$) or
980: in an $8$-simplex (for $k=2$). We believe that the algebraic
981: representation of the model derived here will be useful for
982: deriving new test statistics for linkage in the case when $k \geq 2$.
983:
984: \section{Acknowledgements}
985: We thank Lior Pachter and Terry Speed for
986: reading the manuscript and providing useful
987: comments. We are grateful to Amit Khetan
988: for helping us with the {\tt maple} implementation
989: of the B\'ezout resultant. Bernd Sturmfels was supported
990: by the Hewlett Packard Visiting Research Professorship 2003-04
991: at MSRI~Berkeley and
992: the National Science Foundation (DMS-0200729).
993:
994:
995: \begin{thebibliography}{9}
996: %\bibitem{allman} Elizabeth S. Allman
997: %and John A. Rhodes: Phylogenetic invariants for the general Markov model
998: %of sequence mutation, {\em Math. Biosci.} 186(2) pp.113-144.
999: \bibitem{olof} Olof Bengtsson: {\em Two-Locus Affected Sib-Pair
1000: Identity By Descent Probabilities} (Licentiate Thesis,
1001: Dept. of Mathematical Statistics, G\"{o}teborg Univ., 2001).
1002: \bibitem{botrisch} David Botstein and Neil Risch: Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease, {\em Nature Genetics supplement} {\bf 33} (2003) 228-237.
1003: \bibitem{DalStu} John Dalbec and Bernd Sturmfels:
1004: Introduction to Chow forms,
1005: in {\sl ``Invariant Methods in Discrete and Computational Geometry''}
1006: [N.~White, ed.], Proceedings Curacao (June 1994), Kluwer
1007: Academic Publishers, 1995, pp.~37--58.
1008: \bibitem{DicEmi} Alicia Dickenstein and Ioannis Emiris:
1009: Multihomogeneous resultant formulae by means of complexes,
1010: {\em J.~Symbolic Computation} {\bf 36} (2003) 317--342.
1011: \bibitem{ds} Sandrine Dudoit and Terence P. Speed: A score test
1012: for the linkage analysis of qualitative and quantitative
1013: traits based on identity by descent data from
1014: sib-pairs, {\em Biostatistics} {\bf 1} (2000) 1-26.
1015: \bibitem{elston} Robert C. Elston: Statistical Genetics '98,
1016: Methods of Linkage Analysis-and the Assumptions Underlying Them,
1017: {\em Am.~J.~Hum.~Genet.} {\bf 63} (1998) 931-934
1018: \bibitem{gss} Luis Garcia, Michael Stillman and Bernd Sturmfels:
1019: Algebraic geometry of Bayesian networks, {\em J.~Symbolic Computation},
1020: to appear.
1021: \bibitem{holmans} Peter Holmans: Asymptotic properties of affected
1022: sib-pair linkage analysis, {\em Am.J.Hum.Genet.} {\bf 52} (1993) 362-374.
1023: \bibitem{lander} Eric Lander and Nicholas Schork: Genetic dissection of
1024: complex traits. {\em Science} {\bf 265} (1994) 2037-2048.
1025: \bibitem{ott} Jurg Ott: {\em Analysis of Human Genetic Linkage},
1026: Johns Hopkins Univ.Press, 1991.
1027: \bibitem{prw} Giovanni Pistone, Eva Riccomagno and Henry Wynn:
1028: {\em Algebraic Statistics}, Chapman \& Hall, New York. 2001.
1029: %\bibitem{sham} Pak Sham: {\em Statistics in Human Genetics},
1030: %Arnold Appl. of Statistics, 1998.
1031: \bibitem{StuSanDiego} Bernd Sturmfels: Introduction to
1032: resultants, in: D.~Cox, B.~Sturmfels (eds.),
1033: {\sl Applications of Computational Algebraic Geometry},
1034: Proceedings of Symp.~in Applied Math., {\bf 53},
1035: American Mathematical Society, 1997, pp.~25--39.
1036: \bibitem{stbook} Bernd Sturmfels: {\em Solving Systems of
1037: Polynomial Equations}, American Mathematical Society,
1038: CBMS Regional Conferences Series, No.~97, Providence, Rhode Island, 2002.
1039: \end{thebibliography}
1040:
1041: \end{document}
1042:
1043: