0405:q-bio0405001/paper.tex

1: \documentclass{elsart}

2: \usepackage{graphicx}

3: \usepackage{epsfig}

4:

5: \begin{document}

6: \begin{frontmatter}

7: \title{Resultants in Genetic Linkage Analysis}

8: \author{Ingileif B. Hallgr\'{\i}msd\'ottir}

9: \address{Department of Statistics, University of California, Berkeley}

10: \author{Bernd Sturmfels}

11: \address{Department of Mathematics, University of California, Berkeley}

12:

13: \begin{abstract}

14: Statistical models for genetic linkage analysis of $k$ locus diseases

15: are $k$-dimensional  subvarieties of a $(3^k-1)$-dimensional

16: probability simplex.  We determine the algebraic invariants of

17: these models with general characteristics for $k=1$,

18: in particular we recover, and generalize, the Hardy-Weinberg curve.

19: For $k = 2$, the algebraic invariants are presented as determinants of

20: $32 \times 32$-matrices of linear forms in $9$ unknowns, a suitable

21: format for computations with numerical data.

22: \end{abstract}

23: \end{frontmatter}

24:

25: \section{Introduction}

26:

27: Most common diseases have a genetic component.  The first step

28: towards understanding a genetic disease is to identify the genes

29: that play a role in the disease etiology. Genes are identified by their

30: location  within the genome.  \emph{Genetic linkage analysis}, or gene

31: mapping \cite{ds,holmans,lander,ott},

32: is concerned with this problem of finding the chromosomal

33: location of  disease genes.  Over 1,200 disease genes

34: for have been successfully mapped~\cite{botrisch}, and this

35: has led to a much better understanding of

36: Mendelian (one gene) disorders. Most common diseases are,

37: however, not caused by one gene but by  $k \geq 2$ genes.

38: The challenge today is to understand complex diseases

39: (such as cancer, heart disease and diabetes) which are caused

40: by many interacting genes and environmental factors.

41:

42: The human genome has approximately 25,000 genes.  Genes encode

43: for proteins, and proteins perform all the cellular functions

44: vital to life.  We all have the same set of genes, but there are

45: many variants of each gene, called \emph{alleles}.  Usually these

46: variants all produce a functional protein, but a mutation in a

47: gene can change the protein product of the gene, and this may

48: result in disease. Since mutations are rare, two affected

49: siblings who have the same genetic disease probably inherited

50: the same mutation from a parent.  Genetic linkage analysis makes

51: use of this fact: one tries to locate disease genes by identifying

52: regions in the genome that display statistically significant

53: increased sharing across a sample of affected relatives, such as

54: sibling pairs~\cite{elston}.

55:

56: The statistical models used in genetic linkage analysis are

57: algebraic varieties. The given data are $k$-dimensional tables

58: of format $3 \times 3 \times \cdots \times 3$. As usual in

59: algebraic statistics (\cite{gss}, \cite{prw}, \cite[\S 7]{stbook}),

60: there is one \emph{model coordinate} $z_{i_1 i_2 \cdots i_k}$ for each cell

61: entry, where $i_1 ,i_2 ,\ldots,i_k \in \{0,1,2\}$.  This coordinate

62: represents the probability that for an affected sibling pair

63: the IBD sharing (see section 2) at the first locus is $i_1$, the IBD

64: sharing at the second locus is $i_2$, etc.

65: The model is a subvariety of the probability simplex with

66: these coordinates. It is $k$-dimensional, because the

67: $z_{i_1 i_2 \cdots i_k}$ are given as  polynomials

68: in $k$ \emph{model parameters} $\,p_1,p_2,\ldots,p_k$.

69: Here $p_j$ represents the frequency  of the disease allele

70: at the $j$-th locus. We consider an infinite family of

71: models which depends polynomially on $3^k$

72: \emph{model characteristics} $f_{i_1 i_2 \cdots i_k}$.

73: The characteristic $\,f_{i_1 i_2 \cdots i_k}\,$ represents

74: the probability that an individual who has $i_j$ copies

75: of the disease gene at the $j$-th locus will get affected.

76: Note  that the parameters $p_i$ and the characteristics

77: $f_{i_1 i_2 \cdots i_k} $

78: are unknown, but we might be interested in estimating

79: them from the given data~$z$.

80:

81: This paper is organized as follows.  Section~\ref{oneloc} contains

82: a self-contained derivation of the models in the one-locus

83: case $(k=1)$. Here the models are curves in a triangle with

84: coordinates $(z_0,z_1,z_2)$. For general characteristics, $(f_0,f_1,f_2)$,

85: the curve has degree four. In Section~3 we compute

86: its defining polynomial, a big expression in $z_0,z_1,z_2,f_0,f_1,

87: f_2$. This is done by elimination using the univariate

88: B\'ezout resultant.  We discuss what happens for special

89: choices of characteristics

90: which have been studied in the genetics literature.

91:

92: In Section~4 we derive the parametrization of

93: the linkage models for $k \geq 2$.  In the two-locus

94: case $(k=2)$, the models are surfaces in the space of

95: nonnegative $3 \times 3$-tables $(z_{ij})$ whose entries

96: sum to one. For general characteristics $(f_{ij})$,

97: the surface has degree $32$.  In Section~\ref{surf}

98: we apply Chow forms to derive a system

99: of \emph{algebraic invariants}.  These are

100: the polynomials which cut out the surface.  Each

101: invariant is presented  as the

102: determinant of a $32 \times 32$-matrix whose

103: entries are linear forms in the $z_{ij}$ whose

104: coefficients depend on the $f_{ij}$.  We argue that

105: this format is suitable  for statistical analysis

106: with numerical data. Computational issues and further

107: directions   are discussed in Section~6.

108:

109: \section{Derivation of the One-Locus Model}

110: \label{oneloc}

111:

112: The genetic code, the blueprint of life, is stored in our genome.

113: The genome is arranged into chromosomes which can be thought of as

114: linear arrays of genes.  The human genome has two copies of

115: each chromosome, with 23 pairs of chromosomes,  22 autosomes

116: and the sex chromosomes X and Y (women have XX and men XY).

117: Each parent passes one copy of each chromosome to a child.

118: A chromosome passed from parent to child is a mosaic

119: of the two copies of the parent, and a point at which the origin of a

120: chromosome changes is called a \emph{recombination}.  This is illustrated in

121: Figure~\ref{fig:sibs}.

122:

123: Between any two recombination sites, the inheritance pattern

124: of the two siblings is constant and is encoded by

125: the {\em inheritance vector} $\,x=(x_{11}, x_{12},

126: x_{21}, x_{22})$. The entry $x_{kj}$ is the label of

127: the chromosome segment that sibling $k$ got from parent $j$.

128: If we label the paternal chromosomes with $1$ and $2$ and the

129: maternal chromosomes with $3$ and $4$, then

130: $x_{11}, x_{21} \in \{1,2\}$ and $x_{12}, x_{22} \in \{3,4\}$, so

131: there are 16 possible inheritance vectors $x$.

132: They come in three classes:

133: \begin{eqnarray*}

134: C_0 \quad & = \quad &

135:  \bigl\{ \,

136: (1,3,2,4),\,

137: (1,4,2,3),\,

138: (2,3,1,4),\,(2,4,1,3)  \, \bigr\}, \\

139: C_1 \quad & = \quad &

140:  \bigl\{ \,

141: (1,3,1,4),\,

142: (1,4,1,3),\,

143: (2,3,2,4),\,

144: (2,4,2,3),\, \\ & & \,\,\,\,

145: (1,3,2,3),\,

146: (2,3,1,3),\,

147: (1,4,2,4),\,

148: (2,4,1,4)\, \bigr\} , \\

149: C_2 \quad &  = \quad &

150:  \bigl\{ \,

151: (1,3,1,3), \,

152: (1,4,1,4), \,

153: (2,3,2,3), \,

154: (2,4,2,4)\, \bigr\} .

155: \end{eqnarray*}

156:

157: We say that two siblings share genetic material, at a locus,

158: identical by descent (IBD) if it originated from the same parent.

159: The IBD sharing at a locus can be 0, 1 or 2, where the inheritance

160: vectors in $C_i$ correspond to IBD sharing of $i$.  Since at a

161: random locus in the genome each inheritance vector is equally likely

162: the IBD sharing is 0, 1 or 2 with probabilities $1/4$, $1/2$ and $1/4$.

163:

164: \begin{figure}[h]

165:   \begin{center}

166:     \leavevmode

167:    \epsfig{file=sibs_sharing.eps, height=8cm}

168:     \caption{An example of the inheritance of one chromosome pair in parents and a sibling pair.   Squares represent males and circles females.}

169:     \label{fig:sibs}

170:   \end{center}

171: \end{figure}

172:

173: Each individual has two alleles, i.e. two copies of every gene, one on

174: each chromosome.  A \emph{genotype} at a locus is the unordered pair

175: of alleles.  We are

176: only concerned with whether one carries an allele that predisposes

177: to disease, which we call $d$, or a normal allele, called $n$.

178: The set of possible genotypes at a disease locus is

179:  $\,G=\{nn, nd, dn, dd\}$.

180:

181: Let $p$ denote the frequency of the disease allele

182: $d$ in the population. This quantity is

183: our {\em model parameter}. We assume Hardy-Weinberg equilibrium:

184: $$ \hbox{$Pr(nn)=(1-p)^2, \, Pr(nd)=p(1-p), \, Pr(dn)=p(1-p)$ and $Pr(dd)=p^2$.}$$

185: A disease model is specified by

186: $f = (f_0,f_1,f_2)$, where $f_i$ is the probability

187: that an individual is affected with the disease, given

188: $i$ copies of the disease allele,

189: \vskip -0.3cm

190: \begin{eqnarray*}

191: f_0 &\,=\,& Pr(\mbox{affected} \,|\, nn), \quad f_2 \,=\, Pr(\mbox{affected} \,|\, dd), \\

192: f_1 &\,=\,& Pr(\mbox{affected} \,|\, nd ) \,= \, Pr(\mbox{affected} \,|\, dn).

193: \end{eqnarray*}

194: The quantities $f_i$ are known as {\em penetrances} in the

195: genetics literature. In this paper, we call them

196: {\em  model characteristics} to emphasize their algebraic role.

197:

198: The {\em coordinates} of a disease model are $z = (z_0,z_1,z_2)$, where

199:  $z_i$ is the probability that the IBD sharing for an affected sibling pair

200: is $i$ at a given locus,

201: $$ z_i \,\, = \,\,

202: Pr(\mbox{IBD sharing}=i \,|\, \mbox{both sibs affected}), \quad i=0,1,2. $$

203: Then, as was stated above, at a random locus not linked to

204: the disease gene the distribution is $z_{null}=(1/4,1/2,1/4)$.

205: Data for linkage analysis are collected from a sample of $n$

206: siblings (and parents) as follows.

207: The marker information is used to infer the IBD sharing at

208: each marker locus for each sibling pair and

209: at any particular locus, one uses the

210: vector $(n_0,n_1,n_2)$, where $n_i$ is the number of

211: sibling pairs whose inferred IBD sharing is $i$ at the locus.

212: Each such data point determines an empirical distribution

213: $$ \hat{z} \,\, = \,\, (\hat{z}_0,\hat{z}_1,\hat{z}_2)  \,\, = \,\, (n_0/n,n_1/n,n_2/n) \, , \qquad \hbox{where}  \,\,\,\,

214: n_0+n_1+n_2 = n.  $$

215: The objective is to look for regions in the genome where $\,\hat{z}\,$

216: deviates significantly from  $\,z_{null} = (1/4, 1/2, 1/4)$.

217: Such regions may be linked to the disease.

218:

219: The one-locus model is given by expressing the coordinates

220:  $(z_0,z_1,z_2)$ as polynomial functions of

221: the parameter $p$ and the characteristics $f_0,f_1,f_2$.

222: These polynomials are derived as follows. Consider

223: the set of events $\,\mathcal{E}_i \,=\, C_i \times G \times G\,$ for

224: $i=0,1,2$.

225: Each event in $\mathcal{E}_i$ consists of

226: an inheritance vector, a genotype for

227: the mother and a genotype for the father.

228: This triple determines the total number $m$

229: of disease alleles carried by the parents

230: and the numbers $k_1$ and $k_2$ of disease alleles

231: carried by the two siblings.

232: The probability of the event is

233: $$

234: f_{k_1} f_{k_2} p^m q^{4-m} \,, \quad \quad

235: \hbox{where $q = 1-p$.}

236: $$

237: Then, up to a global normalizing constant,

238: the IBD sharing probability $z_i$ is the sum over

239: all events in $\mathcal{E}_i$ of the monomials

240: $\,f_{k_1} f_{k_2} p^m q^{4-m}$.

241: Hence $z_0$ is a sum of $|\mathcal{E}_0| = 64$ monomials,

242: $z_1$ is a sum of $ 128$ monomials,

243: and $z_2$ is a sum of $ 64$ monomials.

244: But these monomials are not all distinct.

245: For instance, all four elements of

246: $\, C_0 \times \{nn\} \times \{nn\}\,\subset \,\mathcal{E}_0\,$

247: contribute the same monomial $\, f_0^2 q^4\,$ to $z_0$.

248: By explicitly listing all events in

249: $\mathcal{E}_0, \mathcal{E}_1$ and $ \mathcal{E}_2$,

250: we get the following result.

251:

252: \begin{prop} \label{matrixform}

253: The coordinates $z_i$ of the one-locus model

254: are homogeneous polynomials of bidegree

255: $(2,4)$ in the characteristics

256: $(f_0,f_1,f_2)$ and the parameters $(p,q)$.

257: The column vector $(z_0,z_1,z_2)^T$ equals

258: the matrix-vector product

259: \vskip -.4cm

260:  \begin{eqnarray*}

261: \!\!\!\!\!\!

262:    \left( \begin{array}{ccccc}

263: 4f_0^2 & 16f_0f_1 & 8f_0f_2+16f_1^2 & 16f_1f_2 & 4f_2^2 \\

264: 8f_0^2 & 8(f_0^2 \!+\! 2f_0f_1 \!+\! f_1^2) &

265: 16 (f_0f_1\!+ \!f_1^2 \! + \! f_1f_2) &

266: 8(f_1^2\!+\!2f_1f_2\!+ \!f_2^2) & 8f_2^2 \\

267: 4f_0^2 & 8f_0^2+8f_1^2 & 4f_0^2+16f_1^2+4f_2^2 & 8f_1^2+8f_2^2 & 4f_2^2

268:  \end{array} \right)

269: \!\!

270: \left( \begin{array}{l}

271: q^4\\

272: pq^3\\

273: p^2q^2 \! \\

274: p^3q\\

275: p^4

276: \end{array} \right)

277:  \end{eqnarray*}

278: \end{prop}

279:

280: Proposition \ref{matrixform} says that

281: the  one-locus model has the form

282: \begin{equation}

283: \label{zFq} (z_0,z_1,z_2)^T \,\, = \,\, F \cdot

284: (q^4, pq^3, p^2q^2, p^3q, p^4)^T ,

285: \end{equation}

286: where $F$ is a $3 \times 5$-matrix

287: whose entries are quadratic polynomials

288: in the penetrances $f_i$. The resultant

289: computation to be described in the

290: next section works for any model of this form,

291: even if the matrix $F$ were more complicated.

292:

293: \section{Curves in a Triangle}

294:

295: Suppose that we fix the model characteristics

296: $f_0,f_1,f_2$ and hence the matrix $F$.

297: Then (\ref{zFq}) defines a curve in the projective

298: plane with coordinates $(z_0:z_1:z_2)$. The positive

299: part of the projective plane is identified with the

300: triangle

301: \begin{equation}

302: \label{bigtriangle}

303:  \bigl\{\,

304: (z_0,z_1,z_2) \,:\,

305: z_0,z_1,z_2 \geq 0 \,\,\, \hbox{and} \,\,\,

306:  z_0+z_1+z_2 = 1 \,\,\bigr\}.

307: \end{equation}

308: The one-locus model with characteristics

309: $f_0,f_1,f_2$ is the intersection of the curve

310: with the triangle. We are interested in its

311: defining polynomial.

312:

313: \begin{prop} \label{twolocusprop}

314: For general characteristics $f_0,f_1,f_2$,

315: the one-locus model is a plane curve of degree four.

316: The defining polynomial of this

317: curve equals

318: \vskip -.3cm

319: \begin{eqnarray*}

320: I(z_0,z_1,z_2) &\,\,=\,\,&

321:  a_{1} z_0^3 z_2

322: + a_{2} z_0^2 z_1^2

323: + a_{3} z_0^2 z_1 z_2

324: + a_{4} z_0^2 z_2^2

325: + a_{5} z_0 z_1^3\\

326: & &

327: + \, a_{6} z_0 z_1^2 z_2

328: + a_{7} z_0 z_1 z_2^2

329: + a_{8} z_0 z_2^3

330: + a_{9} z_1^4\\

331: & &

332: + \, a_{10} z_1^3 z_2

333: + a_{11} z_1^2 z_2^2

334: + a_{12} z_1 z_2^3

335: + a_{13} z_2^4,

336: \end{eqnarray*}

337: where each $a_i$ is a polynomial  homogeneous

338: of degree eight in $(f_0,f_1,f_2)$.

339: \end{prop}

340:

341: This proposition is proved by

342: an explicit calculation. Namely,

343: the invariant $I(z_0,z_1,z_2)$ is gotten by

344: eliminating $p$ and $q$

345: from the three equations in (\ref{zFq}).

346: This is done using the \emph{B\'ezout resultant}

347: (\cite[Theorem 2.2]{StuSanDiego},

348: \cite[Theorem 4.3]{stbook}).

349: Specifically, we are using the

350: following $4 \times 4$-matrix from

351: \cite[Equation (1.5)]{StuSanDiego}:

352: \begin{equation}

353: \label{bezout}

354: \qquad \qquad B \,\,\, = \,\,\,

355:  \left( \begin{array}{cccccccc}

356: & [12] & & [13]      & [14]      & & [15] & \\

357: & [13] & & [14] \! + \! [23] & [15] \! + \! [24] &  & [25] & \\

358: & [14] & & [15] \! + \! [24] & [25] \! + \! [34] & & [35] & \\

359: & [15] & & [25]      &  [35] & & [45] &

360: \end{array} \right).

361: \end{equation}

362:

363: The determinant of this matrix is the {\em Chow form} \cite{DalStu}

364: of the curve in projective $4$-space $P^4$ which is parameterized

365: by the vector of monomials $(q^4,pq^3,p^2 q^2, p^3 q, p^4)$.

366: We are interested in the curve in the projective plane $P^2$

367: which is the image of that monomial curve under the linear map from

368: $P^4$ to $P^2$ given by the matrix $F$. Section 2.2 in  \cite{DalStu}

369: explains how to compute the image under a linear map of a variety

370: that is presented by its Chow form. Applying the method described

371: there means replacing the bracket $\, [i \, j]\,$ by

372: the $3 \times 3$-subdeterminant with column indices

373: $i$, $j$ and $6$ in the  matrix

374: from Proposition \ref{matrixform} augmented by $z$:

375: $$\! (F,z) =

376:    \left( \begin{array}{ccccccc}

377: \! 4f_0^2 & 16f_0f_1 & 8f_0f_2+16f_1^2 & 16f_1f_2 & 4f_2^2 && z_0 \\

378: \! 8f_0^2 & 8(f_0^2 \!+\! 2f_0f_1 \!+\! f_1^2) &

379: 16 (f_0f_1\!+ \!f_1^2 \! + \! f_1f_2) &

380: 8(f_1^2\!+\!2f_1f_2\!+ \!f_2^2) & 8f_2^2 & & z_1 \\

381: \! 4f_0^2 & 8f_0^2+8f_1^2 & 4f_0^2+16f_1^2+4f_2^2 & 8f_1^2+8f_2^2 & 4f_2^2

382: & & z_2

383:  \end{array} \right)

384: $$

385: The desired algebraic invariant equals

386: (up to a factor) the determinant of $\,B$:

387: \begin{equation}

388: \label{formulaforcurve}

389:   I(z_0,z_1,z_2)

390:  \,\, = \,\,

391: 2^{-16} f_0^{-2} f_2^{-2} (f_0 - 2 f_1 + f_2)^{-4} \cdot {\rm det}(B).

392: \end{equation}

393:

394:

395: If the  characteristics $f_0,f_1,f_2$

396: are arbitrary real numbers between

397: $0$ and $1$ then the polynomial $\,I(z_0,z_1,z_2) \,$

398: is irreducible of degree four and its

399: zero set is precisely the model.

400: For some special choices of characteristics $f_i$, however,

401: the polynomial $I(z_0,z_1,z_2)$ may become reducible

402: or it may vanish identically.

403:  In the reducible case,

404: the defining polynomial is one of the factors.

405:   Consider the following special

406: models which are commonly used in genetics:

407: \begin{center}

408: \begin{tabular}{rcccc}

409:  & & $f_0$ &   $f_1$ &   $f_2$ \\

410: {\it dominant} &:&  0  & $f$ & $f$ \\[-2mm]

411: {\it additive} &:&  $0$ &  $f/2$ & $f$ \\[-2mm]

412: {\it recessive} &:& 0 & 0 & $f$ \\[-2mm]

413: \end{tabular}

414: \end{center}

415: Here $0 < f < 1$.  For the {\em dominant model}

416: our invariant specializes to

417: $$ I(z_0,z_1,z_2) \,\, = \,\,

418: 4 f^8 (z_1-z_0-z_2)

419: (\underline{z_1^2 z_0-8 z_1 z_0 z_2

420: +4 z_1 z_2^2+4 z_0^2 z_2+4 z_0 z_2^2-4 z_2^3}),

421: $$

422: and the defining polynomial of the model is the underlined cubic factor.

423:

424: For the {\em additive model}

425: our invariant specializes to

426: $$

427: I(z_0,z_1,z_2) \,\, = \,\,

428: \frac{f^8}{2^4}(z_1^2+2 z_1 z_2-8 z_0 z_2+z_2^2)

429: (\underline{z_1-z_0-z_2})^2 ,

430: $$

431: and the defining polynomial of the model is the underlined linear factor.

432:

433: It can be shown that $\,I(z_0,z_1,z_2)\,$ vanishes identically if and only if

434: $$ f_0=f_1=0 \quad \hbox{or} \quad

435: f_1 = f_2 = 0 \quad \hbox{or} \quad

436: f_0=f_1 = f_2 . $$

437: This includes the {\em recessive model}, which is the familiar

438: Hardy-Weinberg curve:

439: $$z_1^2-4z_0z_2 \,\, = \,\, 0 .$$

440: \vspace{-0.5cm}

441: \begin{figure}[h]

442:   \begin{center}

443:     \leavevmode

444:     \epsfig{file=holmans.ps, height=9cm, angle=270}

445:     \caption{Holmans' triangle.  The larger triangle is the probability simplex, $z_0+z_1+z_2=1$ and the smaller triangle is the possible triangle for sibling pair IBD sharing probabilities.  The curve from (1/4,1/2,1/4) to (0,0,1) is the Hardy-Weinberg (recessive) curve.  The curve from $(1/4,1/2,1/4)$ to $(0,1/2,1/2)$ is the dominant curve and the line between the same points is the additive curve.}

446:     \label{fig:holmans}

447:   \end{center}

448: \end{figure}

449: %\vspace{-0.5cm}

450: \newpage

451: Holmans \cite{holmans} showed that the IBD sharing probabilities

452: for affected sibling pairs must satisfy

453: $\,2 z_0 \leq  z_1 \leq z_0+z_2 $. This means we can restrict our

454: attention to the  smaller triangle (Holmans' triangle)

455: in Figure~\ref{fig:holmans}.

456:  We can graph the curve in the triangle for any choice

457: of model characteristics.

458:  The part of the curve corresponding to values

459: of $p \in [0,1]$ is within the smaller triangle.

460:

461: It is worth noting that not all points $(z_0,z_1,z_2)$ in Holmans'

462: triangle which satisfy the algebraic invariant are in the image of

463: a point $(p,q)$ with real coordinates.

464: Consider e.g. the model with characteristics $f_0=1, f_1=0$ and $f_2=1$

465: and complex parameters $(p,q)$.

466: The real part of the curve corresponding to this model is shown in

467: Figure~\ref{fig:complex}.  Two segments of the curve are within

468: Holmans' triangle, one of which (dotted) corresponds to values

469: $p \in [0,1]$.  The other segment has a complex pre-image.

470:

471: %\vspace{-0.5cm}

472: \vspace{0.3cm}

473: \begin{figure}[h]

474:   \begin{center}

475:     \leavevmode

476:     \epsfig{file=complex.ps, height=8cm, angle=270}

477:     \caption{Holmans' triangle.  The larger triangle is the probability simplex, $z_0+z_1+z_2=1$ and the smaller triangle is the possible triangle for sibling pair IBD sharing probabilities.  The curve corresponds to a model with characteristics $f_0=1, f_1=0$ and $f_2=1$.  The dotted part of the curve is the image of real valued $p$, and the solid part is the image of $\,p=1/2+y\sqrt{-1}$, for a real number $y$.}

478:

479:     \label{fig:complex}

480:   \end{center}

481: \end{figure}

482: %\vspace{-0.5cm}

483:

484: We expressed the IBD sharing of the sibling pair at a gene locus

485: (the model coordinate $z$) as a function of $f_0,f_1,f_2$ and $p$.

486: In practice, however, we get data at \emph{marker loci},

487: regularly spaced across the chromosomes, not at the gene locus.

488: If there has been no recombination between the gene locus

489: and a marker locus then the IBD sharing at the two loci is the same,

490: but different if there has been a recombination in either sibling.

491: Let $\theta$ be the \emph{recombination fraction}

492: between the gene locus and the marker locus. The new parameter

493: $\theta$ depends on the distance between the two loci.  Following~\cite{ds},

494: we can express the IBD sharing probabilities at a marker locus

495: distance $\theta$ away from the gene by the formula

496: \begin{equation}

497: \label{zFthetaq} (z_0,z_1,z_2)^T \,\, = \,\,  F_{\theta} \cdot

498: (q^4, pq^3, p^2q^2, p^3q, p^4)^T ,

499: \end{equation}

500: where $\,F_{\theta} = \Psi F \,$ and

501: \begin{eqnarray*}

502: \Psi \,\,\, = \,\,\,

503: \left( \begin{array}{ccc}

504: \psi^2 & \bar{\psi} \psi & \bar{\psi}^2 \\

505: 2 \bar{\psi} \psi & \psi^2 + \bar{\psi}^2 & 2 \psi \bar{\psi} \\

506: \bar{\psi}^2 & \bar{\psi} \psi & \psi^2

507: \end{array} \right), \quad

508: \hbox{with $\psi = \theta^2 + (1-\theta)^2$

509: and $\bar{\psi} = 1-\psi$.}

510: \end{eqnarray*}

511: One can easily repeat the resultant calculation in

512: Proposition~\ref{twolocusprop} to obtain the equation of the larger family

513: of curves defined by  (\ref{zFthetaq}).  Note that $\theta = 0$ corresponds

514: to the earlier case, and increasing $\theta$  shifts the curve

515: towards $z_{null}$.

516:

517: We close this section with a statistical discussion.

518: We wish to find the gene locus using the inferred IBD

519: sharing at the marker loci.  Since $\theta$ can be thought of

520: as a measure of the distance between the marker locus and the

521: gene locus we wish to estimate $\theta$ at each marker locus.

522: The inferred IBD sharing can be used to obtain an estimate of the

523: model coordinates $z$.  If $p, f_0, f_1$ and $f_2$ are known it is

524: then easy to estimate $\theta$.  However that is rarely the case,

525: and it is impossible to identify all of the unknown quantities

526: $p, f_0, f_1, f_2$ and $\theta$ from the coordinates $z$.

527: Instead the model (\ref{zFq})

528: is applied to biological data as follows.

529: The IBD sharing at the gene locus (and at nearby marker loci)

530: is largest when the disease allele

531: has a strong effect and/or the disease allele is rare, i.e. when

532: $f_0 \leq f_1 \leq f_2$ (and preferably $f_0 \ll f_2$),

533: and $p$ is small.  In these, biologically interesting,

534: situations the data point $\hat{z}$ is clearly different from $z_{null}$.

535: So in practice a test for genetic linkage tests whether $\hat{z}$ is

536: significantly different from $z_{null}$.  A widely used test statistic for

537: linkage is $S_{pairs} = \hat{z}_2+\hat{z}_1/2$ which measures deviations

538: from $z_{null}$ along the line corresponding to the additive model.

539:

540: \section{Derivation of the Two-Locus Model}

541:

542: Many common genetic disorders are caused by not one but many

543: interacting genes.  We now consider the two-locus model, $k=2$,

544: where we assume that two genes cause the disease,

545: independently or together.  We shall assume that the genes are

546: unlinked, i.e., they are either on different chromosomes or

547: far apart on the same chromosome.  The derivation

548: is much like in Section 2.

549:

550: The {\em model parameters} are $p_1$ and $p_2$, where

551: $p_i$ is the  frequency of the disease allele at the $i$th locus.

552:  A two-locus genotype is an

553: element in $G \times G = \{nn, nd, dn, dd\}^2$.

554:  The {\em model  characteristics} are

555: $\,f=(f_{00}, f_{01},

556: \ldots, f_{22})$ where $f_{ij}$,

557: is the probability that an individual is affected with the

558: disease, given $i$ copies of the first disease allele and

559: $j$ copies of the second disease allele:

560: \begin{eqnarray*}

561: f_{00} \,\, &= & \,\, Pr(\,\mbox{affected} \,\,\,|\,\,\, (nn, nn)\,), \\

562: f_{01} \,\, &= & \,\, Pr(\,\mbox{affected} \,\,\,|\,\,\, (nn, nd)\,)  \,\, = \,\, Pr(\,\mbox{affected} \,\,\,|\,\,\,(nn, dn)\,), \\

563: f_{02} \,\, &=& \,\, Pr(\,\mbox{affected} \,\,\,|\,\,\, (nn, dd)\,), \\

564: f_{10} \,\, &=& \,\, Pr(\,\mbox{affected} \,\,\,|\,\,\, (nd, nn)\,) \,\, = \,\, Pr(\,\mbox{affected} \,\,\,|\,\,\, (dn, nn)\,), \\

565: f_{11} \,\, &=& \,\, Pr(\,\mbox{affected} \,\,\,|\,\,\, (nd, nd)\,) \,\,= \dots = \,\,  Pr(\,\mbox{affected} \,\,\,|\,\,\, (dn, dn)\,), \\

566: f_{12} \,\, &=& \,\, Pr(\,\mbox{affected} \,\,\,|\,\,\, (nd, dd)\,) \,\, = \,\, Pr(\,\mbox{affected} \,\,\,|\,\,\, (dn, dd)\,), \\

567: f_{20} \,\, &=& \,\, Pr(\,\mbox{affected} \,\,\,|\,\,\, (dd, nn)\,), \\

568: f_{21} \,\, &=& \,\, Pr(\,\mbox{affected} \,\,\,|\,\,\, (dd, nd)\,) \,\,=\,\, Pr(\,\mbox{affected} \,\,\,|\,\,\, (dd, dn)\,), \\

569: f_{22} \,\, &=& \,\, Pr(\,\mbox{affected} \,\,\,|\,\,\, (dd, dd)\,).

570: \end{eqnarray*}

571: The {\em model coordinates} are

572: $\,z=(z_{00}, z_{01}, z_{02}, z_{10}, z_{11}, z_{12}, z_{20}, z_{21}, z_{22})$,

573: where $z_{ij}$ represents the probability for an affected sibling pair

574:  that the IBD sharing at the first gene locus is $i$,

575:  and $j$ at the second gene locus:

576: \begin{displaymath}

577: z_{ij} \,\, =  \,\, Pr(\,\mbox{IBD sharing}

578: \,\, = \,\, (i, j) \,|\, \mbox{both sibs affected}

579: \,), \qquad i,j = 0,1,2.

580: \end{displaymath}

581: The IBD sharing at two random loci, neither of which

582: linked to the disease genes, is the null hypothesis

583: $\,z_{null}~=~(1/16, 1/8, 1/16, 1/8, 1/4, 1/8, 1/16, 1/8, 1/16)$.

584:

585: The polynomial functions which express the

586: coordinates $z_{ij}$ in terms of $p_1,p_2$ and the

587: $f_{ij}$ are  derived as follows.

588:  We consider the set of events

589: $$

590: \mathcal{E}_i \times \mathcal{E}_j

591: \,\, = \,\,

592: C_i \times G \times G \times C_j \times G \times G

593: \quad \qquad \hbox{for $i,j=0,1,2$}. $$

594:

595: Each event in $\,\mathcal{E}_i \times \mathcal{E}_j \,$ consists of an

596: inheritance vector, the genotype of the father and the genotype

597: of the mother, at each locus.  For a given event we know

598: the total number $m_1$ and $m_2$ of disease alleles

599: carried by the parents at the first and second locus

600: and $k_{11}, k_{12}, k_{21}, k_{22}$, where

601: $k_{ij}$ is the number of disease

602: alleles carried by sibling $i$ at locus $j$.  The probability of the event is

603: \begin{displaymath}

604: f_{k_{11} k_{12}} f_{k_{21} k_{22}} p_1^{m_1}q_1^{4-m_1} p_2^{m_2} q_2^{4-m_2}, \quad \mbox{where} \quad q_1 = 1-p_1 \quad \mbox{and} \quad q_2 = 1-p_2.

605: \end{displaymath}

606: Up to a normalizing constant,

607:  each IBD sharing probability $z_{ij}$ is the

608: sum of the monomials $\,f_{k_{11} k_{12}}

609: f_{k_{21} k_{22}} p_1^{m_1}q_1^{4-m_1} p_2^{m_2} q_2^{4-m_2}\,$

610: over all events in  $\,\mathcal{E}_i \times \mathcal{E}_j $.

611:

612: \begin{prop} \label{matrixform2}

613: The coordinates $z_{ij}$ of the two-locus model

614: are homogeneous polynomials of tridegree

615: $(2,4,4)$ in the characteristics $(f_0,f_1,f_2)$,

616: the  parameters $(p_1,q_1)$ at the first locus, and

617: the  parameters $(p_2,q_2)$ at the second locus.

618: \end{prop}

619:

620: The matrix form of the one-locus model given in

621: Proposition \ref{matrixform} immediately generalizes

622: to the two-locus model. Let $\pi$ denote the

623: column vector whose entries are

624: the $25$ monomials of bidegree $(4,4)$ listed

625: in lexicographic order:

626: $$ \pi \,\, := \,\, \bigl(\,

627:  q_1^4 q_2^4,\,

628:  q_1^4 p_2 q_2^3,\,

629:  q_1^4 p_2^2 q_2^2,\,

630: \ldots\,,\,

631: p_1 q_1^3 q_2^4,\,

632: p_1 q_1^3 p_2 q_2^3,\,

633: \ldots, \,

634: p_1^4 p_2^4 \, \bigr).

635: $$

636:

637: \begin{cor}  \label{ninetwentyfive}

638: The two-locus model has the form

639: $\,z^T = F \cdot \pi  \,$ where

640: $F$ is a $9\times 25$-matrix

641: whose entries are quadratic forms

642: in the characteristics $f_{ij}$.

643: \end{cor}

644:

645: A typical entry in our $9 \times 25$ matrix $F$ looks like

646: $$

647: 32 \cdot ( f_{00}^2 + 2 f_{00} f_{10} + 4 f_{01}^2

648: + 8 f_{01} f_{11} +  f_{02}^2 + 2 f_{02} f_{12}

649: +  f_{10}^2 + 4 f_{11}^2 +  f_{12}^2).  \eqno (*)

650: $$

651: This quadratic form appears in $F$ in row $6$ and column $8$.

652: It is the coefficient of the

653: $8^{th}$ biquartic monomial $\,p_1 q_1^3 p_2^2 q_2^2 \,$  in

654: the expression for the $6^{th}$ coordinate:

655: \vskip -0.3cm

656: \begin{eqnarray*}

657: z_{12} &\quad=\,\,& \,\,\,\,(32 f_{00}^2) \cdot q_1^4 q_2^4  \,\,+\,\,

658: (64 f_{00}^2+64 f_{01}^2) \cdot q_1^4 p_2 q_2^3\\

659: & & + \, (32 f_{00}^2+128 f_{01}^2+32 f_{02}^2) \cdot q_1^4 p_2^2 q_2^2

660: \,+\, \cdots \cdots \, + \\

661: & & +\, (*) \cdot p_1 q_1^3 p_2^2 q_2^2 \,+\, \cdots\,

662: + (64 f_{21}^2+64 f_{22}^2) \cdot p_1^4 q_2 p_2^3

663: \, +\, (32 f_{22}^2) \cdot p_1^4 p_2^4.

664: \end{eqnarray*}

665:

666: \section{Surfaces of degree 32 in the 8-dimensional simplex}

667: \label{surf}

668:

669: Let $\Delta_8$ denote the eight-dimensional probability simplex

670: $$

671: \{\,(z_{00},z_{01}, \ldots , z_{22}) \,\,

672: : \,\,

673: z_{ij} \geq 0 \,\, \mbox{for} \,\, i,j \in \{0,1,2\}

674: \quad \mbox{and} \quad \sum_{i=0}^2 \sum_{j=0}^2 z_{ij} = 1\}.

675: $$

676: Likewise, we consider the

677: product of two $1$-simplices,

678: which is the square

679: $$ \Delta_1 \times \Delta_1 \,\,\, = \,\,\,

680: \bigl\{\, (p_1,q_1,p_2,q_2) \,\,:\,\,

681: p_1,q_1,p_2,q_1 \geq 0 \quad \mbox{and} \quad

682: p_1+q_1 = p_2 + q_2 = 1 \,\bigr\}. $$

683: For fixed $F$,

684: the formula $\,z^T = F \cdot \pi  \,$  in

685: Corollary  \ref{ninetwentyfive}

686: specifies a polynomial map

687: $$ \tilde F \quad : \quad

688: \Delta_1 \times \Delta_1 \,\,\longrightarrow \,\,

689: \Delta_8 \qquad \qquad

690: \mbox{of bidegree $(4,4)$}. $$

691: The image of the map $\tilde F$

692: is the two-locus model

693: for fixed characteristics $f_{ij}$.

694: The model is a surface in the simplex $\Delta_8$.

695: Our goal in this section is

696: to express this surface  as the common zero set

697: of a system of polynomials in the $z_{ij}$.

698:

699: \begin{thm} \label{thirtytwo}

700: For almost all characteristics $f_{ij}$,

701: the two-locus model is a surface of degree

702: $32$ in the simplex $\Delta_8$. This surface is the

703: common zero set of the degree $32$ polynomials

704: gotten by projection into three-dimensional subspaces.

705: \end{thm}

706:

707: \noindent {\sl Proof. }

708: We work in the setting of complex projective

709: algebraic geometry. Consider the embedding

710: of the product of projective lines $P^1 \times P^1$

711: by the ample line bundle $\mathcal{O}(4,4)$. This

712: is a toric surface $X$ of degree $32$ in $P^{24}$.

713: The $9 \times 25$-matrix $F$ defines a rational

714: map from $P^{24}$ to $P^8$, and it can be checked

715: computationally that this map has no base points on

716: $X$ for general $f_{ij}$. Hence the image $F(X)$ of $X$ in

717: $P^8$ is a rational surface of degree $32$. The two-locus model

718: is the intersection of $F(X)$ with $\Delta_8$, which is

719: the positive orthant in $P^8$.

720:

721: Let $A$ denote a generic  $4 \times 9$-matrix,

722: defining a rational map $P^8 \rightarrow P^3$.

723: It has no base points on $F(X)$, hence the image

724: $AF(X)$ of $F(X)$ under $A$ is a surface

725: of degree $32$ in projective $3$-space $P^3$.

726: The inverse image of $AF(X)$ in $P^8$

727: is an irreducible hypersurface of degree

728: $32$ in $P^8$. It is defined

729: by an irreducible homogeneous polynomial

730: of degree $32$ in $\, z = (z_{00}, z_{01}, \ldots,z_{22})$.

731: These polynomials for various $4 \times 9$-matrices $A$

732: are known as the \emph{Chow equations} of the surface $F(X)$.

733: Computing them is equivalent to computing the

734: \emph{Chow form} of $F(X)$. A well-known

735: construction in algebraic geometry (see e.g.~\cite[\S 3.3]{DalStu})

736: shows that any irreducible projective variety

737: is set-theoretically defined by its

738: Chow equations. Applying this result

739: to $F(X)$ completes the proof. \qed

740:

741: We now explain how Theorem \ref{thirtytwo}

742: translates into an explicit algorithm for

743: computing the algebraic invariants of the

744: two-locus model. Let $\,\mathcal{R}_X\,$ be

745: the Chow form of the toric surface

746: $\, X \simeq P^1 \times P^1 \,$ in $\,P^{24}$.

747: The Chow form $\,\mathcal{R}_X\,$

748: is the multigraded resultant of three polynomial equations

749: of bidegree $(4,4)$:

750: $$

751: \sum_{i=0}^4 \sum_{j=0}^4 \alpha_{ij} x^i y^j

752: \,=\,

753: \sum_{i=0}^4 \sum_{j=0}^4 \beta_{ij}  x^i y^j

754: \,=\,

755: \sum_{i=0}^4 \sum_{j=0}^4 \gamma_{ij}  x^i y^j

756: \,=\, 0 . $$

757: In concrete terms,  $\,\mathcal{R}_X\,$ is

758: the unique (up to sign) irreducible polynomial

759: of tridegree $(32,32,32)$ in the

760: $75$ unknowns $\alpha, \beta,\gamma$ which vanishes

761: if and only if the three equations have a common

762: solution in $\,P^1 \times P^1$.

763:

764: We use the B\'ezout matrix representation

765: of the resultant $\mathcal{R}_X$

766: given in \cite[Theorem 6.2]{DicEmi}.

767: This is a $32 \times 32$-matrix  ${\bf B}$ which is

768: a direct generalization of the $4 \times 4$-matrix

769: in (\ref{bezout}). Consider the

770: $3 \times 25$-coefficient matrix

771: $$

772:    \left( \begin{array}{cccccccccc}

773: \alpha_{00} & \alpha_{01} & \alpha_{02} & \alpha_{03} & \alpha_{04} &

774: \alpha_{10} & \alpha_{11} & \cdots \cdots & \alpha_{43} & \alpha_{44} \\

775: \beta_{00} & \beta_{01} & \beta_{02} & \beta_{03} & \beta_{04} &

776: \beta_{10} & \beta_{11} & \cdots \cdots & \beta_{43} & \beta_{44} \\

777: \gamma_{00} & \gamma_{01} & \gamma_{02} & \gamma_{03} & \gamma_{04} &

778: \gamma_{10} & \gamma_{11} & \cdots \cdots & \gamma_{43} & \gamma_{44} \\

779: \end{array} \right)

780: $$

781: For $1 \leq i < j < k \leq 25$, let $\,[\, i \,j \, k \,]\,$ denote the

782: determinant of the $3 \times 3$-submatrix with column indices $i,j,k$.

783: The entries in the Bezout matrix ${\bf B}$

784: are the linear forms in the brackets

785: $\,[\, i \,j \, k \,]$, and we have

786: $\,\mathcal{R}_X = {\rm det}({\bf B})$.

787:

788: Let $F$ be the $9  \times 25$-matrix

789: in Corollary \ref{ninetwentyfive}.

790: We add the column vector $z$ to get the

791: $ 9 \times 26$-matrix $\,( F \, z )$.

792: Next we pick any $4 \times 9$-matrix $A$

793: and we consider

794: $$ A \cdot (F \,\, z) \,\, = \,\, (A \cdot F \, \,\,\, A \cdot z). $$

795: This is a $4 \times 26$-matrix whose last column consists of

796: linear forms in the $z_{ij}$.

797:

798: In the B\'ezout matrix ${\bf B}$, we now replace

799: each bracket $\,[\, i \,j \, k \,]\,$ by the $4 \times 4$-subdeterminant

800: of $\, A \cdot (F \, \, z)\,$ with column indices

801: $i,j,k$ and $26$. Thus  $\,[\, i \,j \, k \,]\,$ is a linear

802: form in the $z_{ij}$ whose coefficients are homogeneous

803: polynomials of degree six in the $f_{ij}$.

804: The matrix gotten by this substitution is

805: denoted $\,{\bf B}\bigl(A \cdot (F \,\, z) \bigr)$.

806: Its determinant is the specialized resultant

807: $\,\mathcal{R}_X \bigl( A \cdot (F \,\, z) \bigr)$.

808:

809: \begin{cor}

810: The resultant $\,\mathcal{R}_X \bigl( A \cdot (F \,\, z) \bigr)\,$

811: is a homogeneous polynomial of degree $32$ in the entries $a_{ij}$ of $A$.

812: Its coefficients are polynomials which are bihomogeneous of degree $32$

813: in the $z_{ij}$ and degree $192$ in the $f_{ij}$.

814: The two-locus model is cut out by this finite list of coefficient polynomials

815: in the $z_{ij}$ and $f_{ij}$. \end{cor}

816:

817: \noindent {\sl Proof. }

818: Each entry of the $32 \times 32$-matrix

819: $\,{\bf B}\bigl(A \cdot (F \,\, z) \bigr)\,$ is

820: a polynomial which is trihomogeneous of degree

821: $(1,6,1)$ in $(a_{ij},f_{ij},z_{ij})$. Hence its determinant

822: is trihomogeneous of degree $(32,192,32)$.

823: For fixed $A$ and fixed $F$, the resulting polynomial

824: defines a hypersurface of degree $32$ in $P^{24}$.

825: This hypersurface is the inverse image of the

826: surface $AF(X)$ in $P^3$. As discussed in the

827: proof of Theorem \ref{thirtytwo}, our model is

828: the intersection of these hypersurfaces

829: for all possible choices of $A$. A finite basis for

830: the linear system of these hypersurfaces

831: is given by the coefficient polynomials

832: of $\,\mathcal{R}_X \bigl( A \cdot (F \,\, z) \bigr)\,$

833: with respect to $A$. \qed

834:

835: The finite list of algebraic invariants described in the

836: previous corollary is the two-locus generalization

837: of the one-locus invariant in Proposition

838: \ref{twolocusprop}.

839: Note that the bidegree in $(F,z)$ has now

840: increased from $(4,8)$ to $(32,192)$.

841: Our derivation of these invariants

842: from the Chow form of a Segre-Veronese variety

843: generalizes to the $k$-locus case,

844: where $F$ and $z$ are $k$-dimensional tables of format $3 \times 3 \times \cdots \times 3$.

845: The analogous invariants have bidegree

846: $\,\bigl( \,k ! \, 4^k,\, 2 (k+1)! \, 4^k \,\bigr) \,$ in $(z,F)$.

847:

848: \section{Computational experiments and statistical perspectives}

849: We prepared a test implementation in {\tt maple} of the elimination

850: technique described in the previous section. That code is available

851: at the first author's website {\tt www.stat.berkeley.edu/$\sim$ingileif/}.

852: The input is a triple

853: $\bigl((f_{ij}), (z_{ij}),A\bigr) $ consisting of

854: a $3 \times 3$-matrix of model characteristics,

855: a $3 \times 3$-matrix of model coordinates.

856: and a projection matrix of size $4 \times 9$.

857: Each entry in these input matrices can be either

858: left symbolic or it can be specialized to a number.

859: Our program builds the specialized B\'ezout matrix

860: $\,{\bf B}\bigl(A \cdot (F \,\, z) \bigr)$, and, if the

861: matrix entries are purely numeric, then

862: it evaluates  the determinant $\,\mathcal{R}_X \bigl( A \cdot (F \,\, z) \bigr)$.

863:

864: Here are some examples of typical

865: computations with our {\tt maple} program.

866: Set  \vskip -0.4cm

867: \begin{tabbing}

868: $\quad$ \= $z_{00} = 3 \quad$ \= $z_{01} = 3 \quad$ \= $z_{02} = 5 \quad$  \= $\quad$ \= $f_{00} = 32 \quad$ \= $f_{01} = 21 \quad$ \= $f_{02} = 48 \quad$ \\

869: \> $z_{10} = 29$ \> $z_{11} = 11$ \> $z_{12} = 13$ \> $\quad$ \> $f_{10} = 14$ \> $f_{11} = 27$ \> $f_{12} = 39$ \\

870: \> $z_{20} = 17$ \> $z_{21} = 19$ \> $z_{22} = 23$ \> $\quad$ \> $f_{20} = 36$ \> $f_{21} = 19$ \> $f_{22} = 22$ \\

871: \end{tabbing}

872: \vskip -0.4cm

873: $$ \hbox{and}

874: \qquad \qquad A \,\,\, = \,\,\,

875:  \left( \begin{array}{cccccccccc}

876:  1 & 0 & 0 &  0 & 0 & 0 &  0 & 0 & 0 \\

877:   0 & 1 & 0 &  0 & 0 & 0 &  0 & 0 & 0 \\

878:   0 & 0 & 0 &  1 & 0 & 0 &  0 & 0 & 0 \\

879:   0 & 0 & 0 &  0 & 1 & 0 &  0 & 0 & 0 \\

880:     \end{array} \right). \qquad \qquad \qquad \qquad $$

881: Then $\,{\bf B}\bigl(A \cdot (F \,\, z) \bigr)$ is a $32 \times 32$-matrix whose

882: entries $b_{i,j}$ are integers, e.g.,

883: $$ b_{1,1} =  26967093018624, \,\,b_{1,2} =  -114552012275712, \ldots, \,\,b_{32,32} =  845647773696. $$

884: The determinant of this $32 \times 32$-matrix is a non-zero integer with $469$ digits:

885: $$ \mathcal{R}_X \bigl( A \cdot (F \,\, z) \bigr) \,\, = \,\,

886:                                                     0.2704985126... \cdot 10^{469}. $$

887: We now retain the numerical values for the model characteristics $f_{ij}$

888: and the  matrix $A$ from before but we make the model

889: coordinates $z_{ij}$ indeterminates. Then

890: $\,{\bf B}\bigl(A \cdot (F \,\, z) \bigr)$ is a $32 \times 32$-matrix whose

891: entries $b_{i,j}$ are linear forms

892: \vskip -0.4cm

893: \begin{eqnarray*}

894: b_{1,1} &\,\,  = \,\, &  -2630935904256 \, z_{00}+1315467952128 \, z_{01} \\

895:  & &  +1315467952128 \, z_{10}-657733976064 \, z_{11} \\

896: b_{1,2} &\,\, = \,\,&  11746198683648 \, z_{00}-8211709034496 \, z_{01} \\

897: &  &  -5873099341824 \, z_{10}+4105854517248 \, z_{11}\\

898: &  & \qquad \dots  \quad \dots \quad \dots  \quad \dots \quad \dots

899: \end{eqnarray*} \vskip -0.3cm

900: Its determinant  $\mathcal{R}_X \bigl( A \cdot (F \,\, z) \bigr)$ is an irreducible

901: polynomial of degree $32$

902: which vanishes on the model with the given characteristics $f_{ij}$.

903: In fact, up to scaling, it is the unique such polynomial

904: which depends only on $\,z_{00},z_{01},z_{10}$ and $z_{11}$.

905:

906: Finally, we reverse the role of the coordinates $z_{ij}$

907: and the characteristics $f_{ij}$, namely, we fix the former

908: at their previous numerical values $(z_{00} =3,\ldots,z_{22} = 22)$

909: but we regard the $f_{ij}$ as indeterminates. Then $\,{\bf B}\bigl(A \cdot (F \,\, z) \bigr)$

910:  is a $32 \times 32$-matrix whose entries $b_{i,j}$ are

911:  homogeneous polynomials of degree six, e.g.,

912:  \vskip -0.3cm

913: \begin{eqnarray*}

914:  b_{1,1} &\,\, =\,\, & \quad 671744 \, f_{00}^6-1343488 \, f_{00}^5 f_{01}-1343488 \, f_{00}^5 f_{10}\\

915:         &       & + \, 671744 \, f_{00}^4 f_{01}^2  + 2686976 \, f_{00}^4 f_{01} f_{10}+671744 \, f_{00}^4 f_{10}^2 \\

916:         &       & - \, 1343488 \, f_{00}^3 f_{01}^2 f_{10}-1343488 \, f_{00}^3 f_{01} f_{10}^2 + 671744 \,f_{00}^2 f_{01}^2 f_{10}^2.

917: \end{eqnarray*}

918: Now  $\mathcal{R}_X \bigl( A \cdot (F \,\, z) \bigr)$ is an irreducible homogeneous

919: polynomial of degree $192$ in the nine characteristics $f_{ij}$.

920: The vanishing of this polynomial provides an algebraic constraint on the

921: set of all models $(f_{ij})$ which fit the given data $(z_{ij})$.

922:

923:

924: In linkage analysis, the characteristics $f_{ij}$ can take on any

925: real value between $0$ and $1$.

926: %(here between $0$ and $100$ for numerical reasons).

927: Two-locus models are often constructed by

928: first picking two one-locus  characteristics, $g=(g_0, g_1, g_2)$ and $h=(h_0, h_1, h_2)$, from a class of special models such as recessive or dominant.

929: Then the two-locus model is defined by combining the one-locus characteristics in one of the following ways:

930: \begin{center}

931: \begin{tabular}{rcl}

932: {\it multiplicative} &:& $f_{ij} \,=\, g_i \cdot h_j$ \\

933: {\it heterogeneous} &:& $f_{ij} \,=\, g_i + h_j -g_i\cdot h_j$ \\

934: {\it additive} &:& $f_{ij} \,=\, g_i + h_j$ \\

935: \end{tabular}

936: \end{center}

937: The $9 \times 25$-matrix $F$ of the multiplicative model

938: is the tensor product of the two $3 \times 5$-matrices gotten

939:  from $g$ and $h$ as in Proposition \ref{matrixform}.

940: Hence the surface of the multiplicative  model is the

941: \emph{Segre product} of two one-locus curves.

942: The heterogeneous model and the additive model are too special,

943: in the sense that the corresponding surfaces in $P^8$ have degree

944: less than $32$. In these two cases, the resultant

945: $\,\mathcal{R}_X \bigl( A \cdot (F \,\, z) \bigr)\,$ vanishes

946: identically, and our {\tt maple} code always outputs zero.

947: The surfaces arising from these two models require

948: a separate algebraic study. Conducting this study could be

949: a worthwhile next step.

950:

951: The following two-locus analogue to Holmans' triangle (the smaller triangle

952: in Figure \ref{fig:holmans}) was derived in~\cite{olof}.  For affected sibling pairs the IBD sharing probabilities $ \,z = (z_{00}, z_{01}, \ldots, z_{22})\,$

953:  satisfy

954: $\, H \cdot z^T \geq 0 \,$ where $H$ is the inverse of $K^{\otimes 2}$ and

955: \vskip -0.5 cm

956: \begin{eqnarray*}

957: K &\,\,\, = \,\,\, & \frac{1}{4}

958: \left( \begin{array}{rrr}

959:  1 & 0 & 0 \\

960:  2 & 2 & 0 \\

961:  1 & 2 & 4 \\

962: \end{array} \right)

963: \end{eqnarray*}

964: \vskip -0.5 cm

965: So, in practical applications we are only interested in the

966: intersection of our degree $32$ surface with the $8$-simplex defined by

967: these linear inequalities.

968:

969: In summary, in this paper we have presented a model for the sharing

970: of genetic material of two affected siblings, used in genetic linkage

971: analysis, in the framework of algebraic geometry.

972: The model is rich in structure, but this

973: structure is not yet fully exploited in statistical tests for genetic linkage.

974: For plausible biological models we expect to see increased sharing between

975: affected sibling pairs at gene loci linked to the disease.

976: The null hypothesis for linkage is rejected only if the estimate

977: of the model coordinates, $z$, differs significantly from $z_{null}$.

978: This is a geometric statement about the

979: distance between two points in a triangle (for $k=1$) or

980: in an $8$-simplex (for $k=2$). We believe that the algebraic

981: representation of the model derived here will be useful for

982: deriving new test statistics for linkage in the case when $k \geq 2$.

983:

984: \section{Acknowledgements}

985: We thank Lior Pachter and Terry Speed for

986: reading the manuscript and providing useful

987: comments. We are grateful to Amit Khetan

988: for helping us with the {\tt maple} implementation

989: of the B\'ezout resultant.  Bernd Sturmfels was supported

990: by the Hewlett Packard Visiting Research Professorship 2003-04

991: at MSRI~Berkeley and

992: the National Science Foundation (DMS-0200729).

993:

994:

995: \begin{thebibliography}{9}

996: %\bibitem{allman} Elizabeth S. Allman

997: %and John A. Rhodes: Phylogenetic invariants for the general Markov model

998: %of sequence mutation, {\em Math. Biosci.} 186(2) pp.113-144.

999: \bibitem{olof} Olof Bengtsson: {\em Two-Locus Affected Sib-Pair

1000: Identity By Descent Probabilities} (Licentiate Thesis,

1001: Dept. of Mathematical Statistics, G\"{o}teborg Univ., 2001).

1002: \bibitem{botrisch} David Botstein and Neil Risch: Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease, {\em Nature Genetics supplement} {\bf 33} (2003) 228-237.

1003: \bibitem{DalStu} John Dalbec and  Bernd Sturmfels:

1004: Introduction to Chow forms,

1005: in {\sl ``Invariant Methods in Discrete and Computational Geometry''}

1006: [N.~White, ed.], Proceedings Curacao (June 1994), Kluwer

1007: Academic Publishers, 1995, pp.~37--58.

1008: \bibitem{DicEmi} Alicia Dickenstein and Ioannis Emiris:

1009: Multihomogeneous resultant formulae by means of complexes,

1010: {\em J.~Symbolic Computation} {\bf 36} (2003) 317--342.

1011: \bibitem{ds} Sandrine Dudoit and Terence P. Speed: A score test

1012: for the linkage analysis of qualitative and quantitative

1013: traits based on identity by descent data from

1014: sib-pairs, {\em Biostatistics} {\bf 1} (2000) 1-26.

1015: \bibitem{elston} Robert C. Elston: Statistical Genetics '98,

1016:  Methods of Linkage Analysis-and the Assumptions Underlying Them,

1017:  {\em Am.~J.~Hum.~Genet.} {\bf 63} (1998) 931-934

1018: \bibitem{gss} Luis Garcia, Michael Stillman and  Bernd Sturmfels:

1019: Algebraic geometry of Bayesian networks, {\em J.~Symbolic Computation},

1020: to appear.

1021: \bibitem{holmans} Peter Holmans: Asymptotic properties of affected

1022: sib-pair linkage analysis, {\em Am.J.Hum.Genet.} {\bf 52} (1993) 362-374.

1023: \bibitem{lander} Eric Lander and Nicholas Schork: Genetic dissection of

1024: complex traits. {\em Science} {\bf 265} (1994) 2037-2048.

1025: \bibitem{ott} Jurg Ott: {\em Analysis of Human Genetic Linkage},

1026: Johns Hopkins Univ.Press, 1991.

1027: \bibitem{prw} Giovanni Pistone, Eva Riccomagno and Henry Wynn:

1028: {\em Algebraic Statistics}, Chapman \& Hall, New York. 2001.

1029: %\bibitem{sham} Pak Sham: {\em Statistics in Human Genetics},

1030: %Arnold Appl. of Statistics, 1998.

1031: \bibitem{StuSanDiego} Bernd Sturmfels: Introduction to

1032: resultants, in: D.~Cox, B.~Sturmfels (eds.),

1033: {\sl Applications of Computational Algebraic Geometry},

1034: Proceedings of Symp.~in Applied Math., {\bf 53},

1035: American Mathematical Society, 1997, pp.~25--39.

1036: \bibitem{stbook} Bernd Sturmfels: {\em Solving Systems of

1037: Polynomial Equations}, American Mathematical Society,

1038: CBMS Regional Conferences Series, No.~97, Providence, Rhode Island, 2002.

1039: \end{thebibliography}

1040:

1041: \end{document}

1042:

1043: