0401:q-bio0401033/parametric_inference_biological_sequence_analysis

1: \documentclass[11pt]{article}

2: \usepackage{epsfig}

3: \usepackage{amssymb}

4: \usepackage{amsthm}

5: \usepackage{amscd}

6: \usepackage{amsfonts}

7: \usepackage{amsmath}

8: \usepackage[T1]{fontenc}

9: \usepackage{ae,aecompl}

10: \usepackage{pslatex}

11: \usepackage{graphicx}

12: \usepackage{color}

13: \usepackage{latexsym}  % for \Box.

14: %\usepackage{fullpage}

15: \usepackage{url}

16: \usepackage{setspace}

17:

18: \setlength{\textwidth}{6.5in}

19: \setlength{\textheight}{8.5in}

20: \setlength{\oddsidemargin}{0.0in}

21: \setlength{\evensidemargin}{0.0in}

22: \setlength{\topmargin}{0.0in}

23:

24: \newcommand{\mysec}[1]{Section~\ref{sec:#1}}

25: \newcommand{\fig}[1]{Figure~\ref{fig:#1}}

26: \newcommand{\tbl}[1]{Table~\ref{tbl:#1}}

27: \newtheorem{thm}{Theorem}

28: \newtheorem{claim}[thm]{Claim}

29: \newtheorem{conj}[thm]{Conjecture}

30: \newtheorem{defn}[thm]{Definition}

31: \newtheorem{ex}[thm]{Example}

32: \newtheorem{obs}[thm]{Observation}

33: \newtheorem{lem}[thm]{Lemma}

34: \newtheorem{cor}[thm]{Corollary}

35: \newtheorem{prop}[thm]{Proposition}

36: \newtheorem{fact}[thm]{Fact}

37: \newcommand{\ZZ}{\mathbb{Z}}

38: \newcommand{\RR}{\mathbb{R}}

39: \newcommand{\NN}{\mathbb{N}}

40: \newcommand{\TP}{\mathbb{TP}}

41: \newcommand{\bin}[2]{{#1\choose #2}}

42:

43: \pagestyle{myheadings}

44: %\markright{Draft version October 24th, 2003}

45: \begin{document}

46: \title{Parametric Inference for Biological Sequence Analysis}

47:

48: \author{Lior Pachter and Bernd Sturmfels \\

49: Department of Mathematics, University of California, Berkeley, CA

50: 94720}

51:

52: \maketitle

53:

54: \begin{abstract}

55: One of the major successes in computational biology has been the

56: unification, using the graphical model formalism, of a multitude of

57: algorithms for annotating and comparing biological sequences.

58: Graphical models that have been applied towards these problems include

59: hidden Markov models for annotation, tree

60: models for phylogenetics, and pair hidden Markov models for

61: alignment. A single algorithm, the sum-product algorithm, solves

62: many of the inference problems associated with different statistical models.

63: This paper introduces the \emph{polytope propagation algorithm}

64: for computing the Newton polytope of an observation from a graphical model.

65: This algorithm is a geometric version of the sum-product algorithm and

66: is used to analyze the parametric behavior of maximum a posteriori

67: inference calculations for graphical models.

68: \end{abstract}

69:

70: %\doublespacing

71:

72: \section{Inference with Graphical Models for Biological Sequence Analysis}

73:

74: This paper develops a new

75:  algorithm for graphical models based on the mathematical foundation for statistical models proposed in \cite{Pachter:04}. Its relevance

76: for  computational biology can be summarized as follows:

77:

78: \textbf {(a) Graphical models are a unifying statistical framework for biological sequence analysis.}

79:

80: \textbf {(b) Parametric inference is important for obtaining biologically meaningful results.}

81:

82: \textbf {(c) The polytope propagation algorithm solves the parametric inference problem.}

83:

84: \vskip .1cm

85:

86: Thesis (a) states that graphical models are good models for biological sequences. This emerging understanding is the result of practical success with

87: probabilistic algorithms, and also the observation that inference algorithms for graphical models subsume many apparently non-statistical methods.

88:  A noteworthy example of the latter is the explanation of classic alignment

89: algorithms such as Needleman-Wunsch and Smith-Waterman in terms of the Viterbi algorithm for pair hidden Markov models \cite{Bucher:96}.

90: Graphical models are now used for many problems including motif detection, gene finding, alignment, phylogeny reconstruction and protein structure prediction. For example, most gene prediction methods are now hidden Markov model (HMM) based, and previously non-probabilistic methods

91: now have HMM based re-implementations.

92:

93: In typical applications, biological sequences are modeled as {\em observed random variables} $Y_1,\ldots,Y_n$ in a graphical model. The observed random variables may correspond to sequence elements such as nucleotides or amino acids. {\em Hidden random variables} $X_1,\ldots,X_m$ encode information of interest that is unknown, but which one would like to infer. For example, the information could be an annotation, alignment or ancestral sequence in a phylogenetic tree. One of the strengths of graphical models is that by virtue of being probabilistic, they can be combined into powerful models where the hidden variables are more complex. For example, hidden Markov models can be combined with pair hidden Markov models to simultaneously align and annotate sequences \cite{Alexandersson:03}. One of the drawbacks of such approaches is that the models have more parameters and as a result inferences could be less robust.

94:

95: For a fixed observed sequence $\sigma_1 \sigma_2 \ldots \sigma_n$ and {\em fixed parameters},

96: the standard inference problems are:

97: \begin{enumerate}

98: \item[1.] the calculation of {\em marginal probabilities}:

99: \[ p_{\sigma_1 \cdots \sigma_n}

100: \quad = \quad

101: \sum_{h_1,\ldots,h_m} {\rm Prob} (X_1=h_1,\ldots,X_m=h_m,Y_1=\sigma_1,\ldots,Y_n=\sigma_n) \]

102: \item[2.] the calculation of {\em maximum a posteriori log probabilities}:

103: \[ \delta_{\sigma_1 \cdots \sigma_n}

104: \quad = \quad

105:  \min_{h_1,\ldots,h_m} - {\rm log} \left( {\rm Prob} (X_1=h_1,\ldots,X_m=h_m,Y_1=\sigma_1,\ldots,Y_n=\sigma_n) \right), \]

106: \end{enumerate}

107: where the $h_i$ range over all the possible assignments for the hidden random variables $X_i$.

108: In practice, it is the solution to Problem 2 that is of interest, since it is the one that solves the problem of finding the genes in a genome or the ``best'' alignment for a pair of sequences.

109: A shortcoming of this approach is that the solution $\widehat {\bf h} = (\hat h_1, \ldots, \hat h_m)$ may vary considerably with a change in parameters.

110:

111: Thesis (b) suggests that a {\em parametric} solution to the inference problem can help in ascertaining the reliability, robustness and biological meaning of an inference result. By {\em parametric inference} we mean the solution of

112: Problem 2 for all model parameters simultaneously. In this way we can decide if a solution

113: obtained for particular parameters is an artifact or is largely independent of the chosen

114: parameters. This approach has already been applied successfully to the problem of pairwise sequence alignment in which parameter choices are known to be crucial in obtaining good alignments \cite{Fernandez-Baca:00, Gusfield:96, Waterman:92}.

115: Our aim is to develop this approach for arbitrary graphical models.

116: In thesis (c) we claim that the polytope propagation algorithm is efficient for solving the parametric inference problem, and, in certain cases is not much slower than solving Problem 2 for fixed parameters.

117: The algorithm is a geometric

118: version of the sum-product algorithm, which is the standard tool for

119: inference with graphical models.

120:

121: The mathematical setting for understanding

122: the polytope propagation algorithm is {\em tropical geometry}.

123: The connection between tropical geometry and parametric inference  in statistical models

124: is developed in the companion paper \cite{Pachter:04}. Here we describe the details of the polytope propagation algorithm (Section 3) in two familiar settings: the hidden Markov model for annotation (Section 2) and the pair hidden Markov model for alignment (Section 4). Finally, in Section 5, we discuss some practical aspects of parametric inference, such as specializing parameters, the construction of single cones which eliminates the need for identifying all possible maximum a posteriori explanations, and the relevance of our findings to Bayesian computations.

125:

126: \section{Parametric Inference with Hidden Markov Models}

127: Hidden Markov models play a central role in sequence analysis,

128: where they are widely used to annotate DNA sequences \cite{Baldi:98}.

129: A simple example  is the CpG island annotation problem \cite[\S 3]{Durbin:98}.

130: CpG sites are locations in DNA sequences where

131: the nucleotide cytosine (C) is situated next to a guanine (G) nucleotide (the ``p'' comes from the fact that a phosphate links them together). There are regions with many CpG sites in eukaryotic

132: genomes, and these are of interest because of the action of DNA methyltransferase, which

133: recognizes CpG sites and converts the cytosine  into 5-methylcytosine. Spontaneous deamination

134: causes the 5-methylcytosine to be converted into thymine (T), and the mutation is not fixed

135: by DNA repair mechanisms. This results in a gradual erosion

136: of CpG sites in the genome. {\em CpG islands} are regions of DNA with many unmethylated CpG sites. Spontaneous deamination of cytosine to thymine in these sites is repaired, resulting

137: in a restored CpG site. The computational identification of CpG islands is important, because they are associated with promoter regions of genes, and are known to be involved

138: in gene silencing.

139:

140: Unfortunately, there is no sequence characterization of CpG islands. A generally accepted definition due to Gardiner-Garden and Frommer \cite{Gardiner-Garden:87}

141: is that a CpG island is a region of DNA at least 200bp long with a G+C content of at least 50\%, and with a ratio of observed to expected CpG sites of at least 0.6. This arbitrary

142: definition has since been refined (e.g. \cite{Takai:02}), however even analysis of the complete sequence of the human genome \cite{Lander:01} has failed to

143: reveal precise criteria for what constitutes a CpG island. Hidden Markov models can be used to predict CpG islands \cite[\S 3]{Durbin:98}. We have selected this application of HMMs

144: in order to illustrate our approach to parametric inference in a mathematically simple setting.

145:

146: The CpG island HMM we consider has $n$ hidden binary random variables $X_i$, and $n$ observed random variables $Y_i$ that take

147: on the values $\{A,C,G,T\}$ (see Figure 1 in \cite{Pachter:04}). In general, an

148: HMM can be characterized by the following conditional

149: independence statements for  $i = 1 , \ldots,n$:

150: \begin{eqnarray*} & p(X_i \, | \,X_1,X_2,\ldots,X_{i-1}) \quad

151: = \quad  p(X_i \,| \, X_{i-1}),

152: \\& p(Y_i \, |\, X_1,\ldots,X_i,Y_1,\ldots,Y_{i-1})\quad =

153: \quad p(Y_i \,|\, X_i). \end{eqnarray*}

154: The CpG island HMM has twelve model parameters, namely, the

155: entries of the transition matrices

156: $$ S \, = \, \begin{pmatrix}

157: s_{00} & s_{01} \\

158: s_{10} & s_{11} \\

159: \end{pmatrix}

160: \qquad \hbox{and} \qquad

161: T \, = \, \begin{pmatrix}

162: t_{0A} & t_{0C} & t_{0G} & t_{0T} \\

163: t_{1A} & t_{1C} & t_{1G} & t_{1T}

164: \end{pmatrix}.

165: $$

166: Here the hidden state space has just two states non-CpG $=0$ and CpG $=1$

167: with transitions allowed between them, but in more complicated applications, such as gene finding,

168: the state space is used to model numerous gene components (such as introns and exons) and

169: the sparsity pattern of the matrix $S$ is crucial. In its algebraic representation

170: \cite[\S 2]{Pachter:04}, the HMM is given as the image

171: of the polynomial map

172: \begin{equation}

173: \label{polymap}

174: f \, : \, {\bf R}^{12} \rightarrow {\bf R}^{4^n}, \,\,\,

175: (S,T) \ \mapsto \ \sum_{h \in \{0,1\}^n}  \ \  t_{h_1 \sigma_1}

176: s_{h_1 h_2} t_{h_2 \sigma_2} s_{h_2 h_3} \cdots

177: s_{h_{n-1} h_n} t_{h_n \sigma_n}.

178: \end{equation}

179: The inference problem 1 asks for an evaluation of one coordinate polynomial $f_\sigma$ of the map $f$. This can be done in linear time (in $n$) using the

180: \emph{forward algorithm} \cite{Jordan:02},

181: which  recursively evaluates the formula

182: \begin{equation}

183: \label{sum-product}

184:  f_{\sigma} \quad = \quad

185: \sum_{h_n=0}^1 t_{h_n \sigma_n} \biggl(

186: \sum_{h_{n-1}=0}^1 s_{h_{n-1} h_n} t_{h_{n-1} \sigma_{n-1}}

187: \cdots

188: \bigl(

189: \sum_{h_2=0}^1 t_{h_2 h_3} s_{h_2 \sigma_2}

190: (\sum_{h_1=0}^1 t_{h_1 h_2} s_{h_1 \sigma_1} )\bigr) \cdots \biggr)

191: \end{equation}

192: Problem 2 is to identify the largest term in the expansion of $f_\sigma$.

193: Equivalently, if we write $u_{ij} = - {\rm log}(s_{ij})$ and

194: $v_{ij} = - {\rm log}(t_{ij})$ then Problem 2 is to evaluate the piecewise-linear function

195: \begin{equation}

196: \label{ref:Viterbi}

197:  g_{\sigma} \,\, = \,\,

198: {\rm min}_{h_n} v_{h_n \sigma_n} + \bigl(

199: {\rm min}_{h_{n-1}} u_{h_{n-1} h_n} + v_{h_{n-1} \sigma_{n-1}} +

200: \cdots +

201: \bigl(

202: {\rm min}_{h_2} v_{h_2 h_3} + u_{h_2 \sigma_2} +

203: ( {\rm min}_{h_1} u_{h_1 h_2} + v_{h_1 \sigma_1} )\bigr) \cdots \ \bigr).

204: \end{equation}

205: This formula can  be efficiently evaluated by recursively computing the

206: parenthesized expressions. This is known as the

207: \emph{Viterbi algorithm} in the HMM literature.

208: The Viterbi and forward algorithms are instances of

209: the more general {\em sum-product algorithm} \cite{Kschischang:01}.

210:

211: What we are proposing in this paper is to compute

212: the collection of cones in ${\bf R}^{12}$

213: on which the piecewise-linear function $g_\sigma$ is linear.

214: This may be feasible because the number of cones grows polynomially in $n$.

215: Each cone is indexed by

216: a  binary sequence ${\bf h} \in \{0,1\}^n$ which represents the CpG islands found

217: for any system of parameters $(u_{ij}, v_{ij})$ in that cone. A binary sequence which

218: arises in this manner is an \emph{explanation for $\sigma$} in the sense of

219: \cite[\S 4]{Pachter:04}.

220: Our results in \cite{Pachter:04} imply that the number of explanations

221: scales polynomially with $n$.

222:

223: \begin{thm}

224: For any given DNA sequence $\sigma$ of length $n$, the

225: number of bit strings $\widehat {\bf h} \in \{0,1\}^n$ which are

226: explanations for the sequence $\sigma$ in the CpG island HMM

227: is bounded above by a constant times $n^{5.25}$.

228: \end{thm}

229:

230: \begin{proof}

231: There are a total of $2 \cdot 4 + 4 = 12$ parameters which is the dimension of the

232: ambient space. Note, however, that for a fixed observed sequence the number of times

233: the observation $A$ is made is fixed, and similarly for $C,G,T$. Furthermore, the total

234: number of transitions in the hidden states must equal $n$. Together, these constraints remove

235: five degrees of freedom. We can thus apply \cite[Theorem 7]{Pachter:04}

236: with $d=12-5 = 7$. This shows that

237: the total number of vertices of the Newton polytope of $\,f_{\bf \sigma}\,$ is

238:  $\,O(n^{\frac{7 \cdot 6}{8}}) = O(n^{5.25})$.

239: \end{proof}

240:

241: \begin{figure}[ht]

242: \begin{center}

243:  \includegraphics[scale=0.35]{CpGSchlegel_mod.ps}

244:  \end{center}

245: \caption{The Schlegel diagram of the Newton polytope of

246:  an observation in the CpG island HMM.}

247: \label{fig:Newton_polytope}

248: \end{figure}

249:

250:

251:  We explain the biological meaning of our parametric analysis

252:  with a very small example.

253:  Let us consider the

254:  following special case of the CpG island HMM.

255: First, assume that $t_{iA}=t_{iT}$ and that $t_{iC}=t_{iG}$, i.e.,

256: the output probability depends only on whether the nucleotide

257: is a purine or pyrimidine. Furthermore, assume that

258: $t_{0A}=t_{0G}$, which means that the probability of emitting

259: a purine or a pyrimidine in the non-CpG island state is equal

260: (i.e. base composition is uniform in non-CpG islands).

261:

262:  Suppose that the observed sequence is

263: ${\bf \sigma}=AATAGCGG$. We ask for  {\em all}

264: the possible explanations for ${\bf \sigma}$,

265: that is, for all possible maximum a posteriori

266: CpG island annotations for all parameters.

267: A priori, the number of explanations is bounded by $2^8 = 256$, the total

268: number of binary strings of length eight. However, of the

269: $256$ binary strings, only $25$ are explanations.

270: Figure 1 is a geometric representation of the

271: solution to this problem: the Newton polytope of $f_\sigma$ is

272:  a $4$-dimensional polytope with $25$ vertices.

273: The figure is a \emph{Schlegel diagram} of this polytope.

274: It was drawn with the software POLYMAKE

275:  \cite{Gawrilow:00,Gawrilow:01}.

276: The $25$ vertices in Figure 1 correspond to the

277: $25$ annotations, which are the explanations for $\sigma$

278: as the parameters vary. Two annotations are connected by

279: an edge if and only if their parameter cones share a wall.

280: From this geometric representation, we can determine all

281: parameters which result in the

282: same maximum a posteriori prediction.

283:

284: \section{Polytope Propagation}

285:

286: The evaluation of $g_{\sigma}$ for fixed parameters using the formulation in (\ref{ref:Viterbi}) is known as the Viterbi algorithm in the HMM literature. We begin by re-interpreting this algorithm as a convex optimization problem.

287:

288: \begin{defn}

289: The Newton polytope of a polynomial

290: \[ f(x_1,\ldots,x_d) \quad = \quad \sum_{i=1}^{n} c_i \cdot x_1^{a_{1,i}} x_2^{a_{2,i}} \cdots x_d^{a_{d,i}} \]

291: is defined to be the convex hull of the lattice points in ${\bf R}^d$ corresponding to

292: the monomials in $f$:

293: \[ NP(f) \quad  = \quad

294:  conv\{(a_{1,1},a_{2,1},\ldots,a_{d,1}), \cdots, (a_{1,n},a_{2,n},\ldots,a_{d,n})\}. \]

295: \end{defn}

296: Recall that for a fixed observation there are natural polynomials associated with a graphical model, which we have been denoting by $f_{\sigma}$.

297: In the CpG island example from Section 2, these polynomials are the coordinates

298:  $f_\sigma$ of the polynomial map $f$ in (\ref{polymap}).

299:  Each coordinate polynomial $f_\sigma$ is the sum of $2^n$ monomials,

300:  where $n = |\sigma|$. The crucial observation is that even though the number of monomials grows exponentially with $n$, the number of vertices of the

301:  Newton polytope $NP(f_\sigma)$ is much smaller. The Newton polytope

302:  is important for us because its vertices represent the solutions to the

303:  inference problem 2.

304:

305:  \begin{prop}

306: \label{polytopepropagation}

307: The maximum a posteriori log probabilities $\,\delta_{\sigma}\,$

308:  in Problem 2 can be determined by

309: minimizing a linear functional over the Newton polytope of $\,f_\sigma$.

310: \end{prop}

311:

312:  \begin{proof}

313: This is nothing but a restatement of the fact that when passing to logarithms, monomials in the parameters become linear functions in the logarithms of the parameters. \end{proof}

314:

315: Our main result in this section is an algorithm which we state

316: in the form of a theorem.

317:

318: \begin{thm}[Polytope propagation]

319: Let $f_{\sigma}$ be the polynomial associated to a fixed observation $\sigma$ from a graphical model. The list of all vertices of the Newton polytope of $f_{\sigma}$ can be

320: computed efficiently by recursive convex hull and Minkowski sum computations on unions of polytopes.

321: \end{thm}

322:

323: \begin{proof}

324: Observe that if $f_1,f_2$ are polynomials then $NP(f_1 \cdot f_2) = NP(f_1) + NP(f_2)$

325: where the $+$ on the right hand side denotes the Minkowski sum of the two

326: polytopes. Similarly, $\,NP(f_1+f_2) = {\rm conv} \bigl( NP(f_1) \cup NP(f_2) \bigr)\,$

327: if $f_1$ and $f_2$ are polynomials with positive coefficients.

328: The recursive description of $f_{\bf \sigma}$ given in (\ref{sum-product}) can be used

329: to evaluate the Newton polytope efficiently. The necessary geometric

330: primitives are precisely Minkowski sum and convex hull of unions of convex polytopes.

331: These primitives run in polynomial

332: time since the dimension of the polytopes is fixed. This is the

333: case in our situation since we consider graphical models

334: with a fixed number of parameters. We can hence

335: run the sum-product algorithm efficiently in the

336: semiring known as the \emph{polytope algebra}.

337: The size of the output scales polynomially by \cite[Thm.~7]{Pachter:04}.

338: \end{proof}

339:

340:

341: \begin{figure}[ht]

342: \begin{center}

343:  \includegraphics[scale=0.75]{polytope_propagation.ps}

344:  \end{center}

345: \caption{Graphical representation of the polytope propagation algorithm for a hidden Markov model.

346: For a particular pair of parameters, there is one

347: optimal Viterbi path (shown as large vertices on the polytopes).}

348:     \label{fig:HMMpoly}

349: \end{figure}

350:

351: Figure 2 shows an example of the polytope propagation algorithm for a hidden Markov model

352: with all random variables binary and with the following transition and output

353: matrices:

354: $$ S \, = \, \begin{pmatrix}

355: s_{00} & 1 \\

356: 1 & s_{11} \\

357: \end{pmatrix}

358: \qquad \hbox{and} \qquad

359: T \, = \, \begin{pmatrix}

360: s_{00} & 1  \\

361:  1 & s_{11}

362: \end{pmatrix}.

363: $$

364: Here we specialized to only two parameters in order to simplify the diagram.

365: When we run polytope propagation for long enough DNA sequences

366: $\sigma$ in the

367: CpG island HMM of Section 2 with all $12$ free parameters, we get a diagram just like Figure 2,

368: but with each polygon replaced by a seven-dimensional polytope.

369:

370: It is useful to note that for HMMs, the Minkowski sum operations are simply shifts of the polytopes, and therefore the only non-trivial geometric operations required are the convex hulls of unions of polytopes.

371: The polytope in Figure 1 was computed using polytope propagation. This polytope

372: has dimension $4$ (rather than $7$) because the sequence ${\bf \sigma}=AATAGCGG$ is so short.

373: We wish to emphasize that the small size of our examples is only for clarity; there is no practical

374: or theoretical barrier to computing much larger instances.

375:

376:

377: For general graphical models, the running time of the Minkowski sum and convex hull computations depends on the number of parameters, and the number of vertices in each computation. These are

378: clearly bounded by the total number of vertices of $NP(f_{\sigma})$, which are bounded above by \cite[Theorem 7]{Pachter:04}:

379: $$ \# \,{\rm vertices} (NP(f_\sigma)) \,\,\, \leq \,\,\,

380:   {\rm constant} \cdot E^{d(d-1)/(d+1)} \,\,\, \leq \,\,\, {\rm

381: constant} \cdot E^{d-1} . $$

382: Here $E$ is the number of edges in the graphical model (often linear in the number of vertices of the model). The dimension $d$ of the Newton polytope $NP(f_\sigma)$

383: is fixed because it is bounded above by the number of model parameters.

384: The total running time

385: of the polytope propagation algorithm can then be estimated by multiplying the running time for the geometric operations of Minkowski sum and convex hull

386: with the running time of the sum-product algorithm. In any case,

387:  the running time scales polynomially in $E$.

388:

389: We have shown in \cite[\S 4]{Pachter:04}

390: that the vertices of $NP(f_{\sigma})$ correspond to explanations

391: for the observation $\sigma$. In parametric inference we are interested

392: in identifying the parameter regions that lead to the same explanations.

393: Since parameters can be identified

394: with linear functionals, it is the case that the set of parameters that lead to the same explanation (i.e. a vertex $v$) are those linear functionals that minimize on $v$. The

395: set of these linear functionals is the {\em normal cone of

396: $NP(f_\sigma)$ at $v$}. The collection of all normal cones

397: at the various vertices $v$ forms the {\em normal fan} of the polytope. Putting this together with Proposition \ref{polytopepropagation} we obtain:

398:

399: \begin{prop}

400: The normal fan of the Newton polytope of $f_{\sigma}$ solves the parametric

401: inference problem for an observation $\sigma$ in a graphical model.

402: It is computed using the polytope propagation algorithm.

403: \end{prop}

404:

405: An implementation of polytope propagation for arbitrary graphical models

406: is currently being developed within the

407: geometry software package POLYMAKE \cite{Gawrilow:00,Gawrilow:01} by Michael Joswig.

408:

409: \section{Parametric Sequence Alignment}

410:

411: The \emph{sequence alignment} problem asks to find the best alignment between two sequences which have evolved from a common ancestor via a series of mutations, insertions and deletions. Formally,

412:  given two sequences $\,\sigma^1 =

413:  \sigma^1_1 \sigma^1_2 \cdots \sigma^1_n \,$ and

414: $\,\sigma^2 =   \sigma^2_1 \sigma^2_2 \cdots \sigma^2_m \,$

415: over the alphabet $ \{0,1,\ldots,l-1\}$,

416:  an \emph{alignment} is a string over the alphabet $\{M,I,D\}$ such that

417: $\#M+\#D= n$ and $\#M+\#I=m $.

418: Here $\#M, \#I, \#D$ denote the number of characters $M,I,D$

419: in the word respectively.  An alignment records the ``edit steps'' from the sequence

420: $\sigma^1$ to the sequence $\sigma^2$, where edit operations consist of changing characters,

421: preserving them, or inserting/deleting them. An $I$ in the alignment string

422: corresponds to an insertion in the first sequence, a $D$ is a deletion in the first

423: sequence, and an $M$ is either a character change, or lack thereof.

424: We write ${\cal A}_{n.m}$ for the set of all alignments.

425: For a given $h \in {\cal A}_{m,n}$, we will denote the $j$th character in $h$ by $h_j$, we write $\,h[i] \,$ for $\,\#M+\#I \,$ in the prefix

426: $\,h_1 h_2 \ldots h_i$, and we write

427: $\,h \langle j \rangle\,$ for $\,\#M+\#D\,$ in

428: the prefix $\,h_1 h_2 \ldots h_j$.

429: The cardinality of the set ${\cal A}_{n.m}$ of all alignments can be computed

430: as the coefficient of $x^m y^n$ in the generating function

431: $1/(1-x-y-xy)$. These coefficients are known as

432:  \emph{Delannoy numbers} in combinatorics

433: \cite[\S 6.3]{Stanley:99}.

434:

435: {\em Bayesian multi-nets} were introduced in \cite{Friedman:97} and are

436: extensions of graphical models via the introduction of class nodes, and a

437: set of local networks corresponding to values of the class nodes.

438: In other words, the value of a random variable can change the structure

439: of the graph underlying the graphical model. The

440: {\em pair hidden Markov model} (see Figure \ref{fig:pairHMM}) is

441:  an instance of a Bayesian multinet. In this model,

442: the hidden states (unshaded nodes forming the chain) take on

443: one of the values $M,I,D$. Depending on the value at a hidden node,

444: either one or two characters are generated; this is encoded by plates (squares around the observed states) and class nodes (unshaded nodes in the plates).

445: The class nodes take on the values $0$ or $1$ corresponding to whether

446: or not a character is generated.

447: Pair hidden Markov models are

448: therefore probabilistic models of alignments, in which the structure of

449: the model depends on the assignments to the hidden states.

450: \begin{figure}

451:   \begin{center}

452:    \includegraphics[scale=0.7]{pairhmm.ps}

453:   \end{center}

454:   \caption{A pair hidden Markov model for sequence alignment.}

455:   \label{fig:pairHMM}

456: \end{figure}

457:

458: Our next result gives the precise description of the pair HMM for sequence alignment in

459: the language of algebraic statistics, namely, we represent this model

460: by means of a polynomial map $f$.

461: Let $\sigma^1$, $\sigma^2$ be the output strings from a pair hidden Markov model (of lengths $n,m$ respectively). Then:

462: \begin{equation}

463: \label{pairhmm}

464: f_{\sigma^1,\sigma^2}

465:  \quad = \quad \sum_{h \in {\cal A}_{n,m}}

466:  t_{h_1}(\sigma^1_{h[1]},\sigma^2_{h \langle 1 \rangle}) \cdot

467: \prod_{i = 2}^{|h|}

468: s_{h_{i-1}h_i} \cdot t_{h_i}(\sigma^1_{h[i]},\sigma^2_{h \langle i \rangle}) ,

469: \end{equation}

470: where $s_{h_{i-1}h_i}$ is the transition probability from state $h_{i-1}$ to $h_i$ and $t_{h_i}(\sigma^1_{h[i]},\sigma^2_{h \langle i \rangle})$ are the output probabilities

471: for a given state $h_i$ and the corresponding output characters on the strings $\sigma^1,\sigma^2$.

472:

473: \begin{prop} \label{pairHMMmap}

474: The pair hidden Markov model for sequence alignment is the

475: image of a polynomial map $f : {\bf R}^{9 + 2l+ l^2 }

476: \rightarrow {\bf R}^{l^{n+m}}$.

477: The coordinates of $f$ are

478: polynomials

479: of degree $n  +  m + 1 $ in  (\ref{pairhmm}).

480: \end{prop}

481:

482: We need to explain why the number of parameters is $9 + 2l+ l^2 $.

483: First, there are nine parameters

484: $$ S \quad = \quad

485: \begin{pmatrix}

486: s_{MM} &  s_{MI} &  s_{MD} \\

487: s_{IM} &  s_{II} &  s_{ID} \\

488: s_{DM} &  s_{DI} &  s_{DD}

489: \end{pmatrix} , $$

490: which play the same role as in Section 2,

491: namely, they represent transition probabilities

492: in the Markov chain. There are

493: $l^2$ parameters $\,t_M(a,b) =: t_{Mab}\,$

494: for the probability that letter $a$ in

495: $\sigma^1$ is matched with letter $b$ in $\sigma^2$.

496: The insertion parameters $\,t_I(a,b) \,$

497: depend only on the letter $b$, and the

498: deletion parameters  $\,t_D(a,b) \,$

499: depend only on the letter $a$, so there

500: are only  $2l $ of these parameters. In the upcoming example,

501: which explains the algebraic representation of

502: Proposition \ref{pairHMMmap},

503: we use the abbreviations $\,t_{Ib}\,$ and $\,t_{Da}\,$

504: for these parameters.

505:

506: Consider two sequences $\, \sigma^1 = ij \,$ and $\sigma^2 = klm \,$

507: of length $n = 2$ and $m = 3$ over any alphabet.

508: The number of alignments is $\,\#( {\cal A}_{n,m} ) = 25$, and they are listed in Table 1.

509: \begin{table}

510: \begin{center}

511: \begin{tabular} {|l|l|l|}  \hline

512: %$$

513: %\begin{matrix}

514: IIIDD & \,\, $( \,\cdot \cdot \cdot ij \,,\, klm\cdot \cdot  \, )$ & $

515:  t_{Ik} s_{II} t_{Il} s_{II} t_{Im} s_{ID} t_{Di} s_{DD} t_{Dj} $\\

516: IIDID & \,\, $( \,\cdot \cdot i\cdot j \, ,\, kl\cdot m\cdot  \, )$ & $

517:  t_{Ik} s_{II} t_{Il} s_{ID} t_{Di} s_{DI} t_{Im} s_{ID} t_{Dj} $\\

518: IIDDI & \,\, $( \,\cdot \cdot ij \,\cdot \,,\, kl\cdot \cdot m \, )$ & $

519:  t_{Ik} s_{II} t_{Il} s_{ID} t_{Di} s_{DD} t_{Dj} s_{DI} t_{Im} $\\

520: IDIID & \,\, $( \,\cdot \, i\cdot \cdot j\,,\, k\cdot lm\cdot  \, )$ & $

521:  t_{Ik} s_{ID} t_{Di} s_{DI} t_{Il} s_{II} t_{Im} s_{ID} t_{Dj} $\\

522: IDIDI & \,\, $( \,\cdot \, i\cdot j\cdot \,,\, k\cdot l\cdot m \, )$ & $

523:  t_{Ik} s_{ID} t_{Di} s_{DI} t_{Il} s_{ID} t_{Dj} s_{DI} t_{Im} $\\

524: IDDII & \,\, $( \,\cdot \,ij \cdot \cdot \,,\, k\cdot \cdot lm \, )$ & $

525:  t_{Ik} s_{ID} t_{Di} s_{DD} t_{Dj} s_{DI} t_{Il} s_{II} t_{Im} $\\

526: DIIID & \,\, $( \,i\cdot \cdot \cdot j \,,\, \cdot \, klm\cdot  \, )$ & $

527:  t_{Di} s_{DI} t_{Ik} s_{II}

528: t_{Il} s_{II} t_{Im} s_{ID} t_{Dj} $\\

529: DIIDI & \,\, $( \,i\cdot \cdot j\cdot \,,\, \cdot \,kl\cdot m \, )$ & $

530:  t_{Di} s_{DI} t_{Ik} s_{II} t_{Il} s_{ID} t_{Dj} s_{DI} t_{Im} $\\

531: DIDII & \,\, $( \,i\cdot j\cdot \cdot \,,\, \cdot \,k\cdot lm \, )$ & $

532:  t_{Di} s_{DI} t_{Ik} s_{ID} t_{Dj} s_{DI} t_{Il} s_{II} t_{Im} $\\

533: DDIII & \,\, $( \,ij\cdot \cdot \,\cdot\, ,\, \cdot \cdot klm \, )$ & $

534:  t_{Di} s_{DD} t_{Dj} s_{DI} t_{Ik} s_{II} t_{Il} s_{II} t_{Im} $\\

535: MIID & \,\, $( \,i\cdot \cdot j \,,\, klm \,\cdot  \, )$ & $     t_{Mik} s_{MI} t_{Il} s_{II} t_{Im} s_{ID} t_{Dj} $\\

536: MIDI & \,\, $( \,i\cdot j\cdot \,,\, kl\cdot m \, )$ & $     t_{Mik} s_{MI} t_{Il} s_{ID} t_{Dj} s_{DI} t_{Im} $\\

537: MDII & \,\, $( \,ij\cdot \cdot \,,\, k\cdot lm \, )$ & $     t_{Mik} s_{MD} t_{Dj} s_{DI} t_{Il} s_{II} t_{Im} $\\

538: IMID & \,\, $( \,\cdot \,i\cdot j \,,\, klm\cdot  \, )$ & $     t_{Ik} s_{IM} t_{Mil} s_{MI} t_{Im} s_{ID} t_{Dj} $\\

539: IMDI & \,\, $( \,\cdot \,ij\,\cdot \,,\, kl\cdot m \, )$ & $     t_{Ik} s_{IM} t_{Mil} s_{MD} t_{Dj} s_{DI} t_{Im} $\\

540: IIMD & \,\, $( \,\cdot \cdot ij\,,\, klm \,\cdot  \, )$ & $     t_{Ik} s_{II} t_{Il} s_{IM} t_{Mim} s_{MD} t_{Dj} $\\

541: IIDM & \,\, $( \,\cdot \cdot ij\,,\, kl\cdot m \, )$ & $     t_{Ik} s_{II} t_{Il} s_{ID} t_{Di} s_{DM} t_{Mjm} $\\

542: IDMI & \,\, $( \,\cdot ij\cdot \,,\, k\cdot lm \, )$ & $     t_{Ik} s_{ID} t_{Di} s_{DM} t_{Mjl} s_{MI} t_{Im} $\\

543: IDIM & \,\, $( \,\cdot i\cdot j\,,\, k\cdot lm \, )$ & $     t_{Ik} s_{ID} t_{Di} s_{DI} t_{Il} s_{IM} t_{Mjm} $\\

544: DMII & \,\, $( \,ij\cdot \cdot \,,\, \cdot \,klm \, )$ & $     t_{Di} s_{DM} t_{Mjk} s_{MI} t_{Il} s_{II} t_{Im} $\\

545: DIMI & \,\, $( \,i\cdot j\cdot \,,\, \cdot \,klm \, )$ & $     t_{Di} s_{DI} t_{Ik} s_{IM} t_{Mjl} s_{MI} t_{Im} $\\

546: DIIM & \,\, $( \,i\cdot \cdot j\,,\, \cdot \,klm \, )$ & $     t_{Di} s_{DI} t_{Ik} s_{II} t_{Il} s_{IM} t_{Mjm} $\\

547: MMI & \,\, $( \,ij \,\cdot\,\, , \,\,klm \, )$ & $     t_{Mik} s_{MM} t_{Mjl} s_{MI} t_{Im} $\\

548: MIM & \,\, $( \,i \cdot j \,\,,\,\, klm \, )$ & $     t_{Mik} s_{MI} t_{Il} s_{IM} t_{Mjm} $\\

549: IMM & \,\, $( \,\cdot \,ij \,\,,\,\, klm \, )$ & $     t_{Ik} s_{IM} t_{Mil} s_{MM} t_{Mjm} $\\ \hline

550: %\end{matrix}

551: %$$

552: \end{tabular}

553: \end{center}

554: \caption{Alignments for a pair of sequences of length $2$ and $3$.}

555: \end{table}

556: The polynomial $f_{\sigma^1,\sigma^2}$ is the sum of the

557: $25$ monomials (of degree $9,7,5$) in the rightmost column.

558: For instance, if we consider strings over the binary

559: alphabet $\{0,1\}$, then there are $17$ parameters

560: (nine $s$-parameters and eight $t$-parameters), and

561: the pair HMM for alignment is the image of a map

562: $ \, f : {\bf R}^{17} \rightarrow {\bf R}^{32}$.

563: The coordinate of $f$ which is indexed by

564: $(i,j,k,l,m)  \in \{0,1\}^5$ equals the

565: $25$-term polynomial gotten by summing the

566: rightmost column in Table 1.

567:

568: The parametric inference problem for sequence alignment is solved

569: by computing the Newton polytopes $NP(f_{\sigma_1,\sigma_2})$ with the

570: polytope propagation algorithm.

571: In the terminology introduced in \cite[\S 4]{Pachter:04},

572: an observation $\sigma$ in the pair HMM is the pair of sequences

573: $(\sigma_1,\sigma_2)$, and the possible explanations

574: are the optimal alignments of these sequences with

575: respect to the various choices of parameters.

576: In summary, the vertices of the Newton polytope

577: $NP(f_{\sigma_1,\sigma_2})$ correspond to the optimal alignments.

578: If the observed sequences $\sigma_1,\sigma_2$ are not fixed then we are in the situation of

579: \cite[Proposition 6]{Pachter:04}.

580:  Each parameter choice

581: defines a function from pairs of sequences to alignments:

582: $$\, \{0,\ldots,l-1\}^n \times \{0,\ldots,l-1\}^m

583: \rightarrow {\cal A}_{n,m} \,,\quad ( \sigma_1,\sigma_2) \mapsto \hat {\bf h}  .$$

584: The number of such functions

585: grows doubly-exponentially in $n$ and $m$, but only

586: a tiny fraction of them are \emph{inference functions},

587: which means they correspond to the vertices of the Newton polytope

588: of the map $f$.

589: It is an interesting combinatorial problem to characterize

590: the inference functions for sequence alignment.

591:

592: An important observation is that our formulation in Problem 2 is equivalent to

593: combinatorial ``scoring schemes'' or ``generalized edit distances'' which

594: can be used to assign weights to alignments \cite{Bucher:96}.

595: For example, the simplest scoring scheme consists of two parameters:

596:  a mismatch score $mis$, and an indel score $gap$ \cite{Fernandez-Baca:00, Gusfield:94, Waterman:92}.

597: The weight of an alignment is the sum of the scores for all positions in the alignment, where a match is assigned a score of $1$.

598: This is equivalent to specializing the logarithmic parameters

599: $U = - {\rm log} (S)$ and $V = - {\rm log} (T)$ of the pair hidden Markov model as follows:

600: \begin{equation}

601: \label{specialize}

602: u_{ij} = 0, \quad

603: v_{Mij}=1 \,\hbox{ if $i=j$}, \,\,\,\,

604: v_{Mij}=mis\, \hbox{ if $i \neq j$, and }\,\,\,\,

605: v_{Ij} =  v_{Di} = gap

606: \qquad \hbox{for all $i,j$}.

607: \end{equation}

608: This specialization of the parameters

609: corresponds to intersecting the normal fan of

610: the Newton polytope with a two-dimensional affine subspace

611: (whose coordinates are called $mis$ and $gap$).

612:

613: Efficient software for parametrically aligning the sequences with two free parameters

614: already exists (XPARAL \cite{Gusfield:96}).

615: Consider the example of the following two sequences:

616: $\sigma^1=AGGACCGATTACAGTTCAA$ and $\sigma^2=TTCCTAGGTTAAACCTCATGCA$. XPARAL will return four cones, however a computation of the Newton polytope reveals seven vertices (three correspond to positive $mis$ or $gap$ values). The polytope propagation algorithm has

617: the same running time as XPARAL: for two sequences of

618: length $n,m$, the method requires $O(nm)$ two-dimensional convex hull computations. The number of points in each computation is bounded by the total

619: number of points in the final convex hull (or equivalently the number, $K$, of explanations). Each convex hull computation therefore

620: requires at most $O(K {\rm log}(K))$ operations, thus giving an $O(nmK {\rm log}(K))$ algorithm for solving the parametric alignment problem. However, this

621: running time can be improved by observing that the convex hull computations that need to be carried out have a very special form, namely in each

622: step of the algorithm we need to compute the convex hull of two superimposed convex polygons. This procedure is in fact a primitive of the divide

623: and conquer approach to convex hull computation, and there is a well known $O(K)$ algorithm for solving it  \cite[\S 3.3.5]{Preparata:85}. Therefore, for two parameters, our recursive approach

624: to solving the parametric problem yields an $O(Kmn)$ algorithm, matching the running time of XPARAL and the conjecture of Waterman, Eggert and Lander \cite{Waterman:92}.

625:

626: \begin{figure}[ht]

627:   \begin{center}

628:    \includegraphics[scale=1.2]{alignment_pic2.ps}

629:   \end{center}

630:   \caption{Edge graph of the Newton polytope for a four parameter alignment problem.}

631:   \label{fig:parametric}

632: \end{figure}

633:

634:

635: In order to demonstrate the practicality of our approach for higher-dimensional problems, we implemented a four parameter recursive parametric alignment solver. The more

636: general alignment model includes different transition/transversion parameters (instead of just one mismatch parameter), and separate parameters for

637: opening gaps and extending gaps. A transition is mutation from one purine ($A$ or $G$) to another, or from one pyrimidine ($C$ or $T$) to another, and a transversion is a mutation

638: from a purine to a pyrimidine or vice versa. More precisely, if we let $P_u=\{A,G\}$ and $P_y=\{C,T\}$ the model is:

639: \begin{eqnarray*}

640: \label{specialize2}

641: u_{MM} = u_{IM} = u_{DM} & = &  0\\

642: u_{MI} = u_{MD} & = & gapopen\\

643: u_{II} = u_{DD} & = & gapextend\\

644: v_{Mij} & = & \hbox {$1$ if $i=j$}\\

645: v_{Mij} & = & transt\, \  \hbox {if $i \neq j$, and $i,j \in P_u$ or $i,j \in P_y$}\\

646: v_{Mij} & = & transv\, \ \hbox {if $i \neq j$, and $i \in P_u, j \in P_y$ or vice versa}\\

647: v_{Ij} =  v_{Di} & = & \hbox{$0$ for all $i,j$}.

648: \end{eqnarray*}

649:

650: For the two sequences $\sigma^1$ and $\sigma^2$ in the example above, the number of vertices of the four dimensional

651: Newton polytope (shown in Figure 4) is $224$ (to be compared to $7$ for the two parameter case).

652:

653:

654: \section{Practical Aspects of Parametric Inference}

655:

656: We begin by pointing out that parametric inference is useful for Bayesian computations. Consider the problem where we have a prior distribution $\pi(s)$ on our parameters

657: $s = (s_1,\ldots,s_d)$, and we would like to compute the posterior probability of a maximum a posteriori explanation $\widehat {\bf h}$:

658: \begin{equation}

659: \label{Bayesian}

660: {\rm Prob}({\bf X} = \widehat {\bf h} \,|\, {\bf Y} = {\bf \sigma}) \quad

661: = \quad \int_{s} {\rm Prob}({\bf X}= \widehat {\bf h} \,|\, {\bf Y}

662:  = {\bf \sigma},\,s_1,\ldots,s_d \,)\pi(s)  ds.

663: \end{equation}

664: This is an important problem, since it can give a quantitative assessment of the validity of $\widehat {\bf h}$ in a setting where we have prior, but not certain, information about the parameters, and also because we may want to sample $\widehat {\bf h}$ according to its posterior distribution (for an example of how this can be applied in computational biology see \cite{Liu:94}). Unfortunately, these integrals may be difficult to compute. We propose the following simple

665:  Monte Carlo algorithm for computing a numerical approximation

666:  to the integral (\ref{Bayesian}):

667:

668: \begin{prop}

669: Select $N$ parameter vectors $s^{(1)},\ldots,s^{(N)}$

670:  according to the distribution $\pi(s)$, where

671:  $N$ is much larger than the

672:  number of vertices of the Newton polytope $\,NP(f_\sigma)$.

673:  Let $K$ be the number of $s^{(i)}$

674:  such that $-{\rm log}(s^{(i)})$ lies in the normal cone of

675: $NP(f_\sigma)$ indexed by the explanation $\widehat {\bf h}$.

676:  Then $K/N$ approximates (\ref{Bayesian}).

677: \end{prop}

678:

679:  \begin{proof}

680:  The expression $\,

681:   {\rm Prob}({\bf X}= \widehat {\bf h} \,|\, {\bf Y}

682:  = {\bf \sigma},\,s_1,\ldots,s_d \,) \,$ is

683:  zero or one depending on whether the vector

684:  $-{\rm log}(s) = (-{\rm log}(s_1),\ldots,-{\rm log}(s_d))

685:  $ lies in the normal cone of

686: $NP(f_\sigma)$ indexed by $\widehat {\bf h}$. This membership test can be done without ever running the sum-product algorithm if we precompute an inequality representation of the normal cones.

687: \end{proof}

688:

689: The bound on the number of vertices of the Newton polytope

690: in \cite[\S 4]{Pachter:04} provides a valuable tool for

691: estimating the quality of this Monte Carlo approximation.

692: We believe that the tropical geometry developed in \cite{Pachter:04}

693: will also be useful for more refined analytical approaches to

694: Bayesian integrals.  The study of Newton polytopes

695: can also complement the algebraic geometry

696: approach to model selection proposed in \cite{Rusakov:02}.

697:

698:

699: Another application of parametric inference is to problems where the number of parameters may be very large, but where we want to fix a large subset of them, thereby reducing the dimensions of the polytopes. Gene finding models, for example, may have up to thousands of parameters and input sequences can be millions of base pairs long however, we are usually only interested in studying the dependence of inference on a select few. Although specializing parameters reduces the dimension of the parameter space, the explanations correspond to vertices of a

700: \emph{regular subdivision of the Newton polytope}, rather than just to the vertices of the polytope itself. This is explained below (readers may also

701: want to refer to \cite{Pachter:04} for more background).

702:

703: Consider a graphical model with parameters $s_1,\ldots, s_{d}$

704:  of which the parameters $s_1,\ldots , s_{r}$ are

705: free but $\, s_{r+1} = S_{r+1}, \ldots, s_d = S_d \,$

706: where the $S_i$ are fixed non-negative numbers.

707: Then the coordinate polynomials $f_\sigma$ of our model

708: specialize to polynomials in $r$ unknowns

709: whose coefficients $c_a$ are non-negative numbers:

710: $$\, \tilde f_\sigma(s_1,\ldots,s_{r}) \quad = \quad

711:  f_\sigma(s_1,\ldots,s_{r}, S_{r+1}, \ldots, S_{d})

712: \quad = \quad \sum_{a \in {\bf N}^r} c_a \cdot s_1^{a_1} \cdots s_{r}^{a_r}. $$

713: The \emph{support} of this polynomial is the finite set

714: $\, {\cal A}_\sigma \, = \, \{\, a \in {\bf N}^r \, : \,c_a > 0 \,\}$.

715: The convex hull of   $\, {\cal A}_\sigma\,$ in ${\bf R}^r$

716: is the  Newton polytope of the polynomial $\tilde f_\sigma  = \tilde f_\sigma(s_1,\ldots,s_r)$. For example, in the case of the hidden Markov model with output parameters specialized,

717: the Newton polytope of

718:  $\tilde f_{\sigma}$ is the polytope associated with a Markov chain.

719:   Kuo \cite{Kuo:04} shows that the size of these

720:   polytopes does not depend on the length of the chain.

721:

722: Let ${\bf h}$ be any explanation for $\sigma$ in the original model

723: and let $(u_1,\ldots,u_r,u_{r+1}, \ldots,u_n)$ be the vertex

724: of the Newton polytope of $f_\sigma$ corresponding

725: to that explanation. We abbreviate $\,a_{\bf h} = (u_1,\ldots,u_r)\,$

726: and $\, S_{\bf h} \, = \,S_{r+1}^{u_{r+1}} \cdots S_d^{u_d}$.

727: The assignment

728: $\, {\bf  h} \mapsto a_{\bf h}\,$ defines  a map

729: from the set of explanations of $\sigma$ to the support

730: $\, {\cal A}_\sigma$. The convex hull of

731: the image coincides with the Newton polytope of $\,\tilde f_\sigma$.

732: We define

733: \begin{equation}

734: \label{fromHtoA}

735:  w_a \, = \, {\rm min} \bigl\{ \, - {\rm log}(S_{\bf h}) \, \, :\,\,

736: {\bf h} \, \, \hbox{is an explanation for } \, \sigma \,\,\, \hbox{with}\,\,\,

737: a_{\bf h} = a \, \bigr\}.

738: \end{equation}

739: If the specialization is sufficiently generic

740: then this maximum is attained uniquely,

741: and, for simplicity, we will assume that this is the case.

742: If a point $a \in {\cal A}_\sigma$ is not the image of any explanation ${\bf h}$ then

743: we set $w_a = \infty$.

744: The assignment $a \mapsto w_a$ is a real valued function

745: on the support of our polynomial $\tilde f_\sigma$,

746: and it defines a \emph{regular polyhedral subdivision} $\, \Delta_w \,$

747: of the Newton polytope $NP(\tilde f_\sigma)$. Namely, $\Delta_w$ is the polyhedral

748: complex  consisting of all lower faces of the polytope gotten by taking the

749: convex hull of the points $(a,w_a)$ in ${\bf R}^{r+1}$.

750:   See \cite{Sturmfels:96} for details on regular triangulations

751:   and regular polyhedral subdivisions.

752:

753: \begin{thm}

754: The explanations for the observation $\sigma$ in the specialized model are

755: in bijection with the vertices of the regular polyhedral subdivision $\, \Delta_w \,$

756: of the Newton polytope of the specialized polynomial $\, \tilde f_\sigma$.

757: \end{thm}

758:

759: \begin{proof}

760: The point $(a,w_a)$ is a vertex of $\Delta_w$ if and only if

761: the following open polyhedron is non-empty:

762: $$  P_a \quad = \quad \bigl\{ \, v \in {\bf R}^r \,\, : \,\,

763:  a \cdot v + w_a \, < \, a' \cdot v + w_{a'}\,\, \hbox{for all}\,\,

764: a \in {\cal A}_\sigma \backslash \{a\} \, \bigr\}. $$

765: If $v$ is a point in  $P_a$ then we set

766: $\,s_i = {\rm exp}(-v_i)\,$ for $i=1,\ldots,r$,

767:   and we consider the explanation ${\bf h}$

768: which attains the minimum in  (\ref{fromHtoA}).

769: Now all parameters have been specialized

770: and ${\bf h}$ is the solution to Problem 2.

771: This argument is reversible: any explanation for

772: $\sigma$ in the specialized model arises from

773: one of the non-empty polyhedra $P_a$.

774:   We note that the collection of polyhedra $P_a$ defines a polyhedral

775: subdivision of ${\bf R}^r$ which is geometrically dual

776: to the subdivision $\Delta_w$ of the Newton polytope

777: of $\tilde f_\sigma$.

778: \end{proof}

779:

780:  \vskip .1cm

781:

782:  In practical applications of parametric inference, it

783:  may be of interest to compute only one normal cone of the Newton polytope (for example the cone containing some fixed parameters). We conclude this section by observing that the polytope propagation algorithm is suitable for this computation as well:

784:

785: \begin{prop}

786: Let $v$ be a vertex of a $d$-dimensional Newton polytope of a hidden Markov model. Then the normal cone containing $v$ can be computed using a polytope propagation algorithm

787: in dimension $d-1$.

788: \end{prop}

789:

790: \begin{proof} We run the standard polytope propagation algorithm

791: described in Section 4,

792: but at each step we record only the minimizing vertex in the direction of the log parameters, together with its neighboring vertices in the edge graph of the Newton polytope. It follows, by induction, that given this information at the $n$th step, we can use it to find the minimizing vertices and related neighbors in the $(n+1)$st step.

793: \end{proof}

794:

795: \section{Summary}

796:

797: We envision a number of biological applications for the polytope propagation algorithm, including:

798:

799: \begin{itemize}

800: \item Full parametric inference using the normal fan of the Newton polytope of an observation when the graphical model under

801: consideration has only few model parameters.

802: \item Utilization of the edge graph of the polytope  to identify stable parts of

803: alignments and annotations.

804: \item Construction

805: of the normal cone containing a specific parameter vector

806: when computation of the full Newton polytope is infeasible.

807: \item Computation of the posterior probability

808:  (in the sense of Bayesian statistics) of an alignment

809: or annotation. The regions for the relevant integrations

810: are the normal cones of the Newton polytope.

811: \end{itemize}

812:

813:

814: As we have seen, the computation of Newton polytopes for (interesting) graphical models is certainly feasible for a few free parameters, and we expect that further analysis of the computational geometry should yield efficient algorithms in higher dimensions. For example, the key operation, computation of convex hulls of unions of convex polytopes, is likely to be considerably easier than general convex hull computations even in high

815: dimensions. Fukuda, Liebling and L\"{u}tlof \cite{Fukuda:01} give a polynomial time algorithm for computing extended convex hulls (convex hulls of unions of convex polytopes) under

816: the assumption that the polytopes are in general position. Furthermore, it should be possible to optimize the geometric algorithms for specific models of interest, and combinatorial analysis of the Newton polytopes arising in graphical models should yield better complexity estimates (see, e.g., \cite{ Fernandez-Baca:00, Gusfield:94}).

817:  Michael Joswig is currently working on a general polytope propagation implementation in POLYMAKE \cite{Gawrilow:00,Gawrilow:01}.

818:

819: In the case where computation of the Newton polytope is impractical, it is still possible to identify the cone containing a specific parameter, and this can be used to quantitatively measure the robustness of the inference. Parameters near a boundary are unlikely to lead to biologically meaningful results. Furthermore, the edge graph can be used to identify common regions in the explanations corresponding to adjacent vertices. In the case of alignment, biologists might see a collection of alignments rather than just one optimal one, with common sub-alignments highlighted. This is quite different from returning the $k$ best alignments, since suboptimal alignments may not be vertices of the Newton polytope. The solution we propose explicitly identifies all suboptimal alignments that can result from similar parameter choices.

820:

821: \section{Acknowledgments}

822: Lior Pachter was supported in part by a grant from the NIH (R01-HG02362-02).

823: Bernd Sturmfels was supported by

824: a Hewlett Packard Visiting Research Professorship 2003/2004

825: at MSRI Berkeley and in part  by the NSF (DMS-0200729).

826:

827: \nocite{*}

828: \begin{thebibliography}{26}

829:

830: \bibitem{Alexandersson:03} M. Alexandersson, S. Cawley and L. Pachter: SLAM - Cross-species Gene Finding and Alignment with a Generalized Pair Hidden Markov Model, Genome Research 13 (2003) 496--502.

831: \bibitem{Baldi:98} P.  Baldi and S.  Brunak:   Bioinformatics. The Machine Learning Approach. A Bradford Book.The MIT  Press. Cambridge, Massachusetts, 1998.

832: \bibitem{Bucher:96} P. Bucher and K. Hofmann: A sequence similarity

833: search algorithm based on a probabilistic interpretation of an alignment

834: scoring system, Proceedings of the Conference on Intelligent Systems for Molecular Biology, 1996,  44--51.

835: \bibitem{Durbin:98} R. Durbin, S. Eddy, A. Krogh and G. Mitchison: Biological Sequence Analysis (Probabilistic Models of Proteins and Nucleic Acids),

836: Cambridge University Press, 1998.

837: \bibitem{Fernandez-Baca:00} D. Fern\'andez-Baca, T. Sepp\"al\"ainen and G. Slutzki:

838: Parametric multiple sequence alignment and phylogeny construction,

839: in Combinatorial Pattern Matching, Lecture

840: Notes in Computer Science (R. Giancarlo and D. Sankoff eds.), Vol. 1848, 2000, 68--82.

841: \bibitem{Friedman:97} N. Friedman, D. Geiger and M. Goldszmidt: Bayesian network classifiers, Machine Learning 29 (1997) 131--161.

842: \bibitem{Fukuda:01} K. Fukuda, T.H. Liebling and C. L\"{u}tlof: Extended convex hull, Computational Geometry  20 (2001) 13--23.

843: \bibitem{Gardiner-Garden:87} M. Gardiner-Garden and M. Frommer: CpG islands in vertebrate genomes, Journal of Molecular Biology 196 (1987) 261--282.

844: \bibitem{Gawrilow:00} E. Gawrilow and M. Joswig: polymake: a Framework for Analyzing Convex Polytopes, Polytopes -- Combinatorics and Computation (G. Kalai and G.M. Ziegler eds.), Birkhh\"{a}user (2000).

845: \bibitem{Gawrilow:01} E. Gawrilow and M. Joswig: polymake: an Approach to Modular Software Design in Computational Geometry, Proceedings of the 17th Annual Symposium on Computational Geometry, ACM, 2001, 222--231.

846: \bibitem{Gusfield:94} D. Gusfield, K. Balasubramanian, and D. Naor: Parametric optimization of sequence alignment, Algorithmica 12 (1994) 312--326.

847: \bibitem{Gusfield:96} D. Gusfield and P. Stelling: Parametric and inverse-parametric sequence alignment with XPARAL, Methods Enzymology 266  (1996) 481--494.

848: \bibitem{Jordan:02} M.I. Jordan and Y. Weiss: Graphical Models:

849: Probabilistic Inference, in {\it Handbook of Brain Theory and

850: Neural Networks, 2nd edition}, M. Arbib (Ed.), Cambridge, MA, MIT

851: Press, 2002.

852: \bibitem{Kschischang:01} F. Kschischang, B. Frey, and H. A. Loeliger: Factor graphs and the sum-product algorithm, IEEE Trans. Inform. Theory 47 (2001) 498--519.

853: \bibitem{Kuo:04} E. Kuo, Viterbi sequences and polytopes, {\tt http://front.math.ucdavis.edu/math.CO/0401342}.

854: \bibitem{Lander:01} E.S. Lander et al.: Initial sequencing and analysis of the human genome, Nature 409 (2001) 860--921.

855: \bibitem{Liu:94} J. Liu: The collapsed Gibbs sampler with applications to a gene regulation problem, J. Amer. Statist. Assoc.~89 (1994) 958--966.

856: \bibitem{Pachter:04} L. Pachter and B. Sturmfels: Tropical geometry of statistical models,  companion paper, submitted.

857: \bibitem{Preparata:85} F. P. Preparata and M. I. Shamos: Computational Geometry- An Introduction, Springer Verlag 1985.

858: \bibitem{Rusakov:02}  D.~Rusakov and D.~Geiger: Asymptotic model

859: selection for naive Bayesian networks, Uncertainty in

860: Artificial Intelligence, 2002, 438--445.

861: \bibitem{Stanley:99} R. Stanley: Enumerative Combinatorics, Volume 2,

862: Cambridge University Press, 1999.

863: \bibitem{Sturmfels:96} B. Sturmfels: Gr\"obner Bases and Convex Polytopes, University Lecture Series, Vol. 8, American Mathematical Society, 1996.

864: \bibitem{Takai:02} D. Takai and P. A. Jones: Comprehensive analysis of CpG islands in human chromosomes 21 and 22, Proc. Natl. Acad. Sci. USA 99 (2002) 3740--3745.

865: \bibitem{Waterman:92} M. Waterman, M. Eggert and E. Lander: Parametric

866: sequence   comparisons, Proc. Natl. Acad. Sci. USA 89 (1992) 6090--6093.

867: \end{thebibliography}

868: \end{document}

869:

870: