0611:q-bio0611007/star.tex

1:

2: \documentclass{article}

3: %\documentclass{amsart}

4:

5: \usepackage{epsfig}

6: \usepackage{latexsym}

7: \usepackage{amsmath}

8: \usepackage{amssymb}

9:

10: \setlength{\unitlength}{1mm}

11: \parskip=1em

12:

13: %\usepackage{showkeys}

14: %\usepackage[notref,notcite]{showkeys}

15: \newtheorem{theorem}{Theorem}[section]

16: \newtheorem{proposition}[theorem]{Proposition}

17: \newtheorem{conjecture}[theorem]{Conjecture}

18: \newtheorem{lemma}[theorem]{Lemma}

19: \newtheorem{corollary}[theorem]{Corollary}

20: \newtheorem{example}[theorem]{Example}

21: \newtheorem{problem}[theorem]{Problem}

22:

23: \newcommand{\F}{\mathcal F}

24: \newcommand{\T}{\mathcal T}

25: \newcommand{\RR}{{\mathbb R}}

26: \newcommand{\PP}{\mathbb P}

27: \newcommand{\EE}{{\mathbb E}}

28: \newcommand{\II}{{\mathbb I}}

29: \newcommand{\ba}{{\backslash}}

30:

31: \newcommand\ring[1]{\mathaccent23{#1}}

32: \def\Er{\ring{E}} % {\mbox{\r{\itshape E}}}

33: \def\Vr{\ring{V}} % {\mbox{\r{\itshape V}}

34: \newcommand{\old}[1]{{}}

35: \newcommand{\mnote}[1]{\marginpar{\raggedright\footnotesize\em#1}}

36:

37: \renewcommand{\baselinestretch}{1.6}

38:

39: \title{The Bayesian `star paradox' persists  for long finite sequences}

40: \author{Mike Steel and Frederick A. Matsen\\

41: Allan Wilson Centre for Molecular Ecology and Evolution \\

42: \\ \\

43: Corresponding author:\\

44: Mike Steel \\

45: Biomathematics Research Centre\\

46: Department of Mathematics and Statistics\\

47: University of Canterbury\\

48: Private Bag 4800\\

49: Christchurch, New Zealand\\

50: Phone: +64-3-364-2987 ext. 7688\\

51: Fax: +64-3-364-2587\\

52: Email: M.Steel@math.canterbury.ac.nz

53: }

54:

55:

56:

57:

58:

59: \begin{document}

60:

61: \maketitle

62:

63: {\noindent Keywords: phylogenetic trees, Bayesian statistics, star trees}

64:

65: \vspace{-4pt}

66: {\noindent Running head: The star paradox persists}

67:

68: \newpage

69:

70: \begin{abstract}

71: The `star paradox' in phylogenetics is the tendency for a particular resolved tree to be sometimes strongly supported even when the data is generated by an unresolved (`star') tree. There have been contrary claims as to whether this phenomenon persists when very long sequences are considered. This note settles one aspect of this debate by proving mathematically that there is always a chance that a resolved tree could be strongly supported, even as the length of the sequences becomes very large.

72: \end{abstract}

73:

74:

75:

76: \section{Introduction}

77:

78: Two recent papers (Yang and Rannala 2005; Lewis, Holder and

79: Holsinger 2005) highlighted a phenomenon that occurs when sequences

80: evolve on a tree that contains a polytomy - in particular a

81: three-taxon unresolved rooted tree.   As longer sequences are analysed

82: using a Bayesian approach, the posterior probability of the trees that

83: give the different resolutions of the polytomy do not converge on

84: relatively equal probabilities - rather a given

85: resolution can sometimes have a posterior probability close to one.  In

86: response Kolaczkowski and Thornton (2006) investigated this phenomena

87: further, providing some interesting simulation results, and offering

88: an argument that seems to suggest that for very long sequences the

89: tendency to sometimes infer strongly supported resolutions suggested by

90: the earlier papers would disappear with sufficiently long sequences.

91: As part of their case the authors use the expected site frequency

92: patterns to simulate the case of infinite length sequences, concluding that

93: ``with infinite length data, posterior probabilities give equal

94: support for all resolved trees, and the rate of false inferences falls

95: to zero."  Of course these findings concern sequences that are

96: effectively infinite, and, as is well known in statistics, the limit

97: of a function of random variables (in this case site pattern

98: frequencies for the first $n$ sites) does not necessarily equate with

99: the function of the limit of the random variables. Accordingly

100: Kolaczkowski and Thornton

101: offer this appropriate cautionary qualification of their findings:

102:

103: ``Analysis of ideal data sets does not indicate what will happen when

104: very large data sets with some stochastic error are analyzed, but it does

105: show that when infinite data are generated on a star tree, posterior

106: probabilities are predictable, equally supporting each possible resolved

107: tree."

108:

109: Yang and Rannala (2005) had attempted to simulate the large sample

110: posterior distribution, but ran into numerical problems and commented

111: that it was ``unclear'' what the  limiting distribution on

112: posterior probabilities was as $n$ became large.

113:

114: In particular, all of the aforementioned papers have left open an

115: interesting statistical question, which this short note formally answers

116: - namely, does the Bayesian posterior probability of the three

117: resolutions of a star tree on three taxa converge to 1/3 as the sequence

118: length tends to infinity? That is, does the distribution on posterior

119: probabilities for `very long sequences' converge on the distribution for

120: infinite length sequences?  We show that for most reasonable priors it

121: does not. Thus the `star paradox' does not disappear as the

122: sequences get longer.

123:

124: As noted by (Yang and Rannala 2005; Lewis, Holder and Holsinger 2005) one can demonstrate such phenomena more

125: easily for related simpler processes such as coin tossing (particularly if one imposes a particular prior). Here we

126: avoid this simplification to avoid the criticism that such results do

127: not rigorously establish corresponding phenomena in the phylogenetic setting,

128: which in contrast to coin tossing involves considering a parameter

129: space of dimension greater than 1. We also frame our main result so that it applies to a fairly general class of priors.

130: Note also that it is not the purpose of this short note to add to

131: the on-going debate concerning  the implications of this `paradox' for Bayesian phylogenetic

132: analysis, we merely demonstrate its existence. Some further comments and

133: earlier references on the phenomenon have been described in the recent review

134: paper by Alfaro and Holder 2006 (pp. 35-36).

135: \begin{figure}[h]

136: \begin{center} \label{starfig}

137: \resizebox{12cm}{!}{

138: \input{star.pstex_t}

139: }

140: \caption{The three resolved rooted phylogenetic trees on three taxa $T_1, T_2, T_3$, and the unresolved `star' tree on which the sequences are generated $T_0$.}

141: \end{center}

142: \label{overview}

143: \end{figure}

144:

145: \section{Analysis of the star tree paradox for three taxa}

146: On tree $T_1$ (in Fig. 1) let $p_i = p_i(t_0, t_1)$, $i = 0,1,2,3$ denote the probabilities of the four

147: site patterns ($xxx, xxy, yxx, xyx$, respectively) under the simple $2$--state

148: symmetric Markov process (the argument extends to more general models,

149: but it suffices to demonstrate the phenomena for this simple model).

150: From Eqn. (2) of (Yang and Rannala 2005) we have

151: $$p_0(t_0, t_1) = \frac{1}{4}(1+e^{-4t_1} + 2e^{-4(t_0+t_1)}),$$

152: $$p_1(t_0, t_1) = \frac{1}{4}(1+e^{-4t_1} -2e^{-4(t_0+t_1)}),$$

153: and

154: $$p_2(t_0, t_1) = p_3(t_0, t_1) = \frac{1}{4}(1-e^{-4t_1}).$$

155: It follows by elementary algebra  that for $i=2,3$,

156: \begin{equation}

157: \label{ineq1}

158: \frac{p_1(t_0, t_1)}{p_i(t_0, t_1)} \geq 1+ 2e^{-4t_1}(1-e^{-4t_0}),

159: \end{equation}

160:  and thus $p_1(t_0, t_1) \geq p_i(t_0, t_1)$ with strict inequality unless

161: $t_0=0$ (or in the limit as $t_1$ tends to infinity).

162:

163: To allow maximal generality we make only minimal neutral assumptions about the

164: prior distribution on  trees and branch lengths. Namely  we assume that the three resolved trees on three leaves (trees $T_1, T_2, T_3$ in Fig. 1) have equal prior probability $\frac{1}{3}$ and that the prior distribution on branch lengths

165: $t_0, t_1$ is the same for each tree,  and  has a continuous joint

166: probability density function that is everywhere non-zero. This condition

167: applies for example to the exponential and gamma priors discussed by Yang and Rannala

168: (2005).  Any prior that satisfies these conditions we call {\em reasonable}. Note

169: that we do not require that $t_0$ and $t_1$ be independent.

170:

171: Let ${\bf n} = (n_0, n_1, n_2, n_3)$ be the counts of the different types of

172: site patterns (corresponding to the same patterns as for the $p_i$'s). Thus $n= \sum_{i=0}^3 n_i$ is the total number of sites (i.e. the length of the sequences).

173: Given a prior distribution on $(t_0, t_1)$ for the branch lengths of $T_i$ (for $i=1,2,3$)

174: let $\PP[T_i|{\bf n}]$ be the posterior probability of tree $T_i$ given the site pattern counts ${\bf n}$.

175: Now suppose the $n$ sites are generated on a star tree $T_0$ with positive branch lengths.  We are interested in whether  the posterior probability $\PP[T_i|{\bf n}]$ could be close to 1 or whether the chance of generating data with this property goes to zero as the sequence length gets very large. We show that in fact the latter possibility is ruled out by our main result, namely the following:

176:

177: \begin{theorem}

178: \label{main}

179: Consider sequences of length $n$ generated by a star tree $T_0$ on three

180: taxa with strictly positive edge length $t_1^0$ and let ${\bf n}$ be

181: the resulting data (in terms of site pattern counts). Consider any prior

182: on the three resolved trees $(T_1, T_2, T_3)$ and their branch lengths that is reasonable (as defined above).  For any $\epsilon >0$, and each resolved tree $T_i$

183: ($i=1,2,3$),

184: the probability that ${\bf n}$ has the property that $$\PP(T_i|{\bf n}) >

185: 1-\epsilon$$  does not converge to $0$ as $n$ tends to infinity.

186: \end{theorem}

187: {\em Proof of Theorem~\ref{main}}

188: Consider the star tree $T_0$ with given branch lengths

189: $t_1^0$ (as in Fig. 1). Let $(q_0, q_1, q_2, q_3)$ denote the probability of the four types of site patterns generated by $T_0$ with these branch lengths. Note that $q_1=q_2=q_3$ and so $q_0 = 1-3q_1$).

190: Suppose we generate $n$ sites on this tree, and let $n_0, n_1, n_2, n_3$ be the counts of the different types of

191: site patterns (corresponding to the $p_i$'s). Let $\Delta_0: = \frac{n_0-q_0n}{\sqrt{n}}$ and

192: for $i=1,2,3$ let $$\Delta_i:= \frac{n_i - \frac{1}{3}(n-n_0)}{\sqrt{n}}.$$

193: For a  constant $c>1$, let $F_c$ denote the event: $$F_c:  \Delta_2,

194: \Delta_3 \in [-2c, -c] \mbox { and } \Delta_0 \in [-c,c].$$ Notice that

195: $F_c$ implies $\Delta_1 \in  [2c, 4c]$ since

196: $\Delta_1+\Delta_2+\Delta_3=0$. By standard stochastic arguments (based

197: on the asymptotic approximation of the multinomial distribution by the

198: multinormal distribution) event $F_c$ has probability at least some value

199: $\delta'=\delta'(c)>0$ for all $n$ sufficiently large (relative to

200: $c$).

201:

202: Given the data ${\bf n}=(n_0, n_1, n_2, n_3)$ the assumption of equality

203: of priors across $T_1, T_2$ and $T_3$ implies that

204: \begin{equation}

205: \label{identities1}

206: \PP(n_0, n_1, n_2, n_3|T_2, t_0, t_1) = \PP(n_0, n_2, n_3, n_1|T_1,

207: t_0, t_1),

208: \end{equation}

209:  and

210: \begin{equation}

211: \label{identities2}

212: \PP(n_0, n_1, n_2, n_3|T_3, t_0, t_1) = \PP(n_0, n_3, n_1, n_2|T_1,

213: t_0, t_1).

214: \end{equation}

215:

216: Now, as $(t_0, t_1)$ are random variables with some prior density, when

217: we view $p_0, p_1, p_2, p_3$ as random variables by virtue of their

218: dependence on $(t_0, t_1)$, we will write them as $P_0, P_1, P_2,

219: P_3$ (note that Yang and Rannala (2005) use $P_i$ differently). With this notation, the posterior probability of $T_1$ conditional on ${\bf n}$ can be

220: written as

221: $$\PP(T_1|{\bf n})= p({\bf n})^{-1}\times\EE_1[P_0^{n_0}P_1^{n_1}P_2^{n_2}P_3^{n_3}]$$

222: where $ p({\bf n})$ is the posterior probability of ${\bf n}$

223: and $\EE_1$ denotes expectation with respect to the prior for

224: $t_0, t_1$ on $T_1$.

225: Moreover since $P_2=P_3$, we can write this as

226: $\PP(T_1|{\bf n})=  p({\bf n})^{-1}

227: \times\EE_1[P_0^{n_0}P_1^{n_1}P_2^{n_2+n_3}].$ By (\ref{identities1}) and (\ref{identities2})  we have

228: $$\PP(T_2|{\bf n}) = p({\bf n})^{-1} \times \EE_1[P_0^{n_0}P_1^{n_2}P_2^{n_1+n_3}]; \mbox{ and } \PP(T_3|{\bf n}) = p({\bf n})^{-1} \times \EE_1[P_0^{n_0}P_1^{n_3}P_2^{n_1+n_2}]$$

229: where again expectation is taken with respect to the prior for $t_0,

230: t_1$ on $T_1$. Consequently,

231: \begin{equation}

232: \label{eqfrac}

233: \frac{\PP(T_1|{\bf n})}{\PP(T_2|{\bf n})} = \frac{\EE_1[X]}{\EE_1[Y]},

234: \end{equation}

235: where $$X= P_0^{n_0}P_1^{n_1}P_2^{n_2+n_3} \mbox{ and }

236: Y=P_0^{n_0}P_1^{n_2}P_2^{n_1+n_3}.$$ As will be shown later, it

237: suffices to demonstrate that the ratio in (\ref{eqfrac})

238: can be large with nonvanishing

239: probability in order to obtain the conclusion of the theorem. In order

240: to do so we use the following lemma, whose proof is provided in the Appendix.

241:

242: \begin{lemma}

243: \label{lem3}

244: Let $X,Y$ be non-negative continuous random variables, dependent on a third random variable $\Lambda$ that takes values in an interval $I=[a,b]$.  Suppose that for some interval $I_0$ strictly inside $I$, and $I_1=I-I_0$ the following inequality holds:

245: \begin{equation}

246: \label{eqin}

247: \EE[Y|\Lambda \in I_0] \geq \EE[Y|\Lambda \in I_1],

248: \end{equation}

249: and that for some constant $B>0$, and all $\lambda \in I_0$,

250: \begin{equation}

251: \label{eqin2}

252: \frac{\EE[X|\Lambda=\lambda]}{\EE[Y|\Lambda=\lambda]} \geq B.

253: \end{equation}  Then,

254: $\frac{\EE[X]}{\EE[Y]} \geq B\cdot \PP(\Lambda \in I_0)$.

255: \end{lemma}

256: \noindent

257: To apply this lemma,

258: select a value $s>0$ so that $\frac{1}{4} + s < q_0 < 1-s$, and let

259: $I_0 = [q_0-s, q_0+s]$. Then let

260: $I = [\frac{1}{4}, 1]$ and $I_1 = I - I_0$.

261:

262: {\em Claim:} For $n$ sufficiently large, and conditional on the data

263: ${\bf n} = (n_0, n_1, n_2, n_3)$ satisfying $F_c$:

264: \begin{itemize}

265: \item[(i)]  $\EE_1[Y|P_0 \in I_0] \geq \EE_1[Y|P_0 \in I_1]$

266: \item[(ii)] For all $p_0 \in I_0$,  $\frac{\EE_1[X|P_0=p_0]}{\EE_1[Y|P_0=p_0]}

267:   \geq 6c^2$.

268: \end{itemize}

269: The proofs of these two claims is given in the Appendix.

270:

271:

272: Applying Lemma~\ref{lem3} to the Claims (i) and (ii) we deduce that

273: conditional on ${\bf n}$ satisfying $F_c$ and

274: $n$ being sufficiently large,

275: \begin{equation}

276: \label{cbound}

277: \frac{\EE_1[X]}{\EE_1[Y]} \geq 6c^2 \cdot \PP(P_0 \in I_0).

278: \end{equation}

279:

280: \noindent

281: Select $c > \frac{1}{\sqrt{3\epsilon \PP(P_0 \in I_0)}}$ (this is

282: finite by the assumption that the prior on $(t_0, t_1)$ is everywhere

283: non-zero).   As stated before, the probability that ${\bf n}$ satisfies $F_c$ is at least $\delta'=\delta'(c)>0$  for $n$ sufficiently large.

284: Then, $6c^2 \cdot \PP(P_0 \in I_0) > \frac{2}{\epsilon}$

285: and so by (\ref{cbound}), $\frac{\PP(T_1|{\bf n})}{\PP(T_2|{\bf n})} = \frac{\EE_1[X]}{\EE_1[Y]} > \frac{2}{\epsilon}.$

286: Similarly, $\frac{\PP(T_1|{\bf n})}{\PP(T_3|{\bf n})}  > \frac{2}{\epsilon}.$

287: Now, since $\PP(T_1|{\bf n})+\PP(T_2|{\bf n})+\PP(T_3|{\bf n})=1$ it now

288: follows that, for $n$ sufficiently large, and conditional an event of

289: probability at least $\delta'>0$, that $\PP(T_1|{\bf n})> 1-\epsilon$ as claimed.  This completes the proof.

290: \hfill$\Box$

291:

292:

293:

294:

295:

296:

297: \newpage

298: \subsection{Concluding remarks}

299:

300: One feature of the argument we have provided is that it does not require

301: stipulating in advance a particular prior on the branch lengths -- that

302: is, the result is somewhat generic as it imposes relatively few

303: conditions. Moreover,

304:  the requirement that the prior on $(t_0, t_1)$ be everywhere

305: non-zero could be weakened to simply being non-zero in a neighborhood of $(0,

306: t_1^0)$ (thereby allowing, for example,  a uniform distribution on

307: bounded range).

308:

309: A interesting open question in the spirit of this paper is to

310: explicitly calculate the limit of the posterior density $f(P_1, P_2,

311: P_3)$ described in (Yang and Rannala 2005).

312:

313:

314: \subsection{Acknowledgments} MS thanks Ziheng Yang for suggesting the

315: problem of computing the limiting distribution of posterior

316: probabilities for 3-taxon trees. This work is funded by the

317: \emph{Allan Wilson Centre for Molecular Ecology and Evolution}.

318:

319:

320: \section*{References}

321: \noindent

322: Alfaro ME, Holder MT. 2006. The posterior and prior in Bayesian

323: phylogenetics. Annu. Rev. Evol. Syst. 37: 19-42.

324:

325: \noindent

326: Kolaczkowski B, Thornton JW. 2006.

327: Is there a star tree paradox? Mol. Biol. Evol. 23: 1819--1823.

328:

329:

330: \noindent

331: Lewis PO, Holder MT, Holsinger KE. 2005. Polytomies and Bayesian

332: phylogenetic inference. Syst. Biol. 54 (2): 241-253.

333:

334: \noindent

335: Yang Z, Rannala B. 2005. Branch-length prior influences Bayesian

336: posterior probability of phylogeny.

337: Syst. Biol. 54 (3): 455-470.

338:

339:

340: \newpage

341:

342: \section{Appendix: Proof of Lemma ~\ref{lem3} and Claims (i), (ii)}

343:

344: {\em Proof of Lemma~\ref{lem3}}:

345: \noindent For $W =X,Y$ we have

346: \begin{equation}

347: \label{equatione1}

348: \EE[W] = \EE[W|\Lambda \in I_0]\PP(\Lambda \in I_0) + \EE[W|\Lambda \in I_1]\PP(\Lambda \in I_1).

349: \end{equation}

350: In particular, for $W = X$ we have: $\EE[X] \geq  \EE[X|\Lambda \in

351: I_0]\PP(\Lambda \in I_0)$. Note that

352: (\ref{eqin2}) implies that $\EE[X|\Lambda \in I_0] \geq B \cdot

353: \EE[Y|\Lambda \in I_0]$, so

354: \begin{equation}

355: \label{equatione2}

356: \EE[X] \geq  B \cdot \EE[Y|\Lambda \in I_0]\PP(\Lambda \in I_0).

357: \end{equation}

358: Taking $W=Y$ in (\ref{equatione1}) and applying (\ref{eqin}) gives us $$\EE[Y] \leq \EE[Y|\Lambda \in I_0](\PP(\Lambda \in I_0)+\PP(\Lambda \in I_1))=\EE[Y|\Lambda \in I_0]$$

359: which combined with (\ref{equatione2}) gives the result.

360: \hfill$\Box$

361:

362: \noindent{\em Proof of Claim (i)}, $\EE_1[Y|P_0 \in I_0] \geq

363: \EE_1[Y|P_0 \in I_1]$:

364:

365: \noindent

366: We will first bound $\EE_1[Y|P_0 \in I_1]$ above.

367: Let $\mu(n) = (q_0^{q_0}q_1^{q_1}q_2^{q_2}q_3^{q_3})^n$.

368: Now,  conditional on ${\bf n}$ satisfying $F_c$ we have

369: $$n^{-1} \log\left(\mu(n) /Y(t_0, t_1)\right) = d_{KL}(q,p) + o(1),$$

370: where $p = (p_0, p_1, p_2, p_3)$ and $q = (q_0, q_1, q_2, q_3)$, and

371: $d_{KL}$ denotes Kullback-Leibler distance.

372: Now, $d_{KL}(q,p) \geq \frac{1}{2} \|q-p\|_1^2 \geq \frac{1}{2} |q_0 -

373: p_0|^2$ (the first inequality is a standard one in probability theory).

374: In particular, if $p_0 \in I_1$, then $|q_0 - p_0| > s>0$.

375: Moreover, $$\EE_1[Y|P_0 \in I_1] \leq \max\{Y(t_0, t_1): p_0(t_0, t_1) \in I_1\}.$$

376: Summarizing,

377: \begin{equation}

378: \label{eqo1}

379: \EE_1[Y|P_0 \in I_1] \leq \max\{Y(t_0, t_1): p_0(t_0, t_1) \in I_1\} < \mu(n)e^{-\frac{1}{2}s^2n + o(n)}.

380: \end{equation}

381: In the reverse direction, we have:

382: $$\EE_1[Y|P_0 \in I_0] \geq A(n)B(n)$$

383: where

384: $$A(n) = \min \left\{Y(t_0, t_1): (t_0, t_1) \in [0, n^{-1}] \times

385:   [t_1^0, t_1^0+n^{-1}] \right\}$$

386: and

387: $$B(n)= \PP\left((t_0, t_1 \right)

388: \in [0, n^{-1}] \times [t_1^0, t_1^0 + n^{-1}]).$$

389: Now,

390: $$A(n)/\mu(n) =

391: \left(\frac{p_0^{q_0}p_1^{q_1}p_2^{2q_1}}{q_0^{q_0}q_1^{3q_1}}\right)^n \cdot

392: (p_1^{\Delta_2-\frac{1}{3}\Delta_0}p_2^{\Delta_1+\Delta_3-\frac{2}{3}\Delta_0})

393: ^{\sqrt{n}}.$$

394: Now, the first term of this product converges to a constant as $n$ grows

395: (because $p_0 -q_0, p_1-q_1$ and $p_2-q_1$

396: are each of order $n^{-1}$)  while the condition $F_c$ ensures that the second

397: term decays no faster than $e^{-C_1 \sqrt{n}}$ for a constant $C_1$. Thus,

398: $A(n) \geq C_2\mu(n)e^{-C_1\sqrt{n}}$ for a positive constant $C_2$.

399: The term $B(n)$ is asymptotically proportional to $n^{-2}$.  Summarizing, for a constant $C_3>0$ (dependent just on $t_1^0$)

400: $$\EE_1[Y|P_0 \in I_0] \geq  C_3 \mu(n)n^{-2}e^{-C_1\sqrt{n}},$$ which

401: combined with (\ref{eqo1}) establishes claim (i) for $n$ sufficiently

402: large.

403: \hfill$\Box$

404:

405:

406:

407: In order to prove claim (ii)  we need some preliminary results.

408: \begin{lemma}

409: Let $\eta<1$.  Then for each $x >0$ there exists a value $K = K(x) < \infty$ that depends continuously on $x$ so that the following holds. For any continuous random variable $Z$ on $[0,1]$ with a smooth density function $f$ that satisfies $f(1) \neq 0$ and

410: $|f'(z)|<B$ for all $z \in (\eta, 1]$, we have

411: $$ k \cdot \frac{(\EE[Z^k] -

412:   \EE[Z^{k+1}])}{\EE[Z^k]} \geq \frac{1}{2}$$

413:   for all $k \geq K(\frac{B}{f(1)})$.

414: \label{lem1}

415: \end{lemma}

416: {\em Proof.}

417: Let $t_k = 1- \frac{1}{\sqrt{k}}$. Then $$\EE[Z^k] = \int_{0}^{t_k} t^k f(t)dt + \int_{t_k}^1 t^k f(t)dt.$$

418: Now,  $$0 \leq \int_{0}^{t_k} t^k f(t)dt \leq t_k^k \sim

419: e^{-\sqrt{k}-1/2},$$

420: where $\sim$ denotes asymptotic equivalence (i.e. $f(k) \sim g(k)$ iff

421: $\lim_{k \rightarrow \infty} f(k)/g(k) =1$).

422: Using integration by parts,

423: $$\int_{t_k}^1 t^k f(t)dt =  \frac{1}{k+1}\left.t^{k+1}f(t)\right|_{t_k}^1 -

424: \frac{1}{k+1}\int_{t_k}^1t^{k+1}f'(t)dt.$$

425: Now, provided $k>(1-\eta)^{-2}$ we have $t_k>\eta$ and so the absolute value of the second term on the right is at most

426: $\frac{B}{k+1}\int_{t_k}^1t^{k+1}dt  = \frac{B}{(k+1)(k+2)}(1-t_{k}^{k+2}).$

427: Consequently, $\left|\EE[Z^k] - \frac{f(1)}{k+1}\right|$ is bounded above by $B$

428: times a term of order $k^{-2}$.

429: A similar argument, again using integration by parts, shows that

430: $\left|k(\EE[Z^k]- \EE[Z^{k+1}]) -  \frac{f(1)}{k+1}\right|$ is bounded above by $B$

431:  times a term of order $k^{-2}$, and the lemma now follows by some routine analysis.

432: \hfill$\Box$

433:

434:

435: \begin{lemma}

436: \label{lem2}

437: Let $y = (1+2x)(1-x)^2$. Then for $x \in [0,1)$ and $m \geq 3$ we have

438: $$\left(\frac{1+2x}{1-x}\right)^m \geq  m^2(1-y).$$

439: \end{lemma}

440: {\em Proof.}

441: $$\left(\frac{1+2x}{1-x}\right)^m = \left(1+ \frac{3x}{1-x}\right)^m

442: \geq \frac{m(m-1)}{2}\left(\frac{3x}{1-x}\right)^2 \geq \frac{9m(m-1)x^2}{2},$$ and

443: $m^2(1-y) = m^2(3x^2-2x^3) \leq 3m^2x^2$ and for $m\geq 3$ this upper bound is less than the lower bound in the previous

444: expression.

445: \hfill$\Box$

446:

447:

448: \noindent {\em Proof of Claim (ii)}, for all $p_0 \in I_0$,  $\frac{\EE_1[X|P_0=p_0]}{\EE_1[Y|P_0=p_0]} \geq 6c^2$:

449:

450: Write $\EE_1[W|p_0]$ as shorthand for $\EE[W|P_0=p_0]$.

451:  Note that, for any $r,s>0$,

452: $\EE_1[P_0^{n_0}P_1^{r}P_2^{s}|p_0] =

453: p_0^{n_0}\EE_1[P_1^{r}P_2^{s}|p_0]$. Consequently, if we let

454:  $k = k(n) = \frac{1}{3}(n-n_0)$ then, by definition of the $\Delta_i$'s,

455: \begin{equation}

456: \label{efrac}

457: \frac{\EE_1[X|p_0]}{\EE_1[Y|p_0]} = \frac{\EE_1[(P_1P_2^2)^{k}\cdot (P_1^{\Delta_1}P_2^{\Delta_2+\Delta_3})^{\sqrt{n}}|p_0]}{\EE_1[(P_1P_2^2)^{k} \cdot (P_1^{\Delta_2}P_2^{\Delta_1+\Delta_3})^{\sqrt{n}}|p_0]}.

458: \end{equation}

459: Now, conditional on ${\bf n}$ satisfying $F_c$ (and since $P_1 \geq P_2$) the following two inequalities hold

460: $$P_1^{\Delta_1}P_2^{\Delta_2+\Delta_3}

461: = \left(\frac{P_1}{P_2}\right)^{\Delta_1}

462: \geq \left(\frac{P_1}{P_2}\right)^{2c}

463: \mbox{ and }

464: P_1^{\Delta_2}P_2^{\Delta_1+\Delta_3}

465: = \left(\frac{P_1}{P_2}\right)^{\Delta_2}

466: \leq 1.$$

467: Applying this to

468: (\ref{efrac}) gives:

469: \begin{equation}

470: \label{efrac2}

471: \frac{\EE_1[X|p_0]}{\EE_1[Y|p_0]} \geq

472: \frac{\EE_1\left[(P_1P_2^2)^{k}\cdot

473: \left(\left.\frac{P_1}{P_2}\right)^{2c\sqrt{n}}\right|p_0\right]}{\EE_1[(P_1P_2^2)^{k}|p_0]}.

474: \end{equation}

475:

476: Let $U = \frac{P_1-P_2}{1-P_0}$,

477: which takes values between $0$ and $1$ because $P_1 \geq P_2$.

478: Since $P_1 + 2P_2 = 1-P_0$, we can write

479: $P_1 = \frac{1}{3}(1+2U)(1-P_0)$ and $P_2 = \frac{1}{3}(1-U)(1-P_0)$.

480: Thus,

481: $P_1P_2^2= \frac{1}{27} (1+2U)(1-U)^2(1-P_0)^3$ and

482: $\frac{P_1}{P_2} = \frac{(1+2U)}{(1-U)}$. Substituting these into (\ref{efrac2}), letting $Z = (1+2U)(1-U)^2$ and noting that $\sqrt{n} \geq \sqrt{3k}$ gives

483: $$\frac{\EE_1[X|p_0]}{\EE_1[Y|p_0]}  \geq

484: \frac{\EE_1\left[\left.Z^{k}\cdot

485: (\frac{1+2U}{1-U})^{2c\sqrt{3k}}\right|p_0\right]}{\EE_1[Z^{k}|p_0]}.$$

486: Thus, by Lemma~\ref{lem2}, (taking $x=U, y=Z, m= 2c\sqrt{3k})$ we

487: obtain, for $m \geq 3$,

488: \begin{equation}

489: \label{boundx}

490: \frac{\EE_1[X|p_0]}{\EE_1[Y|p_0]} \geq 12c^2k\cdot

491: \frac{\left(\EE_1[Z^{k}|p_0]-\EE_1[Z^{k+1}|p_0]\right)}{\EE_1[Z^{k}|p_0]}.

492: \end{equation}

493: Now the mapping $(t_0, t_1) \mapsto (P_0, Z)$ is a smooth invertible

494: mapping between $(0, \infty)^2$ and its image within $(\frac{1}{4}, 1) \times (0,1)$.

495: Notice that $Z$ approaches $1$ whenever $P_0$ approaches $\frac{1}{4}$ or

496: $1$ (in particular, even if $t_0, t_1$ are independent, $P_0$ and

497: $Z$ generally will not be). However over the interval $I_0$

498: the conditional density

499: $f(Z|P_0=p_0)$ of $Z$ given a value $p_0$ for $P_0$ is smooth and bounded

500: away from $0$, and its first derivative is also bounded above over

501: this interval. Consequently, we may apply Lemma~\ref{lem1} (noting that

502: the condition that  ${\bf n}$ satisfies $F_c$ ensures

503: that $k(n) \geq \frac{1}{4}n -o(n)$) to show that for $n$ sufficiently

504: large the following inequality holds for all  $p_0 \in I_0$,

505: $$k\cdot \frac{\left(\EE_1[Z^{k}|p_0]-\EE_1[Z^{k+1}|p_0]\right)}{\EE_1[Z^{k}|p_0]} \geq

506: \frac{1}{2}.$$

507: Applying this to (\ref{boundx}) gives

508: $\frac{\EE_1[X|p_0]}{\EE_1[Y|p_0]} \geq 6c^2$ as claimed.

509:  This completes the proof of Claim (ii).

510:

511:

512:

513:

514:

515: \end{document}

516:

517:

518: \nocite{alf, kol,lew,yan}

519: \begin{thebibliography}{99}

520:

521: \bibitem{alf}

522: Alfaro ME, Holder MT. 2006. The posterior and prior in Bayesian

523: phylogenetics. Annu. Rev. Evol. Syst. 37: 19-42.

524:

525: \bibitem{kol}

526: Kolaczkowski B, Thornton JW. 2006.

527: Is there a star tree paradox? Mol. Biol. Evol. 23: 1819--1823.

528:

529:

530: \bibitem{lew}

531: Lewis PO, Holder MT, Holsinger KE. 2005. Polytomies and Bayesian

532: phylogenetic inference. Syst. Biol. 54 (2): 241-253.

533:

534: \bibitem{yan}

535: Yang Z, Rannala B. 2005. Branch-length prior influences Bayesian

536: posterior probability of phylogeny.

537: Syst. Biol. 54 (3): 455-470.

538:

539:

540: \end{thebibliography}

541:

542:

543: