1:
2: \documentclass{article}
3: %\documentclass{amsart}
4:
5: \usepackage{epsfig}
6: \usepackage{latexsym}
7: \usepackage{amsmath}
8: \usepackage{amssymb}
9:
10: \setlength{\unitlength}{1mm}
11: \parskip=1em
12:
13: %\usepackage{showkeys}
14: %\usepackage[notref,notcite]{showkeys}
15: \newtheorem{theorem}{Theorem}[section]
16: \newtheorem{proposition}[theorem]{Proposition}
17: \newtheorem{conjecture}[theorem]{Conjecture}
18: \newtheorem{lemma}[theorem]{Lemma}
19: \newtheorem{corollary}[theorem]{Corollary}
20: \newtheorem{example}[theorem]{Example}
21: \newtheorem{problem}[theorem]{Problem}
22:
23: \newcommand{\F}{\mathcal F}
24: \newcommand{\T}{\mathcal T}
25: \newcommand{\RR}{{\mathbb R}}
26: \newcommand{\PP}{\mathbb P}
27: \newcommand{\EE}{{\mathbb E}}
28: \newcommand{\II}{{\mathbb I}}
29: \newcommand{\ba}{{\backslash}}
30:
31: \newcommand\ring[1]{\mathaccent23{#1}}
32: \def\Er{\ring{E}} % {\mbox{\r{\itshape E}}}
33: \def\Vr{\ring{V}} % {\mbox{\r{\itshape V}}
34: \newcommand{\old}[1]{{}}
35: \newcommand{\mnote}[1]{\marginpar{\raggedright\footnotesize\em#1}}
36:
37: \renewcommand{\baselinestretch}{1.6}
38:
39: \title{The Bayesian `star paradox' persists for long finite sequences}
40: \author{Mike Steel and Frederick A. Matsen\\
41: Allan Wilson Centre for Molecular Ecology and Evolution \\
42: \\ \\
43: Corresponding author:\\
44: Mike Steel \\
45: Biomathematics Research Centre\\
46: Department of Mathematics and Statistics\\
47: University of Canterbury\\
48: Private Bag 4800\\
49: Christchurch, New Zealand\\
50: Phone: +64-3-364-2987 ext. 7688\\
51: Fax: +64-3-364-2587\\
52: Email: M.Steel@math.canterbury.ac.nz
53: }
54:
55:
56:
57:
58:
59: \begin{document}
60:
61: \maketitle
62:
63: {\noindent Keywords: phylogenetic trees, Bayesian statistics, star trees}
64:
65: \vspace{-4pt}
66: {\noindent Running head: The star paradox persists}
67:
68: \newpage
69:
70: \begin{abstract}
71: The `star paradox' in phylogenetics is the tendency for a particular resolved tree to be sometimes strongly supported even when the data is generated by an unresolved (`star') tree. There have been contrary claims as to whether this phenomenon persists when very long sequences are considered. This note settles one aspect of this debate by proving mathematically that there is always a chance that a resolved tree could be strongly supported, even as the length of the sequences becomes very large.
72: \end{abstract}
73:
74:
75:
76: \section{Introduction}
77:
78: Two recent papers (Yang and Rannala 2005; Lewis, Holder and
79: Holsinger 2005) highlighted a phenomenon that occurs when sequences
80: evolve on a tree that contains a polytomy - in particular a
81: three-taxon unresolved rooted tree. As longer sequences are analysed
82: using a Bayesian approach, the posterior probability of the trees that
83: give the different resolutions of the polytomy do not converge on
84: relatively equal probabilities - rather a given
85: resolution can sometimes have a posterior probability close to one. In
86: response Kolaczkowski and Thornton (2006) investigated this phenomena
87: further, providing some interesting simulation results, and offering
88: an argument that seems to suggest that for very long sequences the
89: tendency to sometimes infer strongly supported resolutions suggested by
90: the earlier papers would disappear with sufficiently long sequences.
91: As part of their case the authors use the expected site frequency
92: patterns to simulate the case of infinite length sequences, concluding that
93: ``with infinite length data, posterior probabilities give equal
94: support for all resolved trees, and the rate of false inferences falls
95: to zero." Of course these findings concern sequences that are
96: effectively infinite, and, as is well known in statistics, the limit
97: of a function of random variables (in this case site pattern
98: frequencies for the first $n$ sites) does not necessarily equate with
99: the function of the limit of the random variables. Accordingly
100: Kolaczkowski and Thornton
101: offer this appropriate cautionary qualification of their findings:
102:
103: ``Analysis of ideal data sets does not indicate what will happen when
104: very large data sets with some stochastic error are analyzed, but it does
105: show that when infinite data are generated on a star tree, posterior
106: probabilities are predictable, equally supporting each possible resolved
107: tree."
108:
109: Yang and Rannala (2005) had attempted to simulate the large sample
110: posterior distribution, but ran into numerical problems and commented
111: that it was ``unclear'' what the limiting distribution on
112: posterior probabilities was as $n$ became large.
113:
114: In particular, all of the aforementioned papers have left open an
115: interesting statistical question, which this short note formally answers
116: - namely, does the Bayesian posterior probability of the three
117: resolutions of a star tree on three taxa converge to 1/3 as the sequence
118: length tends to infinity? That is, does the distribution on posterior
119: probabilities for `very long sequences' converge on the distribution for
120: infinite length sequences? We show that for most reasonable priors it
121: does not. Thus the `star paradox' does not disappear as the
122: sequences get longer.
123:
124: As noted by (Yang and Rannala 2005; Lewis, Holder and Holsinger 2005) one can demonstrate such phenomena more
125: easily for related simpler processes such as coin tossing (particularly if one imposes a particular prior). Here we
126: avoid this simplification to avoid the criticism that such results do
127: not rigorously establish corresponding phenomena in the phylogenetic setting,
128: which in contrast to coin tossing involves considering a parameter
129: space of dimension greater than 1. We also frame our main result so that it applies to a fairly general class of priors.
130: Note also that it is not the purpose of this short note to add to
131: the on-going debate concerning the implications of this `paradox' for Bayesian phylogenetic
132: analysis, we merely demonstrate its existence. Some further comments and
133: earlier references on the phenomenon have been described in the recent review
134: paper by Alfaro and Holder 2006 (pp. 35-36).
135: \begin{figure}[h]
136: \begin{center} \label{starfig}
137: \resizebox{12cm}{!}{
138: \input{star.pstex_t}
139: }
140: \caption{The three resolved rooted phylogenetic trees on three taxa $T_1, T_2, T_3$, and the unresolved `star' tree on which the sequences are generated $T_0$.}
141: \end{center}
142: \label{overview}
143: \end{figure}
144:
145: \section{Analysis of the star tree paradox for three taxa}
146: On tree $T_1$ (in Fig. 1) let $p_i = p_i(t_0, t_1)$, $i = 0,1,2,3$ denote the probabilities of the four
147: site patterns ($xxx, xxy, yxx, xyx$, respectively) under the simple $2$--state
148: symmetric Markov process (the argument extends to more general models,
149: but it suffices to demonstrate the phenomena for this simple model).
150: From Eqn. (2) of (Yang and Rannala 2005) we have
151: $$p_0(t_0, t_1) = \frac{1}{4}(1+e^{-4t_1} + 2e^{-4(t_0+t_1)}),$$
152: $$p_1(t_0, t_1) = \frac{1}{4}(1+e^{-4t_1} -2e^{-4(t_0+t_1)}),$$
153: and
154: $$p_2(t_0, t_1) = p_3(t_0, t_1) = \frac{1}{4}(1-e^{-4t_1}).$$
155: It follows by elementary algebra that for $i=2,3$,
156: \begin{equation}
157: \label{ineq1}
158: \frac{p_1(t_0, t_1)}{p_i(t_0, t_1)} \geq 1+ 2e^{-4t_1}(1-e^{-4t_0}),
159: \end{equation}
160: and thus $p_1(t_0, t_1) \geq p_i(t_0, t_1)$ with strict inequality unless
161: $t_0=0$ (or in the limit as $t_1$ tends to infinity).
162:
163: To allow maximal generality we make only minimal neutral assumptions about the
164: prior distribution on trees and branch lengths. Namely we assume that the three resolved trees on three leaves (trees $T_1, T_2, T_3$ in Fig. 1) have equal prior probability $\frac{1}{3}$ and that the prior distribution on branch lengths
165: $t_0, t_1$ is the same for each tree, and has a continuous joint
166: probability density function that is everywhere non-zero. This condition
167: applies for example to the exponential and gamma priors discussed by Yang and Rannala
168: (2005). Any prior that satisfies these conditions we call {\em reasonable}. Note
169: that we do not require that $t_0$ and $t_1$ be independent.
170:
171: Let ${\bf n} = (n_0, n_1, n_2, n_3)$ be the counts of the different types of
172: site patterns (corresponding to the same patterns as for the $p_i$'s). Thus $n= \sum_{i=0}^3 n_i$ is the total number of sites (i.e. the length of the sequences).
173: Given a prior distribution on $(t_0, t_1)$ for the branch lengths of $T_i$ (for $i=1,2,3$)
174: let $\PP[T_i|{\bf n}]$ be the posterior probability of tree $T_i$ given the site pattern counts ${\bf n}$.
175: Now suppose the $n$ sites are generated on a star tree $T_0$ with positive branch lengths. We are interested in whether the posterior probability $\PP[T_i|{\bf n}]$ could be close to 1 or whether the chance of generating data with this property goes to zero as the sequence length gets very large. We show that in fact the latter possibility is ruled out by our main result, namely the following:
176:
177: \begin{theorem}
178: \label{main}
179: Consider sequences of length $n$ generated by a star tree $T_0$ on three
180: taxa with strictly positive edge length $t_1^0$ and let ${\bf n}$ be
181: the resulting data (in terms of site pattern counts). Consider any prior
182: on the three resolved trees $(T_1, T_2, T_3)$ and their branch lengths that is reasonable (as defined above). For any $\epsilon >0$, and each resolved tree $T_i$
183: ($i=1,2,3$),
184: the probability that ${\bf n}$ has the property that $$\PP(T_i|{\bf n}) >
185: 1-\epsilon$$ does not converge to $0$ as $n$ tends to infinity.
186: \end{theorem}
187: {\em Proof of Theorem~\ref{main}}
188: Consider the star tree $T_0$ with given branch lengths
189: $t_1^0$ (as in Fig. 1). Let $(q_0, q_1, q_2, q_3)$ denote the probability of the four types of site patterns generated by $T_0$ with these branch lengths. Note that $q_1=q_2=q_3$ and so $q_0 = 1-3q_1$).
190: Suppose we generate $n$ sites on this tree, and let $n_0, n_1, n_2, n_3$ be the counts of the different types of
191: site patterns (corresponding to the $p_i$'s). Let $\Delta_0: = \frac{n_0-q_0n}{\sqrt{n}}$ and
192: for $i=1,2,3$ let $$\Delta_i:= \frac{n_i - \frac{1}{3}(n-n_0)}{\sqrt{n}}.$$
193: For a constant $c>1$, let $F_c$ denote the event: $$F_c: \Delta_2,
194: \Delta_3 \in [-2c, -c] \mbox { and } \Delta_0 \in [-c,c].$$ Notice that
195: $F_c$ implies $\Delta_1 \in [2c, 4c]$ since
196: $\Delta_1+\Delta_2+\Delta_3=0$. By standard stochastic arguments (based
197: on the asymptotic approximation of the multinomial distribution by the
198: multinormal distribution) event $F_c$ has probability at least some value
199: $\delta'=\delta'(c)>0$ for all $n$ sufficiently large (relative to
200: $c$).
201:
202: Given the data ${\bf n}=(n_0, n_1, n_2, n_3)$ the assumption of equality
203: of priors across $T_1, T_2$ and $T_3$ implies that
204: \begin{equation}
205: \label{identities1}
206: \PP(n_0, n_1, n_2, n_3|T_2, t_0, t_1) = \PP(n_0, n_2, n_3, n_1|T_1,
207: t_0, t_1),
208: \end{equation}
209: and
210: \begin{equation}
211: \label{identities2}
212: \PP(n_0, n_1, n_2, n_3|T_3, t_0, t_1) = \PP(n_0, n_3, n_1, n_2|T_1,
213: t_0, t_1).
214: \end{equation}
215:
216: Now, as $(t_0, t_1)$ are random variables with some prior density, when
217: we view $p_0, p_1, p_2, p_3$ as random variables by virtue of their
218: dependence on $(t_0, t_1)$, we will write them as $P_0, P_1, P_2,
219: P_3$ (note that Yang and Rannala (2005) use $P_i$ differently). With this notation, the posterior probability of $T_1$ conditional on ${\bf n}$ can be
220: written as
221: $$\PP(T_1|{\bf n})= p({\bf n})^{-1}\times\EE_1[P_0^{n_0}P_1^{n_1}P_2^{n_2}P_3^{n_3}]$$
222: where $ p({\bf n})$ is the posterior probability of ${\bf n}$
223: and $\EE_1$ denotes expectation with respect to the prior for
224: $t_0, t_1$ on $T_1$.
225: Moreover since $P_2=P_3$, we can write this as
226: $\PP(T_1|{\bf n})= p({\bf n})^{-1}
227: \times\EE_1[P_0^{n_0}P_1^{n_1}P_2^{n_2+n_3}].$ By (\ref{identities1}) and (\ref{identities2}) we have
228: $$\PP(T_2|{\bf n}) = p({\bf n})^{-1} \times \EE_1[P_0^{n_0}P_1^{n_2}P_2^{n_1+n_3}]; \mbox{ and } \PP(T_3|{\bf n}) = p({\bf n})^{-1} \times \EE_1[P_0^{n_0}P_1^{n_3}P_2^{n_1+n_2}]$$
229: where again expectation is taken with respect to the prior for $t_0,
230: t_1$ on $T_1$. Consequently,
231: \begin{equation}
232: \label{eqfrac}
233: \frac{\PP(T_1|{\bf n})}{\PP(T_2|{\bf n})} = \frac{\EE_1[X]}{\EE_1[Y]},
234: \end{equation}
235: where $$X= P_0^{n_0}P_1^{n_1}P_2^{n_2+n_3} \mbox{ and }
236: Y=P_0^{n_0}P_1^{n_2}P_2^{n_1+n_3}.$$ As will be shown later, it
237: suffices to demonstrate that the ratio in (\ref{eqfrac})
238: can be large with nonvanishing
239: probability in order to obtain the conclusion of the theorem. In order
240: to do so we use the following lemma, whose proof is provided in the Appendix.
241:
242: \begin{lemma}
243: \label{lem3}
244: Let $X,Y$ be non-negative continuous random variables, dependent on a third random variable $\Lambda$ that takes values in an interval $I=[a,b]$. Suppose that for some interval $I_0$ strictly inside $I$, and $I_1=I-I_0$ the following inequality holds:
245: \begin{equation}
246: \label{eqin}
247: \EE[Y|\Lambda \in I_0] \geq \EE[Y|\Lambda \in I_1],
248: \end{equation}
249: and that for some constant $B>0$, and all $\lambda \in I_0$,
250: \begin{equation}
251: \label{eqin2}
252: \frac{\EE[X|\Lambda=\lambda]}{\EE[Y|\Lambda=\lambda]} \geq B.
253: \end{equation} Then,
254: $\frac{\EE[X]}{\EE[Y]} \geq B\cdot \PP(\Lambda \in I_0)$.
255: \end{lemma}
256: \noindent
257: To apply this lemma,
258: select a value $s>0$ so that $\frac{1}{4} + s < q_0 < 1-s$, and let
259: $I_0 = [q_0-s, q_0+s]$. Then let
260: $I = [\frac{1}{4}, 1]$ and $I_1 = I - I_0$.
261:
262: {\em Claim:} For $n$ sufficiently large, and conditional on the data
263: ${\bf n} = (n_0, n_1, n_2, n_3)$ satisfying $F_c$:
264: \begin{itemize}
265: \item[(i)] $\EE_1[Y|P_0 \in I_0] \geq \EE_1[Y|P_0 \in I_1]$
266: \item[(ii)] For all $p_0 \in I_0$, $\frac{\EE_1[X|P_0=p_0]}{\EE_1[Y|P_0=p_0]}
267: \geq 6c^2$.
268: \end{itemize}
269: The proofs of these two claims is given in the Appendix.
270:
271:
272: Applying Lemma~\ref{lem3} to the Claims (i) and (ii) we deduce that
273: conditional on ${\bf n}$ satisfying $F_c$ and
274: $n$ being sufficiently large,
275: \begin{equation}
276: \label{cbound}
277: \frac{\EE_1[X]}{\EE_1[Y]} \geq 6c^2 \cdot \PP(P_0 \in I_0).
278: \end{equation}
279:
280: \noindent
281: Select $c > \frac{1}{\sqrt{3\epsilon \PP(P_0 \in I_0)}}$ (this is
282: finite by the assumption that the prior on $(t_0, t_1)$ is everywhere
283: non-zero). As stated before, the probability that ${\bf n}$ satisfies $F_c$ is at least $\delta'=\delta'(c)>0$ for $n$ sufficiently large.
284: Then, $6c^2 \cdot \PP(P_0 \in I_0) > \frac{2}{\epsilon}$
285: and so by (\ref{cbound}), $\frac{\PP(T_1|{\bf n})}{\PP(T_2|{\bf n})} = \frac{\EE_1[X]}{\EE_1[Y]} > \frac{2}{\epsilon}.$
286: Similarly, $\frac{\PP(T_1|{\bf n})}{\PP(T_3|{\bf n})} > \frac{2}{\epsilon}.$
287: Now, since $\PP(T_1|{\bf n})+\PP(T_2|{\bf n})+\PP(T_3|{\bf n})=1$ it now
288: follows that, for $n$ sufficiently large, and conditional an event of
289: probability at least $\delta'>0$, that $\PP(T_1|{\bf n})> 1-\epsilon$ as claimed. This completes the proof.
290: \hfill$\Box$
291:
292:
293:
294:
295:
296:
297: \newpage
298: \subsection{Concluding remarks}
299:
300: One feature of the argument we have provided is that it does not require
301: stipulating in advance a particular prior on the branch lengths -- that
302: is, the result is somewhat generic as it imposes relatively few
303: conditions. Moreover,
304: the requirement that the prior on $(t_0, t_1)$ be everywhere
305: non-zero could be weakened to simply being non-zero in a neighborhood of $(0,
306: t_1^0)$ (thereby allowing, for example, a uniform distribution on
307: bounded range).
308:
309: A interesting open question in the spirit of this paper is to
310: explicitly calculate the limit of the posterior density $f(P_1, P_2,
311: P_3)$ described in (Yang and Rannala 2005).
312:
313:
314: \subsection{Acknowledgments} MS thanks Ziheng Yang for suggesting the
315: problem of computing the limiting distribution of posterior
316: probabilities for 3-taxon trees. This work is funded by the
317: \emph{Allan Wilson Centre for Molecular Ecology and Evolution}.
318:
319:
320: \section*{References}
321: \noindent
322: Alfaro ME, Holder MT. 2006. The posterior and prior in Bayesian
323: phylogenetics. Annu. Rev. Evol. Syst. 37: 19-42.
324:
325: \noindent
326: Kolaczkowski B, Thornton JW. 2006.
327: Is there a star tree paradox? Mol. Biol. Evol. 23: 1819--1823.
328:
329:
330: \noindent
331: Lewis PO, Holder MT, Holsinger KE. 2005. Polytomies and Bayesian
332: phylogenetic inference. Syst. Biol. 54 (2): 241-253.
333:
334: \noindent
335: Yang Z, Rannala B. 2005. Branch-length prior influences Bayesian
336: posterior probability of phylogeny.
337: Syst. Biol. 54 (3): 455-470.
338:
339:
340: \newpage
341:
342: \section{Appendix: Proof of Lemma ~\ref{lem3} and Claims (i), (ii)}
343:
344: {\em Proof of Lemma~\ref{lem3}}:
345: \noindent For $W =X,Y$ we have
346: \begin{equation}
347: \label{equatione1}
348: \EE[W] = \EE[W|\Lambda \in I_0]\PP(\Lambda \in I_0) + \EE[W|\Lambda \in I_1]\PP(\Lambda \in I_1).
349: \end{equation}
350: In particular, for $W = X$ we have: $\EE[X] \geq \EE[X|\Lambda \in
351: I_0]\PP(\Lambda \in I_0)$. Note that
352: (\ref{eqin2}) implies that $\EE[X|\Lambda \in I_0] \geq B \cdot
353: \EE[Y|\Lambda \in I_0]$, so
354: \begin{equation}
355: \label{equatione2}
356: \EE[X] \geq B \cdot \EE[Y|\Lambda \in I_0]\PP(\Lambda \in I_0).
357: \end{equation}
358: Taking $W=Y$ in (\ref{equatione1}) and applying (\ref{eqin}) gives us $$\EE[Y] \leq \EE[Y|\Lambda \in I_0](\PP(\Lambda \in I_0)+\PP(\Lambda \in I_1))=\EE[Y|\Lambda \in I_0]$$
359: which combined with (\ref{equatione2}) gives the result.
360: \hfill$\Box$
361:
362: \noindent{\em Proof of Claim (i)}, $\EE_1[Y|P_0 \in I_0] \geq
363: \EE_1[Y|P_0 \in I_1]$:
364:
365: \noindent
366: We will first bound $\EE_1[Y|P_0 \in I_1]$ above.
367: Let $\mu(n) = (q_0^{q_0}q_1^{q_1}q_2^{q_2}q_3^{q_3})^n$.
368: Now, conditional on ${\bf n}$ satisfying $F_c$ we have
369: $$n^{-1} \log\left(\mu(n) /Y(t_0, t_1)\right) = d_{KL}(q,p) + o(1),$$
370: where $p = (p_0, p_1, p_2, p_3)$ and $q = (q_0, q_1, q_2, q_3)$, and
371: $d_{KL}$ denotes Kullback-Leibler distance.
372: Now, $d_{KL}(q,p) \geq \frac{1}{2} \|q-p\|_1^2 \geq \frac{1}{2} |q_0 -
373: p_0|^2$ (the first inequality is a standard one in probability theory).
374: In particular, if $p_0 \in I_1$, then $|q_0 - p_0| > s>0$.
375: Moreover, $$\EE_1[Y|P_0 \in I_1] \leq \max\{Y(t_0, t_1): p_0(t_0, t_1) \in I_1\}.$$
376: Summarizing,
377: \begin{equation}
378: \label{eqo1}
379: \EE_1[Y|P_0 \in I_1] \leq \max\{Y(t_0, t_1): p_0(t_0, t_1) \in I_1\} < \mu(n)e^{-\frac{1}{2}s^2n + o(n)}.
380: \end{equation}
381: In the reverse direction, we have:
382: $$\EE_1[Y|P_0 \in I_0] \geq A(n)B(n)$$
383: where
384: $$A(n) = \min \left\{Y(t_0, t_1): (t_0, t_1) \in [0, n^{-1}] \times
385: [t_1^0, t_1^0+n^{-1}] \right\}$$
386: and
387: $$B(n)= \PP\left((t_0, t_1 \right)
388: \in [0, n^{-1}] \times [t_1^0, t_1^0 + n^{-1}]).$$
389: Now,
390: $$A(n)/\mu(n) =
391: \left(\frac{p_0^{q_0}p_1^{q_1}p_2^{2q_1}}{q_0^{q_0}q_1^{3q_1}}\right)^n \cdot
392: (p_1^{\Delta_2-\frac{1}{3}\Delta_0}p_2^{\Delta_1+\Delta_3-\frac{2}{3}\Delta_0})
393: ^{\sqrt{n}}.$$
394: Now, the first term of this product converges to a constant as $n$ grows
395: (because $p_0 -q_0, p_1-q_1$ and $p_2-q_1$
396: are each of order $n^{-1}$) while the condition $F_c$ ensures that the second
397: term decays no faster than $e^{-C_1 \sqrt{n}}$ for a constant $C_1$. Thus,
398: $A(n) \geq C_2\mu(n)e^{-C_1\sqrt{n}}$ for a positive constant $C_2$.
399: The term $B(n)$ is asymptotically proportional to $n^{-2}$. Summarizing, for a constant $C_3>0$ (dependent just on $t_1^0$)
400: $$\EE_1[Y|P_0 \in I_0] \geq C_3 \mu(n)n^{-2}e^{-C_1\sqrt{n}},$$ which
401: combined with (\ref{eqo1}) establishes claim (i) for $n$ sufficiently
402: large.
403: \hfill$\Box$
404:
405:
406:
407: In order to prove claim (ii) we need some preliminary results.
408: \begin{lemma}
409: Let $\eta<1$. Then for each $x >0$ there exists a value $K = K(x) < \infty$ that depends continuously on $x$ so that the following holds. For any continuous random variable $Z$ on $[0,1]$ with a smooth density function $f$ that satisfies $f(1) \neq 0$ and
410: $|f'(z)|<B$ for all $z \in (\eta, 1]$, we have
411: $$ k \cdot \frac{(\EE[Z^k] -
412: \EE[Z^{k+1}])}{\EE[Z^k]} \geq \frac{1}{2}$$
413: for all $k \geq K(\frac{B}{f(1)})$.
414: \label{lem1}
415: \end{lemma}
416: {\em Proof.}
417: Let $t_k = 1- \frac{1}{\sqrt{k}}$. Then $$\EE[Z^k] = \int_{0}^{t_k} t^k f(t)dt + \int_{t_k}^1 t^k f(t)dt.$$
418: Now, $$0 \leq \int_{0}^{t_k} t^k f(t)dt \leq t_k^k \sim
419: e^{-\sqrt{k}-1/2},$$
420: where $\sim$ denotes asymptotic equivalence (i.e. $f(k) \sim g(k)$ iff
421: $\lim_{k \rightarrow \infty} f(k)/g(k) =1$).
422: Using integration by parts,
423: $$\int_{t_k}^1 t^k f(t)dt = \frac{1}{k+1}\left.t^{k+1}f(t)\right|_{t_k}^1 -
424: \frac{1}{k+1}\int_{t_k}^1t^{k+1}f'(t)dt.$$
425: Now, provided $k>(1-\eta)^{-2}$ we have $t_k>\eta$ and so the absolute value of the second term on the right is at most
426: $\frac{B}{k+1}\int_{t_k}^1t^{k+1}dt = \frac{B}{(k+1)(k+2)}(1-t_{k}^{k+2}).$
427: Consequently, $\left|\EE[Z^k] - \frac{f(1)}{k+1}\right|$ is bounded above by $B$
428: times a term of order $k^{-2}$.
429: A similar argument, again using integration by parts, shows that
430: $\left|k(\EE[Z^k]- \EE[Z^{k+1}]) - \frac{f(1)}{k+1}\right|$ is bounded above by $B$
431: times a term of order $k^{-2}$, and the lemma now follows by some routine analysis.
432: \hfill$\Box$
433:
434:
435: \begin{lemma}
436: \label{lem2}
437: Let $y = (1+2x)(1-x)^2$. Then for $x \in [0,1)$ and $m \geq 3$ we have
438: $$\left(\frac{1+2x}{1-x}\right)^m \geq m^2(1-y).$$
439: \end{lemma}
440: {\em Proof.}
441: $$\left(\frac{1+2x}{1-x}\right)^m = \left(1+ \frac{3x}{1-x}\right)^m
442: \geq \frac{m(m-1)}{2}\left(\frac{3x}{1-x}\right)^2 \geq \frac{9m(m-1)x^2}{2},$$ and
443: $m^2(1-y) = m^2(3x^2-2x^3) \leq 3m^2x^2$ and for $m\geq 3$ this upper bound is less than the lower bound in the previous
444: expression.
445: \hfill$\Box$
446:
447:
448: \noindent {\em Proof of Claim (ii)}, for all $p_0 \in I_0$, $\frac{\EE_1[X|P_0=p_0]}{\EE_1[Y|P_0=p_0]} \geq 6c^2$:
449:
450: Write $\EE_1[W|p_0]$ as shorthand for $\EE[W|P_0=p_0]$.
451: Note that, for any $r,s>0$,
452: $\EE_1[P_0^{n_0}P_1^{r}P_2^{s}|p_0] =
453: p_0^{n_0}\EE_1[P_1^{r}P_2^{s}|p_0]$. Consequently, if we let
454: $k = k(n) = \frac{1}{3}(n-n_0)$ then, by definition of the $\Delta_i$'s,
455: \begin{equation}
456: \label{efrac}
457: \frac{\EE_1[X|p_0]}{\EE_1[Y|p_0]} = \frac{\EE_1[(P_1P_2^2)^{k}\cdot (P_1^{\Delta_1}P_2^{\Delta_2+\Delta_3})^{\sqrt{n}}|p_0]}{\EE_1[(P_1P_2^2)^{k} \cdot (P_1^{\Delta_2}P_2^{\Delta_1+\Delta_3})^{\sqrt{n}}|p_0]}.
458: \end{equation}
459: Now, conditional on ${\bf n}$ satisfying $F_c$ (and since $P_1 \geq P_2$) the following two inequalities hold
460: $$P_1^{\Delta_1}P_2^{\Delta_2+\Delta_3}
461: = \left(\frac{P_1}{P_2}\right)^{\Delta_1}
462: \geq \left(\frac{P_1}{P_2}\right)^{2c}
463: \mbox{ and }
464: P_1^{\Delta_2}P_2^{\Delta_1+\Delta_3}
465: = \left(\frac{P_1}{P_2}\right)^{\Delta_2}
466: \leq 1.$$
467: Applying this to
468: (\ref{efrac}) gives:
469: \begin{equation}
470: \label{efrac2}
471: \frac{\EE_1[X|p_0]}{\EE_1[Y|p_0]} \geq
472: \frac{\EE_1\left[(P_1P_2^2)^{k}\cdot
473: \left(\left.\frac{P_1}{P_2}\right)^{2c\sqrt{n}}\right|p_0\right]}{\EE_1[(P_1P_2^2)^{k}|p_0]}.
474: \end{equation}
475:
476: Let $U = \frac{P_1-P_2}{1-P_0}$,
477: which takes values between $0$ and $1$ because $P_1 \geq P_2$.
478: Since $P_1 + 2P_2 = 1-P_0$, we can write
479: $P_1 = \frac{1}{3}(1+2U)(1-P_0)$ and $P_2 = \frac{1}{3}(1-U)(1-P_0)$.
480: Thus,
481: $P_1P_2^2= \frac{1}{27} (1+2U)(1-U)^2(1-P_0)^3$ and
482: $\frac{P_1}{P_2} = \frac{(1+2U)}{(1-U)}$. Substituting these into (\ref{efrac2}), letting $Z = (1+2U)(1-U)^2$ and noting that $\sqrt{n} \geq \sqrt{3k}$ gives
483: $$\frac{\EE_1[X|p_0]}{\EE_1[Y|p_0]} \geq
484: \frac{\EE_1\left[\left.Z^{k}\cdot
485: (\frac{1+2U}{1-U})^{2c\sqrt{3k}}\right|p_0\right]}{\EE_1[Z^{k}|p_0]}.$$
486: Thus, by Lemma~\ref{lem2}, (taking $x=U, y=Z, m= 2c\sqrt{3k})$ we
487: obtain, for $m \geq 3$,
488: \begin{equation}
489: \label{boundx}
490: \frac{\EE_1[X|p_0]}{\EE_1[Y|p_0]} \geq 12c^2k\cdot
491: \frac{\left(\EE_1[Z^{k}|p_0]-\EE_1[Z^{k+1}|p_0]\right)}{\EE_1[Z^{k}|p_0]}.
492: \end{equation}
493: Now the mapping $(t_0, t_1) \mapsto (P_0, Z)$ is a smooth invertible
494: mapping between $(0, \infty)^2$ and its image within $(\frac{1}{4}, 1) \times (0,1)$.
495: Notice that $Z$ approaches $1$ whenever $P_0$ approaches $\frac{1}{4}$ or
496: $1$ (in particular, even if $t_0, t_1$ are independent, $P_0$ and
497: $Z$ generally will not be). However over the interval $I_0$
498: the conditional density
499: $f(Z|P_0=p_0)$ of $Z$ given a value $p_0$ for $P_0$ is smooth and bounded
500: away from $0$, and its first derivative is also bounded above over
501: this interval. Consequently, we may apply Lemma~\ref{lem1} (noting that
502: the condition that ${\bf n}$ satisfies $F_c$ ensures
503: that $k(n) \geq \frac{1}{4}n -o(n)$) to show that for $n$ sufficiently
504: large the following inequality holds for all $p_0 \in I_0$,
505: $$k\cdot \frac{\left(\EE_1[Z^{k}|p_0]-\EE_1[Z^{k+1}|p_0]\right)}{\EE_1[Z^{k}|p_0]} \geq
506: \frac{1}{2}.$$
507: Applying this to (\ref{boundx}) gives
508: $\frac{\EE_1[X|p_0]}{\EE_1[Y|p_0]} \geq 6c^2$ as claimed.
509: This completes the proof of Claim (ii).
510:
511:
512:
513:
514:
515: \end{document}
516:
517:
518: \nocite{alf, kol,lew,yan}
519: \begin{thebibliography}{99}
520:
521: \bibitem{alf}
522: Alfaro ME, Holder MT. 2006. The posterior and prior in Bayesian
523: phylogenetics. Annu. Rev. Evol. Syst. 37: 19-42.
524:
525: \bibitem{kol}
526: Kolaczkowski B, Thornton JW. 2006.
527: Is there a star tree paradox? Mol. Biol. Evol. 23: 1819--1823.
528:
529:
530: \bibitem{lew}
531: Lewis PO, Holder MT, Holsinger KE. 2005. Polytomies and Bayesian
532: phylogenetic inference. Syst. Biol. 54 (2): 241-253.
533:
534: \bibitem{yan}
535: Yang Z, Rannala B. 2005. Branch-length prior influences Bayesian
536: posterior probability of phylogeny.
537: Syst. Biol. 54 (3): 455-470.
538:
539:
540: \end{thebibliography}
541:
542:
543: