1: %Posted to archive: 08--6--22
2:
3: \documentclass[11pt]{article}
4:
5: \pdfoutput=1
6:
7: \usepackage{graphicx}
8: \usepackage[small]{caption}
9:
10: \newcommand\tf{t_{\!f}}
11: \newcommand\tp{t_p}
12:
13: \def\cmc#1{{\tt #1}}
14: \def\urldot{.\discretionary{}{}{}}
15: \def\urlslash{/\discretionary{}{}{}}
16:
17: \setlength{\textwidth}{6.5in}
18: \setlength{\oddsidemargin}{0.0in}
19: \setlength{\evensidemargin}{0.0in}
20: \setlength{\textheight}{9.0in}
21: \setlength{\topmargin}{0.0in}
22: \setlength{\headheight}{0.0in}
23: \setlength{\headsep}{0.0in}
24:
25: \begin{document}
26:
27: \title{{\bf Predicting future duration from present age:\\
28: Revisiting a critical assessment of Gott's rule}}
29:
30: \author{{\Large Carlton M.~Caves}
31: \\
32: \\
33: Department of Physics and Astronomy, MSC07--4220, University of New Mexico,\\
34: Albuquerque, New Mexico 87131-0001, USA\\
35: \\
36: and\\
37: \\
38: Department of Physics, University of Queensland,\\
39: Brisbane, Queensland 4072, Australia
40: \\
41: \\
42: E-mail: caves@info.phys.unm.edu
43: }
44:
45: \date{2008 June~21}
46:
47: \maketitle
48:
49: \begin{abstract}
50: Gott has promulgated a rule for making probabilistic predictions of
51: the future duration of a phenomenon based on the phenomenon's present
52: age [{\sl Nature\/} {\bf 363}, 315 (1993)]. I show that the two
53: usual methods for deriving Gott's rule are flawed. Nothing licenses
54: indiscriminate use of Gott's rule as a predictor of future duration.
55: It should only be used when the phenomenon in question has no
56: identifiable time scales.
57: \end{abstract}
58:
59: \baselineskip=14.2pt
60:
61: \section{Introduction}
62:
63: In an article$^1$ published in {\sl Nature\/} in 1993 and in
64: subsequent publications$^{\hbox{\scriptsize2--5}}$ and the concluding
65: chapter of a book,$^6$ J.~Richard Gott~III has promulgated a formula
66: for making probabilistic predictions of the future duration of a
67: phenomenon based on the phenomenon's present age. When you observe
68: that a phenomenon has lasted a time $\tp$, Gott instructs you to
69: predict that the phenomenon will last an additional time $\tf\ge
70: Y\tp$ with probability
71: \begin{equation}
72: G(\tf\ge Y\tp)={1\over1+Y}\;.
73: \label{eq:Grule}
74: \end{equation}
75: For example, Gott's rule predicts that a phenomenon has a probability
76: of $1/2$ to survive an additional time at least as long as its
77: past ($Y=1$) and, by the same token, a probability of $1/2$ to end
78: before reaching twice its present age.
79:
80: In applying his rule to a host of phenomena, Gott usually couches his
81: predictions in terms of a particular 95\% confidence interval, 95\%
82: confidence being his standard for a scientific prediction. According
83: to his rule, the probability that a phenomenon's future duration,
84: $\tf$, will be between $1/39$ and $39$ times its present age, $\tp$,
85: is $G(\tf\ge\tp/39)-G(\tf\ge39\tp)=39/40-1/40=0.95$. The flip side of
86: this prediction is that the phenomenon has a 2.5\% chance to end
87: before reaching 1/39 of its present age and a 2.5\% chance of lasting
88: longer than 39 times its present age.
89:
90: Gott bases his formula on a temporal version of the Copernican
91: principle: when you observe the phenomenon's present age, your
92: observation does not occur at a special time. Here I show,
93: distilling the essence of a previous critical analysis$^7$ of Gott's
94: work, that although the Copernican principle does lead directly to a
95: version of Gott's rule, this version is essentially meaningless and,
96: in particular, does not authorize his predictions for future
97: longevity based on present age.
98:
99: In published papers and his book, Gott is on record as applying his
100: rule to
101: %
102: himself,$^{2,6}$
103: Christ\-ianity,$^6$
104: the former Soviet Union,$^{1,6}$
105: the Third Reich,$^6$
106: the United States,$^6$
107: Canada,$^4$
108: world leaders,$^{2,4,6}$
109: Stonehenge,$^4$
110: the Seven Wonders of the Ancient World,$^6$
111: the Pantheon,$^6$
112: the Great Wall of China,$^6$
113: {\sl Nature},$^1$
114: the {\sl Wall Street Journal},$^6$
115: {\sl The New York Times},$^6$
116: the Berlin Wall,$^{\hbox{\scriptsize1--4,6}}$
117: the Astronomical Society of the Pacific,$^2$
118: the 44 Broadway and off-Broadway plays open
119: and running on 27~May 1993,$^{\hbox{\scriptsize2--4,6}}$
120: the Thatcher-Major Conservative government in the UK,$^{\hbox{\scriptsize2--4,6}}$
121: Manhattan (New York City),$^6$
122: the New York Stock Exchange,$^6$
123: Oxford University,$^6$
124: the internet,$^6$
125: Microsoft,$^6$
126: General Motors,$^6$
127: the human spaceflight program,$^{\hbox{\scriptsize1--6}}$
128: and
129: {\it homo sapiens}.$^{\hbox{\scriptsize1--6}}$
130: %
131: In all these cases---even the New York plays---Gott uses his rule to
132: make probabilistic predictions for the survival of individual
133: phenomena whose present age is known. For example, given {\sl
134: Nature\/}'s 123 years of publishing in 1993, Gott predicted that {\sl
135: Nature\/} had a 95\% chance to continue publishing for a period
136: between 3.15 years (already exceeded) and 4,800 years.$^1$ Most
137: notably, Gott has used the 200,000-year present age of {\it homo
138: sapiens} to predict that we have a 95\% chance to go extinct sometime
139: between 5,100 years and 7.8 million years from
140: now.$^{\hbox{\scriptsize1--6}}$ Although Gott issues occasional
141: cautionary statements about the applicability of his rule,$^{2,6}$
142: the list of phenomena to which he has applied the rule indicates that
143: these cautions don't cramp his style much.
144:
145: Gott's predictions have received attention in the popular media,
146: including a favorable piece by Timothy Ferris in {\sl The New
147: Yorker},$^8$ which highlighted Gott's predictions for human survival
148: (and which motivated me to write my original paper$^7$), and a recent
149: article by John Tierney in {\sl The New York Times},$^9$ which
150: focused on the implications for space colonization. My late 1999
151: posting of the paper$^7$ that eventually appeared in {\sl
152: Contemporary Physics\/} prompted a sympathetic article in {\sl The
153: New York Times\/} by James Glanz,$^{10}$ then a science writer at
154: {\sl The Times}.
155:
156: Glanz, as a long-suffering fan of the Chicago White Sox, was
157: particularly interested in Gott's prediction, issued in 1996, for the
158: Sox's World Series prospects. Gott opined that the Sox, having not
159: won a World Series title since 1917, would, with 95\% confidence, win
160: a Series sometime between 1999 and 5077. In his 2000 article, Glanz
161: noted that the Sox hadn't yet succeeded, but he was clearly dismayed
162: by the long wait evoked by the mere mention of 5077. Happily for him
163: and other Sox fans, they did win the Series in 2005. In 1996 Gott
164: would have predicted a World Series title in 2005 or before with
165: probability~0.10, considerably less than the probability,
166: $1-(29/30)^9=0.26$, that comes from assuming that the Sox had the
167: same chance each year as the other 30 major-league ball clubs.
168:
169: Gott has given two main derivations of his rule: the argument from
170: the Copernican principle, which he calls the delta-$t$
171: argument,$^{\hbox{\scriptsize1--6}}$ and a Bayesian
172: analysis,$^{2,6,11}$ which he adopted from criticism due to
173: Buch.$^{12}$ Both of these derivations are flawed.$^7$ Here I begin
174: in Sec.~\ref{sec:deltat} with an analysis of the delta-$t$ argument,
175: because it led Gott to his rule and because he consistently portrays
176: it as the chief justification for his predictions. I show that the
177: delta-$t$ argument does not lead to any prediction of future duration
178: based on present age. I then turn in Sec.~\ref{sec:Bayesian} to the
179: usual Bayesian derivation of Gott's rule, which has greater appeal to
180: most other contributors to the literature on the subject. I
181: demonstrate that this derivation is simply wrong and sketch the
182: correct Bayesian analysis. A concluding Sec.~\ref{sec:conclusion}
183: considers the assumptions that are required to get Gott's
184: predictions.
185:
186: It should be emphasized at the outset that we will not conclude that
187: Gott's rule is ``wrong,'' but rather that its two primary derivations
188: are wrong. In science flawed justifications are as bad as---perhaps
189: worse than---being obviously wrong, because they are more pernicious.
190: They can mislead you into using methods that don't apply in your
191: situation and can get you into trouble when you export those methods
192: to other contexts. Determining the assumptions that underlie
193: whatever you are doing in science is essential, so that you know when
194: to abandon what you are doing in favor of something else. The
195: purpose of this article is thus to debunk the two primary derivations
196: of Gott's rule and to identify the assumptions that underlie Gott's
197: rule in its predictive form, so that you will know what you are doing
198: should you choose to use it.
199:
200: The discussion in this paper is couched mainly in terms of a simple
201: graphical representation, which is equivalent to the more formal,
202: Bayesian analysis given in Ref.~7. Section~\ref{sec:deltat} on the
203: delta-$t$ argument is phrased almost entirely in terms of the
204: graphical representation. The results of the Bayesian analysis in
205: Sec.~\ref{sec:Bayesian} can be understood by referring to the
206: graphical representation, but the Bayesian equations are included for
207: those who prefer to see the details.
208:
209: The evidence from papers$^{\hbox{\scriptsize13--17}}$ that cite
210: Ref.~7 is that its argument and conclusions have not been appreciated
211: and understood. The goal of the present paper, with its graphical
212: mode of presentation, is to rectify that situation.
213:
214: \section{Copernican ensembles and the delta-$t$ argument}
215: \label{sec:deltat}
216:
217: The delta-$t$ argument is short and sweet. It starts from the
218: premise that if your observation does not occur at a special
219: time---that is the temporal Copernican principle---then it is equally
220: likely to occur at any time within the total duration $T=\tp+\tf$.
221: This means that the probability that the present age $\tp$ is less
222: than or equal to $XT$, where $X$ is between 0 and 1 inclusive, is
223: $G(\tp\le XT)=X$. This being the same as the probability that the
224: future duration $\tf$ is not smaller than $(1-X)T$, i.e., not smaller
225: than $(X^{-1}-1)\tp$, one obtains Gott's rule~(\ref{eq:Grule}) by
226: letting $Y=X^{-1}-1$.
227:
228: The alluring simplicity of the delta-$t$ argument means that we need
229: an equally simple way of investigating its validity and
230: interpretation. In any probabilistic analysis, you start with a
231: prior probability density, in this case a distribution $w(T)$, which
232: gives the probability $w(T)\,dT$ that the phenomenon's total duration
233: lies in the interval between $T$ and $T+dT$. This prior probability
234: density is based on whatever information or data you have about the
235: phenomenon before observing it. To formulate the problem in terms of
236: the temporal Copernican principle, the description in terms of
237: duration $T$ must be supplemented by introducing an additional
238: temporal variable. In doing so, it is convenient to set the
239: arbitrary zero of time at the present, i.e., at the time you observe
240: the phenomenon, and to let $t_0$ denote the time when the phenomenon
241: starts. With these choices, the present age is $\tp=-t_0$, and the
242: future duration is $\tf=T+t_0$. The phenomenon is now characterized
243: by two variables, either $t_0$ and $T$ or $\tp$ and $\tf$.
244:
245: The temporal Copernican principle---that you are not at a special
246: time relative to the phe\-no\-menon---is implemented by saying that
247: all starting times are equally likely, independent of duration~$T$.
248: More precisely, one requires that the joint probability density be
249: invariant under time translations,$^{18}$ which yields the unique
250: probability density
251: \begin{equation}
252: p(\tp,\tf)=p(t_0,T)=\gamma w(T)\;,
253: \end{equation}
254: where $\gamma$ is a constant, describing a uniform distribution for
255: the starting time. That this distribution cannot be normalized turns
256: out not to be a problem, but it can be dealt with at this stage, if
257: desired, by cutting off the distribution at very large negative and
258: positive values of~$t_0$.
259:
260: It is instructive to think about the joint probability density in
261: terms of an ensemble made up of many instances of the same
262: phenomenon. We can picture this ensemble, which I call an {\it
263: unrestricted Copernican ensemble}, as a population distributed in a
264: plane whose horizontal axis is labeled by $\tp$ and whose vertical
265: axis is labeled by $\tf$. The population density is proportional to
266: the probability density $p(\tp,\tf)=\gamma w(t_p+t_f)$. The
267: Copernican plane is depicted in Fig.~\ref{fig1}.
268:
269: \begin{figure}[t]
270: \begin{center}
271: \includegraphics[height=12cm]{fig1}
272: \end{center}
273: \vspace{-24pt}
274: \caption{The $\tp$-$\tf$ plane on which the {\it unrestricted
275: Copernican ensemble\/} resides.
276: \label{fig1}}
277: \end{figure}
278:
279: The duration $T=\tp+\tf$ labels an axis that points symmetrically
280: into the first quadrant. The constraint of nonnegative durations
281: means that the ensemble occupies the upper right half-plane, which
282: splits naturally into three regions:
283: \begin{enumerate}
284: \item{The upper left wedge ($\tp=-t_0<0$), in which the phenomenon
285: has not yet begun.}
286: \item{The lower right wedge ($\tf<0$), in which the phenomenon is over.}
287: \item{The first quadrant ($\tp\ge0$, $\tf\ge0$), in which the phenomenon is
288: in progress.}
289: \end{enumerate}
290: There are two instructive ways of dividing the unrestricted
291: Copernican ensemble into subensembles. First, for each starting time
292: $t_0=-\tp$, the population along the associated vertical line is the
293: subensemble of durations for a phenomenon that starts at $t_0$. The
294: translational symmetry of the Copernican ensemble means that all
295: these {\it unrestricted vertical subensembles\/} describe the same
296: distribution of durations, given by the prior density $w(T)$. Second,
297: population is distributed uniformly along the diagonal lines of
298: constant duration $T$, each of which can be called an {\it
299: unrestricted diagonal subensemble}.
300:
301: \begin{figure}[t]
302: \begin{center}
303: \includegraphics[height=10cm]{fig2}
304: \end{center}
305: \vspace{-24pt} \caption{First quadrant, on which resides the {\it
306: (truncated) Copernican ensemble}, which applies to a phenomenon in
307: progress. The truncated Copernican ensemble is obtained by lopping
308: off Regions~1 and~2 of the unrestricted Copernican ensemble. The
309: Copernican ensemble is an idealized sample of phenomena with
310: uniformly random starting times $t_0$ (or present ages $\tp=-t_0$)
311: and with duration distributed along each vertical subensemble
312: according to the prior density~$w(T)$. Since the starting time is
313: uniformly random, population is distributed uniformly along each
314: diagonal subensemble. Gott's delta-$t$ argument is that the fraction
315: of population with $\tf\ge Y\tp$ within each diagonal subensemble is,
316: by the elementary geometry illustrated in the figure, $1/(1+Y)$.
317: This fraction being the same for each diagonal subensemble, it also
318: applies to the entire Copernican ensemble, giving Gott's
319: rule~(\protect\ref{eq:Grule}). The content of Gott's rule is the
320: trivial statement that a fraction $X$ of the members in the
321: Copernican ensemble have an age less than a fraction $X$ of their
322: eventual duration. This trivial statement does not authorize any
323: prediction of future duration based on present age because the
324: present age is unknown. Once the present age is known, predictions
325: of future duration are made within the vertical subensemble
326: corresponding to the observed age and thus are governed by the prior
327: density $w(T)$, but with those durations ruled out by the observed
328: age discarded.\label{fig2}}
329: \end{figure}
330:
331: To discuss your observation requires taking into account that you are
332: only interested in the situation, denoted by $I$, where you find the
333: phenomenon to be in progress. Imposing this condition requires you
334: to lop off the regions of the unrestricted Copernican ensemble that
335: correspond to the phenomenon not having begun or having finished
336: [Regions~1 and~2 of Fig.~\ref{fig1}]. This leaves the {\it
337: (truncated) Copernican ensemble\/} depicted in Fig.~\ref{fig2}, which
338: occupies the first quadrant of the $\tp$-$\tf$ plane. The
339: probability density for the truncated Copernican ensemble is given by
340: \begin{equation}
341: p(\tp,\tf|I)={w(\tp+\tf)\over\overline T}\;,\quad\mbox{$\tp\ge0$, $\tf\ge0$,}
342: \label{eq:joint}
343: \end{equation}
344: where $\overline T$ is a normalization constant equal to the mean
345: value of the total duration with respect to~$w(T)$. Truncating the
346: unrestricted Copernican ensemble also truncates the unrestricted
347: diagonal and vertical subensembles. In the following, the
348: designation ``truncated'' is often omitted; an undesignated ensemble
349: is always the truncated one.
350:
351: A {\it diagonal subensemble\/} lives on a diagonal line of constant
352: $T$. Along the diagonal line, population is distributed uniformly,
353: and the total population is weighted by $Tw(T)$. The Copernican
354: principle is the statement that the population within each diagonal
355: subensemble is distributed uniformly, with no bias toward the past or
356: the future, as is expressed by the fact that the joint
357: density~(\ref{eq:joint}) depends only on $T$. A {\it vertical
358: subensemble\/} lives on a line of constant $t_p$; it has the same
359: population density as the corresponding unrestricted vertical
360: ensemble, except that durations that correspond to the phenomenon's
361: having already finished, $T<\tp$ ($\tf<0$), are not part of the
362: ensemble and have no population. The Copernican ensemble is an
363: idealization of a sample of phenomena in progress, with random
364: starting times and durations distributed according to the prior
365: density $w(T)$.
366:
367: Now suppose you ask for the probability that $\tf\ge Y\tp$ for a
368: phenomenon selected from the truncated Copernican ensemble. Within
369: each diagonal subensemble this probability is given by the fraction
370: of the length of the diagonal line that lies above the line
371: $\tf=Y\tp$ shown in Fig.~\ref{fig2}. This ratio, from elementary
372: geometry, is $1/(1+Y)$, and this ratio is the delta-$t$ argument. Since
373: this fraction is the same for all the diagonal subensembles, it gives
374: the probability that $\tf\ge Y\tp$ within the entire truncated
375: Copernican ensemble. The result is Gott's rule~(\ref{eq:Grule}),
376: written here as
377: \begin{equation}
378: P(\tf\ge Y\tp|I)={1\over1+Y}\;.
379: \end{equation}
380: Notice that this probability is independent of the prior density
381: $w(T)$; it is wholly determined by the time-translation symmetry of
382: the Copernican ensemble. The rule is particularly easy to understand
383: for the case $Y=1$: half the members of the Copernican ensemble lie
384: above (below) the line $\tf=\tp$ and thus have a future duration that
385: is greater than (less than) their present age. We conclude that
386: Gott's rule {\it is\/} a universal expression of the Copernican
387: principle for a phenomenon drawn from the entire Copernican ensemble,
388: i.e., for a phenomenon known to be in progress, but whose present age
389: is unknown.
390:
391: Gott's rule as a universal expression of the Copernican principle has
392: precisely the content that a fraction $X$ of the members in the
393: Copernican ensemble have an age less than a fraction $X$ of their
394: eventual duration. This trivial conclusion is what the Copernican
395: principle tells you: you know a phenomenon is in progress, but you
396: know neither when it started nor when it will end, so you judge
397: yourself equally likely to be at any point in the phenomenon's life.
398: This trivial conclusion is of very little interest, because the
399: present age being unknown, the rule has no predictive power. What
400: attracts attention to Gott's work is that he repeatedly uses his rule
401: in a different way, to make probabilistic predictions of the future
402: longevity of particular phenomena whose present age is known.
403:
404: \begin{figure}
405: \begin{center}
406: \includegraphics[height=14cm]{fig3}
407: \end{center}
408: \vspace{-18pt}
409: %
410: \caption{(a)~Unrestricted vertical subensemble, in which population
411: is distributed according to the prior density~$w(T)$.
412: (b)~Unrestricted Copernican ensemble, created from many copies of the
413: unrestricted vertical ensemble (fifteen copies, including the
414: vertical axis, are shown), each corresponding to a different starting
415: time $t_0=-\tp$. Gott's Copernican principle is the statement that
416: all starting times are equally likely. (c)~(Truncated) Copernican
417: ensemble, which describes phenomena in progress. It is created by
418: removing from the unrestricted Copernican ensemble the regions that
419: correspond to phenomena not yet begun and already completed
420: (Regions~1 and 2 of Fig.~\ref{fig1}). In particular, each vertical
421: subensemble is truncated by removing the part with $T<\tp$ ($\tf<0$).
422: Population is distributed uniformly along the diagonal subensembles
423: of constant total duration~$T$, one of which is shown. (d)~Vertical
424: subensemble chosen by an observation of present age $\tp$.
425: Predictions within this vertical subensemble are governed by a
426: renormalized prior density, $w(T)/\Pi(\tp)$, with durations ruled out
427: by the observed age omitted. Steps~(b) and~(c) of this process can
428: be short-circuited by going directly from~(a) to~(d). Imagining many
429: copies of the unrestricted vertical ensemble, as is done in
430: implementing the temporal Copernican principle and thus constructing
431: the Copernican ensemble, or even having an approximation to the
432: Copernican ensemble available cannot increase your power to predict
433: the future duration of a phenomenon with a particular present age.
434: This is particularly clear in the special case of a phenomenon whose
435: total duration $T$ is known in advance, so that only one diagonal
436: subensemble is populated, say, the one shown in~(c). At the stage of
437: the truncated Copernican ensemble in~(c), the present age and future
438: duration are strictly correlated, but randomly distributed within the
439: interval $[0,T]$, thus giving Gott's delta-$t$ argument. Once you
440: observe the present age, however, the future duration is known and is
441: certainly not governed by Gott's rule.
442: %
443: \label{fig3}}
444: \end{figure}
445:
446: We thus need to determine what you can say when you discover the
447: present age. Your probabilistic predictions are then determined by
448: the distribution of population within the vertical subensemble whose
449: members have the observed present age. It is clear from
450: Fig.~\ref{fig2} that the probability density for future duration
451: within this subensemble---this is the conditional probability density
452: for $\tf$ given $\tp$---is proportional to $w(\tp+\tf)$. Properly
453: normalized, this conditional density becomes
454: \begin{equation}
455: p(\tf|\tp,I)=w(\tp+\tf)/\Pi(\tp)\;,\quad\mbox{$\tf\ge0$,}
456: \label{eq:cond}
457: \end{equation}
458: where the normalization constant,
459: \begin{equation}
460: \Pi(\tp)=\int_0^\infty d\tf\,w(\tp+\tf)=\int_{\tp}^\infty dT\,w(T)\;,
461: \end{equation}
462: is the survival probability, i.e., the probability for the phenomenon
463: to survive at least a time $\tp$. The conditional probability
464: density~(\ref{eq:cond}) gives the probabilities you should use for
465: making predictions of future duration based on present age. It has a
466: very simple interpretation: once you determine the present age, you
467: rule out total durations shorter than the observed age, and you use
468: the prior density, suitably renormalized, for total durations longer
469: than the observed age. This is what you would have done had you not
470: bothered to introduce the Copernican ensemble, but rather worked
471: directly within an unrestricted vertical ensemble.$^7$
472:
473: The process of constructing an unrestricted Copernican ensemble,
474: truncating to take account that the phenomenon is in progress, and
475: observing the present age is depicted in Fig.~\ref{fig3}.
476:
477: One way to construct the vertical subensemble for present age $\tp$
478: is to select, from each diagonal subensemble with $T\ge\tp$, the
479: subpopulation that has age $\tp$. That population is distributed
480: uniformly within the rest of each diagonal subensemble is irrelevant
481: to the statistics of a phenomenon drawn from a vertical subensemble.
482: This is why the Copernican principle has no bearing on predictions of
483: future duration based on present age. Indeed, once you discover the
484: present age, the probability that $\tf\ge Y\tp$ is
485: \begin{equation}
486: P(\tf\ge Y\tp|\tp,I)
487: =\int_{Y\tp}^\infty d\tf\,p(\tf|\tp,I)
488: ={\Pi\Bigl((1+Y)\tp\Bigr)\over \Pi(\tp)}\;.
489: \label{eq:rightrule}
490: \end{equation}
491: This is the predictive form of the desired probability, predictive
492: because it is conditioned on the present age. It is determined
493: completely by the prior density and coincides with Gott's
494: rule~(\ref{eq:Grule}) only for a special choice of prior density,
495: which is identified in Sec.~\ref{sec:Bayesian} and discussed further
496: in Sec.~\ref{sec:conclusion}. We conclude that Gott's rule should
497: not be used indiscriminately to make probabilistic predictions of
498: future duration based on present age.
499:
500: All your prior information about a phenomenon's total duration is
501: incorporated in the prior density $w(T)$. Often you can improve your
502: predictions of future longevity by studying a phenomenon as it
503: progresses, gathering information about its particular history. In
504: the absence of gathering additional information, however, all
505: predictions about future longevity must arise from the prior density.
506: That Gott's rule, as it comes from the delta-$t$ argument, is
507: independent of the prior density is a dead give-away that it has no
508: predictive power. Since any prior density can be embedded in a
509: Copernican ensemble, it is clear that the Copernican principle does
510: not restrict the prior density in any way and thus is irrelevant to
511: predicting future longevity.
512:
513: \section{Bayesian analysis of Gott's rule}
514: \label{sec:Bayesian}
515:
516: Gott has endorsed$^{11}$ a Bayesian derivation of his rule, which was
517: introduced by Buch$^{12}$ in the only technical comment {\it
518: Nature\/} has published on Gott's original article. The input to
519: Buch's analysis is the prior density $w(T)$ and the assertion that
520: given the duration~$T$, present age $\tp$ is uniformly distributed
521: within the interval $[0,T]$:
522: \begin{equation}
523: q(\tp|T)=\cases{
524: 1/T\;,&$0\le\tp\le T$,\cr
525: 0\;,&$\tp>T$.}
526: \label{eq:qtpT}
527: \end{equation}
528: A simple application of Bayes's rule gives
529: \begin{equation}
530: q(T|\tp)={q(\tp|T)w(T)\over q(\tp)}=
531: \cases{
532: w(T)/Tq(\tp)\;,&$T\ge\tp$,\cr
533: 0\;,&$T<\tp$,
534: }
535: \label{eq:qTtp}
536: \end{equation}
537: where
538: \begin{equation}
539: q(\tp)=\int_{\tp}^\infty dT\,{w(T)\over T}
540: \end{equation}
541: is the unconditioned probability density for present age $\tp$. The
542: conditional probability that $\tf\ge Y\tp$, given $\tp$, takes the
543: form
544: \begin{equation}
545: Q(\tf\ge Y\tp|\tp)=
546: \int_{(1+Y)\tp}^\infty dT\,q(T|\tp)=
547: {q\Bigl((1+Y)\tp\Bigr)\over q(\tp)}\;.
548: \end{equation}
549: If you use the (unnormalizable) prior density $w(T)=1/T$, this result
550: reduces to Gott's rule, in a predictive form:
551: \begin{equation}
552: Q(\tf\ge Y\tp|\tp)={1\over1+Y}\;.
553: \end{equation}
554: The prior $w(T)=1/T$, called the {\it Jeffreys prior},$^{18}$ has the
555: unique status of being the only distribution on the interval
556: $[0,\infty]$ that is invariant under scale changes. Thus this
557: Bayesian derivation concludes with the appealing result that Gott's
558: rule, as a genuinely predictive rule for future duration given
559: present age, follows from assuming a prior that has no built-in time
560: scales.
561:
562: The only problem with this neat conclusion is that this Bayesian
563: derivation is dead wrong. This is evident from the
564: posterior~(\ref{eq:qTtp}), which is not just the original prior with
565: excluded durations given zero probability, as in the process of
566: lopping off the already completed phenomena from the unrestricted
567: ensembles to get the truncated ensembles. The analysis gets right
568: that the posterior probability is zero for durations $T<\tp$ that are
569: ruled out by the observation of present age $\tp$, but it doesn't use
570: a renormalized version of the prior density for the durations that
571: are still allowed, i.e., for $T\ge\tp$. This must be wrong because
572: your prior density $w(T)$ already contains your entire judgment about
573: the future duration of the phenomenon should it survive to age $\tp$.
574: In the absence of getting additional information, there is nothing to
575: justify changing your judgment about future duration when you learn
576: that the phenomenon has indeed survived to age~$\tp$.
577:
578: The question then is where this apparently innocuous Bayesian
579: analysis goes wrong. It is not hard to determine that. The error
580: lies in using the uniform conditional probability density $q(\tp|T)$
581: of Eq.~(\ref{eq:qtpT}) in conjunction with the prior density $w(T)$.
582: Within the unrestricted Copernican ensemble, where it is correct to
583: use $w(T)$, learning the duration $T$ tells you nothing about the
584: present age, as is evident from considering the unrestricted diagonal
585: subensemble in Fig.~\ref{fig1}. This is confirmed by a trivial
586: application of Bayes's rule to the uncorrelated variables $\tp$
587: and~$T$: $p(\tp|T)=p(t_0,T)/w(T)=\gamma$. It is simply not
588: consistent with the unrestricted Copernican ensemble to use the
589: uniform conditional probability density~(\ref{eq:qtpT}).
590:
591: The natural thing then is to try the truncated Copernican ensemble of
592: Fig.~\ref{fig2}, which applies once you know the phenomenon is in
593: progress. Then it is correct to use a uniform conditional density
594: for $\tp$, i.e.,
595: \begin{equation}
596: p(\tp|T,I)=\cases{
597: 1/T\;,&$0\le\tp\le T$,\cr
598: 0\;,&$\tp>T$.}\;,
599: \end{equation}
600: as is evident from considering the truncated diagonal subensemble in
601: Fig.~\ref{fig2}, but it is not correct to use the prior
602: density~$w(T)$. Once you know the phenomenon is in progress, you must
603: weight $w(T)$ by a factor of $T$, which comes from the ``lengths'' of
604: the truncated diagonal subensembles being proportional to $T$.
605: Formally, one has
606: \begin{equation}
607: p(T|I)=\int d\tp\,d\tf\,p(\tp,\tf|I)\delta(T-\tp-\tf)={Tw(T)\over\overline T}\;.
608: \label{eq:pTI}
609: \end{equation}
610: The factor of $T$ here is not optional. It is {\it required\/} once
611: you have decided to describe the phenomenon in terms of two temporal
612: variables and to impose the time-translation symmetry of the
613: Copernican principle on the joint probability density. To put it
614: more succinctly, it is required once you decide to use an ensemble of
615: phenomena with random starting times.
616:
617: Once one realizes that the factor of $T$ is present in $p(T|I)$,
618: the Bayesian inference of Eq.~(\ref{eq:qTtp}) is replaced by
619: \begin{equation}
620: p(T|\tp,I)={p(\tp|T,I)p(T|I)\over p(\tp|I)}=
621: \cases{
622: w(T)/\Pi(\tp)\;,&$T\ge\tp$,\cr
623: 0\;,&$T<\tp$,
624: }
625: \end{equation}
626: since the probability density of $\tp$ is given by
627: \begin{equation}
628: p(\tp|I)=\int_0^\infty d\tf\,p(\tp,\tf|I)={\Pi(\tp)\over\overline T}\;.
629: \end{equation}
630: This correct Bayesian analysis is thus in accord with the obvious
631: inference of truncating the unrestricted vertical ensemble to get the
632: conditional probability density for $T$, given $\tp$.
633:
634: Because of the additional factor of $T$ in this correct analysis, the
635: (unnormalizable) prior density that gives a predictive version of
636: Gott's rule turns out to be $w(T)=1/T^2$. This prior density plays a
637: special role in this problem because it is the unique distribution on
638: the first quadrant of the $\tp$-$\tf$ plane that is (i) constant on
639: lines of constant $T$ and (ii) invariant under simultaneous scale
640: changes of $\tp$ and $\tf$. Formally, with this prior, we can write
641: [see Eq.~(\ref{eq:rightrule})]
642: \begin{equation}
643: P(\tf\ge Y\tp|\tp,I)={1\over1+Y}\;,
644: \end{equation}
645: since $\Pi(\tp)=1/\tp$. Thus Gott's rule, in a predictive form,
646: emerges from a prior $w(T)=1/T^2$ that has no time scales into the
647: past or future; alternatively, one can say that this predictive form
648: of Gott's rule arises when the probability density for $T$ within the
649: truncated Copernican ensemble, i.e., $p(T|I)$ of Eq.~(\ref{eq:pTI}),
650: is the Jeffreys prior.
651:
652: \section{Conclusion}
653: \label{sec:conclusion}
654:
655: The best way to test belief in probabilistic predictions is to offer
656: a bet based on those predictions. For that purpose, I sent an e-mail
657: on 1999~October~21 and again on 1999~December~2 to my department's
658: most comprehensive e-mail alias, which included faculty, staff, and
659: graduate students, requesting information on pet dogs. The responses
660: were compiled and checked for accuracy on 1999~December~6; a
661: notarized list of the 24~dogs, including each dog's name, date of
662: birth, breed, and caretaker, was deposited in my departmental
663: personnel file on 1999~December~21. In accordance with his practice
664: for other phenomena, Gott would have made a prediction for each dog's
665: future prospects based on its age. In particular, he would have
666: predicted that each dog would survive beyond twice its age with
667: probability $1/2$.
668:
669: For the youngest and oldest dogs on the list, Gott's predictions
670: offered favorable opportunities for betting. I chose to focus on the
671: oldest dogs, and for each of the six dogs above ten years old on the
672: list, I offered$^7$ to bet Gott \$1,000\,US that the dog would not
673: survive to twice its age on 1999~December~3. To sweeten the pot, I
674: offered Gott 2:1 odds in his favor. Gott refused the bets on the
675: grounds that ``I don't do bets.''$^{19}$ If he had believed his own
676: predictions, his expected gain would have been \$3,000\,US, and the
677: probability that he would have been a net loser on the six bets was
678: 7/64=0.11. I contacted the caretakers during May and June of 2008
679: and verified that all six dogs have died. Thus, as I fully expected,
680: I would have won all the bets and been \$6,000\,US richer. Even with
681: the current reduced state of the US dollar, that would have been
682: enough to buy a very nice piece of Australian aboriginal art.
683:
684: More revealing than Gott's blanket refusal to bet was his excuse that
685: his rule only applies to a random dog chosen from my sample,$^{19}$
686: which is another way of saying that his rule applies to a sample of
687: dogs drawn from the truncated Copernican ensemble, i.e., a sample
688: selected without regard to present age. In discussions of the 44 New
689: York plays$^{\hbox{\scriptsize2--4,6}}$ and of his own
690: longevity,$^{2,6}$ Gott has also suggested that a fair test of the
691: Copernican hypothesis should involve a large sample selected without
692: regard to present age. As we have seen, Gott is quite right on this
693: score: his rule {\it does\/} apply to a phenomenon whose present age
694: is unknown. If this were all Gott claimed, however, no one would pay
695: attention, because the universal form of his rule, applicable when
696: the present age is unknown, has no predictive power. What grabs
697: attention is that in case after case, Gott uses his rule to make
698: predictions of the future longevity of individual phenomena whose
699: present age is known. In the language of this paper, Gott makes
700: predictions for the vertical subensembles, but only wants to bet on
701: the entire Copernican ensemble.
702:
703: It is obvious that in a large sample of dogs selected without regard
704: to age, roughly half the dogs, within the inevitable statistical
705: fluctuations, will be in the first half of their lives, with the rest
706: in the second half. This is the trivial content of Gott's Copernican
707: principle. It is equally obvious that having a sample in which half
708: the dogs are in the first half of their lives does not imply that any
709: particular dog in the sample has a probability of $1/2$ to survive
710: beyond twice its present age. Yet the elementary error of making
711: this implication underlies all of Gott's predictions.
712:
713: We have seen, at the end of Sec.~\ref{sec:Bayesian}, that there is a
714: particular (unnormalizable) prior density, $w(T)=1/T^2$, which does
715: give Gott's rule in a predictive form.$^7$ Although the prior
716: density~$1/T^2$ does not appear in any of Gott's publications, it has
717: a special status in that it is the unique prior density that makes
718: the Copernican probability $p(\tp,\tf|I)$ invariant under
719: simultaneous rescaling of the past and the future. Use of this prior
720: is the only license for Gott's predictions. When you can't identify
721: any time scales, Gott's rule is your best bet for making predictions
722: of future duration based on present age.
723:
724: For most phenomena, including many that Gott discusses, especially
725: those involving human institutions and creations, it is easy to
726: identify important time scales.$^7$ Although it is often difficult
727: to incorporate these time scales into a prior probability, it is
728: always a good idea to try. This having been said, it is usually the
729: case that formulating prior information precisely is of less value
730: than observing a phenomenon as it progresses, since readily available
731: current information is more cogent than prior information for
732: predicting the future.
733:
734: Although there is little love lost between White Sox fans and fans of
735: the Chicago Cubs, I like to think that {\sl New York Times\/} writer
736: Jim Glanz, having experienced a Sox World Series win in his lifetime,
737: sympathizes with the plight of Cubs fans, who haven't seen a World
738: Series title since 1908. Gott would predict, with 95\% confidence,
739: that they won't win a Series in the next three years, but will win
740: one before 5868. Perhaps more to the point, he would predict with
741: probability $1/2$ that they won't bring home a title in the next
742: 99~years. We are immediately skeptical of Gott's prediction. For
743: example, giving each of the 30 clubs an equal chance each year sets
744: the probability of a 99-year drought at $(29/30)^{99}=0.035$. It's
745: not that this is the ``right'' way to calculate the probability, but
746: it does show that a reasonable assumption gives quite a different
747: answer from Gott's rule.
748:
749: The reason Gott's prediction for the Cubs is so unreasonable is that
750: there are readily identifiable time scales---the length of a typical
751: player's career, the turnover in owners and management, etc.---that
752: are well short of 99~years and suggest that the Cubs might get their
753: act together much sooner. Indeed, as of June~21, they have the best
754: record in North American baseball and are leading the National League
755: Central division. Still, Cubs fans know to keep some pessimism in
756: reserve.
757:
758: Suppose a fan at a Cubs game at Wrigley Field in Chicago got up and
759: announced to great fanfare that half the people at the game were in
760: the first half of their life. Everyone would yawn (except perhaps
761: the technically sophisticated, who might wonder about whether the
762: attendees are a representative sample of all ages, although a ball
763: game is probably not a bad sample in this regard).
764:
765: Suppose, however, that the fan marched up to parents holding a
766: one-month-old infant and proclaimed, ``Gott says your baby has a
767: 2.5\% chance of dying before tomorrow's game,'' or informed the
768: 60-year-old next to him, ``Gott says you have a 50\% chance of living
769: to 120.'' Both these predictions would garner attention, as
770: applications of Gott's rule often do. The parents would probably
771: call security and ask that the fan be removed. The 60-year-old might
772: reply, with the ingrained pessimism of Cubs fans, ``God only knows,
773: but maybe if I lived to 120, I could see the Cubs win a Series.'' His
774: seatmate would pour cold water on that: ``Don't get your hopes up.
775: Gott gives, and Gott takes away. You might live to 120, but Gott
776: says there's only a 38\% chance the Cubs will win the Series by
777: then. There's only a 50\% chance they'll win before you're 160.''
778:
779: Gott's rule makes absurd predictions for human longevity and other
780: human activities because there are readily identifiable time scales,
781: the most obvious of which is the average human life span, that render
782: application of his rule entirely inappropriate. If he continues to
783: believe his rule makes nontrivial, universal predictions for the
784: future duration of individual phenomena, it's time he took some bets.
785:
786: \vspace{9pt}
787: \hspace{1truein}\hrulefill\hspace{1truein}
788: \vspace{9pt}
789:
790: $^1$\,J.~R. Gott~III, ``Implications of the Copernican principle for
791: our future prospects,'' {\sl Nature\/} {\bf 363}, 315 (1993).
792:
793: \vspace{4pt}
794: $^2$\,J.~R. Gott~III, ``Our future in the Universe,'' in
795: {\sl Clusters, Lensing, and the Future of the Universe}, Astronomical
796: Society of the Pacific Conference Series, Vol.~88, edited by
797: V.~Trimble and A.~Reisenegger (Astronomical Society of the Pacific,
798: San Francisco, 1996), p.~140.
799:
800: \vspace{4pt}
801: $^3$\,J.~R. Gott~III, ``A grim reckoning,'' {\sl New Scientist\/}
802: {\bf 156}\,(No.~2108), 36 (1997 November~15).
803:
804: \vspace{4pt}
805: $^4$\,J.~R. Gott~III, ``The Copernican principle and human survivability,''
806: in {\sl Human Survivability in the 21st Century}, Transactions of the
807: Royal Society of Canada, Series~VI, Vol.~IX, edited by D.~M.~Hayme
808: (University of Toronto Press, Toronto, 1999), p.~131.
809:
810: \vspace{4pt}
811: $^5$\,J.~R. Gott~III, ``Colonies in space; Will we plant colonies
812: beyond the Earth before it is too late?" {\sl New Scientist\/}
813: {\bf 195}\,(No.~2620), 51 (2007 September~8).
814:
815: \vspace{4pt}
816: $^6$\,J.~R. Gott~III, {\sl Time Travel in Einstein's Universe\/}
817: (Houghton Mifflin, Boston, 2001), Chap.~5.
818:
819: \vspace{4pt}
820: $^7$\,C.~M. Caves, ``Predicting future duration from present age:
821: A critical assessment,'' {\sl Contemporary Physics\/} {\bf 41}, 143
822: (2000).
823:
824: \vspace{4pt}
825: $^8$\,T.~Ferris, ``How to predict everything: Has the physicist
826: J.~Richard Gott really found a way?'' {\sl The New Yorker\/}
827: {\bf 75}\,(18), 35 (1999 July~12).
828:
829: \vspace{4pt}
830: $^9$\,J. Tierney, ``A survival imperative for space colonization,''
831: {\sl The New York Times\/} (2007 July~17).
832:
833: \vspace{4pt}
834: $^{10}$\,J. Glanz, ``Point, counterpoint and the duration of everything,''
835: {\sl The New York Times\/} (2000 February~8).
836:
837: \vspace{4pt}
838: $^{11}$\,J.~R. Gott~III, ``Future prospects discussed---Gott replies,''
839: {\sl Nature\/} {\bf 368}, 108 (1994).
840:
841: \vspace{4pt}
842: $^{12}$\,P.~Buch, ``Future prospects discussed,'' {\sl Nature\/}
843: {\bf 368}, 107 (1994).
844:
845: \vspace{4pt}
846: $^{13}$\,A.~Ledford, P.~Marriott, and M. Crowder, ``Lifetime prediction from
847: only present age: Fact or fiction?'' {\sl Physics Letters~A\/} {\bf 280},
848: 309 (2001).
849:
850: \vspace{4pt}
851: $^{14}$\,K.~D. Olum, ``The doomsday argument and the number of possible
852: observers,'' {\sl Philosophical Quarterly\/} {\bf 52}, 164 (2002).
853:
854: \vspace{4pt}
855: $^{15}$\,E.~Sober, ``An empirical critique of two versions of the doomsday
856: argument---Gott's line and Leslie's wedge,'' {\sl Synthese\/} {\bf 135},
857: 415 (2003).
858:
859: \vspace{4pt}
860: $^{16}$\,L.~Bass, ``How to predict everything: Nostradamus in the role of
861: Copernicus,'' {\sl Reports of Mathematical Physics\/} {\bf 57}, 13 (2006).
862:
863: \vspace{4pt}
864: $^{17}$\,B.~Monton and B.~Kierland, ``How to predict future duration from
865: present age,'' {\sl Philosophical Quarterly\/} {\bf 56}, 16 (2006).
866:
867: \vspace{4pt}
868: $^{18}$\,E.~T.~Jaynes, {\sl Probability Theory: The Logic of
869: Science}, edited by G.~L. Bretthorst (Cambridge University Press,
870: Cambridge, England, 2003), Chap.~12.
871:
872: \vspace{4pt}
873: $^{19}$\,``Life, longevity, and a \$6\,000 bet,'' {\sl
874: Physics World} (2000 February~11), {\tt http://physicsworld\urldot
875: com\urlslash cws\urlslash article\urlslash news\urlslash 2890}.
876: Quotes from Gott in this short article are based on a long and
877: informative document entitled ``Random observations and the
878: Copernican Principle,'' which Gott posted to PhysicsWeb early in
879: 2000~February. As of this writing, I have been unable to find this
880: long document anywhere on the web, but I have a copy that can be made
881: available on request.
882:
883: \vspace{9pt}
884: \hspace{1truein}\hrulefill\hspace{1truein}
885: \vspace{9pt}
886:
887: {\it Author's note:} This paper was originally submitted to {\sl
888: Nature\/} on 2000 April~3 and was summarily rejected on the grounds
889: that {\sl Nature\/} had already published sufficient technical
890: comment$^{12}$ on Gott's original paper. Then I forgot about it,
891: though it's not clear to me now why I didn't post it to the preprint
892: archive. That was probably just pique, to which I was more subject
893: then than now. It's just as well, because the current version is, I
894: think, considerably improved. Three circumstances prompted me to
895: revive the paper: (i)~my 2007--08 sabbatical at the University of
896: Queensland, which has given me the gift of time; (ii)~John Tierney's
897: recent {\sl New York Times\/} article,$^9$ which showed that Gott's
898: predictions still have the power to fascinate; and (iii)~a
899: conversation with B.~J. Brewer of the University of Sydney, which
900: indicated that there would be interest in a simpler explanation of my
901: {\sl Contemporary Physics\/} article.$^7$ In preparing the current
902: version, I expanded the discussion of the delta-$t$ argument with the
903: aim of making it as simple and airtight as possible, incorporated a
904: discussion of Bayesian derivations of Gott's rule, updated the
905: references, and gathered information about the six dogs' ultimate
906: fates.
907:
908: \end{document}
909: