math0409165/typ.tex
1: \documentclass[12pt,notitlepage]{article}[1995/06/26]
2: %\usepackage{a4wide,epsfig,latexsym,amsfonts,amssymb,enumerate,amsmath}
3: \usepackage{a4wide,latexsym,amsfonts,amssymb,enumerate,amsmath}
4: \usepackage[dvips]{graphics}
5: %\usepackage[pdftex]{graphicx}
6: \usepackage{natbib}
7: \usepackage[english]{babel}
8: 
9: 
10: \newtheorem{defn}{Definition}[section]
11: \newtheorem{lem}[defn]{Lemma}
12: \newtheorem{thm}[defn]{Theorem}
13: \newtheorem{cor}[defn]{Corollary}
14: \newtheorem{assu}[defn]{Assumption}
15: %Dawid's conditional independence symbol.
16: \newcommand{\cip}{\mbox{$\perp\!\!\!\perp$}}
17: \newcommand{\proof}{\bigskip\noindent{\bf Proof.\enskip}}
18: \newcommand{\proofof}[1]{\bigskip\noindent{\bf Proof of #1.\enskip}}
19: \newcommand{\Endproof}{\ \hfill$\Box$\bigskip}
20: \newcommand{\GG}{g}
21: 
22: \begin{document}
23: 
24: \title{Estimating the causal effect of a time-varying treatment on
25: time-to-event using structural nested failure time models}
26: 
27: \author{Judith Lok, Richard Gill, Aad van der Vaart and James Robins\\
28: University of Leiden, Utrecht University, Vrije Universiteit Amsterdam\\
29: and Harvard University}
30: 
31: \date{July 2003}
32: 
33: \maketitle
34: 
35: \begin{abstract}
36: In this paper we review an approach to estimating the causal
37: effect of a time-varying treatment on time
38: to some event of interest. This approach is designed for
39: the situation where the treatment may have been repeatedly adapted to
40: patient characteristics, which themselves may
41: also be time-dependent. In this situation the effect of
42: the treatment cannot simply be estimated by conditioning on the patient
43: characteristics, as these may themselves be indicators of the
44: treatment effect. This so-called time-dependent confounding is typical
45: in observational studies. We discuss a new class of failure
46: time models, structural nested failure time models, which can be used
47: to estimate the causal effect of a time-varying treatment,
48: and present methods for estimating and testing the parameters of these models. 
49: \end{abstract}
50: 
51: \section{Introduction}
52: This paper offers a new approach to estimating, from observational
53: data, the causal effect of a time-dependent treatment on time to an
54: event of interest in the presence of time-dependent confounding
55: variables. This approach is based on a new class of failure time
56: models, the \emph{structural nested failure time models} (SNFTM). The
57: primary goal of this paper is to motivate the need for structural
58: nested failure time models. To achieve this goal in the most
59: straightforward manner, we shall assume that the event times are
60: observed without censoring, and that there is no missing or
61: misclassified data. Additional complications that arise when these
62: assumptions are not satisfied are discussed in Robins et al.~(1992)
63: and Robins~(1993).
64: 
65: The approach using SNTFMs will be useful in any observational study in
66: which there exist time-dependent risk factors that are also predictive
67: for subsequent exposure to the treatment under study, i.e.\ in any
68: study where there are time-dependent covariates that correlate with
69: the final outcome of the treatment, but also with the amount or type
70: of treatment over time.  This situation arises in any observational
71: study in which there is ``treatment by indication'', i.e.\ the
72: treatment is not predetermined by the investigator, but adapted to the
73: current condition of the patient.  The problem then is to distinguish
74: between treatment effect and selection bias (i.e.\ confounding). For
75: example, in an observational study for the effect of AZT treatment on
76: HIV-infected subjects, subjects with low CD4 lymphocyte counts at a
77: given time are subsequently at increased risk of developing AIDS and
78: are for that reason more likely to be treated with AZT. Thus the
79: covariate variables ``low CD4-count'' is a risk factor for AIDS, but
80: is also a predictor of subsequent treatment with AZT. The problem is
81: then to isolate the effect of AZT treatment as given according to a
82: predetermined plan (which may take into account covariates) from the
83: confounding effect of CD4-count.  As a second example, many physicians
84: withdraw women from exogenous estrogens at the time they develop an
85: elevated blood cholesterol, since both exogenous estrogens and
86: elevated blood cholesterol are considered possible cardiac risk
87: factors. Therefore, in a study of the effect of postmenopausal
88: estrogen on cardiac mortality, the covariate variables ``cholesterol
89: level'' is a predictor of subsequent exposure to estrogens, but also
90: correlates with the outcome ``cardiac mortality''. As a third example,
91: in observational studies of the efficacy of cervical cancer screening
92: on mortality, women who have had operative removal of their cervix due
93: to invasive disease are no longer at risk for further screening (i.e.\
94: exposure), but are at increased risk for death. Therefore, the
95: covariate, ``operative removal of the cervix'', is an independent risk
96: factor for death, but also a predictor of subsequent exposure. As a
97: final epidemiologic example, in occupational mortality studies,
98: unhealthy workers who terminate employment early are at increased risk
99: of death compared to other workers and receive no further exposure to
100: the chemical agent under study. Therefore, the time-dependent
101: covariate ``employment status'' is an independent risk factor for
102: death, and a predictor of exposure to the study agent.
103: 
104: Epidemiologists refer to the covariates in the preceding
105: examples as ``time-dependent confounders''. It may be
106: important to analyze the data from any of the above studies using the
107: approach presented in this paper.
108: 
109: For pedagogic purposes, we shall illustrate our models and assumptions
110: throughout the paper by the problem of estimating, from data obtained
111: in an observational study, the effect of treatment with the drug AZT
112: on time to clinical AIDS in asymptomatic subjects with newly diagnosed
113: human immunodeficiency virus (HIV) infection. We shall suppose that
114: measurements on current AZT dosage as well as on various
115: time-dependent covariates, such as weight, temperature, hematocrit,
116: and CD4-lymphocyte count, are recorded at regularly spaced time points,
117: until the development of clinical AIDS. These time points, which we
118: denote by $0=\tau_0<\tau_1<\tau_2<\cdots<\cdots<\tau_K$, may for
119: instance correspond to clinic visits at which the measurements are
120: obtained, with time defined as time since the diagnosis of HIV
121: infection.
122: 
123: Our goal will be to identify and estimate, for each \emph{treatment
124: regime}, the time-to-AIDS distribution that would have been observed
125: if (typically \emph{counter to fact}) each study subject had followed
126: the AZT treatment history prescribed by the regime. We shall call each
127: such distribution an AZT treatment regime-specific, counterfactual,
128: time-to-AIDS distribution. The treatment regimes we study need not be
129: static. A \emph{treatment regime} is a rule that assigns to each
130: possible covariate history through time $\tau_k$, an AZT dosage rate
131: $a_k$ to be taken in the interval $\left(\tau_k,\tau_{k+1}\right]$. A
132: simple example of a treatment regime is ``take an AZT dosage $a_k$ of
133: $1,000$ milligrams of AZT daily in the interval
134: $\left(\tau_k,\tau_{k+1}\right]$ if the hematocrit measured at
135: $\tau_k$ exceeds $30$; otherwise take no AZT in the interval''.
136: 
137: Our interest in AZT treatment regime-specific, counterfactual
138: time-to-AIDS distributions is based on the following
139: considerations. Suppose, after the completion of the study, a further
140: individual with newly diagnosed HIV infection, whom we shall call
141: ``the infected subject'', wishes to use the data from the completed
142: study to select the AZT dosage schedule that will maximize his
143: expected or median number of years of AIDS-free survival. If the
144: ``infected subject'' is considered exchangeable with the subjects in
145: the trial, then he would wish to follow the AZT treatment regime whose
146: regime-specific, counterfactual time-to-AIDS distribution has the
147: largest expected or median value.
148: 
149: In Section~\ref{Gcomps} we show that the AZT treatment
150: regime-specific, counterfactual time-to-AIDS distributions are
151: identified from the observed data under the assumption that the
152: investigator has succeeded in recording sufficient data on the history
153: of all covariates to ensure that, at each time $\tau_k$, given the
154: covariate history and the AZT treatment history up till $\tau_k$, the
155: AZT dosage rate in $\left(\tau_k,\tau_{k+1}\right]$ is independent of
156: the regime-specific, counterfactual time-to-AIDS. Robins~(1992)
157: refers to this assumption as the assumption of \emph{no unmeasured
158: confounding factors}. In other words, under this assumption at each
159: time point the treatment can be viewed as depending only on recorded
160: information up till that point and external factors that are not
161: predictive of (counterfactual) survival.
162: 
163: In Section~\ref{repars} we introduce \emph{structural nested failure
164: time models (SNFTM)}. An SNFTM models the magnitude of the causal
165: effect of a (final) blip of AZT treatment in the interval
166: $\left(\tau_k,\tau_{k+1}\right]$ on time-to-AIDS, as a function of
167: past AZT and covariate history.  We show that, under the assumption of
168: no unmeasured confounding, the null hypothesis of no causal effect of
169: AZT on time-to-AIDS is equivalent to the null hypothesis that the
170: parameter vector of any SNFTM is $0$. 
171: 
172: The term ``structural'' in SNFTM derives terminology used
173: in the social science and econometric literature (e.g.\ Rubin~(1978)).
174: Our models are ``structural'', because they
175: directly model regime-specific, counterfactual time-to-AIDS
176: distributions. In Sections~\ref{mlesec} and~\ref{Gest} we discuss
177: two different methods to fit SNFTMs and to use them for inference.
178: 
179: In Section~\ref{mlesec} we show that, under the assumption of no
180: unmeasured confounding, SNFTMs can be understood as a component of a
181: particular reparameterization of the joint distribution of the
182: observables. We use this reparameterization to develop
183: likelihood-based tests of the causal null hypothesis of no effect of
184: AZT-exposure on time-to-AIDS. We also show how to estimate the
185: AZT-treatment regime-specific, counterfactual time-to-AIDS
186: distributions, in the case that the null hypothesis of no causal
187: effect of AZT on time-to-AIDS is rejected.
188: 
189: In Section~\ref{Gest} we present an alternative, semiparametric
190: approach to test the null hypothesis of no treatment effect and to
191: estimate the parameters in an SNFTM. This approach,
192: \emph{G--estimation}, has the advantage of avoiding for
193: parameterization of the distributions appearing in the
194: likelihood-based approach of Section~\ref{mlesec} (e.g. the
195: conditional distributions of covariates given past treatment- and
196: covariate history). Instead G--estimation uses a model
197: for the SNFTM and for the conditional distribution of treatment given
198: past treatment- and covariate history. Tests and estimators based on
199: G-estimation have the additional advantage that they can often be
200: calculated with standard software.
201: 
202: \section{Formalization of the problem}
203: We fix a discrete time frame $\tau_0=0<\tau_1<\tau_2<\ldots<\tau_K$
204: throughout the paper, where $\tau_0$ is the time of enrollment in the
205: study (and possibly also initiation of treatment), $\tau_1,\tau_2,\ldots$ are
206: the times of the clinic visits, and $\tau_K$ can be the time of the
207: last clinic visit, or can be chosen past the upper support point of
208: the time-to-AIDS distribution. For simplicity the times of the clinic
209: visits are assumed to be the same for all patients (as long as they
210: are alive).
211: 
212: At each time point $\tau_k$ we measure a covariate vector $L_k$ for
213: each patient, where $L_0$ may also contain time-independent covariates
214: and information collected before time $\tau_0$, and we register the
215: treatment given in the interval $(\tau_k, \tau_{k+1}]$ in a variable
216: $A_k$, for instance the AZT dosage, assumed constant during the
217: interval. Besides covariates $L_k$ and treatments $A_k$, we observe
218: for each person a positive time $T$, for instance the time from
219: enrollment to the development of clinical AIDS. Thus the data observed
220: on one person is a vector $(\overline L_K,\overline A_K,T)$, where,
221: for each $k=0,1,\ldots,K$,
222: \begin{eqnarray*}
223: \overline L_k&=&(L_0,L_1,\ldots, L_k),\\
224: \overline A_k&=&(A_0,A_1,\ldots, A_k).
225: \end{eqnarray*}
226: For time instances $\tau_k>T$ the values $L_k$ and $A_k$ may be
227: interpreted to be empty. For simplicity we assume that the variables
228: $L_k$ and $A_k$ take their values in countable sets, denoted by
229: ${\mathcal L}_k$ and ${\mathcal A}_k$. The total set of observations
230: are a sample of $n$ independent and identically distributed (i.i.d.)
231: observations from the distribution of the random vector $(\overline
232: L_K,\overline A_K,T)$.
233: 
234: As is clear from the preceding display we use the overline notation
235: $\overline{}$ to denote a ``cumulative vector''. For simplicity of
236: notation, it will be understood that whenever two expressions such as
237: $\overline l_k$ and $\overline l_{k-1}$ occur together, then
238: $\overline l_{k-1}$ is the initial part of $\overline l_k$.
239: 
240: A ``treatment regime'' is a prescription for the treatment dosages
241: fixed at the times $\tau_k$, where at each time instant the prescribed
242: treatment may depend on the observed covariate history until this
243: time. We make this precise in the following definition.
244: 
245: \begin{defn} \emph{(treatment regimes).}
246: A treatment regime $\GG$ is a vector  $\GG=(\GG_0,\ldots, \GG_K)$
247: of functions 
248: $\GG_k: {\mathcal L}_0\times\cdots\times{\mathcal L}_k\to {\mathcal A}_k$.
249: \end{defn}
250: 
251: The value $a_k=\GG_k(\overline l_k)$ of the $k$th coordinate 
252: of the treatment regime $\GG$ at covariate $\overline l_k$
253: is interpreted as the dosage
254: prescribed by treatment regime $\GG$ in the interval $(\tau_k,\tau_{k+1}]$
255: to a patient with covariate history $\overline l_k$ following 
256: this regime (up to time $\tau_k$). The treatment at time $\tau_k$
257: may depend on the full covariate history $\overline l_k=(l_0,\ldots,l_k)$
258: until time $\tau_k$, not just on $l_k$.
259: We define maps 
260: $\overline g_k: {\mathcal L}_0\times\cdots\times{\mathcal L}_k\to 
261: {\mathcal A}_0\times\cdots\times{\mathcal A}_k$ by
262: $$\overline g_k(\overline l_k) =\bigl(g_0(l_0),g_1(\overline
263: l_1),\ldots, g_k(\overline l_k)\bigr).$$ 
264: To alleviate notation we may drop the subscripts $k$ or the overline
265: in $g_k$ or $\overline g_k$ if the value of $k$ is clear from the
266: context.  In particular $g(\overline l_K)=\overline g(l_K)= \overline
267: g_K(\overline l_K)$ are equivalent notations for the complete
268: treatment history.
269: 
270: We wish to study the effect of treatment using the observed data.
271: Depending on this data not all treatment regimes may be accessible to
272: analysis. We call a treatment regime ``evaluable'' (relative to the
273: distribution of the data vector $(\overline L_K,\overline A_K,T)$) if
274: whenever the regime was followed until some time $\tau_k$ by some
275: positive fraction of the population, then it is also followed in the
276: interval $(\tau_k, \tau_{k+1}]$.
277: 
278: \begin{defn} \emph{(evaluable treatment regimes).}
279: A treatment regime $\GG$ is called evaluable if for each $k$ and
280: each $\overline{l}_k\in\overline{{\mathcal L}}_k$,
281: \begin{equation*}
282: P\left(\overline{L}_k=\overline{l}_k,\overline{A}_{k-1}=
283: \overline{\GG}\left(\overline{l}_{k-1}\right),T>\tau_k\right)>0 \Rightarrow
284: P\left(\overline{L}_k=\overline{l}_k,\overline{A}_k=
285: \overline{\GG}\left(\overline{l}_k\right),T>\tau_k\right)>0.
286: \end{equation*}
287: \end{defn}
288: 
289: Next we introduce \emph{counterfactual variables}. These will be
290: instrumental both to express the aims of the statistical analysis, and
291: to formulate our assumptions. In our mathematical model the
292: counterfactual variables are ordinary random variables $T^\GG$,
293: one for each treatment regime $\GG$, that are assumed to be defined on the
294: same probability space as the data vector $(\overline L_K,\overline
295: A_K,T)$.  The variable $T^{\GG}$ should be thought of as a patient's
296: time to clinical AIDS had she been treated according to treatment
297: regime $\GG$. Because in actual fact the patient receives treatment
298: $\bar A_K$ (resulting in time to aids $T$), the variable $T^\GG$ is ``counter
299: to fact''. However, it gives a useful notation to express the
300: distribution of interest, and will be related to the observable
301: variables by two assumptions.
302: 
303: Counterfactual variables referring to different subjects are
304: assumed independent (cf.\ Rubin~(1978)), and hence we can formulate
305: our set-up in terms of the set of random variables $(T^\GG, T, \overline
306: L_K,\overline A_K)$ referring to one person. We shall not be
307: interested in the joint distribution of counterfactual variables
308: corresponding to different treatment regimes.  We also do not need
309: counterfactual versions of the covariates or treatments.
310: 
311: We describe the aims of the statistical analysis in terms of the
312: counterfactual variables.  The \emph{G--null hypothesis} of no effect
313: of AZT on time-to-AIDS is the hypothesis that
314: \begin{equation*}
315: P\left(T^{g_1}>t\right)=P\left(T^{g_2}>t\right)\hspace{0.7cm}
316: {\rm for}\;{\rm all}\; {\rm treatment}\;{\rm regimes}\;g_1\;{\rm and}\;g_2.
317: \label{nte}
318: \end{equation*}
319: In Section~\ref{mlesec} we derive fully parametric
320: likelihood-based tests of this G--null hypothesis based on  a random
321: sample from the distribution of the
322: observables $\left(\overline{L}_K,\overline{A}_K,T\right)$,
323: and a parametric model for their joint
324: distribution. In Section~\ref{Gest} we develop an alternative,
325: semi-parametric procedure with the same aim.
326: 
327: If the G--null hypothesis is rejected, then the next goal is to
328: identify and estimate, for each treatment regime $g$, the survival
329: curve $t\mapsto P\left(T^\GG>t\right)$, i.e.\ the survival curve that
330: would have been observed had a subject followed regime
331: $g$. Specifically, if our infected subject outside of the study
332: mentioned in the introduction wishes to maximize his expected years of
333: AIDS-free survival, he would follow the regime $g$ that maximized $E
334: T^\GG=\int_0^\infty P\left(T^\GG>t\right)\, dt$. Inference regarding
335: the distribution of counterfactual variables is referred to as
336: \emph{causal inference}, as the outcomes $T^\GG$ are interpreted
337: as being the effect of the treatment regime $\GG$.
338: 
339: Clearly it is impossible to make inference about the
340: counterfactual survival distributions $P(T^\GG>t)$ based on the
341: observed data unless the variables $T^\GG$ and 
342: $(\overline L_K,\overline A_K,T)$
343: are related.  The assumed coupling of these variables
344: on a given underlying probability space allows to make
345: the following assumptions relating counterfactual and factual
346: variables.
347: 
348: \begin{assu}\emph{(consistency).} \label{cons}
349: For any treatment
350: regime $\GG$, $\overline{l}_k\in\overline{{\mathcal L}}_k$ and
351: $t\in\left(\tau_k,\tau_{k+1}\right]$,
352: \begin{equation*}
353: \left\{T^{\GG}>t,\overline{L}_k=\overline{l}_k,\overline{A}_k=
354: \overline{\GG}\left(\overline{l}_k\right),T> \tau_k\right\} =
355: \left\{T>t,\overline{L}_k=\overline{l}_k,\overline{A}_k=
356: \overline{\GG}\left(\overline{l}_k\right),T> \tau_k\right\}.
357: \end{equation*}
358: \end{assu}
359: \begin{assu}\emph{(no unmeasured confounding).} \label{ass}
360: For any treatment regime $\GG$, for any time $\tau_k$ and for
361: any $\overline{l}_k\in \overline{{\mathcal L}}_k$,
362: \begin{equation*} A_k \cip T^{\GG} | \overline{L}_k=\overline{l}_k,
363: \overline{A}_{k-1} = \overline{\GG}\left(\overline{l}_{k-1}\right).
364: \end{equation*}
365: \end{assu}
366: 
367: Here the notation $X\cip Y|Z=z$, borrowed from Dawid~(1979),
368: means that the random variabless $X$ and $Y$ are conditionally
369: independent given the event $Z=z$.
370: 
371: The consistency assumption, Assumption~\ref{cons},  couples the true and
372: counterfactual survival times $T$ and $T^{\GG}$ by merely stating that
373: if until some time $\tau_k$ a patient is treated exactly as prescribed
374: by regime $g$, then she would die at some time in the interval
375: $(\tau_k,\tau_{k+1}]$ under regime $g$ if and only if she actually
376: died at the same time. This implies in particular that if all patients
377: were treated according to a predetermined treatment regime, then
378: counterfactual and actual survival times coincide. This is the customary
379: situation in clinical trials, but may fail to be the case
380: in an observational study.
381: 
382: The assumption of no unmeasured confounding, Assumption~\ref{ass}, can
383: be expected to hold if the observed covariate history $\overline L_K$
384: contains sufficient information, so that at each time $\tau_k$ the
385: treatment $A_k$ can be assumed to depend on the covariate history
386: $\overline L_k$ of a patient up till that time and no other relevant
387: information. The assumption would for instance hold if at each time
388: $\tau_k$ the treatment in the interval $(\tau_k,\tau_{k+1}]$ is
389: assigned through randomization within fixed levels of equal covariates
390: $\overline L_k$ and earlier treatments.
391: 
392: More specifically, in our AIDS example Assumption~\ref{ass} may be
393: expected to hold if the following information is recorded in
394: $\overline{L}_k$: all risk factors (i.e.\ predictors) of
395: regime-specific, counterfactual time-to-AIDS, other than prior
396: AZT-history $\overline{A}_{k-1}$, that are used by physicians and
397: patients to determine the dose $A_k$ of AZT in
398: $\left(\tau_k,\tau_{k+1}\right]$. Then, given $\overline{L}_k$ and
399: $\overline{A}_{k-1}=\GG\bigl(\overline{L}_{k-1}\bigr)$, the treatment
400: $A_k$ in the interval $(\tau_k,\tau_{k+1}]$ may
401: be thought of as depending only on external factors unrelated to
402: the patient's prognosis regarding time-to-AIDS, and hence as being
403: independent of $T^\GG$. For example, since it is known that physicians
404: tend to prescribe AZT to subjects with low CD4-counts and a low
405: CD4-count is an independent predictor of time-to-AIDS, the assumption
406: of no unmeasured confounding would be false if $\overline{L}_k$ does
407: not contain CD4-count history. 
408: 
409: It is a basic objective of epidemiologists conducting an observational
410: study to collect data on a sufficient number of covariates to ensure
411: that Assumption~\ref{ass} will be true. In this paper, we assume this
412: objective has been realized, while recognizing that, in practice, this
413: may only approximately be the case.
414: 
415: \section{G--computation}\label{Gcomps}
416: We are interested in the distribution of the counterfactual, and hence
417: unobservable, variables $T^{\GG}$, as they indicate the success or
418: failure from applying the treatment regime $\GG$.  In this section we
419: show that, under Assumptions~\ref{cons} and~\ref{ass}, the
420: distribution of $T^\GG$ is identifiable from the distribution of the
421: observed data $\bigl(\overline{L}_K,\overline{A}_K,T\bigr)$ for each
422: evaluable treatment regime $\GG$.  As a consequence, given a random
423: sample from the latter distribution, the distribution of $T^\GG$ is
424: estimable, in principle.
425: 
426: In fact, the following \emph{G--computation formula} gives an explicit
427: expression for $P\left(T^{\GG}>t\right)$, as well as several
428: conditional survival functions, in terms of the distribution of the
429: data $\left(\overline{L}_K,\overline{A}_K,T\right)$.
430: 
431: \begin{thm}\label{Gcomp}
432: \emph{(G--computation-formula).}
433: Suppose that Assumptions~\ref{ass} (no
434: unmeasured confounding) and~\ref{cons} (consistency) hold, and that
435: $\GG$ is an evaluable treatment regime. Then for any $t>0$,
436: with $p$ defined by $\tau_p<t\le \tau_{p+1}$,
437: \begin{eqnarray}\label{Gf}
438: P\left(T^{\GG}>t\right) &=& \sum_{l_{0}} \cdots\sum_{l_{p-1}}\sum_{l_p}
439: \Bigg[P\Bigl(T>t| \overline{L}_{p}=\overline{l}_{p},
440: \overline{A}_{p}=\overline{\GG}\left(\overline{l}_{p}\right),
441: T>\tau_{p}\Bigr)
442: \nonumber\\
443: &&\hspace{2.0cm}\times\prod_{m=0}^{p}\Big\{
444: P\Bigl(T>\tau_m|\overline{L}_{m-1}=\overline{l}_{m-1},
445: \overline{A}_{m-1}=\overline{\GG}\left(\overline{l}_{m-1}\right),
446: T>\tau_{m-1}\Bigr)\nonumber\\
447: &&\hspace{2.2cm}
448: \times P\Bigl(L_m=l_m|
449: \overline{L}_{m-1}=\overline{l}_{m-1},
450: \overline{A}_{m-1}=\overline{\GG}\left(\overline{l}_{m-1}\right),
451: T>\tau_{m}\Bigr) \Big\}\Bigg].\nonumber
452: \end{eqnarray}
453: \end{thm}
454: 
455: In the preceding theorem we interpret variables indexed by $-1$ as not
456: present, and events concerning only such variables as being empty. For
457: instance, the conditional probability 
458: $P\bigl(L_m=l_m| \overline{L}_{m-1}=\overline{l}_{m-1},
459: \overline{A}_{m-1}=\overline{\GG}\left(\overline{l}_{m-1}\right),
460: T>\tau_{m}\bigr)$ is to be read as the probability $P(L_0=l_0)$ when
461: $m=0$.
462: 
463: All conditional probabilities on the right side concern
464: observable variables. Hence the theorem gives an explicit
465: description of the survival function of the counterfactual
466: variable $T^\GG$ in terms of the distribution of
467: the data $(\overline L_K, \overline A_K, T)$. 
468: 
469: It is instructive to evaluate the formula in the simple case that
470: $K=1$, when there exists only one treatment $A_0$ applied in the
471: single interval $(0,\tau_1]$. Then the G--computation formula yields,
472: for $t>0$,
473: $$P(T^\GG>t)=\sum_{l_0} P\bigl(T>t|L_0=l_0,A_0=g(l_0)\bigr)
474: \,P(L_0=l_0).$$
475: This shows that in general the distribution of the counterfactual
476: variable $T^\GG$ differs from the distribution of $T$, which can be
477: written in the form
478: $$P(T>t)=\sum_{l_0} P\bigl(T>t|L_0=l_0\bigr)\,P(L_0=l_0).$$ 
479: This difference is not too surprising, because the variable $T^\GG$
480: refers to the treatment regime $\GG$, whereas $T$ relates to the
481: observed outcomes under the actual treatments. Had all patients
482: received treatment $g$, then the two distributions would coincide.
483: More notable is the difference between the conditional distribution of
484: $T$ given $A_0=a_0$ and the distribution of $T^\GG$ for the fixed
485: treatment regime $\GG$ that assigns all patients to treatment $a_0$,
486: i.e.\ $g(l_0)=a_0$.  These two survival distributions can be written
487: \begin{eqnarray*}
488: P(T^{a_0}>t)&=&\sum_{l_0} P\bigl(T>t|L_0=l_0,A_0=a_0\bigr)\,P(L_0=l_0),\\
489: P\bigl(T>t| A_0=a_0\bigr)&=&\sum_{l_0} P\bigl(T>t|L_0=l_0,A_0=a_0\bigr)
490: \,P\bigl(L_0=l_0| A_0=a_0\bigr).
491: \end{eqnarray*}
492: The conditional distribution of $T$ given $A_0=a_0$ is estimable, in
493: principle, by taking only those patients into account who happened to
494: receive treatment $a_0$. The outcome distribution of this subset of
495: patients may however be different from the distribution of the
496: counterfactual variable $T^{a_0}$, as a result of ``selection bias''.
497: In the actual world some patients may be assigned other treatments
498: than $a_0$, where the assignment $A_0$ may correlate with the
499: covariate variable $L_0$. Therefore, the conditional and unconditional
500: distributions of $L_0$ given $A_0$ may differ, and consequently so may
501: the right hand sides of the display. It is the counterfactual survival
502: function $t\mapsto P(T^{a_0}>t)$ that is the relevant one to judge the
503: causal effect of treatment $a_0$. Randomization of treatment over
504: patients within fixed levels of the covariate would have made $L_0$
505: and $A_0$ independent, and the difference would disappear.  The
506: protocol of a controlled experiment may include such randomization,
507: but in a observational study it cannot be taken for granted. The
508: G--computation formula then shows, under some assumptions, how we can
509: still compute the relevant outcome distributions from the observed
510: data distribution.
511: 
512: We can make further comparisons after deriving
513: a similar representation for conditional probabilities involving
514: the counterfactual variables.
515: 
516: \begin{thm}\label{Gcompcond}
517: \emph{(G--computation-formula).}
518: Under the assumptions of Theorem~\ref{ass},
519: for any $k\in \{0,1,2,\ldots,K\}$ and any $\overline l_k$ such that
520: $P\left(\overline{L}_k=\overline{l}_k,
521: \overline{A}_{k-1}=\overline{\GG}\left(\overline{l}_{k-1}\right),
522: T>\tau_k\right)>0$,
523: for any $t>\tau_k$, and with $p\ge k$ defined by $\tau_p<t\le \tau_{p+1}$,
524: \begin{eqnarray}\label{Gfc}
525: \lefteqn{P\Bigl(T^{\GG}>t|\overline{L}_k=\overline{l}_k,
526: \overline{A}_{k-1}=\overline{\GG}\left(\overline{l}_{k-1}\right),
527: T>\tau_k\Bigr)}\nonumber\\ &=&\sum_{l_{k+1}} \cdots\sum_{l_{p-1}}\sum_{l_p}
528: \Bigg[P\Bigl(T>t|\overline{L}_{p}=\overline{l}_{p},
529: \overline{A}_{p}=\overline{\GG}\left(\overline{l}_{p}\right),
530: T>\tau_{p}\Bigr)\nonumber\\
531: &&\hspace{1.9cm}\times\prod_{m=k+1}^{p}\Big\{
532: P\Bigl(T>\tau_m|\overline{L}_{m-1}=\overline{l}_{m-1},
533: \overline{A}_{m-1}=\overline{\GG}\left(\overline{l}_{m-1}\right),
534: T>\tau_{m-1}\Bigr)\nonumber\\ &&\hspace{1.9cm}
535: \times P\Bigl(L_m=l_m|
536: \overline{L}_{m-1}=\overline{l}_{m-1},
537: \overline{A}_{m-1}=\overline{\GG}\left(\overline{l}_{m-1}\right),
538: T>\tau_{m}\Bigr) \Big\}\Bigg].
539: \end{eqnarray}
540: \end{thm}
541: 
542: Again variables indexed by $-1$ should be read as not being present.
543: Furthermore, a repeated summation of the form $\sum_{l_{k+1}}\cdots
544: \sum_{l_p}a_{k,p}(\overline l_{k},l_{k+1},\ldots,l_p)$ is considered
545: to be the single term $a_{k,k}(\overline l_k)$ if $k=p$, whereas the
546: product $\prod_{k+1}^p$ is to be read as 1 in this case. The summation
547: may be restricted to terms whose conditioning events have positive
548: probability.
549: 
550: Again we may evaluate this formula in the simple case 
551: of a single treatment interval. Then the formula in the
552: preceding theorem (with $k=0=p, K=1$) reduces to 
553: $$P\bigl(T^\GG>t| L_0=l_0\bigr) =
554: P\bigl(T>t|L_0=l_0,A_0=g(l_0)\bigr).$$ 
555: The right side is precisely the conditional distribution of the actual
556: survival time for a subject with covariate $l_0$ following the
557: treatment regime $g$. Intuitively, the conditional probabilities
558: $P\bigl(T>t|L_0=l_0,A_0=g(l_0)\bigr)$ are the correct ones for
559: evaluating the quality of treatment $g$ for a subject with covariate
560: value $l_0$, and the equality in the preceding display is actually a
561: direct consequence of the Assumptions~\ref{cons} and~\ref{ass}
562: relating the counterfactual and factual survival times. (We may add
563: $A_0=g(l_0)$ in the conditioning event on the left by
564: Assumption~\ref{ass}, and next use Assumption~\ref{cons} to see that
565: $T^\GG$ may be replaced by $T$.)
566: 
567: Henceforth, we shall denote the right side of (\ref{Gfc}) by
568: $s_{\overline{l}_k,\GG}\left(t\right)$. For $k=-1$ this reduces to the
569: right side in Theorem~\ref{Gcomp}, and we write it as
570: $s_{\GG}(t)$, interpreting $\overline{l}_{-1}$ as empty.  
571: Then Theorems~\ref{Gcomp}-\ref{Gcompcond}
572: can be reformulated as saying that under
573: Assumptions~\ref{cons} (consistency) and~\ref{ass} (no unmeasured
574: confounding), for every evaluable treatment regime $\GG$,
575: \begin{equation*}P\left(T^{\GG}>t\right)=s_{\GG}(t)
576: \end{equation*}
577: and, for every $k=0,1,\ldots, K$,
578: \begin{equation*}
579: P\Bigl(T^{\GG}>t|\overline{L}_k=\overline{l}_k,
580: \overline{A}_{k-1}=\overline{\GG}\left(\overline{l}_{k-1}\right),
581: T>\tau_k\Bigr)=s_{\overline{l}_k,\GG}(t).
582: \end{equation*}
583: These functions are survival functions of distributions
584: that concentrate on $(\tau_k,\infty)$.
585: 
586: Inspection of the G--computation formula shows that
587: $s_{\overline{l}_k,\GG}$ is a (complicated) function of the
588: distribution of the data vector
589: $\left(\overline{L}_K,\overline{A}_K,T\right)$ and depends on this
590: distribution only through the conditional distributions of the
591: covariates and the survival time given the past, given by
592: \begin{equation}\label{plm}
593: P\left(L_m=l_m|\overline{L}_{m-1}=\overline{l}_{m-1},
594: \overline{A}_{m-1}=\overline{a}_{m-1},T>\tau_m\right),
595: \end{equation}
596: and
597: \begin{equation}\label{ptm}
598: P\left(T>t|\overline{L}_{m-1}=\overline{l}_{m-1},
599: \overline{A}_{m-1}=\overline{a}_{m-1},
600: T>\tau_{m-1}\right).
601: \end{equation}
602: In particular, the functions $s_{g,\overline{l}_k}$ do not depend on
603: conditional laws of the treatment variables $A_m$ given the past.
604: 
605: \proofof{Theorems~\ref{Gcomp} and~\ref{Gcompcond}} 
606: We prove Theorems~\ref{Gcomp} and~\ref{Gcompcond} by backward
607: induction on $k$, for fixed $t$ (and hence also fixed $p$).  Formula
608: (\ref{Gfc}) with $k=-1$ can be read as the formula given by
609: Theorem~\ref{Gcomp}, so we restrict to proving (\ref{Gfc}).
610: 
611: For $k=p$ the left side of (\ref{Gfc}) is equal to
612: \begin{eqnarray*}
613: \lefteqn{P\left(T^{\GG}>t|\overline{L}_p=\overline{l}_p, \overline{A}_{p-1}
614: =\overline{\GG}\left(\overline{l}_{p-1}\right),T>\tau_p\right)}\\
615: &&\hspace{2.5cm}=
616: P\left(T^{\GG}>t|\overline{L}_p=\overline{l}_p, \overline{A}_{p}
617: =\overline{\GG}\left(\overline{l}_{p}\right),T>\tau_p\right)\\
618: &&\hspace{2.5cm}=
619: P\left(T>t|\overline{L}_p=\overline{l}_p, \overline{A}_{p}
620: =\overline{\GG}\left(\overline{l}_{p}\right),T>\tau_p\right),
621: \end{eqnarray*}
622: where in the first equality we can add
623: $A_p=\GG_p\left(\overline{l}_p\right)$ in
624: the conditioning event by Assumption~\ref{ass} of no unmeasured confounding,
625: and in the second equality we can replace the event $T^{\GG}>t$
626: by the event $T>t$, because of the Assumption~\ref{cons}
627: of consistency.
628: 
629: The induction step is proved by similar arguments. Supposing
630: that (\ref{Gfc}) holds for $k\le p$,
631: we shall deduce that it also holds for $k-1$. We have
632: \begin{eqnarray*}
633: \lefteqn{P\left(T^{\GG}>t|\overline{L}_{k-1}=\overline{l}_{k-1},
634: \overline{A}_{k-2}=\overline{\GG}\left(\overline{l}_{k-2}\right),
635: T>\tau_{k-1}\right)}\\
636: &=&
637: P\left(T^{\GG}>t|\overline{L}_{k-1}=\overline{l}_{k-1},\overline{A}_{k-1}=
638: \overline{\GG}\left(\overline{l}_{k-1}\right),T>\tau_{k-1}\right)\\
639: &=&
640: P\left(T^{\GG}>\tau_{k}|\overline{L}_{k-1}=\overline{l}_{k-1},
641: \overline{A}_{k-1}=\overline{\GG}\left(\overline{l}_{k-1}\right),
642: T>\tau_{k-1}\right)\\
643: &&\hspace{1.5cm} \times P\left(T^{\GG}>t|\overline{L}_{k-1}=\overline{l}_{k-1},
644: \overline{A}_{k-1}=\overline{\GG}\left(\overline{l}_{k-1}\right),
645: T>\tau_{k-1},T^{\GG}>\tau_{k}\right).
646: \end{eqnarray*}
647: The first equality follows by the assumption of no unmeasured
648: confounding, while the second follows by conditioning on the
649: event $T^{\GG}>\tau_{k}$, where we note that $t>\tau_{k}$, because
650: $t>\tau_p\ge\tau_{k}$.  By the consistency assumption we can replace
651: the event $T^{\GG}>\tau_{k}$ by the event $T>\tau_{k}$ without
652: changing the events or probabilities.  Next we can rewrite the second
653: probability as a sum by conditioning on the variable $L_{k}$, to
654: obtain that the preceding display is equal to
655: \begin{eqnarray*}
656: \lefteqn{\sum_{l_{k}}\Big[
657: P\left(T>\tau_{k}|\overline{L}_{k-1}=\overline{l}_{k-1},\overline{A}_{k-1}=
658: \overline{\GG}\left(\overline{l}_{k-1}\right),T>\tau_{k-1}\right)} \\
659: &&\hspace{1.5cm}
660: \times P\left(T^{\GG}>t|\overline{L}_{k}=\overline{l}_{k},
661: \overline{A}_{k-1}=\overline{\GG}\left(\overline{l}_{k-1}\right),
662: T>\tau_{k}\right)\Big]\\
663: &&\hspace{1.5cm}
664: \times P\left(L_{k}=l_{k}|\overline{L}_{k-1}=\overline{l}_{k-1},
665: \overline{A}_{k-1}=
666: \overline{\GG}\left(\overline{l}_{k-1}\right),T>\tau_{k}\right).
667: \end{eqnarray*}
668: Finally we replace the probability involving the counterfactual
669: variable $T^\GG$ by the right side of (\ref{Gfc}), which is 
670: permitted in view of the induction hypothesis.
671: This yields the right side of (\ref{Gfc}) for $k-1$, 
672: and concludes the induction step.
673: \Endproof
674: 
675: %\footnote{Proposal: we could put some interpretation and something
676: %like Example~2.6.1 from my thesis here, e.g.\ just copy Section~2.6 from
677: %thesis, or rather with a more realistic example. Try to find one in
678: %\citet{Rlat} or ask Jamie. Decided: no, may become too long; if they want
679: %it let them ask; otherwise risk it needs to get out again and waste of
680: %time}
681: 
682: \section{Reparameterization}
683: \label{repars}
684: To investigate the effect of a given treatment regime $\GG$ on
685: survival, it suffices to know the conditional distributions given in
686: (\ref{plm}) and (\ref{ptm}). Given these distributions we can compute
687: the counterfactual survival functions by using the G--computation
688: formula, given by Theorem~\ref{Gcomp}.
689: 
690: Because carrying out this computation may be a formidable task, we may
691: perform the calculation by simulation methods, rather than by
692: analytical calculation.  Robins~(1986, 1987, 1988) provides a Monte
693: Carlo algorithm, called the ``Monte Carlo G--computation algorithm'',
694: for evaluating the functions $s_{\GG}$ that satisfactorily resolves
695: potential difficulties with the analytical computation. We refer the
696: reader to these papers for further discussion.
697: 
698: A difficulty is that the distributions in (\ref{plm}) and
699: (\ref{ptm}) will typically be unknown and must be estimated from the
700: data. One possibility is to specify models for (\ref{plm}) and
701: (\ref{ptm}), for instance logistic or Cox models, and next estimate
702: the unknown parameters from the data. The function $s_\GG$ can then be
703: estimated using the Monte Carlo G--computation algorithm with model
704: derived estimates. Robins~(1986, 1987) provides several worked
705: examples of this approach.
706: 
707: This approach has a number of unattractive features. Estimation of the
708: function $s_\GG$ according to the preceding scheme and without
709: confidence intervals, may be feasible, but testing whether treatment
710: affects the outcome is complicated.  The models used to specify
711: $s_\GG$ will usually be rough approximations, and the null hypothesis
712: of no treatment effect will be a complex function of all parameters.
713: Standard statistical software may not apply, and in large datasets the
714: null hypothesis will usually be rejected, just because of model
715: misspecification (cf.\ Robins~(1986, 1987, 1988, 1989)). In this paper
716: we take a different approach, based on a reparameterization of the
717: joint distribution of the observations
718: $\left(\overline{L}_K,\overline{A}_K,T\right)$ using \emph{structural
719: nested failure time models (SNFTM)}.
720: 
721: SNFTMs are models for the causal effect of skipping a ``last''
722: treatment dose given the past, thus reverting to the ``baseline
723: treatment''.  To make this precise, suppose that there is a certain
724: baseline treatment regime, which we shall refer to as ``no
725: treatment''. This could for instance be ``zero medication'', and
726: consequently we shall let a zero in the sets $\overline{\mathcal A}_k$ of
727: treatment dosages refer to treatment under the baseline treatment
728: regime.
729: 
730: At any time point $\tau_k$ a doctor could switch a patient to the
731: baseline regime, at least conceptually, and leave her there. Let
732: $\left(\overline{a}_k,\overline{0}\right)$ be an abbreviation for the
733: treatment regime $\GG=\left(a_0,\ldots,a_k,0,\ldots,0\right)$, i.e.\
734: the $m$th coordinate function of $\GG$ is given by
735: \begin{equation*}
736: g_m\left(\overline{l}_m\right)= \left\{\begin{array}{ll} a_m & {\rm
737: for} \;{\rm any}\;{\rm value}\;{\rm of}\; {\rm the}\; {\rm
738: covariate}\;{\rm vector}\;\overline{l}_m \; {\rm if}\; m\leq k,\\
739: 0 & {\rm if}\; m>k. \end{array}\right.
740: \end{equation*}
741: Henceforth, we shall always assume that Assumptions~\ref{cons}
742: (consistency) and~\ref{ass} (no unmeasured confounding) are
743: satisfied. Then, by Theorem~\ref{Gcomp}, if the treatment regime
744: $(\overline a_k,\overline 0)$ is evaluable, the function 
745: $$t\mapsto s_{\overline{l}_k,\left(\overline{a}_k,\overline 0\right)}(t)$$
746: (by
747: definition the right side of (\ref{Gfc}) with $\GG=(\overline
748: a_k,\overline 0)$) is the conditional survival function of the
749: counterfactual survival time
750: $T^{\left(\overline{a}_k,\overline{0}\right)}$ given the treatment-
751: and covariate history $\left(\overline{l}_k,\overline{a}_{k-1}\right)$
752: up to time $\tau_k$, and given that
753: $T^{\left(\overline{a}_k,\overline{0}\right)}>\tau_k$.  Define
754: ``shift-functions'' $\gamma$ by
755: \begin{equation}\label{gd}
756: \gamma_{\overline{l}_k,\overline{a}_k}(t)=
757: s^{-1}_{\overline{l}_k,\left(\overline{a}_{k-1},\overline{0}\right)}
758: \circ s_{\overline{l}_k,\left(\overline{a}_{k},\overline{0}\right)} (t),
759: \end{equation}
760: where the inverse $s^{-1}$ is the quantile function of the corresponding
761: survival function.
762: 
763: The functions $\gamma$ map percentiles of the distribution of the
764: random variable $T^{\left(\overline{a}_k,\overline{0}\right)}$ into
765: those of the distribution of the random variable
766: $T^{\left(\overline{a}_{k-1},\overline{0}\right)}$,
767: \begin{equation}
768: \label{gdalt}
769: s_{\overline{l}_k,\left(\overline{a}_{k-1},\overline{0}\right)}
770: \circ \gamma_{\overline{l}_k,\overline{a}_k}
771: =s_{\overline{l}_k,\left(\overline{a}_{k},\overline{0}\right)}.
772: \end{equation}
773: The functions $\gamma$ thus measure the effect of skipping the ``last''
774: treatment dose $a_k$ given the covariate and treatment history
775: $(\overline l_k,\overline a_{k-1})$. We assume that the survival functions
776: are continuous and strictly decreasing, so that (\ref{gd}) and (\ref{gdalt})
777: give equivalent definitions.
778: 
779: If the ``last treatment'' $a_k$ has no effect, then the functions
780: $s_{\overline{l}_k,\left(\overline{a}_{k-1},\overline{0}\right)}$ and
781: $s_{\overline{l}_k,\left(\overline{a}_{k},\overline{0}\right)}$ are
782: identical, and the function $\gamma_{\overline{l}_k,\overline{a}_k}$
783: is the identity function. More generally, the function
784: $\gamma_{\overline{l}_k,\overline{a}_k}$ can be seen to measure the
785: effect of the treatment $a_k$ given in
786: $\left[\tau_k,\tau_{k+1}\right)$ on (counterfactual) survival.  This
787: is illustrated in Figure~\ref{gamf}.
788: 
789: \begin{figure}[htb!]
790: \begin{picture}(350,170)
791: \put(40,10){\line(1,0){250}}
792: \put(40,10){\line(0,1){160}}
793: \put(32,10){\makebox(0,0){$0$}}
794: \put(32,160){\makebox(0,0){$1$}}
795: \put(38,160){\line(1,0){4}}
796: \put(40,2){\makebox(0,0){$\tau_k$}}
797: \qbezier[4000](40,160)(75,30)(290,25)
798: \qbezier[4000](40,160)(115,35)(290,32)
799: \put(245,2){\makebox(0,0){$t$}}
800: \put(245,8){\line(0,1){4}}
801: \put(195,2){\makebox(0,0){$\gamma_{\overline{l}_k,\overline{a}_k}(t)$}}
802: \put(195,8){\line(0,1){4}}
803: \put(328,20){\makebox(0,0){$s_{\overline{l}_k,\left(\overline{a}_{k-1},
804: \overline{0}\right)}$}}
805: \put(323,37){\makebox(0,0){$s_{\overline{l}_k,\left(\overline{a}_k,
806: \overline{0}\right)}$}}
807: \put(245,15){\line(0,1){4}}
808: \put(245,23){\line(0,1){4}}
809: \put(245,31){\line(0,1){4}}
810: %\put(255,10){\line(0,1){125}}
811: %\qbezier[42](255,10)(255,72)(255,135)
812: %\put(189,10){\line(0,1){125}}
813: %\qbezier[42](189,10)(189,72)(189,135)
814: \put(195,15){\line(0,1){4}}
815: \put(195,23){\line(0,1){4}}
816: \put(195,31){\line(0,1){4}}
817: %\put(38,35){\line(1,0){4}}
818: %\put(26,35){\makebox(0,0){$0.17$}}
819: %\put(40,135){\line(1,0){215}}
820: \put(40,35){\line(1,0){5}}
821: \put(49,35){\line(1,0){4}}
822: \put(57,35){\line(1,0){4}}
823: \put(65,35){\line(1,0){4}}
824: \put(73,35){\line(1,0){4}}
825: \put(81,35){\line(1,0){4}}
826: \put(89,35){\line(1,0){4}}
827: \put(97,35){\line(1,0){4}}
828: \put(105,35){\line(1,0){4}}
829: \put(113,35){\line(1,0){4}}
830: \put(121,35){\line(1,0){4}}
831: \put(129,35){\line(1,0){4}}
832: \put(137,35){\line(1,0){4}}
833: \put(145,35){\line(1,0){4}}
834: \put(153,35){\line(1,0){4}}
835: \put(161,35){\line(1,0){4}}
836: \put(169,35){\line(1,0){4}}
837: \put(177,35){\line(1,0){4}}
838: \put(185,35){\line(1,0){4}}
839: \put(193,35){\line(1,0){4}}
840: \put(201,35){\line(1,0){4}}
841: \put(209,35){\line(1,0){4}}
842: \put(217,35){\line(1,0){4}}
843: \put(225,35){\line(1,0){4}}
844: \put(233,35){\line(1,0){4}}
845: \put(241,35){\line(1,0){4}}
846: %\qbezier[72](20,135)(128,135)(235,135)
847: \end{picture}
848: \caption{Illustration of the shift-function $\gamma$.
849: In this picture the 
850: function $s_{\overline{l}_k,\left(\overline{a}_{k-1},\overline{0}\right)}$
851: lies to the left of the function 
852: $s_{\overline{l}_k,\left(\overline{a}_{k},\overline{0}\right)}$, indicating
853: that skipping the treatment $a_k$ decreases survival for patients
854: with covariate and treatment history $(\overline{l}_k,\overline{a}_{k-1})$.
855: In this case the 
856: function $\gamma_{\overline{l}_k,\overline{a}_k}$ is below the identity.}
857: \label{gamf}
858: \end{figure}
859: 
860: Conversely, if the shift function
861: $\gamma_{\overline{l}_k,\overline{a}_k}$ is equal to the identity
862: function, then the distribution of the counterfactual variables
863: $T^{\left(\overline{a}_k,\overline{0}\right)}$ and
864: $T^{\left(\overline{a}_{k-1},\overline{0}\right)}$ coincide for
865: patients with past covariate- and treatment history $\overline{l}_k$
866: and $\overline{a}_{k-1}$.  This suggests that, if
867: $\gamma_{\overline{l}_k,\overline{a}_k}$ is the identity function for
868: all values of $\overline{l}_k$, $\overline{a}_k$ and $k$, then
869: treatment does not affect the outcome of interest: skipping the last
870: treatment does not affect the outcome of interest, next skipping the
871: second-last treatment does not affect the outcome of interest,
872: etcetera.
873: 
874: For a rigorous proof of this conclusion it is necessary that
875: sufficiently many treatment regimes are evaluable, because the
876: functions $s_{\overline l_k,g}$ (defined in terms of the distribution
877: of the observable data by the right side of (\ref{Gfc})) are equal to
878: the counterfactual survival distributions only if the treatment regime
879: $g$ is evaluable.  For instance, the treatment regime
880: $g=\left(\overline{a}_k,\overline{0}\right)$ need not be evaluable for
881: all $\overline{a}_k$ and hence the distributions of the counterfactual
882: variables $T^{\left(\overline{a}_k,\overline{0}\right)}$ and/or
883: $T^{\left(\overline{a}_{k-1},\overline{0}\right)}$ may not be
884: identifiable from the observed data.  To overcome this difficulty we
885: assume that the baseline treatment regime $\overline{0}$ is
886: ``admissible''. A treatment regime is called ``admissible'' if in
887: \emph{every} situation there is a positive probability for this regime
888: to be implemented in the next step.  As applied to the baseline regime
889: $\overline 0$, this property takes the form of the following
890: assumption.
891: 
892: \begin{assu}\label{base} \emph{(admissible baseline treatment regime).}
893: For each $k$, each
894: $\overline{l}_k\in\overline{{\cal L}}_k$ and each
895: $\overline{a}_{k-1}\in\overline{{\cal A}}_{k-1}$,
896: \begin{equation*} P\left(\overline{L}_k=\overline{l}_k,
897: \overline{A}_{k-1}=\overline{a}_{k-1}, T>\tau_k\right)>0 \Rightarrow
898: P\left(\overline{L}_k=\overline{l}_k,\overline{A}_{k-1}=\overline{a}_{k-1},
899: A_k=0, T>\tau_k\right)>0.
900: \end{equation*}
901: \end{assu}
902: 
903: Under this assumption the shift functions
904: $\gamma_{\overline{l}_k,\overline{a}_k}$ are identifiable for all
905: values of $(k,\overline{l}_k, \overline{a}_k)$ with
906: $P\left(\overline{L}_k=\overline{l}_k,\overline{A}_k=
907: \overline{a}_k,T>\tau_k\right)>0$, and fully characterize the
908: potential effect of any treatment regime.  This is the content of the
909: following theorem, whose proof is deferred to Appendix~\ref{appid}.
910: (As shown in Lok~(2001, Section 2.12), Assumption~\ref{base} can
911: be avoided if one allows $\overline{0}$ to be a so-called
912: admissible baseline course of treatment, which may not only depend on
913: past covariate- but also on past treatment history. Some
914: admissible baseline course of treatment, which has a positive
915: probability of occurring after any observed treatment- and covariate
916: history, always exists.)
917: 
918: \begin{thm}\label{geq}
919: Under Assumptions~\ref{ass} (no unmeasured confounding), \ref{cons}
920: (consistency) and \ref{base} (admissible baseline treatment regime),
921: the distribution of $T^\GG$ is the same under all evaluable treatment
922: regimes $\GG$ if and only if the shift-function
923: $\gamma_{\overline{l}_k,\overline{a}_k}$ is the identity for all $(k,
924: \overline{l}_k, \overline{a}_k)$ with
925: $P\left(\overline{L}_k=\overline{l}_k,\overline{A}_k=\overline{a}_k,
926: T>\tau_k\right)>0$.
927: \end{thm}
928: 
929: It follows that the functions $\gamma_{\overline{l}_k,\overline{a}_k}$
930: characterize the null hypothesis of no treatment effect.
931: Because they also possess an easy interpretation in terms of the
932: effect of a ``last blip'' of treatment, it is attractive 
933: to model these functions rather than the set of conditional
934: distributions in (\ref{plm}) and (\ref{ptm}). 
935: A \emph{structural nested failure time model} is a parametrized
936: family of functions used to  model the functions
937: $\gamma_{\overline{l}_k,\overline{a}_k}$. Each of the model
938: functions is an increasing function on $[\tau_k,\infty)$ (that
939: can arise as a quantile-distribution function), with the identity
940: function referring to the absence of the treatment effect.
941: 
942: With the parameter denoted by $\psi=(\psi_1,\psi_2,\psi_3)$,
943: one example of an SNFTM would be 
944: \begin{equation*}
945: \gamma^\psi_{\overline{l}_k,\overline{a}_k}\left(t\right)=\tau_k
946: +\left(\min\left\{\tau_{k+1},t\right\}-\tau_k\right)
947: e^{\psi_1 a_k +\psi_2 a_k a_{k-1} +\psi_3 a_k l_k}+
948: \left(t-\tau_{k+1}\right)1_{\left\{t>\tau_{k+1}\right\}}.
949: \end{equation*}
950: If $\psi=0$, then this function reduces to the identity function,
951: indicating that the parameter value $\psi=0$ corresponds to
952: the absence of a treatment effect. For nonzero values of $\psi$
953: the model corresponds to a 
954: ``change of time scale''  depending on present and
955: past treatment $\left(a_k,a_{k-1}\right)$ and present covariate
956: ($l_k$). The variable $L_k$ might for instance be
957: the univariate covariate CD4 lymphocyte count at $\tau_k$, and
958: the variable $A_k$ the AZT prescription. Then the given model
959: allows for interaction between CD4 lymphocyte count and
960: treatment, and could of course be extended with other factors.
961: Figure~\ref{figureSNFTM} shows two typical functions
962: $\gamma$ following this model.
963: 
964: \begin{figure}
965: \qquad\resizebox{4in}{!}{\includegraphics{SNFTM.eps}}
966: \caption{Examples of shift functions. The picture shows the identity
967: function (dashed) and the function 
968: $t\mapsto \tau_k
969: +\left(\min\left\{\tau_{k+1},t\right\}-\tau_k\right)
970: 0.5+\left(t-\tau_{k+1}\right)1_{\left\{t>\tau_{k+1}\right\}}$
971: for $\tau_k=1<\tau_{k+1}=2$, which corresponds to decreasing
972: survival by skipping the treatment in the interval $(\tau_k, \tau_{k+1}]$.}
973: \label{figureSNFTM}
974: \end{figure}
975: 
976: \section{Mimicking counterfactual outcomes}
977: \label{mimsec}
978: In the next two sections we present two methods for estimating the
979: parameter $\psi$ in a structural nested failure time
980: model. Theorem~\ref{Tog} below is basic for both methods. Consider the
981: following transformation of the observation
982: $\left(\overline{L}_K,\overline{A}_K,T\right)$, using the ``true'' shift
983: functions $\gamma$ (given by (\ref{gd})):
984: \begin{equation} \label{Togd}
985: T_0^\gamma=\gamma_{\overline{L}_0,\overline{A}_0}
986: \circ \gamma_{\overline{L}_1,\overline{A}_1}
987: \circ \cdots
988: \circ \gamma_{\overline{L}_{p\left(T\right)},
989: \overline{A}_{p\left(T\right)}}\left(T\right),
990: \end{equation}
991: where $p(T)=\max\left\{k:\tau_k<T\right\}$.
992: %By its definition
993: The application of the function
994: $\gamma_{\overline{L}_{p\left(T\right)},\overline{A}_{p\left(T\right)}}$
995: to $T$ annihilates the effect of the last treatment
996: $A_{p\left(T\right)}$, and each further application of a shift
997: function annihilates the effect of an earlier treatment.  This
998: explains the following theorem, which is proved in
999: Appendix~\ref{appTog}.
1000: 
1001: \begin{thm} \label{Tog}\emph{(mimicking counterfactual outcomes).}
1002: %Under Assumptions~\ref{ass} (no unmeasured confounding), \ref{cons}
1003: %(consistency) and~\ref{base} (admissible baseline treatment regime),
1004: %we have that
1005: The variable $T_0^\gamma$ defined in (\ref{Togd})
1006: possesses survival function $s_{\overline{0}}$.
1007: Furthermore, for every $k\geq0$,
1008: \begin{equation}\label{indep}
1009: A_k \cip T_0^\gamma |\overline{L}_k,\overline{A}_{k-1},T>\tau_k.
1010: \end{equation}
1011: \end{thm}
1012: 
1013: The variable $T_0^\gamma$ is a (deterministic) function of the data
1014: vector $\left(\overline{L}_K,\overline{A}_K,T\right)$, through the
1015: (unknown) family of shift-functions $\gamma$. If the shift functions
1016: $\gamma$ would be known, then we would be able to ``mimic'' the
1017: survival time without treatment by calculating the transformation
1018: $T_0^\gamma$. By the preceding theorem this variable is distributed
1019: according to $s_{\overline 0}$ and hence under the conditions of 
1020: Theorem~\ref{Gcomp} possesses the same distribution as $T^{\GG}$ for
1021: $\GG=\overline{0}$, the null treatment.
1022: 
1023: Equation (\ref{indep}) shows that the variable $T_0^\gamma$ also
1024: shares the ``no unmeasured confounding'' property
1025: (Assumption~\ref{ass}) of counterfactual variables (in a slightly
1026: stronger form).
1027: 
1028: \section{Maximum likelihood estimation}
1029: \label{mlesec}
1030: In this section we consider likelihood based inference for the
1031: parameter $\psi$ in a given SNFTM. Clearly this requires that we make
1032: the parameter $\psi$ visible in the density of the observation
1033: $\left(\overline{L}_K,\overline{A}_K,T\right)$. We first show that this
1034: can be achieved using the transformation $T_0^\gamma = T_0^\gamma
1035: \left(T,\overline{L}_K,\overline{A}_K\right)$ defined in
1036: (\ref{Togd}), which will depend on $\psi$ if we use a
1037: SNFTM for $\gamma$.
1038: 
1039: \begin{thm}\label{mleth}\emph{(the likelihood rewritten).}
1040: Suppose that Assumption~\ref{base} (admissible baseline treatment
1041: regime) holds.
1042: %\footnote{check must be more assumptions at least to be able to
1043: %apply Theorem~\ref{Tog}}
1044: Suppose moreover that $\left(T,\overline{L}_K,\overline{A}_K\right)$ has a
1045: Lebesgue density, and that the function $t\mapsto
1046: s_{\left(\overline{l}_k,\left(\overline{a}_k,\overline{0}\right)\right)}
1047: \left(t\right)$ is continuously differentiable in $t$, for all
1048: $\overline{l}_k$, $\overline{a}_k$ with
1049: $P\left(\overline{L}_k=\overline{l}_k,\overline{A}_k=\overline{a}_k,
1050: T>\tau_k \right)>0$, with strictly negative derivative except for at
1051: most finitely many points. Then the joint density of
1052: $\left(T,\overline{L},\overline{A}\right)$ can be rewritten as
1053: \begin{eqnarray*}
1054: \lefteqn{f_{T,\overline{L},\overline{A}}\left(t,\overline{l},
1055: \overline{a}\right)}\\
1056: &=&\frac{\partial}{\partial t} t_0^\gamma
1057: \left(t,\overline{l}_p,\overline{a}_p\right)
1058: f_{T_0^\gamma}\left(t_0^\gamma\left(t,\overline{l}_p,\overline{a}_p\right)
1059: \right)
1060: P\left(L_0=l_0|T_0^\gamma=t_0^\gamma\right)
1061: P\left(A_0=a_0|L_0=l_0\right)\nonumber\\
1062: &&\prod_{k=0}^p \Big\{P\left(L_k=l_k|\overline{L}_{k-1}=\overline{l}_{k-1},
1063: \overline{A}_{k-1}=\overline{a}_{k-1},T>\tau_k, T_0^\gamma=t_0^\gamma\right)
1064: \\&&\hspace{3cm}
1065: P\left(A_k=a_k|\overline{L}_{k}=\overline{l}_{k},
1066: \overline{A}_{k-1}=\overline{a}_{k-1},T>\tau_k\right)\Big\},
1067: \end{eqnarray*}
1068: where $\tau_p<t\le \tau_{p+1}$ and
1069: \begin{equation*}
1070: t_0^\gamma\left(t,\overline{l}_p,\overline{a}_p\right)=
1071: \gamma_{\overline{l}_0,\overline{a}_0}
1072: \circ \gamma_{\overline{l}_1,\overline{a}_1}
1073: \circ \cdots
1074: \circ \gamma_{\overline{l}_{p},
1075: \overline{a}_{p}}\left(t\right).
1076: \end{equation*}
1077: \end{thm}
1078: 
1079: \proof
1080: Under the conditions of Theorem~\ref{mleth},
1081: \begin{equation*}
1082: \left(T,\overline{L},\overline{A}\right)\mapsto
1083: \left(T_0^\gamma,\overline{L},\overline{A}\right)=
1084: \left(t_0^\gamma\left(T\right),\overline{L},\overline{A}\right)
1085: \end{equation*}
1086: is a one-to-one mapping. Thus if $t_0^\gamma$ were continuously
1087: differentiable everywhere, then the identity
1088: \begin{equation}\label{lik1}
1089: f_{T,\overline{L},\overline{A}}\left(t,\overline{l},\overline{a}\right)
1090: =\frac{\partial}{\partial t} t_0^\gamma
1091: \left(t,\overline{l},\overline{a}\right)
1092: f_{T_0^\gamma,\overline{L},\overline{A}}
1093: \left(t_0^\gamma\left(t,\overline{l},\overline{a}\right),\overline{l},
1094: \overline{a}\right)
1095: \end{equation}
1096: would be immediate from the change of variables formula.  We show that
1097: (\ref{lik1}) holds under the conditions of Theorem~\ref{mleth}
1098: too. Next the assertion of the theorem follows by repeated
1099: conditioning and using Theorem~\ref{Tog}.
1100: 
1101: To prove (\ref{lik1}) in general, note that the probability space
1102: consists of countably many sets of the form
1103: $\left(\overline{L}_K=\overline{l}_K,\overline{A}_K=\overline{a}_K\right)$, so
1104: that by countable additivity of measures it suffices to prove
1105: (\ref{lik1}) on each of these sets that has probability greater than
1106: $0$. On each of these sets, $t_0^\gamma$ is one-to-one and
1107: continuously differentiable except for at finitely many points: it is
1108: the composition of finitely many functions
1109: $\gamma_{\overline{l}_k,\overline{a}_k}$ and under the assumptions of
1110: Theorem~\ref{mleth},
1111: \begin{equation*}\gamma'_{\overline{l}_k,\overline{a}_k}\left(t\right)
1112: =\bigl(s^{-1}_{\overline{l}_k,\left(\overline{a}_{k-1},\overline{0}\right)}
1113: \circ 
1114: s_{\overline{l}_k,\left(\overline{a}_{k},\overline{0}\right)}\bigr)'(t)
1115: =\frac{1}{s'_{\overline{l}_k,\left(\overline{a}_{k-1},\overline{0}\right)}
1116: \bigl(\gamma_{\overline{l}_k,\overline{a}_k}\left(t\right)\bigr)}
1117: s'_{\overline{l}_k,\left(\overline{a}_{k},\overline{0}\right)}\left(t\right)
1118: \end{equation*}
1119: exists and is continuous except for at most finitely many $t$.
1120: %\footnote{$f\circ f^{-1}(x)=x$, so
1121: %$f'\left(f^{-1}(x)\right) \cdot f^{-1}'(x)=1$, so
1122: %$f^{-1}'(x)=1/f'\left(f^{-1}(x)\right)$.
1123: %$\gamma=f^{-1}\circ g$, so $\gamma'(x)=1/(f'(\gamma(x))) \cdot g'(x)$.
1124: Thus, from the change of variables formula, equation (\ref{lik1}) is
1125: true on each set
1126: $\left(\overline{L}_K=\overline{l}_K,\overline{A}_K=\overline{a}_K\right)$, as
1127: we needed to show. 
1128: \Endproof
1129: 
1130: 
1131: Regarding the conditions of Theorem~\ref{mleth} we note that the
1132: baseline treatment regime $\overline{0}$ may not be constant, whence
1133: the death rate under $\overline{0}$ may change at the time points
1134: $\tau_m$. However, it will often be reasonable to assume
1135: differentiability of the function
1136: $s_{\left(\overline{l}_k,\left(\overline{a}_k,\overline{0}\right)\right)}
1137: \left(t\right)$ on all intervals $\left(\tau_m,\tau_{m+1}\right)$.
1138: 
1139: For likelihood inference concerning the parameter $\psi$ of an SNFTM, we
1140: shall generally drop the factors
1141: \begin{equation}
1142: \label{condlawtreatment}
1143: P\left(A_k=a_k|\overline{L}_k=\overline{l}_k,
1144: \overline{A}_{k-1}=\overline{a}_{k-1}\right)
1145: \end{equation}
1146: from the likelihood. All other terms involve $\psi$ through
1147: $T_0^\gamma$ and we will need to specify models for these terms in
1148: order to proceed, typically involving additional parameters.  Given
1149: such models we can estimate $\psi$ by the corresponding coordinate of
1150: the maximum likelihood estimator obtained by maximizing the likelihood
1151: over all parameters. Of course finding this maximizer may be a
1152: formidable task.
1153: 
1154: Since the null hypothesis of no treatment effect is equivalent to the
1155: functions $\gamma_{\overline{l}_{k},\overline{a}_k}$ being equal to
1156: the identity function, by Theorem~\ref{geq}, this hypothesis can be
1157: fully expressed in the parameter $\psi$. For instance, we could, by
1158: convention, construct our SNFTM in such a way that this null
1159: hypothesis is equivalent to $H_0: \psi=0$. Then we can obtain a
1160: likelihood-based test for the null hypothesis of no treatment effect
1161: using the Wald, score or likelihood ratio test for $H_0: \psi=0$.
1162: 
1163: \section{G--estimation}
1164: \label{Gest}
1165: The likelihood methods of the preceding section require the 
1166: specification of models for the conditional laws of the covariates,
1167: among others, next to a specification of an SNFTM. In this section we
1168: present an alternative approach to testing and estimation of the
1169: parameter in a SNFTM, called \emph{G--estimation} in
1170: Robins~(1998). This approach is based on models for the conditional
1171: distributions of the treatment variables given in
1172: (\ref{condlawtreatment}). It can be considered a semiparametric
1173: approach, where the parametric component refers to the laws
1174: (\ref{condlawtreatment}) and all other laws appearing in
1175: Theorem~\ref{mleth} form the nonparametric, unspecified
1176: component. From a practical perspective modelling the distributions
1177: (\ref{condlawtreatment}) is more attractive than modelling the
1178: remaining laws in Theorem~\ref{mleth}, as it may be expected that
1179: doctors have clear ideas, at least qualitatively, about how they reach
1180: their decisions about treatment.
1181: 
1182: The method of G--estimation is based on the conditional independence
1183: of the ``blipped-up'' variable $T_0^\gamma$ defined in (\ref{Togd})
1184: and the treatment variable $A_k$ given the variables $\overline L_k$
1185: and $\overline A_{k-1}$, for each $k$, asserted by
1186: Theorem~\ref{Tog}. Consider first testing the null hypothesis $H_0:
1187: \gamma=\gamma_0$ for a given shift function $\gamma_0$. 
1188: Theorem~\ref{Tog} gives, under the null hypothesis, that, for each $k$,
1189: \begin{equation}\label{ind}
1190: A_k \cip T_0^{\gamma_0}\,|\,\overline{L}_k,\overline{A}_{k-1},T>\tau_k.
1191: \end{equation}
1192: This is an assertion about the observed data vector
1193: $(\overline L_K,\overline A_K, T)$ only. Any test for the validity
1194: of (\ref{ind}) is therefore a test for the null hypothesis 
1195: $H_0: \gamma=\gamma_0$.
1196: 
1197: In order to operationalize this idea we adopt for each $k$ a model
1198: \begin{equation*}
1199: P_\theta\left(A_k=a_k|\overline{L}_k=\overline{l}_k,
1200: \overline{A}_{k-1}=\overline{a}_{k-1},T>\tau_k\right)
1201: \end{equation*}
1202: for the prediction of treatment given the past, indexed by some
1203: parameter $\theta$. Such a model tries to explain the treatment $A_k$
1204: by the values of the covariates up to time $\tau_k$ and the preceding
1205: treatment history.  Formula (\ref{ind}) implies that, under the null
1206: hypothesis, inclusion of the variable $T_0^{\gamma_0}$ as an extra
1207: explanatory variable is useless for the prediction of $A_k$, if past
1208: covariate- and treatment information $\overline{L}_k$ and
1209: $\overline{A}_{k-1}$ are known. Thus given a term of the form
1210: $\alpha\, T_0^{\gamma_0}$ in the prediction model with $\alpha$ a
1211: parameter, the true value of $\alpha$ must be equal to $0$, because of
1212: (\ref{ind}). It follows that we can test the null hypothesis $H_0:
1213: \gamma=\gamma_0$ by adding a term $\alpha T_0^{\gamma_0}$ anyway, and
1214: next test the null hypothesis $H_0: \alpha=0$ in the model indexed by
1215: the overall parameter $(\theta,\alpha)$.  Depending on the chosen
1216: types of model such a test, for instance a Wald, score or the
1217: likelihood ratio test, can be performed by standard statistical
1218: software.
1219: 
1220: This procedure is particularly simple for testing the
1221: null hypothesis of no treatment effect. In view of 
1222: Theorem~\ref{geq}, this is equivalent to testing whether the function
1223: $\gamma$ is equal to the identity function, i.e.\ we take $\gamma_0$ in the
1224: preceding equal to the identity function. In this case
1225: the variable $T_0^{\gamma_0}$ is equal to $T$, and hence
1226: the G--estimation procedure reduces to testing the null hypothesis
1227: $H_0: \alpha=0$ in a regression model that tries to explain the variable
1228: $A_k$ by the variables $\overline L_k$, $\overline A_{k-1}$ and $\alpha T$. 
1229: The null hypothesis of no treatment effect can be tested
1230: in this way without specifying a model for the shift function $\gamma$.
1231: 
1232: For a specific example, suppose that the treatment variables $A_k$ are
1233: binary-valued.  Then a logistic regression model is a standard choice
1234: for modelling the probabilities (\ref{condlawtreatment}).  We might
1235: add the variable $\alpha T_0^{\gamma}$ to a logistic regression model
1236: to form the model
1237: \begin{equation*}
1238: P_{\theta,\alpha}\left(A_k=a_k|\overline{L}_k,\overline{A}_{k-1},T>\tau_k,T_0^{\gamma}\right)
1239: =\frac{1}{1+e^{\theta\cdot f_k(\overline{L}_k,\overline{A}_{k-1})
1240: +\alpha g_k(T_0^{\gamma})}},
1241: \end{equation*}
1242: for given, known functions $f_k$ and $g_k$, and unknown parameters
1243: $\theta$ and $\alpha$. A test for the null hypothesis $H_0: \alpha=0$
1244: can be carried out by standard software for logistic regression.
1245: 
1246: Given an SNFTM $\psi\mapsto \gamma_\psi$ for the shift functions
1247: $\gamma$, indexed by a parameter $\psi$, we can extend the preceding
1248: testing methods to full inference on the parameter $\psi$.  First, we
1249: can obtain confidence regions for $\psi$ by inverting the tests for
1250: the null hypotheses $H_0: \gamma=\gamma_{\psi}$ in the usual way: the
1251: value $\psi$ belongs to the confidence region if the corresponding
1252: null hypothesis $H_0$ is not rejected.
1253: 
1254: A natural estimator of $\psi$ would be the center of a confidence
1255: set, or, alternatively, a value of $\psi$ for
1256: which $T_0^{\gamma_\psi}$ contributes the
1257: least to the prediction model for treatment given the
1258: past. That is, the $\psi$ for which the fitted model for
1259: \begin{equation}\label{akpred}
1260: P_{\theta,\alpha}\left(A_k=a_k|\overline{L}_k,
1261: \overline{A}_{k-1},T>\tau_k,\alpha T_0^{\gamma_\psi}\right).
1262: \end{equation}
1263: does not include the variable $T_0^{\gamma_\psi}$, i.e.\ $\alpha=0$.
1264: For each given value of the parameter $\psi$ of the SNFTM we may
1265: obtain estimators $\hat{\theta}(\psi)$ and $\hat{\alpha}(\psi)$ for
1266: the parameters $\theta$ and $\alpha$, based on the observations
1267: $(\overline{L}_K^i,\overline{A}_K^i,T^i)$ on $n$ persons.  Then we
1268: define $\hat{\psi}$ as the solution of the equation
1269: \begin{equation*}
1270: \hat{\alpha}\left(\psi\right)=0.
1271: \end{equation*}
1272: If we use a logistic regression model, then the estimators
1273: $\hat\theta$  and $\hat\alpha$ can be obtained with
1274: standard software, for each given value of $\psi$.
1275: The estimator $\hat\psi$ can next be found by a grid search method.
1276: Alternatively, we can implement a direct numerical method for
1277: estimating $\psi$.
1278: 
1279: The procedures just outlined may appear a bit unusual, in view of
1280: their indirect nature. However, in most cases they can also be
1281: interpreted in a standard way. For instance, the procedure for
1282: estimating $\alpha$ for given $\psi$ will often be equivalent to
1283: solving $\hat{\alpha}=\hat{\alpha}\left(\psi\right)$ from an
1284: estimating equation of the type
1285: \begin{equation*}
1286: \sum_{i=1}^n h_{\alpha,\psi}\bigl(\overline{L}_K^i,\overline{A}_K^i,T^i\bigr)
1287: =0.
1288: \end{equation*}
1289: Then $\hat{\psi}$ satisfying
1290: $\hat{\alpha}\bigl(\hat{\psi}\bigr)=0$ will satisfy
1291: the estimating equation
1292: \begin{equation*}
1293: \sum_{i=1}^n h_{0,\hat{\psi}}\bigl(\overline{L}_K^i,\overline{A}_K^i,T^i\bigr)
1294: =0.
1295: \end{equation*}
1296: Because $\alpha(\psi_0)=0$ for the true value $\psi_0$ of $\psi$,
1297: the true value of $\psi$ is a solution to the equation
1298: \begin{equation*}
1299: E h_{0,\psi}\left(\overline{L}_K,\overline{A}_K,T\right)=0.
1300: \end{equation*}
1301: In other words, $\hat{\psi}$ will be the solution of an unbiased
1302: estimating equation, whence the (asymptotic) properties of $\hat\psi$
1303: can be ascertained with the usual theory for M-estimators
1304: (e.g.\ Van der Vaart~(1998)). For
1305: instance, we may expect the sequence
1306: $\sqrt{n}\bigl(\hat{\psi}-\psi\bigr)$ to be asymptotically (as
1307: $n\rightarrow\infty$) normal with mean zero and variance
1308: \begin{equation*}
1309: \frac{E h_{0,\psi}^2\left(\overline{L}_K,\overline{A}_K,T\right)}
1310: {\bigl(\frac{\partial}{\partial \psi} E 
1311: h_{0,\psi}\left(\overline{L}_K,\overline{A}_K,T\right)\bigr)^2}.
1312: \end{equation*}
1313: Lok~(1991) has studied the validity of these results, and has thus
1314: justified the preceding procedures.
1315: 
1316: \section{Summary and extensions}
1317: We have shown that the AZT treatment regime-specific, counterfactual
1318: AIDS-free survival curves $P\left(T^g>t\right)$ are identified for all
1319: evaluable treatment regimes $g$ if our maintained assumption of no
1320: unmeasured confounding, Assumption~\ref{ass}, is met. This assumption
1321: will hold if the investigator has succeeded in recording in
1322: $\overline{l}_k$ data on all covariates that, conditional on past AZT
1323: history $\overline{a}_{k-1}$, predict both the AZT dosage rate $a_k$
1324: in $\left(\tau_k,\tau_{k+1}\right]$ and the random variables $T^g$
1325: representing time to AIDS had, contrary to fact, all subjects followed
1326: an AZT treatment history consistent with regime $g$.
1327: 
1328: Further, we have shown that, under the assumption of no unmeasured
1329: confounding, Assumption~\ref{ass}, the shift functions $\gamma$ of an
1330: SNFTM are the identity function if and only if the G--null hypothesis
1331: of no causal effect of AZT on time to AIDS is true. We have expressed
1332: the likelihood of the observable random variables
1333: $\left(T,\overline{L}_K,\overline{A}_K\right)$ in terms of the transformed
1334: random variables $\left(T_0^\gamma,
1335: \overline{L}_K,\overline{A}_K\right)$. We then developed parametric
1336: likelihood based tests of the hypothesis $\gamma={\rm id}$ by specifying fully
1337: parametric models for the joint distribution of
1338: $\left(T,\overline{L}_K,\overline{A}_K\right)$ in terms of the
1339: transformed random variables $\left(T_0^\gamma,
1340: \overline{L}_K,\overline{A}_K\right)$.
1341: 
1342: Even in the absence of censoring or missing data, a major limitation
1343: of the fully parametric likelihood-based tests of the null hypothesis
1344: $\gamma={\rm id}$ from Section~\ref{mlesec} is that misspecification of the
1345: parametric models for the distribution of $L_k$ given
1346: $\overline{L}_{k-1}$, $\overline{A}_{k-1}$ and $T_0^\gamma$, or for
1347: the distribution of $T^{\overline{0}}$, can cause the true
1348: $\alpha$-level of the test to deviate from the nominal
1349: $\alpha$-level. This limitation raised the question of whether it is
1350: possible to construct $\alpha$-level tests of the null hypothesis
1351: $\gamma={\rm id}$ and of more general hypotheses concerning $\gamma$, which
1352: are asymptotically distribution-free. A closely related question is
1353: whether there exist $n^{1/2}$-consistent asymptotically normal
1354: estimators of the parameter $\psi$ of a correctly specified structural
1355: nested failure time model if the joint distribution of the observables
1356: $\left(\overline{L}_K,\overline{A}_K,T\right)$ is otherwise unspecified,
1357: i.e.\ if the distribution of $L_k$ given $\overline{L}_{k-1}$,
1358: $\overline{A}_{k-1}$ and $T_0^\gamma$ and the distribution of the variable
1359: $T^{\overline{0}}$ are left completely unspecified. In
1360: Section~\ref{Gest} we showed that one only needs to specify a parametric
1361: model for the shift function $\gamma$, which models the causal effect
1362: of one treatment dosage given the past, and a parametric model for the
1363: distribution of actual treatment dosage given past treatment- and
1364: covariate history. Doctors will usually have clear ideas about this
1365: latter distribution of treatment decisions. Moreover, the doctors'
1366: interest will often be in the causal effect of one treatment dosage
1367: given the past.
1368: 
1369: If the null hypothesis of no treatment effect has been rejected and
1370: the parameter $\psi$ of the shift function $\gamma$ has been
1371: estimated, one might wish to estimate the survival distribution
1372: $t\mapsto P\bigl(T^\GG>t\bigr)$ of the outcome under specific
1373: treatment regimes $\GG$ in a way consistent with the estimator
1374: $\hat{\psi}$. This can be done by estimating the distribution of
1375: $T^{\overline{0}}$ (e.g.\ by the empirical distribution of
1376: $T_0^{\gamma^\psi}$) and the empirical distribution of $L_k$ given
1377: $\overline{L}_{k-1}$, $\overline{A}_{k-1}$ and $T_0^\gamma$
1378: ($k=0,\ldots,K$) for histories $\overline{L}_{k-1}$,
1379: $\overline{A}_{k-1}$ consistent with $\GG$.  An approximate sample
1380: $\tilde{T}^\GG_i$ ($i=1,2,\ldots$) from the distribution of $T^\GG$
1381: could then be generated by using these estimated distributions: first
1382: draw $T'_0$ from the distribution of $T^{\overline{0}}$, then draw
1383: $L'_0$ from the distribution of $L_0$ given $T_0^\gamma=T'_0$, then
1384: put $A'_0=\GG\bigl(L'_0\bigr)$, then draw $L'_1$ from the distribution
1385: of $L_1$ given $T_0^\gamma=T'_0$, $A_0=A'_0$ and $L_0=L'_0$, etcetera.
1386: Finally put
1387: \begin{equation*}\tilde{T}^\GG=
1388: {\gamma^{\hat{\psi}}_{\overline{L}'_K,\overline{A}'_K}}^{-1}\circ\ldots\circ
1389: {\gamma^{\hat{\psi}}_{\overline{L}'_0,\overline{A}'_0}}^{-1}\bigl(T'_0\bigr).
1390: \end{equation*}
1391: This variable will be generated from the desired distribution.
1392: 
1393: Extensions of the results of this paper that allow for censoring and
1394: missing data are discussed in Robins~(1988, 1992, 1993, 1998), and Robins
1395: et al~(1992). The extension of G--tests and estimators to continuous
1396: $L_k$ and $A_k$ are discussed in Robins~(1992, 1993), Robins et
1397: al.~(1992), and Gill and Robins~(2001). Robins~(1998) and Lok~(2001)
1398: show that the results in this paper can be extended to allow for jumps
1399: in the treatment- and covariate processes in continuous time.
1400: 
1401: \begin{appendix}
1402: 
1403: \section{Alternative formulation of the null hypothesis}
1404: \label{appid}
1405: In this appendix we prove Theorem~\ref{geq} through two lemmas.  The
1406: first lemma shows that if all functions $\gamma$ are equal to the
1407: identity function, then all survival curves $P\left(T^g>t\right)$ for
1408: evaluable treatment regimes are the same. The second lemma shows the
1409: reverse.
1410: 
1411: \begin{lem}
1412: \label{alem1}
1413: Suppose that Assumptions~\ref{ass} (no unmeasured confounding),
1414: \ref{cons} (consistency) and~\ref{base} (admissible baseline treatment
1415: regime) hold. If $\gamma_{\overline{l}_k,\overline{a}_k}$ is the
1416: identity function for all $k$, $\overline{l}_k\in\overline{{\cal
1417: L}}_k$ and $\overline{a}_k\in\overline{A}_k$ with
1418: $P\left(\overline{L}_k=\overline{l}_k,\overline{A}_k=\overline{a}_k,
1419: T>\tau_k\right)>0$, then
1420: all survival curves $P\left(T^\GG>t\right)$ for evaluable treatment
1421: regimes $\GG$ are the same.
1422: \end{lem}
1423: 
1424: \proof
1425: We show that for all evaluable treatment regimes $\GG$ and
1426: all $\overline{l}_k$ with \linebreak
1427: $P\left(\overline{L}_k=\overline{l}_k,\overline{A}_k=
1428: \overline{\GG}\left(\overline{l}_k\right),T>\tau_k\right)>0$,
1429: the conditional distributions of the counterfactual variables
1430: $T^\GG$ and 
1431: $T^{\left(\overline{\GG}_{k-1}\left(\overline{l}_{k-1}\right),
1432: \overline{0}\right)}$
1433: given $\overline{L}_k=\overline{l}_k,\overline{A}_{k-1}
1434: =\overline{\GG}\left(\overline{l}_{k-1}\right),T>\tau_k$
1435: are the same, i.e., for $t\ge \tau_k$,
1436: \begin{equation}\label{kweg}
1437: s_{\overline{l}_k,\GG}(t)=
1438: s_{\overline{l}_k,\left(\overline{\GG}_{k-1}\left(\overline{l}_{k-1}\right),
1439: \overline{0}\right)}(t).
1440: \end{equation}
1441: For $k=-1$ this should be read as $s_\GG\left(t\right)=s_{\overline{0}}(t)$,
1442: which implies Lemma~\ref{alem1}.
1443: 
1444: We prove (\ref{kweg}) by backward induction on $k$, for $t$ fixed.
1445: With $\tau_p$ the last clinic visit time strictly before $t$, we start
1446: with $k=p$ and end with $k=0$. The statement for $k=-1$ follows from
1447: the statement for $k=0$ by summation over $l_0$.
1448: 
1449: Basis: For $k=p$, by the definition of $s$ as the right side
1450: of (\ref{Gfc}),
1451: $$s_{\overline{l}_p,\GG}(t)
1452: =P\left(T>t|\overline{L}_p=\overline{l}_p,\overline{A}_p=
1453: \overline{\GG}_p\left(\overline{l}_p\right),T>\tau_p\right)
1454: =s_{\overline{l}_p,\left(\overline{\GG}_p\left(\overline{l}_p\right),
1455: \overline{0}\right)}(t),$$
1456: by another application of the definition of $s$.
1457: The right side is equal to 
1458: $s_{\overline{l}_p,\left(\overline{\GG}_{p-1}\left(\overline{l}_{p-1}\right),
1459: \overline{0}\right)}(t)$ by the assumption that
1460: the function $\gamma_{\overline{l}_p,\overline{a}_p}$ with 
1461: $\overline a_p=\overline g_p(\overline l_p)$, is the identity function
1462: is the identity.
1463: 
1464: Induction step: we suppose that (\ref{kweg}) is true for $k\ge 1$ and
1465: establish (\ref{kweg}) for $k-1$. By straightforward algebra using the
1466: definition of $s_{\overline l_{k-1},g}$,
1467: \begin{eqnarray*} s_{\overline{l}_{k-1},\GG}\left(t\right)
1468: &=& P\left(T>\tau_{k}|\overline{L}_{k-1}=\overline{l}_{k-1},
1469: \overline{A}_{k-1}=\overline{\GG}\left(\overline{l}_{k-1}\right),
1470: T>\tau_{k-1}\right)\\
1471: &&\hspace{0.5cm}\sum_{l_{k}}
1472: P\left(L_{k}=l_{k}|\overline{L}_{k-1}=\overline{l}_{k-1},
1473: \overline{A}_{k-1}=\overline{\GG}\left(\overline{l}_{k-1}\right),
1474: T>\tau_{k}\right) s_{\overline{l}_{k},\GG}\left(t\right).
1475: \end{eqnarray*}
1476: Here we can replace $s_{\overline l_{k},g}$ using the induction
1477: hypothesis, giving that the preceding display is equal to
1478: \begin{eqnarray*}
1479: &&P\left(T>\tau_{k}|\overline{L}_{k-1}=\overline{l}_{k-1},
1480: \overline{A}_{k-1}=\overline{\GG}\left(\overline{l}_{k-1}\right),
1481: T>\tau_{k-1}\right)\\
1482: &&\hspace{0.5cm}
1483: \sum_{l_{k}}
1484: P\left(L_{k}=l_{k}|\overline{L}_{k-1}=\overline{l}_{k-1},
1485: \overline{A}_{k-1}=\overline{\GG}\left(\overline{l}_{k-1}\right),
1486: T>\tau_{k}\right)
1487: s_{\overline{l}_{k},\left(\overline{\GG}_{k-1}\left(\overline{l}_{k-1}\right),
1488: \overline{0}\right)}\left(t\right)\\
1489: &&\hspace{1cm}= s_{\overline{l}_{k-1},\left(\overline{\GG}_{k-1}\left(\overline{l}_{k-1}
1490: \right), \overline{0}\right)}\left(t\right)\\
1491: &&\hspace{1cm}= s_{\overline{l}_{k-1},\left(\overline{\GG}_{k-2}\left(\overline{l}_{k-2}
1492: \right),\overline{0}\right)}\left(t\right),
1493: \end{eqnarray*}
1494: where we use the definition of $s$ in the first equality, and the
1495: assumption that $\gamma_{\overline{l}_{k-1},\overline{a}_{k-1}}$,
1496: for  $\overline{a}_{k-1}=\overline{\GG}_{k-1}(\overline{l}_{k-1})$, is the
1497: identity function in the second.  
1498: \Endproof
1499: 
1500: \begin{lem}\label{alem2}
1501: Suppose that Assumptions~\ref{ass} (no unmeasured confounding),
1502: \ref{cons} (consistency) and~\ref{base} (admissible baseline treatment
1503: regime) hold. If the survival curves
1504: $P\left(T^\GG>t\right)$ are the same for all evaluable treatment regimes
1505: $\GG$, then the shift function
1506: $\gamma_{\overline{l}_k,\overline{a}_k}$ is the identity for all
1507: $k$, $\overline{l}_k\in\overline{{\cal L}}_k$ and
1508: $\overline{a}_k\in\overline{A}_k$ with
1509: $P\left(\overline{L}_k=\overline{l}_k,\overline{A}_k=\overline{a}_k,
1510: T>\tau_k\right)>0$.
1511: \end{lem}
1512: 
1513: \proof
1514: Let fixed $\overline{l}_k$, $\overline{a}_k$ with
1515: $P\left(\overline{L}_k=\overline{l}_k,\overline{A}_k=\overline{a}_k,
1516: T>\tau_k\right)>0$ be given. To prove that
1517: $\gamma_{\overline{l}_k,\overline{a}_k}$ is the identity we need to
1518: show that, for all $t>\tau_k$,
1519: \begin{equation}\label{akweg}
1520: s_{\overline{l}_k,\left(\overline{a}_k,\overline{0}\right)}(t)=
1521: s_{\overline{l}_k,\left(\overline{a}_{k-1},\overline{0}\right)}(t).
1522: \end{equation}
1523: Define a treatment regime $\GG^1$ by the coordinate functions
1524: $\GG_m^1\bigl(\overline{\tilde{l}}_m\bigr)=a_m$ if
1525: $\overline{\tilde{l}}_m$ is the initial part of $\overline{l}_k$,
1526: and by $\GG_m^1\bigl(\overline{\tilde{l}}_m\bigr)=0$ otherwise. Define
1527: a second treatment regime $\GG^2$ by 
1528: and $\GG^2=\bigl(\overline{\GG^1}_{k-1},\overline{0}\bigr)$.
1529: Because of Assumption~\ref{base} and because 
1530: $P\left(\overline{L}_k=\overline{l}_k,\overline{A}_k=\overline{a}_k,
1531: T>\tau_k\right)>0$, the treatment
1532: regimes $\GG^1$ and $\GG^2$ are evaluable.   Thus, by assumption,
1533: we have that $P\left(T^{\GG_1}>t\right)=P\left(T^{\GG_2}>t\right)$, and these
1534: probabilities are given by the G--computation formula,
1535: given in Theorem~\ref{Gcomp}. For the first regime this formula
1536: can be written in the form
1537: \begin{eqnarray*}
1538: \lefteqn{P\left(T^{\GG_1}>t\right)}\\
1539: &=&
1540: \sum_{\tilde{l}_0}\cdots
1541: \sum_{\tilde{l}_k} 1_{\overline{\tilde{l}}_k\neq\overline{l}_k}
1542: \prod_{m=0}^k\Big\{ P\bigl(T>\tau_m|\overline{L}_{m-1}=
1543: \overline{\tilde{l}}_{m-1},
1544: \overline{A}_{m-1}=\overline{\GG^1}\bigl(\overline{\tilde{l}}_{m-1}\bigr),
1545: T>\tau_{m-1}\bigr)\\
1546: &&\hspace{1.5cm}
1547: P\bigl(L_m=\tilde{l}_m| \overline{L}_{m-1}=\overline{\tilde{l}}_{m-1},
1548: \overline{A}_{m-1}=\overline{\GG^1}\bigl(\overline{\tilde{l}}_{m-1}\bigr),
1549: T>\tau_{m}\bigr)
1550: \Big\}s_{\overline{\tilde{l}}_k,\GG^1}(t)\\
1551: &&+\bigg[\prod_{m=0}^k
1552: \Big\{P\bigl(T>\tau_m|\overline{L}_{m-1}=\overline{l}_{m-1},
1553: \overline{A}_{m-1}=\overline{\GG^1}\left(\overline{l}_{m-1}\right),
1554: T>\tau_{m-1}\bigr)\\
1555: &&\hspace{1.5cm}
1556: P\bigl(L_m=l_m| \overline{L}_{m-1}=\overline{l}_{m-1},
1557: \overline{A}_{m-1}=\overline{\GG^1}\bigl(\overline{l}_{m-1}\bigr),
1558: T>\tau_{m}\bigr)\Big\}\bigg] s_{\overline{l}_k,\GG^1}(t).
1559: \end{eqnarray*}
1560: A similar expression holds for the treatment regime $\GG^2$.  Because
1561: the regimes $\GG^1$ and $\GG^2$ are constructed to be the same up to
1562: time $\tau_{k-1}$, only the second terms of the summs differs between these
1563: two expressions. Even there, the product preceding
1564: $s_{\overline{l}_k,\GG^1}(t)$ and $s_{\overline{l}_k,\GG^2}(t)$ is the
1565: same for $\GG^1$ and $\GG^2$. Moreover, this factor is strictly
1566: positive, since
1567: $P\left(\overline{L}_k=\overline{l}_k,\overline{A}_k=\overline{a}_k,
1568: T>\tau_k\right)>0$ by assumption. The equality of
1569: $P\left(T^{\GG_1}>t\right)$ and $P\left(T^{\GG_2}>t\right)$ therefore
1570: implies the equality of $s_{\overline{l}_k,\GG^1}(t)$ and
1571: $s_{\overline{l}_k,\GG^2}(t)$. By construction of $\GG^1$ and $\GG^2$,
1572: equation (\ref{akweg}) and hence Lemma~\ref{alem2} follow.  
1573: \Endproof
1574: 
1575: 
1576: \section{Mimicking counterfactual outcomes} \label{appTog}
1577: For $t>0$ define $p(t)$ by $\tau_{p(t)}<t\le \tau_{p(t)+1}$,
1578: i.e.\ $\tau_{p(t)}$ is the last clinic visit time strictly before $t$.
1579: For $k\geq 0$ with $k\le p(T)$ we define a random variable by 
1580: \begin{equation*}
1581: T_k^\gamma=\gamma_{\overline{L}_k,\overline{A}_k}\circ
1582: \cdots\circ\gamma_{\overline{L}_p(T),\overline{A}_{p(T)}}(T).
1583: \end{equation*}
1584: For $k>p(T)$ we interprete the (empty) composition of transformations
1585: on the right as the identity and define $T_k^\gamma=T$.
1586: 
1587: In this appendix we prove the following theorem, which generalizes the
1588: first part of Theorem~\ref{Tog}. This theorem implies the second part,
1589: since $T_0^\gamma$ is a function of
1590: $\bigl(\overline{L}_{k-1},\overline{A}_{k-1},T_k^\gamma\bigr)$.
1591: 
1592: \begin{thm}
1593: For $t>\tau_k$ and every $\overline l_k$, $\overline a_k$ with
1594: $P\left(\overline{L}_k=\overline{l}_k,\overline{A}_k=
1595: \overline{a}_k,T>\tau_k\right)>0$,
1596: \begin{eqnarray*}
1597: P\left(T^\gamma_k>t|\overline{L}_k=\overline{l}_k,\overline{A}_k=
1598: \overline{a}_k,T>\tau_k\right)
1599: &=&P\left(T^\gamma_k>t|\overline{L}_k=\overline{l}_k,
1600: \overline{A}_{k-1}=\overline{a}_{k-1},T>\tau_k\right)\\
1601: &=&s_{\overline{l}_k,\left(\overline{a}_{k-1},\overline{0}\right)}(t).
1602: \end{eqnarray*}
1603: \end{thm}
1604: 
1605: \proof
1606: We use backward induction on $k$, starting with $k=K$ and ending with $k=0$.
1607: For $k=K$,
1608: \begin{eqnarray*}
1609: P\bigl(T_K^\gamma>t|\overline{L}_K=\overline{l}_K,
1610: \overline{A}_K=\overline{a}_K,T>\tau_K\bigr)&=&
1611: P\bigl(\gamma_{\overline{l}_K,\overline{a}_K}\left(T\right)>t|
1612: \overline{L}_K=\overline{l}_K,\overline{A}_K=\overline{a}_K,T>\tau_K\bigr)\\
1613: &=&P\bigl(T>\gamma^{-1}_{\overline{l}_K,\overline{a}_K}(t)|
1614: \overline{L}_K=\overline{l}_K,\overline{A}_K=\overline{a}_K,T>\tau_K\bigr)\\
1615: &=&s_{\overline{l}_K,\left(\overline{a}_K,\overline{0}\right)}
1616: \bigl(\gamma^{-1}_{\overline{l}_K,\overline{a}_K}(t)\bigr)\\
1617: &=&s_{\overline{l}_K,\left(\overline{a}_{K-1},\overline{0}\right)}(t).
1618: \end{eqnarray*}
1619: Here the first equality is immediate from the definition of
1620: $T_K^\gamma$, the second follows by the strict monotonicity of the
1621: functions $\gamma$, the third by definition of $s$ and the last by
1622: definition of $\gamma$.
1623: 
1624: Induction step: we show that if the theorem is true for $k+1$, then it
1625: is also true for $k$. Just as for $k=K$,
1626: \begin{equation*}
1627: P\bigl(T_k^\gamma>t|\overline{L}_k=\overline{l}_k,
1628: \overline{A}_k=\overline{a}_k,T>\tau_k\bigr)
1629: =P\bigl(T_{k+1}^\gamma>\gamma^{-1}_{\overline{l}_k,\overline{a}_k}(t)|
1630: \overline{L}_k=\overline{l}_k,\overline{A}_k=\overline{a}_k,T>\tau_k\bigr).
1631: \end{equation*}
1632: Now we distinguish two possibilities:
1633: $\gamma^{-1}_{\overline{l}_k,\overline{a}_k}(t)\leq\tau_{k+1}$ and
1634: $\gamma^{-1}_{\overline{l}_k,\overline{a}_k}(t)>\tau_{k+1}$. In the
1635: first case, the right side of the preceding display is equal to 
1636: \begin{eqnarray*}
1637: &&P\bigl(T>\gamma^{-1}_{\overline{l}_k,\overline{a}_k}(t)|
1638: \overline{L}_k=\overline{l}_k,\overline{A}_k=\overline{a}_k,T>\tau_k\bigr)\\
1639: &&\hspace{1cm}=s_{\overline{l}_k,\left(\overline{a}_k,\overline{0}\right)}
1640: \bigl(\gamma^{-1}_{\overline{l}_k,\overline{a}_k}(t)\bigr)\\
1641: &&\hspace{1cm}=s_{\overline{l}_k,\left(\overline{a}_{k-1},\overline{0}\right)}(t),
1642: \end{eqnarray*}
1643: where the first equality holds because for
1644: $s\in\left(\tau_k,\tau_{k+1}\right]$ we have that
1645: $\left\{T_{k+1}^\gamma>s\right\}=\left\{T>s\right\}$ by the construction of
1646: $T_{k+1}^\gamma$, and the last equality holds by the definition of $\gamma$.
1647: In the second possibility, i.e.\ if
1648: $\gamma^{-1}_{\overline{l}_k,\overline{a}_k}(t)>\tau_{k+1}$,
1649: \begin{eqnarray*}
1650: \lefteqn{P\bigl(T_{k+1}^\gamma>\gamma^{-1}_{\overline{l}_k,\overline{a}_k}(t)|
1651: \overline{L}_k=\overline{l}_k,\overline{A}_k=\overline{a}_k,T>\tau_k\bigr)}\\
1652: &=&P\bigl(T_{k+1}^\gamma>\tau_{k+1}|\overline{L}_k=\overline{l}_k,
1653: \overline{A}_k=\overline{a}_k,T>\tau_k\bigr)\\
1654: &&\hspace{0.5cm} P\bigl(T_{k+1}^\gamma>\gamma^{-1}_{\overline{l}_k,
1655: \overline{a}_k}(t)|\overline{L}_k=\overline{l}_k,
1656: \overline{A}_k=\overline{a}_k,T>\tau_k,T_{k+1}^\gamma>\tau_{k+1}\bigr)\\
1657: &=&P\left(T>\tau_{k+1}|\overline{L}_k=\overline{l}_k,
1658: \overline{A}_k=\overline{a}_k,T>\tau_k\right)\\
1659: &&\hspace{0.5cm} \sum_{l_{k+1}}\Big\{P\left(L_{k=1}=l_{k+1}|
1660: \overline{L}_{k}=\overline{l}_{k},\overline{A}_k=\overline{a}_k,
1661: T>\tau_{k+1}\right)\\
1662: &&\hspace{0.5cm} \hspace{0.95cm}
1663: P\bigl(T_{k+1}^\gamma>\gamma^{-1}_{\overline{l}_k,\overline{a}_k}(t)|
1664: \overline{L}_{k+1}=\overline{l}_{k+1},\overline{A}_k=\overline{a}_k,
1665: T>\tau_{k+1}\bigr)\Big\}\\
1666: &=&P\left(T>\tau_{k+1}|\overline{L}_k=\overline{l}_k,
1667: \overline{A}_k=\overline{a}_k,T>\tau_k\right)\\
1668: &&\hspace{0.5cm} \sum_{l_{k+1}}\Big\{
1669: P\left(L_{k=1}=l_{k+1}|\overline{L}_{k}=\overline{l}_{k},
1670: \overline{A}_k=\overline{a}_k,T>\tau_{k+1}\right)
1671: s_{\overline{l}_{k+1},\left(\overline{a}_k,\overline{0}\right)}
1672: \bigl(\gamma^{-1}_{\overline{l}_k,\overline{a}_k}(t)\bigr)\Big\}\\
1673: &=&P\left(T>\tau_{k+1}|\overline{L}_k=\overline{l}_k,
1674: \overline{A}_k=\overline{a}_k,T>\tau_k\right)\\
1675: &&\hspace{0.5cm} \sum_{l_{k+1}}\Big\{
1676: P\left(L_{k=1}=l_{k+1}|\overline{L}_{k}=\overline{l}_{k},
1677: \overline{A}_k=\overline{a}_k,T>\tau_{k+1}\right)
1678: s_{\overline{l}_{k+1},\left(\overline{a}_{k-1},\overline{0}\right)}
1679: \left(t\right)\Big\}\\
1680: &=&s_{\overline{l}_{k},\left(\overline{a}_{k-1},\overline{0}\right)}(t),
1681: \end{eqnarray*}
1682: where in the first step we condition on $T_{k+1}^\gamma>\tau_{k+1}$,
1683: in the second we use that
1684: $\left\{T_{k+1}^\gamma>\tau_{k+1}\right\}=\left\{T>\tau_{k+1}\right\}$
1685: and we condition on $L_{k+1}$, the fourth is the induction step, the
1686: fifth follows from the definition of $\gamma$ and the last from
1687: the definition of
1688: $s_{\overline{l}_k,\left(\overline{a}_{k-1},\overline{0}\right)}$.
1689: \Endproof
1690: 
1691: 
1692: \end{appendix}
1693: 
1694: \bigskip\noindent
1695: {\bf Acknowledgement.} This paper is based on an earlier manuscript
1696: by the first author.
1697: 
1698: %\addcontentsline{toc}{chapter}{Bibliography}
1699: %\bibliographystyle{harry}
1700: %\bibliography{ref}
1701: 
1702: \begin{thebibliography}{13}
1703: \expandafter\ifx\csname natexlab\endcsname\relax\def\natexlab#1{#1}\fi
1704: 
1705: \bibitem[{Dawid(1979)}]{Dawid}
1706: Dawid, A.~P. (1979).
1707: \newblock {Conditional independence in statistical theory (with discussion)}.
1708: \newblock {\em Journal of the Royal Statistical Society\/} {\bf B 41}, 1--31.
1709: 
1710: \bibitem[{Gill and Robins(2001)}]{RJ}
1711: Gill, R.~D. and Robins, J.~M. (2001).
1712: \newblock {Causal inference for complex longitudinal data: the continuous
1713:   case}.
1714: \newblock {\em Annals of Statistics\/} {\bf 29}(6), 1785--1811.
1715: 
1716: \bibitem[{Lok(2001)}]{Lok}
1717: Lok, J.~J. (2001).
1718: \newblock {\em {Statistical modelling of causal effects in time}\/}.
1719: \newblock Ph.D. thesis, Division of Mathematics and Computer Science, 
1720: Vrije Universiteit Amsterdam.
1721: 
1722: \bibitem[{Robins(1986)}]{R86}
1723: Robins, J.~M. (1986).
1724: \newblock {A new approach to causal inference in mortality studies with a
1725:   sustained exposure period -- Applications to control of the healthy worker
1726:   survivor effect}.
1727: \newblock {\em Mathematical Modelling\/} {\bf 7}, 1393--1512.
1728: 
1729: \bibitem[{Robins(1987{\natexlab{a}})}]{R87a}
1730: Robins, J.~M. (1987{\natexlab{a}}).
1731: \newblock {A graphical approach to the identification and estimation of causal
1732:   parameters in mortality studies with sustained exposure periods}.
1733: \newblock {\em Journal of Chronic Disease\/} {\bf 40}(Suppl. 2), 139S--161S.
1734: 
1735: \bibitem[{Robins(1987{\natexlab{b}})}]{R87b}
1736: Robins, J.~M. (1987{\natexlab{b}}).
1737: \newblock {Addendum to ``A new approach to causal inference in mortality
1738:   studies with a sustained exposure period -- Application to control of the
1739:   healthy worker survivor effect}.
1740: \newblock {\em Computers and Mathematics with Applications\/} {\bf 14},
1741:   923--945.
1742: 
1743: \bibitem[{Robins(1988{\natexlab{a}})}]{R89}
1744: Robins, J.~M. (1988{\natexlab{a}}).
1745: \newblock {The analysis of randomized and nonrandomized AIDS treatment trials
1746:   using a new approach to causal inference in longitudinal studies}.
1747: \newblock In {\em Health service research methodology: a focus on AIDS\/}, pp.
1748:   113--159. NCHSR, U.S. Publc Health Service, Washington.
1749: 
1750: \bibitem[{Robins(1988{\natexlab{b}})}]{R88b}
1751: Robins, J.~M. (1988{\natexlab{b}}).
1752: \newblock {The control of confounding by intermediate variables}.
1753: \newblock {\em Statistics in Medicine\/} {\bf 8}, 679--701.
1754: 
1755: \bibitem[{Robins(1992)}]{R92}
1756: Robins, J.~M. (1992).
1757: \newblock {Estimation of the time-dependent accelerated failure time model in
1758:   the presence of confounding factors}.
1759: \newblock {\em Biometrika\/} {\bf 78}, 321--334.
1760: 
1761: \bibitem[{Robins(1993)}]{R93}
1762: Robins, J.~M. (1993).
1763: \newblock {Analysis methods for HIV treatment and cofactor effects}.
1764: \newblock In {D.G. Ostrow and R. Kessler}, ed., {\em Methodological issues of
1765:   AIDS behavioral research\/}, pp. 113--159. Plenum Press, New York.
1766: 
1767: \bibitem[{Robins(1998)}]{R98}
1768: Robins, J.~M. (1998).
1769: \newblock {Structural nested failure time models}.
1770: \newblock In {P.K. Andersen and N. Keiding}, ed., {\em Survival Analysis\/},
1771:  volume~6 of {\em Encyclopedia of Biostatistics\/}, pp. 4372--4389. 
1772: John Wiley and Sons, New York.
1773: 
1774: \bibitem[{Robins et~al.(1992)Robins, Blevins, Ritter and Wulfsohn}]{Aids}
1775: Robins, J.~M., Blevins, J.~M., Ritter, G. and Wulfsohn, M. (1992).
1776: \newblock {G-estimation of the effect of prophylaxis therapy for pneumocystis
1777:   carinii pneumonia on the survival of AIDS patients}.
1778: \newblock {\em Epidemiology\/} {\bf 3}, 319--336.
1779: 
1780: \bibitem[{Rubin(1978)}]{Rubin}
1781: Rubin, D.~B. (1978).
1782: \newblock {Bayesian inference for causal effects: the role of randomization}.
1783: \newblock {\em Annals of Statistics\/} {\bf 6}, 34--58.
1784: 
1785: \bibitem[{Vandervaart(1998)}]{Vaart}
1786: Van der Vaart, A.W. (1998).
1787: \newblock {\em Asymptotic Statistics}.
1788: \newblock {Cambridge University Press}.
1789: 
1790: \end{thebibliography}
1791: 
1792: \bigskip\noindent
1793: Corresponding author:\\
1794: {\sl Aad van der Vaart}\\
1795: {\sl Department of Mathematics}\\
1796: {\sl Faculty of Sciences}\\
1797: {\sl Vrije Universiteit}\\
1798: {\sl De Boelelaan 1081 a}\\
1799: {\sl 1081 HV Amsterdam}\\
1800: {\sl The Netherlands}\\
1801: 
1802: 
1803: 
1804: \end{document}
1805: Specifically, if $l_k$ and $a_k$
1806: are both discrete for all $k$, the class of G--null tests discussed in
1807: \citet{R86,R87a,R89} are asymptotically distribution-free tests of the
1808: null hypothesis $\psi_0=0$. The class of G--tests and estimators
1809: introduced in \citet{R89} are, respectively, asymptotically
1810: distribution-free tests of the null hypothesis $\psi_0=0$ and
1811: $n^{1/2}$-consistent semiparametric estimators of $\psi_0$.
1812: 
1813: 
1814: 
1815: The following corollary is obvious.
1816: 
1817: \begin{cor} Suppose that Assumptions~\ref{ass} (no
1818: unmeasured confounding) and~\ref{cons} (consistency) hold. Then the
1819: survival curves $P\left(T^{\GG}>t\right)$ are the same for all
1820: evaluable treatment regimes $\GG$ if and only if the functions
1821: $s_{\GG}$ do not depend on $\GG$.
1822: \end{cor}
1823: