0409:math0409165/typ.tex

1: \documentclass[12pt,notitlepage]{article}[1995/06/26]

2: %\usepackage{a4wide,epsfig,latexsym,amsfonts,amssymb,enumerate,amsmath}

3: \usepackage{a4wide,latexsym,amsfonts,amssymb,enumerate,amsmath}

4: \usepackage[dvips]{graphics}

5: %\usepackage[pdftex]{graphicx}

6: \usepackage{natbib}

7: \usepackage[english]{babel}

8:

9:

10: \newtheorem{defn}{Definition}[section]

11: \newtheorem{lem}[defn]{Lemma}

12: \newtheorem{thm}[defn]{Theorem}

13: \newtheorem{cor}[defn]{Corollary}

14: \newtheorem{assu}[defn]{Assumption}

15: %Dawid's conditional independence symbol.

16: \newcommand{\cip}{\mbox{$\perp\!\!\!\perp$}}

17: \newcommand{\proof}{\bigskip\noindent{\bf Proof.\enskip}}

18: \newcommand{\proofof}[1]{\bigskip\noindent{\bf Proof of #1.\enskip}}

19: \newcommand{\Endproof}{\ \hfill$\Box$\bigskip}

20: \newcommand{\GG}{g}

21:

22: \begin{document}

23:

24: \title{Estimating the causal effect of a time-varying treatment on

25: time-to-event using structural nested failure time models}

26:

27: \author{Judith Lok, Richard Gill, Aad van der Vaart and James Robins\\

28: University of Leiden, Utrecht University, Vrije Universiteit Amsterdam\\

29: and Harvard University}

30:

31: \date{July 2003}

32:

33: \maketitle

34:

35: \begin{abstract}

36: In this paper we review an approach to estimating the causal

37: effect of a time-varying treatment on time

38: to some event of interest. This approach is designed for

39: the situation where the treatment may have been repeatedly adapted to

40: patient characteristics, which themselves may

41: also be time-dependent. In this situation the effect of

42: the treatment cannot simply be estimated by conditioning on the patient

43: characteristics, as these may themselves be indicators of the

44: treatment effect. This so-called time-dependent confounding is typical

45: in observational studies. We discuss a new class of failure

46: time models, structural nested failure time models, which can be used

47: to estimate the causal effect of a time-varying treatment,

48: and present methods for estimating and testing the parameters of these models.

49: \end{abstract}

50:

51: \section{Introduction}

52: This paper offers a new approach to estimating, from observational

53: data, the causal effect of a time-dependent treatment on time to an

54: event of interest in the presence of time-dependent confounding

55: variables. This approach is based on a new class of failure time

56: models, the \emph{structural nested failure time models} (SNFTM). The

57: primary goal of this paper is to motivate the need for structural

58: nested failure time models. To achieve this goal in the most

59: straightforward manner, we shall assume that the event times are

60: observed without censoring, and that there is no missing or

61: misclassified data. Additional complications that arise when these

62: assumptions are not satisfied are discussed in Robins et al.~(1992)

63: and Robins~(1993).

64:

65: The approach using SNTFMs will be useful in any observational study in

66: which there exist time-dependent risk factors that are also predictive

67: for subsequent exposure to the treatment under study, i.e.\ in any

68: study where there are time-dependent covariates that correlate with

69: the final outcome of the treatment, but also with the amount or type

70: of treatment over time.  This situation arises in any observational

71: study in which there is ``treatment by indication'', i.e.\ the

72: treatment is not predetermined by the investigator, but adapted to the

73: current condition of the patient.  The problem then is to distinguish

74: between treatment effect and selection bias (i.e.\ confounding). For

75: example, in an observational study for the effect of AZT treatment on

76: HIV-infected subjects, subjects with low CD4 lymphocyte counts at a

77: given time are subsequently at increased risk of developing AIDS and

78: are for that reason more likely to be treated with AZT. Thus the

79: covariate variables ``low CD4-count'' is a risk factor for AIDS, but

80: is also a predictor of subsequent treatment with AZT. The problem is

81: then to isolate the effect of AZT treatment as given according to a

82: predetermined plan (which may take into account covariates) from the

83: confounding effect of CD4-count.  As a second example, many physicians

84: withdraw women from exogenous estrogens at the time they develop an

85: elevated blood cholesterol, since both exogenous estrogens and

86: elevated blood cholesterol are considered possible cardiac risk

87: factors. Therefore, in a study of the effect of postmenopausal

88: estrogen on cardiac mortality, the covariate variables ``cholesterol

89: level'' is a predictor of subsequent exposure to estrogens, but also

90: correlates with the outcome ``cardiac mortality''. As a third example,

91: in observational studies of the efficacy of cervical cancer screening

92: on mortality, women who have had operative removal of their cervix due

93: to invasive disease are no longer at risk for further screening (i.e.\

94: exposure), but are at increased risk for death. Therefore, the

95: covariate, ``operative removal of the cervix'', is an independent risk

96: factor for death, but also a predictor of subsequent exposure. As a

97: final epidemiologic example, in occupational mortality studies,

98: unhealthy workers who terminate employment early are at increased risk

99: of death compared to other workers and receive no further exposure to

100: the chemical agent under study. Therefore, the time-dependent

101: covariate ``employment status'' is an independent risk factor for

102: death, and a predictor of exposure to the study agent.

103:

104: Epidemiologists refer to the covariates in the preceding

105: examples as ``time-dependent confounders''. It may be

106: important to analyze the data from any of the above studies using the

107: approach presented in this paper.

108:

109: For pedagogic purposes, we shall illustrate our models and assumptions

110: throughout the paper by the problem of estimating, from data obtained

111: in an observational study, the effect of treatment with the drug AZT

112: on time to clinical AIDS in asymptomatic subjects with newly diagnosed

113: human immunodeficiency virus (HIV) infection. We shall suppose that

114: measurements on current AZT dosage as well as on various

115: time-dependent covariates, such as weight, temperature, hematocrit,

116: and CD4-lymphocyte count, are recorded at regularly spaced time points,

117: until the development of clinical AIDS. These time points, which we

118: denote by $0=\tau_0<\tau_1<\tau_2<\cdots<\cdots<\tau_K$, may for

119: instance correspond to clinic visits at which the measurements are

120: obtained, with time defined as time since the diagnosis of HIV

121: infection.

122:

123: Our goal will be to identify and estimate, for each \emph{treatment

124: regime}, the time-to-AIDS distribution that would have been observed

125: if (typically \emph{counter to fact}) each study subject had followed

126: the AZT treatment history prescribed by the regime. We shall call each

127: such distribution an AZT treatment regime-specific, counterfactual,

128: time-to-AIDS distribution. The treatment regimes we study need not be

129: static. A \emph{treatment regime} is a rule that assigns to each

130: possible covariate history through time $\tau_k$, an AZT dosage rate

131: $a_k$ to be taken in the interval $\left(\tau_k,\tau_{k+1}\right]$. A

132: simple example of a treatment regime is ``take an AZT dosage $a_k$ of

133: $1,000$ milligrams of AZT daily in the interval

134: $\left(\tau_k,\tau_{k+1}\right]$ if the hematocrit measured at

135: $\tau_k$ exceeds $30$; otherwise take no AZT in the interval''.

136:

137: Our interest in AZT treatment regime-specific, counterfactual

138: time-to-AIDS distributions is based on the following

139: considerations. Suppose, after the completion of the study, a further

140: individual with newly diagnosed HIV infection, whom we shall call

141: ``the infected subject'', wishes to use the data from the completed

142: study to select the AZT dosage schedule that will maximize his

143: expected or median number of years of AIDS-free survival. If the

144: ``infected subject'' is considered exchangeable with the subjects in

145: the trial, then he would wish to follow the AZT treatment regime whose

146: regime-specific, counterfactual time-to-AIDS distribution has the

147: largest expected or median value.

148:

149: In Section~\ref{Gcomps} we show that the AZT treatment

150: regime-specific, counterfactual time-to-AIDS distributions are

151: identified from the observed data under the assumption that the

152: investigator has succeeded in recording sufficient data on the history

153: of all covariates to ensure that, at each time $\tau_k$, given the

154: covariate history and the AZT treatment history up till $\tau_k$, the

155: AZT dosage rate in $\left(\tau_k,\tau_{k+1}\right]$ is independent of

156: the regime-specific, counterfactual time-to-AIDS. Robins~(1992)

157: refers to this assumption as the assumption of \emph{no unmeasured

158: confounding factors}. In other words, under this assumption at each

159: time point the treatment can be viewed as depending only on recorded

160: information up till that point and external factors that are not

161: predictive of (counterfactual) survival.

162:

163: In Section~\ref{repars} we introduce \emph{structural nested failure

164: time models (SNFTM)}. An SNFTM models the magnitude of the causal

165: effect of a (final) blip of AZT treatment in the interval

166: $\left(\tau_k,\tau_{k+1}\right]$ on time-to-AIDS, as a function of

167: past AZT and covariate history.  We show that, under the assumption of

168: no unmeasured confounding, the null hypothesis of no causal effect of

169: AZT on time-to-AIDS is equivalent to the null hypothesis that the

170: parameter vector of any SNFTM is $0$.

171:

172: The term ``structural'' in SNFTM derives terminology used

173: in the social science and econometric literature (e.g.\ Rubin~(1978)).

174: Our models are ``structural'', because they

175: directly model regime-specific, counterfactual time-to-AIDS

176: distributions. In Sections~\ref{mlesec} and~\ref{Gest} we discuss

177: two different methods to fit SNFTMs and to use them for inference.

178:

179: In Section~\ref{mlesec} we show that, under the assumption of no

180: unmeasured confounding, SNFTMs can be understood as a component of a

181: particular reparameterization of the joint distribution of the

182: observables. We use this reparameterization to develop

183: likelihood-based tests of the causal null hypothesis of no effect of

184: AZT-exposure on time-to-AIDS. We also show how to estimate the

185: AZT-treatment regime-specific, counterfactual time-to-AIDS

186: distributions, in the case that the null hypothesis of no causal

187: effect of AZT on time-to-AIDS is rejected.

188:

189: In Section~\ref{Gest} we present an alternative, semiparametric

190: approach to test the null hypothesis of no treatment effect and to

191: estimate the parameters in an SNFTM. This approach,

192: \emph{G--estimation}, has the advantage of avoiding for

193: parameterization of the distributions appearing in the

194: likelihood-based approach of Section~\ref{mlesec} (e.g. the

195: conditional distributions of covariates given past treatment- and

196: covariate history). Instead G--estimation uses a model

197: for the SNFTM and for the conditional distribution of treatment given

198: past treatment- and covariate history. Tests and estimators based on

199: G-estimation have the additional advantage that they can often be

200: calculated with standard software.

201:

202: \section{Formalization of the problem}

203: We fix a discrete time frame $\tau_0=0<\tau_1<\tau_2<\ldots<\tau_K$

204: throughout the paper, where $\tau_0$ is the time of enrollment in the

205: study (and possibly also initiation of treatment), $\tau_1,\tau_2,\ldots$ are

206: the times of the clinic visits, and $\tau_K$ can be the time of the

207: last clinic visit, or can be chosen past the upper support point of

208: the time-to-AIDS distribution. For simplicity the times of the clinic

209: visits are assumed to be the same for all patients (as long as they

210: are alive).

211:

212: At each time point $\tau_k$ we measure a covariate vector $L_k$ for

213: each patient, where $L_0$ may also contain time-independent covariates

214: and information collected before time $\tau_0$, and we register the

215: treatment given in the interval $(\tau_k, \tau_{k+1}]$ in a variable

216: $A_k$, for instance the AZT dosage, assumed constant during the

217: interval. Besides covariates $L_k$ and treatments $A_k$, we observe

218: for each person a positive time $T$, for instance the time from

219: enrollment to the development of clinical AIDS. Thus the data observed

220: on one person is a vector $(\overline L_K,\overline A_K,T)$, where,

221: for each $k=0,1,\ldots,K$,

222: \begin{eqnarray*}

223: \overline L_k&=&(L_0,L_1,\ldots, L_k),\\

224: \overline A_k&=&(A_0,A_1,\ldots, A_k).

225: \end{eqnarray*}

226: For time instances $\tau_k>T$ the values $L_k$ and $A_k$ may be

227: interpreted to be empty. For simplicity we assume that the variables

228: $L_k$ and $A_k$ take their values in countable sets, denoted by

229: ${\mathcal L}_k$ and ${\mathcal A}_k$. The total set of observations

230: are a sample of $n$ independent and identically distributed (i.i.d.)

231: observations from the distribution of the random vector $(\overline

232: L_K,\overline A_K,T)$.

233:

234: As is clear from the preceding display we use the overline notation

235: $\overline{}$ to denote a ``cumulative vector''. For simplicity of

236: notation, it will be understood that whenever two expressions such as

237: $\overline l_k$ and $\overline l_{k-1}$ occur together, then

238: $\overline l_{k-1}$ is the initial part of $\overline l_k$.

239:

240: A ``treatment regime'' is a prescription for the treatment dosages

241: fixed at the times $\tau_k$, where at each time instant the prescribed

242: treatment may depend on the observed covariate history until this

243: time. We make this precise in the following definition.

244:

245: \begin{defn} \emph{(treatment regimes).}

246: A treatment regime $\GG$ is a vector  $\GG=(\GG_0,\ldots, \GG_K)$

247: of functions

248: $\GG_k: {\mathcal L}_0\times\cdots\times{\mathcal L}_k\to {\mathcal A}_k$.

249: \end{defn}

250:

251: The value $a_k=\GG_k(\overline l_k)$ of the $k$th coordinate

252: of the treatment regime $\GG$ at covariate $\overline l_k$

253: is interpreted as the dosage

254: prescribed by treatment regime $\GG$ in the interval $(\tau_k,\tau_{k+1}]$

255: to a patient with covariate history $\overline l_k$ following

256: this regime (up to time $\tau_k$). The treatment at time $\tau_k$

257: may depend on the full covariate history $\overline l_k=(l_0,\ldots,l_k)$

258: until time $\tau_k$, not just on $l_k$.

259: We define maps

260: $\overline g_k: {\mathcal L}_0\times\cdots\times{\mathcal L}_k\to

261: {\mathcal A}_0\times\cdots\times{\mathcal A}_k$ by

262: $$\overline g_k(\overline l_k) =\bigl(g_0(l_0),g_1(\overline

263: l_1),\ldots, g_k(\overline l_k)\bigr).$$

264: To alleviate notation we may drop the subscripts $k$ or the overline

265: in $g_k$ or $\overline g_k$ if the value of $k$ is clear from the

266: context.  In particular $g(\overline l_K)=\overline g(l_K)= \overline

267: g_K(\overline l_K)$ are equivalent notations for the complete

268: treatment history.

269:

270: We wish to study the effect of treatment using the observed data.

271: Depending on this data not all treatment regimes may be accessible to

272: analysis. We call a treatment regime ``evaluable'' (relative to the

273: distribution of the data vector $(\overline L_K,\overline A_K,T)$) if

274: whenever the regime was followed until some time $\tau_k$ by some

275: positive fraction of the population, then it is also followed in the

276: interval $(\tau_k, \tau_{k+1}]$.

277:

278: \begin{defn} \emph{(evaluable treatment regimes).}

279: A treatment regime $\GG$ is called evaluable if for each $k$ and

280: each $\overline{l}_k\in\overline{{\mathcal L}}_k$,

281: \begin{equation*}

282: P\left(\overline{L}_k=\overline{l}_k,\overline{A}_{k-1}=

283: \overline{\GG}\left(\overline{l}_{k-1}\right),T>\tau_k\right)>0 \Rightarrow

284: P\left(\overline{L}_k=\overline{l}_k,\overline{A}_k=

285: \overline{\GG}\left(\overline{l}_k\right),T>\tau_k\right)>0.

286: \end{equation*}

287: \end{defn}

288:

289: Next we introduce \emph{counterfactual variables}. These will be

290: instrumental both to express the aims of the statistical analysis, and

291: to formulate our assumptions. In our mathematical model the

292: counterfactual variables are ordinary random variables $T^\GG$,

293: one for each treatment regime $\GG$, that are assumed to be defined on the

294: same probability space as the data vector $(\overline L_K,\overline

295: A_K,T)$.  The variable $T^{\GG}$ should be thought of as a patient's

296: time to clinical AIDS had she been treated according to treatment

297: regime $\GG$. Because in actual fact the patient receives treatment

298: $\bar A_K$ (resulting in time to aids $T$), the variable $T^\GG$ is ``counter

299: to fact''. However, it gives a useful notation to express the

300: distribution of interest, and will be related to the observable

301: variables by two assumptions.

302:

303: Counterfactual variables referring to different subjects are

304: assumed independent (cf.\ Rubin~(1978)), and hence we can formulate

305: our set-up in terms of the set of random variables $(T^\GG, T, \overline

306: L_K,\overline A_K)$ referring to one person. We shall not be

307: interested in the joint distribution of counterfactual variables

308: corresponding to different treatment regimes.  We also do not need

309: counterfactual versions of the covariates or treatments.

310:

311: We describe the aims of the statistical analysis in terms of the

312: counterfactual variables.  The \emph{G--null hypothesis} of no effect

313: of AZT on time-to-AIDS is the hypothesis that

314: \begin{equation*}

315: P\left(T^{g_1}>t\right)=P\left(T^{g_2}>t\right)\hspace{0.7cm}

316: {\rm for}\;{\rm all}\; {\rm treatment}\;{\rm regimes}\;g_1\;{\rm and}\;g_2.

317: \label{nte}

318: \end{equation*}

319: In Section~\ref{mlesec} we derive fully parametric

320: likelihood-based tests of this G--null hypothesis based on  a random

321: sample from the distribution of the

322: observables $\left(\overline{L}_K,\overline{A}_K,T\right)$,

323: and a parametric model for their joint

324: distribution. In Section~\ref{Gest} we develop an alternative,

325: semi-parametric procedure with the same aim.

326:

327: If the G--null hypothesis is rejected, then the next goal is to

328: identify and estimate, for each treatment regime $g$, the survival

329: curve $t\mapsto P\left(T^\GG>t\right)$, i.e.\ the survival curve that

330: would have been observed had a subject followed regime

331: $g$. Specifically, if our infected subject outside of the study

332: mentioned in the introduction wishes to maximize his expected years of

333: AIDS-free survival, he would follow the regime $g$ that maximized $E

334: T^\GG=\int_0^\infty P\left(T^\GG>t\right)\, dt$. Inference regarding

335: the distribution of counterfactual variables is referred to as

336: \emph{causal inference}, as the outcomes $T^\GG$ are interpreted

337: as being the effect of the treatment regime $\GG$.

338:

339: Clearly it is impossible to make inference about the

340: counterfactual survival distributions $P(T^\GG>t)$ based on the

341: observed data unless the variables $T^\GG$ and

342: $(\overline L_K,\overline A_K,T)$

343: are related.  The assumed coupling of these variables

344: on a given underlying probability space allows to make

345: the following assumptions relating counterfactual and factual

346: variables.

347:

348: \begin{assu}\emph{(consistency).} \label{cons}

349: For any treatment

350: regime $\GG$, $\overline{l}_k\in\overline{{\mathcal L}}_k$ and

351: $t\in\left(\tau_k,\tau_{k+1}\right]$,

352: \begin{equation*}

353: \left\{T^{\GG}>t,\overline{L}_k=\overline{l}_k,\overline{A}_k=

354: \overline{\GG}\left(\overline{l}_k\right),T> \tau_k\right\} =

355: \left\{T>t,\overline{L}_k=\overline{l}_k,\overline{A}_k=

356: \overline{\GG}\left(\overline{l}_k\right),T> \tau_k\right\}.

357: \end{equation*}

358: \end{assu}

359: \begin{assu}\emph{(no unmeasured confounding).} \label{ass}

360: For any treatment regime $\GG$, for any time $\tau_k$ and for

361: any $\overline{l}_k\in \overline{{\mathcal L}}_k$,

362: \begin{equation*} A_k \cip T^{\GG} | \overline{L}_k=\overline{l}_k,

363: \overline{A}_{k-1} = \overline{\GG}\left(\overline{l}_{k-1}\right).

364: \end{equation*}

365: \end{assu}

366:

367: Here the notation $X\cip Y|Z=z$, borrowed from Dawid~(1979),

368: means that the random variabless $X$ and $Y$ are conditionally

369: independent given the event $Z=z$.

370:

371: The consistency assumption, Assumption~\ref{cons},  couples the true and

372: counterfactual survival times $T$ and $T^{\GG}$ by merely stating that

373: if until some time $\tau_k$ a patient is treated exactly as prescribed

374: by regime $g$, then she would die at some time in the interval

375: $(\tau_k,\tau_{k+1}]$ under regime $g$ if and only if she actually

376: died at the same time. This implies in particular that if all patients

377: were treated according to a predetermined treatment regime, then

378: counterfactual and actual survival times coincide. This is the customary

379: situation in clinical trials, but may fail to be the case

380: in an observational study.

381:

382: The assumption of no unmeasured confounding, Assumption~\ref{ass}, can

383: be expected to hold if the observed covariate history $\overline L_K$

384: contains sufficient information, so that at each time $\tau_k$ the

385: treatment $A_k$ can be assumed to depend on the covariate history

386: $\overline L_k$ of a patient up till that time and no other relevant

387: information. The assumption would for instance hold if at each time

388: $\tau_k$ the treatment in the interval $(\tau_k,\tau_{k+1}]$ is

389: assigned through randomization within fixed levels of equal covariates

390: $\overline L_k$ and earlier treatments.

391:

392: More specifically, in our AIDS example Assumption~\ref{ass} may be

393: expected to hold if the following information is recorded in

394: $\overline{L}_k$: all risk factors (i.e.\ predictors) of

395: regime-specific, counterfactual time-to-AIDS, other than prior

396: AZT-history $\overline{A}_{k-1}$, that are used by physicians and

397: patients to determine the dose $A_k$ of AZT in

398: $\left(\tau_k,\tau_{k+1}\right]$. Then, given $\overline{L}_k$ and

399: $\overline{A}_{k-1}=\GG\bigl(\overline{L}_{k-1}\bigr)$, the treatment

400: $A_k$ in the interval $(\tau_k,\tau_{k+1}]$ may

401: be thought of as depending only on external factors unrelated to

402: the patient's prognosis regarding time-to-AIDS, and hence as being

403: independent of $T^\GG$. For example, since it is known that physicians

404: tend to prescribe AZT to subjects with low CD4-counts and a low

405: CD4-count is an independent predictor of time-to-AIDS, the assumption

406: of no unmeasured confounding would be false if $\overline{L}_k$ does

407: not contain CD4-count history.

408:

409: It is a basic objective of epidemiologists conducting an observational

410: study to collect data on a sufficient number of covariates to ensure

411: that Assumption~\ref{ass} will be true. In this paper, we assume this

412: objective has been realized, while recognizing that, in practice, this

413: may only approximately be the case.

414:

415: \section{G--computation}\label{Gcomps}

416: We are interested in the distribution of the counterfactual, and hence

417: unobservable, variables $T^{\GG}$, as they indicate the success or

418: failure from applying the treatment regime $\GG$.  In this section we

419: show that, under Assumptions~\ref{cons} and~\ref{ass}, the

420: distribution of $T^\GG$ is identifiable from the distribution of the

421: observed data $\bigl(\overline{L}_K,\overline{A}_K,T\bigr)$ for each

422: evaluable treatment regime $\GG$.  As a consequence, given a random

423: sample from the latter distribution, the distribution of $T^\GG$ is

424: estimable, in principle.

425:

426: In fact, the following \emph{G--computation formula} gives an explicit

427: expression for $P\left(T^{\GG}>t\right)$, as well as several

428: conditional survival functions, in terms of the distribution of the

429: data $\left(\overline{L}_K,\overline{A}_K,T\right)$.

430:

431: \begin{thm}\label{Gcomp}

432: \emph{(G--computation-formula).}

433: Suppose that Assumptions~\ref{ass} (no

434: unmeasured confounding) and~\ref{cons} (consistency) hold, and that

435: $\GG$ is an evaluable treatment regime. Then for any $t>0$,

436: with $p$ defined by $\tau_p<t\le \tau_{p+1}$,

437: \begin{eqnarray}\label{Gf}

438: P\left(T^{\GG}>t\right) &=& \sum_{l_{0}} \cdots\sum_{l_{p-1}}\sum_{l_p}

439: \Bigg[P\Bigl(T>t| \overline{L}_{p}=\overline{l}_{p},

440: \overline{A}_{p}=\overline{\GG}\left(\overline{l}_{p}\right),

441: T>\tau_{p}\Bigr)

442: \nonumber\\

443: &&\hspace{2.0cm}\times\prod_{m=0}^{p}\Big\{

444: P\Bigl(T>\tau_m|\overline{L}_{m-1}=\overline{l}_{m-1},

445: \overline{A}_{m-1}=\overline{\GG}\left(\overline{l}_{m-1}\right),

446: T>\tau_{m-1}\Bigr)\nonumber\\

447: &&\hspace{2.2cm}

448: \times P\Bigl(L_m=l_m|

449: \overline{L}_{m-1}=\overline{l}_{m-1},

450: \overline{A}_{m-1}=\overline{\GG}\left(\overline{l}_{m-1}\right),

451: T>\tau_{m}\Bigr) \Big\}\Bigg].\nonumber

452: \end{eqnarray}

453: \end{thm}

454:

455: In the preceding theorem we interpret variables indexed by $-1$ as not

456: present, and events concerning only such variables as being empty. For

457: instance, the conditional probability

458: $P\bigl(L_m=l_m| \overline{L}_{m-1}=\overline{l}_{m-1},

459: \overline{A}_{m-1}=\overline{\GG}\left(\overline{l}_{m-1}\right),

460: T>\tau_{m}\bigr)$ is to be read as the probability $P(L_0=l_0)$ when

461: $m=0$.

462:

463: All conditional probabilities on the right side concern

464: observable variables. Hence the theorem gives an explicit

465: description of the survival function of the counterfactual

466: variable $T^\GG$ in terms of the distribution of

467: the data $(\overline L_K, \overline A_K, T)$.

468:

469: It is instructive to evaluate the formula in the simple case that

470: $K=1$, when there exists only one treatment $A_0$ applied in the

471: single interval $(0,\tau_1]$. Then the G--computation formula yields,

472: for $t>0$,

473: $$P(T^\GG>t)=\sum_{l_0} P\bigl(T>t|L_0=l_0,A_0=g(l_0)\bigr)

474: \,P(L_0=l_0).$$

475: This shows that in general the distribution of the counterfactual

476: variable $T^\GG$ differs from the distribution of $T$, which can be

477: written in the form

478: $$P(T>t)=\sum_{l_0} P\bigl(T>t|L_0=l_0\bigr)\,P(L_0=l_0).$$

479: This difference is not too surprising, because the variable $T^\GG$

480: refers to the treatment regime $\GG$, whereas $T$ relates to the

481: observed outcomes under the actual treatments. Had all patients

482: received treatment $g$, then the two distributions would coincide.

483: More notable is the difference between the conditional distribution of

484: $T$ given $A_0=a_0$ and the distribution of $T^\GG$ for the fixed

485: treatment regime $\GG$ that assigns all patients to treatment $a_0$,

486: i.e.\ $g(l_0)=a_0$.  These two survival distributions can be written

487: \begin{eqnarray*}

488: P(T^{a_0}>t)&=&\sum_{l_0} P\bigl(T>t|L_0=l_0,A_0=a_0\bigr)\,P(L_0=l_0),\\

489: P\bigl(T>t| A_0=a_0\bigr)&=&\sum_{l_0} P\bigl(T>t|L_0=l_0,A_0=a_0\bigr)

490: \,P\bigl(L_0=l_0| A_0=a_0\bigr).

491: \end{eqnarray*}

492: The conditional distribution of $T$ given $A_0=a_0$ is estimable, in

493: principle, by taking only those patients into account who happened to

494: receive treatment $a_0$. The outcome distribution of this subset of

495: patients may however be different from the distribution of the

496: counterfactual variable $T^{a_0}$, as a result of ``selection bias''.

497: In the actual world some patients may be assigned other treatments

498: than $a_0$, where the assignment $A_0$ may correlate with the

499: covariate variable $L_0$. Therefore, the conditional and unconditional

500: distributions of $L_0$ given $A_0$ may differ, and consequently so may

501: the right hand sides of the display. It is the counterfactual survival

502: function $t\mapsto P(T^{a_0}>t)$ that is the relevant one to judge the

503: causal effect of treatment $a_0$. Randomization of treatment over

504: patients within fixed levels of the covariate would have made $L_0$

505: and $A_0$ independent, and the difference would disappear.  The

506: protocol of a controlled experiment may include such randomization,

507: but in a observational study it cannot be taken for granted. The

508: G--computation formula then shows, under some assumptions, how we can

509: still compute the relevant outcome distributions from the observed

510: data distribution.

511:

512: We can make further comparisons after deriving

513: a similar representation for conditional probabilities involving

514: the counterfactual variables.

515:

516: \begin{thm}\label{Gcompcond}

517: \emph{(G--computation-formula).}

518: Under the assumptions of Theorem~\ref{ass},

519: for any $k\in \{0,1,2,\ldots,K\}$ and any $\overline l_k$ such that

520: $P\left(\overline{L}_k=\overline{l}_k,

521: \overline{A}_{k-1}=\overline{\GG}\left(\overline{l}_{k-1}\right),

522: T>\tau_k\right)>0$,

523: for any $t>\tau_k$, and with $p\ge k$ defined by $\tau_p<t\le \tau_{p+1}$,

524: \begin{eqnarray}\label{Gfc}

525: \lefteqn{P\Bigl(T^{\GG}>t|\overline{L}_k=\overline{l}_k,

526: \overline{A}_{k-1}=\overline{\GG}\left(\overline{l}_{k-1}\right),

527: T>\tau_k\Bigr)}\nonumber\\ &=&\sum_{l_{k+1}} \cdots\sum_{l_{p-1}}\sum_{l_p}

528: \Bigg[P\Bigl(T>t|\overline{L}_{p}=\overline{l}_{p},

529: \overline{A}_{p}=\overline{\GG}\left(\overline{l}_{p}\right),

530: T>\tau_{p}\Bigr)\nonumber\\

531: &&\hspace{1.9cm}\times\prod_{m=k+1}^{p}\Big\{

532: P\Bigl(T>\tau_m|\overline{L}_{m-1}=\overline{l}_{m-1},

533: \overline{A}_{m-1}=\overline{\GG}\left(\overline{l}_{m-1}\right),

534: T>\tau_{m-1}\Bigr)\nonumber\\ &&\hspace{1.9cm}

535: \times P\Bigl(L_m=l_m|

536: \overline{L}_{m-1}=\overline{l}_{m-1},

537: \overline{A}_{m-1}=\overline{\GG}\left(\overline{l}_{m-1}\right),

538: T>\tau_{m}\Bigr) \Big\}\Bigg].

539: \end{eqnarray}

540: \end{thm}

541:

542: Again variables indexed by $-1$ should be read as not being present.

543: Furthermore, a repeated summation of the form $\sum_{l_{k+1}}\cdots

544: \sum_{l_p}a_{k,p}(\overline l_{k},l_{k+1},\ldots,l_p)$ is considered

545: to be the single term $a_{k,k}(\overline l_k)$ if $k=p$, whereas the

546: product $\prod_{k+1}^p$ is to be read as 1 in this case. The summation

547: may be restricted to terms whose conditioning events have positive

548: probability.

549:

550: Again we may evaluate this formula in the simple case

551: of a single treatment interval. Then the formula in the

552: preceding theorem (with $k=0=p, K=1$) reduces to

553: $$P\bigl(T^\GG>t| L_0=l_0\bigr) =

554: P\bigl(T>t|L_0=l_0,A_0=g(l_0)\bigr).$$

555: The right side is precisely the conditional distribution of the actual

556: survival time for a subject with covariate $l_0$ following the

557: treatment regime $g$. Intuitively, the conditional probabilities

558: $P\bigl(T>t|L_0=l_0,A_0=g(l_0)\bigr)$ are the correct ones for

559: evaluating the quality of treatment $g$ for a subject with covariate

560: value $l_0$, and the equality in the preceding display is actually a

561: direct consequence of the Assumptions~\ref{cons} and~\ref{ass}

562: relating the counterfactual and factual survival times. (We may add

563: $A_0=g(l_0)$ in the conditioning event on the left by

564: Assumption~\ref{ass}, and next use Assumption~\ref{cons} to see that

565: $T^\GG$ may be replaced by $T$.)

566:

567: Henceforth, we shall denote the right side of (\ref{Gfc}) by

568: $s_{\overline{l}_k,\GG}\left(t\right)$. For $k=-1$ this reduces to the

569: right side in Theorem~\ref{Gcomp}, and we write it as

570: $s_{\GG}(t)$, interpreting $\overline{l}_{-1}$ as empty.

571: Then Theorems~\ref{Gcomp}-\ref{Gcompcond}

572: can be reformulated as saying that under

573: Assumptions~\ref{cons} (consistency) and~\ref{ass} (no unmeasured

574: confounding), for every evaluable treatment regime $\GG$,

575: \begin{equation*}P\left(T^{\GG}>t\right)=s_{\GG}(t)

576: \end{equation*}

577: and, for every $k=0,1,\ldots, K$,

578: \begin{equation*}

579: P\Bigl(T^{\GG}>t|\overline{L}_k=\overline{l}_k,

580: \overline{A}_{k-1}=\overline{\GG}\left(\overline{l}_{k-1}\right),

581: T>\tau_k\Bigr)=s_{\overline{l}_k,\GG}(t).

582: \end{equation*}

583: These functions are survival functions of distributions

584: that concentrate on $(\tau_k,\infty)$.

585:

586: Inspection of the G--computation formula shows that

587: $s_{\overline{l}_k,\GG}$ is a (complicated) function of the

588: distribution of the data vector

589: $\left(\overline{L}_K,\overline{A}_K,T\right)$ and depends on this

590: distribution only through the conditional distributions of the

591: covariates and the survival time given the past, given by

592: \begin{equation}\label{plm}

593: P\left(L_m=l_m|\overline{L}_{m-1}=\overline{l}_{m-1},

594: \overline{A}_{m-1}=\overline{a}_{m-1},T>\tau_m\right),

595: \end{equation}

596: and

597: \begin{equation}\label{ptm}

598: P\left(T>t|\overline{L}_{m-1}=\overline{l}_{m-1},

599: \overline{A}_{m-1}=\overline{a}_{m-1},

600: T>\tau_{m-1}\right).

601: \end{equation}

602: In particular, the functions $s_{g,\overline{l}_k}$ do not depend on

603: conditional laws of the treatment variables $A_m$ given the past.

604:

605: \proofof{Theorems~\ref{Gcomp} and~\ref{Gcompcond}}

606: We prove Theorems~\ref{Gcomp} and~\ref{Gcompcond} by backward

607: induction on $k$, for fixed $t$ (and hence also fixed $p$).  Formula

608: (\ref{Gfc}) with $k=-1$ can be read as the formula given by

609: Theorem~\ref{Gcomp}, so we restrict to proving (\ref{Gfc}).

610:

611: For $k=p$ the left side of (\ref{Gfc}) is equal to

612: \begin{eqnarray*}

613: \lefteqn{P\left(T^{\GG}>t|\overline{L}_p=\overline{l}_p, \overline{A}_{p-1}

614: =\overline{\GG}\left(\overline{l}_{p-1}\right),T>\tau_p\right)}\\

615: &&\hspace{2.5cm}=

616: P\left(T^{\GG}>t|\overline{L}_p=\overline{l}_p, \overline{A}_{p}

617: =\overline{\GG}\left(\overline{l}_{p}\right),T>\tau_p\right)\\

618: &&\hspace{2.5cm}=

619: P\left(T>t|\overline{L}_p=\overline{l}_p, \overline{A}_{p}

620: =\overline{\GG}\left(\overline{l}_{p}\right),T>\tau_p\right),

621: \end{eqnarray*}

622: where in the first equality we can add

623: $A_p=\GG_p\left(\overline{l}_p\right)$ in

624: the conditioning event by Assumption~\ref{ass} of no unmeasured confounding,

625: and in the second equality we can replace the event $T^{\GG}>t$

626: by the event $T>t$, because of the Assumption~\ref{cons}

627: of consistency.

628:

629: The induction step is proved by similar arguments. Supposing

630: that (\ref{Gfc}) holds for $k\le p$,

631: we shall deduce that it also holds for $k-1$. We have

632: \begin{eqnarray*}

633: \lefteqn{P\left(T^{\GG}>t|\overline{L}_{k-1}=\overline{l}_{k-1},

634: \overline{A}_{k-2}=\overline{\GG}\left(\overline{l}_{k-2}\right),

635: T>\tau_{k-1}\right)}\\

636: &=&

637: P\left(T^{\GG}>t|\overline{L}_{k-1}=\overline{l}_{k-1},\overline{A}_{k-1}=

638: \overline{\GG}\left(\overline{l}_{k-1}\right),T>\tau_{k-1}\right)\\

639: &=&

640: P\left(T^{\GG}>\tau_{k}|\overline{L}_{k-1}=\overline{l}_{k-1},

641: \overline{A}_{k-1}=\overline{\GG}\left(\overline{l}_{k-1}\right),

642: T>\tau_{k-1}\right)\\

643: &&\hspace{1.5cm} \times P\left(T^{\GG}>t|\overline{L}_{k-1}=\overline{l}_{k-1},

644: \overline{A}_{k-1}=\overline{\GG}\left(\overline{l}_{k-1}\right),

645: T>\tau_{k-1},T^{\GG}>\tau_{k}\right).

646: \end{eqnarray*}

647: The first equality follows by the assumption of no unmeasured

648: confounding, while the second follows by conditioning on the

649: event $T^{\GG}>\tau_{k}$, where we note that $t>\tau_{k}$, because

650: $t>\tau_p\ge\tau_{k}$.  By the consistency assumption we can replace

651: the event $T^{\GG}>\tau_{k}$ by the event $T>\tau_{k}$ without

652: changing the events or probabilities.  Next we can rewrite the second

653: probability as a sum by conditioning on the variable $L_{k}$, to

654: obtain that the preceding display is equal to

655: \begin{eqnarray*}

656: \lefteqn{\sum_{l_{k}}\Big[

657: P\left(T>\tau_{k}|\overline{L}_{k-1}=\overline{l}_{k-1},\overline{A}_{k-1}=

658: \overline{\GG}\left(\overline{l}_{k-1}\right),T>\tau_{k-1}\right)} \\

659: &&\hspace{1.5cm}

660: \times P\left(T^{\GG}>t|\overline{L}_{k}=\overline{l}_{k},

661: \overline{A}_{k-1}=\overline{\GG}\left(\overline{l}_{k-1}\right),

662: T>\tau_{k}\right)\Big]\\

663: &&\hspace{1.5cm}

664: \times P\left(L_{k}=l_{k}|\overline{L}_{k-1}=\overline{l}_{k-1},

665: \overline{A}_{k-1}=

666: \overline{\GG}\left(\overline{l}_{k-1}\right),T>\tau_{k}\right).

667: \end{eqnarray*}

668: Finally we replace the probability involving the counterfactual

669: variable $T^\GG$ by the right side of (\ref{Gfc}), which is

670: permitted in view of the induction hypothesis.

671: This yields the right side of (\ref{Gfc}) for $k-1$,

672: and concludes the induction step.

673: \Endproof

674:

675: %\footnote{Proposal: we could put some interpretation and something

676: %like Example~2.6.1 from my thesis here, e.g.\ just copy Section~2.6 from

677: %thesis, or rather with a more realistic example. Try to find one in

678: %\citet{Rlat} or ask Jamie. Decided: no, may become too long; if they want

679: %it let them ask; otherwise risk it needs to get out again and waste of

680: %time}

681:

682: \section{Reparameterization}

683: \label{repars}

684: To investigate the effect of a given treatment regime $\GG$ on

685: survival, it suffices to know the conditional distributions given in

686: (\ref{plm}) and (\ref{ptm}). Given these distributions we can compute

687: the counterfactual survival functions by using the G--computation

688: formula, given by Theorem~\ref{Gcomp}.

689:

690: Because carrying out this computation may be a formidable task, we may

691: perform the calculation by simulation methods, rather than by

692: analytical calculation.  Robins~(1986, 1987, 1988) provides a Monte

693: Carlo algorithm, called the ``Monte Carlo G--computation algorithm'',

694: for evaluating the functions $s_{\GG}$ that satisfactorily resolves

695: potential difficulties with the analytical computation. We refer the

696: reader to these papers for further discussion.

697:

698: A difficulty is that the distributions in (\ref{plm}) and

699: (\ref{ptm}) will typically be unknown and must be estimated from the

700: data. One possibility is to specify models for (\ref{plm}) and

701: (\ref{ptm}), for instance logistic or Cox models, and next estimate

702: the unknown parameters from the data. The function $s_\GG$ can then be

703: estimated using the Monte Carlo G--computation algorithm with model

704: derived estimates. Robins~(1986, 1987) provides several worked

705: examples of this approach.

706:

707: This approach has a number of unattractive features. Estimation of the

708: function $s_\GG$ according to the preceding scheme and without

709: confidence intervals, may be feasible, but testing whether treatment

710: affects the outcome is complicated.  The models used to specify

711: $s_\GG$ will usually be rough approximations, and the null hypothesis

712: of no treatment effect will be a complex function of all parameters.

713: Standard statistical software may not apply, and in large datasets the

714: null hypothesis will usually be rejected, just because of model

715: misspecification (cf.\ Robins~(1986, 1987, 1988, 1989)). In this paper

716: we take a different approach, based on a reparameterization of the

717: joint distribution of the observations

718: $\left(\overline{L}_K,\overline{A}_K,T\right)$ using \emph{structural

719: nested failure time models (SNFTM)}.

720:

721: SNFTMs are models for the causal effect of skipping a ``last''

722: treatment dose given the past, thus reverting to the ``baseline

723: treatment''.  To make this precise, suppose that there is a certain

724: baseline treatment regime, which we shall refer to as ``no

725: treatment''. This could for instance be ``zero medication'', and

726: consequently we shall let a zero in the sets $\overline{\mathcal A}_k$ of

727: treatment dosages refer to treatment under the baseline treatment

728: regime.

729:

730: At any time point $\tau_k$ a doctor could switch a patient to the

731: baseline regime, at least conceptually, and leave her there. Let

732: $\left(\overline{a}_k,\overline{0}\right)$ be an abbreviation for the

733: treatment regime $\GG=\left(a_0,\ldots,a_k,0,\ldots,0\right)$, i.e.\

734: the $m$th coordinate function of $\GG$ is given by

735: \begin{equation*}

736: g_m\left(\overline{l}_m\right)= \left\{\begin{array}{ll} a_m & {\rm

737: for} \;{\rm any}\;{\rm value}\;{\rm of}\; {\rm the}\; {\rm

738: covariate}\;{\rm vector}\;\overline{l}_m \; {\rm if}\; m\leq k,\\

739: 0 & {\rm if}\; m>k. \end{array}\right.

740: \end{equation*}

741: Henceforth, we shall always assume that Assumptions~\ref{cons}

742: (consistency) and~\ref{ass} (no unmeasured confounding) are

743: satisfied. Then, by Theorem~\ref{Gcomp}, if the treatment regime

744: $(\overline a_k,\overline 0)$ is evaluable, the function

745: $$t\mapsto s_{\overline{l}_k,\left(\overline{a}_k,\overline 0\right)}(t)$$

746: (by

747: definition the right side of (\ref{Gfc}) with $\GG=(\overline

748: a_k,\overline 0)$) is the conditional survival function of the

749: counterfactual survival time

750: $T^{\left(\overline{a}_k,\overline{0}\right)}$ given the treatment-

751: and covariate history $\left(\overline{l}_k,\overline{a}_{k-1}\right)$

752: up to time $\tau_k$, and given that

753: $T^{\left(\overline{a}_k,\overline{0}\right)}>\tau_k$.  Define

754: ``shift-functions'' $\gamma$ by

755: \begin{equation}\label{gd}

756: \gamma_{\overline{l}_k,\overline{a}_k}(t)=

757: s^{-1}_{\overline{l}_k,\left(\overline{a}_{k-1},\overline{0}\right)}

758: \circ s_{\overline{l}_k,\left(\overline{a}_{k},\overline{0}\right)} (t),

759: \end{equation}

760: where the inverse $s^{-1}$ is the quantile function of the corresponding

761: survival function.

762:

763: The functions $\gamma$ map percentiles of the distribution of the

764: random variable $T^{\left(\overline{a}_k,\overline{0}\right)}$ into

765: those of the distribution of the random variable

766: $T^{\left(\overline{a}_{k-1},\overline{0}\right)}$,

767: \begin{equation}

768: \label{gdalt}

769: s_{\overline{l}_k,\left(\overline{a}_{k-1},\overline{0}\right)}

770: \circ \gamma_{\overline{l}_k,\overline{a}_k}

771: =s_{\overline{l}_k,\left(\overline{a}_{k},\overline{0}\right)}.

772: \end{equation}

773: The functions $\gamma$ thus measure the effect of skipping the ``last''

774: treatment dose $a_k$ given the covariate and treatment history

775: $(\overline l_k,\overline a_{k-1})$. We assume that the survival functions

776: are continuous and strictly decreasing, so that (\ref{gd}) and (\ref{gdalt})

777: give equivalent definitions.

778:

779: If the ``last treatment'' $a_k$ has no effect, then the functions

780: $s_{\overline{l}_k,\left(\overline{a}_{k-1},\overline{0}\right)}$ and

781: $s_{\overline{l}_k,\left(\overline{a}_{k},\overline{0}\right)}$ are

782: identical, and the function $\gamma_{\overline{l}_k,\overline{a}_k}$

783: is the identity function. More generally, the function

784: $\gamma_{\overline{l}_k,\overline{a}_k}$ can be seen to measure the

785: effect of the treatment $a_k$ given in

786: $\left[\tau_k,\tau_{k+1}\right)$ on (counterfactual) survival.  This

787: is illustrated in Figure~\ref{gamf}.

788:

789: \begin{figure}[htb!]

790: \begin{picture}(350,170)

791: \put(40,10){\line(1,0){250}}

792: \put(40,10){\line(0,1){160}}

793: \put(32,10){\makebox(0,0){$0$}}

794: \put(32,160){\makebox(0,0){$1$}}

795: \put(38,160){\line(1,0){4}}

796: \put(40,2){\makebox(0,0){$\tau_k$}}

797: \qbezier[4000](40,160)(75,30)(290,25)

798: \qbezier[4000](40,160)(115,35)(290,32)

799: \put(245,2){\makebox(0,0){$t$}}

800: \put(245,8){\line(0,1){4}}

801: \put(195,2){\makebox(0,0){$\gamma_{\overline{l}_k,\overline{a}_k}(t)$}}

802: \put(195,8){\line(0,1){4}}

803: \put(328,20){\makebox(0,0){$s_{\overline{l}_k,\left(\overline{a}_{k-1},

804: \overline{0}\right)}$}}

805: \put(323,37){\makebox(0,0){$s_{\overline{l}_k,\left(\overline{a}_k,

806: \overline{0}\right)}$}}

807: \put(245,15){\line(0,1){4}}

808: \put(245,23){\line(0,1){4}}

809: \put(245,31){\line(0,1){4}}

810: %\put(255,10){\line(0,1){125}}

811: %\qbezier[42](255,10)(255,72)(255,135)

812: %\put(189,10){\line(0,1){125}}

813: %\qbezier[42](189,10)(189,72)(189,135)

814: \put(195,15){\line(0,1){4}}

815: \put(195,23){\line(0,1){4}}

816: \put(195,31){\line(0,1){4}}

817: %\put(38,35){\line(1,0){4}}

818: %\put(26,35){\makebox(0,0){$0.17$}}

819: %\put(40,135){\line(1,0){215}}

820: \put(40,35){\line(1,0){5}}

821: \put(49,35){\line(1,0){4}}

822: \put(57,35){\line(1,0){4}}

823: \put(65,35){\line(1,0){4}}

824: \put(73,35){\line(1,0){4}}

825: \put(81,35){\line(1,0){4}}

826: \put(89,35){\line(1,0){4}}

827: \put(97,35){\line(1,0){4}}

828: \put(105,35){\line(1,0){4}}

829: \put(113,35){\line(1,0){4}}

830: \put(121,35){\line(1,0){4}}

831: \put(129,35){\line(1,0){4}}

832: \put(137,35){\line(1,0){4}}

833: \put(145,35){\line(1,0){4}}

834: \put(153,35){\line(1,0){4}}

835: \put(161,35){\line(1,0){4}}

836: \put(169,35){\line(1,0){4}}

837: \put(177,35){\line(1,0){4}}

838: \put(185,35){\line(1,0){4}}

839: \put(193,35){\line(1,0){4}}

840: \put(201,35){\line(1,0){4}}

841: \put(209,35){\line(1,0){4}}

842: \put(217,35){\line(1,0){4}}

843: \put(225,35){\line(1,0){4}}

844: \put(233,35){\line(1,0){4}}

845: \put(241,35){\line(1,0){4}}

846: %\qbezier[72](20,135)(128,135)(235,135)

847: \end{picture}

848: \caption{Illustration of the shift-function $\gamma$.

849: In this picture the

850: function $s_{\overline{l}_k,\left(\overline{a}_{k-1},\overline{0}\right)}$

851: lies to the left of the function

852: $s_{\overline{l}_k,\left(\overline{a}_{k},\overline{0}\right)}$, indicating

853: that skipping the treatment $a_k$ decreases survival for patients

854: with covariate and treatment history $(\overline{l}_k,\overline{a}_{k-1})$.

855: In this case the

856: function $\gamma_{\overline{l}_k,\overline{a}_k}$ is below the identity.}

857: \label{gamf}

858: \end{figure}

859:

860: Conversely, if the shift function

861: $\gamma_{\overline{l}_k,\overline{a}_k}$ is equal to the identity

862: function, then the distribution of the counterfactual variables

863: $T^{\left(\overline{a}_k,\overline{0}\right)}$ and

864: $T^{\left(\overline{a}_{k-1},\overline{0}\right)}$ coincide for

865: patients with past covariate- and treatment history $\overline{l}_k$

866: and $\overline{a}_{k-1}$.  This suggests that, if

867: $\gamma_{\overline{l}_k,\overline{a}_k}$ is the identity function for

868: all values of $\overline{l}_k$, $\overline{a}_k$ and $k$, then

869: treatment does not affect the outcome of interest: skipping the last

870: treatment does not affect the outcome of interest, next skipping the

871: second-last treatment does not affect the outcome of interest,

872: etcetera.

873:

874: For a rigorous proof of this conclusion it is necessary that

875: sufficiently many treatment regimes are evaluable, because the

876: functions $s_{\overline l_k,g}$ (defined in terms of the distribution

877: of the observable data by the right side of (\ref{Gfc})) are equal to

878: the counterfactual survival distributions only if the treatment regime

879: $g$ is evaluable.  For instance, the treatment regime

880: $g=\left(\overline{a}_k,\overline{0}\right)$ need not be evaluable for

881: all $\overline{a}_k$ and hence the distributions of the counterfactual

882: variables $T^{\left(\overline{a}_k,\overline{0}\right)}$ and/or

883: $T^{\left(\overline{a}_{k-1},\overline{0}\right)}$ may not be

884: identifiable from the observed data.  To overcome this difficulty we

885: assume that the baseline treatment regime $\overline{0}$ is

886: ``admissible''. A treatment regime is called ``admissible'' if in

887: \emph{every} situation there is a positive probability for this regime

888: to be implemented in the next step.  As applied to the baseline regime

889: $\overline 0$, this property takes the form of the following

890: assumption.

891:

892: \begin{assu}\label{base} \emph{(admissible baseline treatment regime).}

893: For each $k$, each

894: $\overline{l}_k\in\overline{{\cal L}}_k$ and each

895: $\overline{a}_{k-1}\in\overline{{\cal A}}_{k-1}$,

896: \begin{equation*} P\left(\overline{L}_k=\overline{l}_k,

897: \overline{A}_{k-1}=\overline{a}_{k-1}, T>\tau_k\right)>0 \Rightarrow

898: P\left(\overline{L}_k=\overline{l}_k,\overline{A}_{k-1}=\overline{a}_{k-1},

899: A_k=0, T>\tau_k\right)>0.

900: \end{equation*}

901: \end{assu}

902:

903: Under this assumption the shift functions

904: $\gamma_{\overline{l}_k,\overline{a}_k}$ are identifiable for all

905: values of $(k,\overline{l}_k, \overline{a}_k)$ with

906: $P\left(\overline{L}_k=\overline{l}_k,\overline{A}_k=

907: \overline{a}_k,T>\tau_k\right)>0$, and fully characterize the

908: potential effect of any treatment regime.  This is the content of the

909: following theorem, whose proof is deferred to Appendix~\ref{appid}.

910: (As shown in Lok~(2001, Section 2.12), Assumption~\ref{base} can

911: be avoided if one allows $\overline{0}$ to be a so-called

912: admissible baseline course of treatment, which may not only depend on

913: past covariate- but also on past treatment history. Some

914: admissible baseline course of treatment, which has a positive

915: probability of occurring after any observed treatment- and covariate

916: history, always exists.)

917:

918: \begin{thm}\label{geq}

919: Under Assumptions~\ref{ass} (no unmeasured confounding), \ref{cons}

920: (consistency) and \ref{base} (admissible baseline treatment regime),

921: the distribution of $T^\GG$ is the same under all evaluable treatment

922: regimes $\GG$ if and only if the shift-function

923: $\gamma_{\overline{l}_k,\overline{a}_k}$ is the identity for all $(k,

924: \overline{l}_k, \overline{a}_k)$ with

925: $P\left(\overline{L}_k=\overline{l}_k,\overline{A}_k=\overline{a}_k,

926: T>\tau_k\right)>0$.

927: \end{thm}

928:

929: It follows that the functions $\gamma_{\overline{l}_k,\overline{a}_k}$

930: characterize the null hypothesis of no treatment effect.

931: Because they also possess an easy interpretation in terms of the

932: effect of a ``last blip'' of treatment, it is attractive

933: to model these functions rather than the set of conditional

934: distributions in (\ref{plm}) and (\ref{ptm}).

935: A \emph{structural nested failure time model} is a parametrized

936: family of functions used to  model the functions

937: $\gamma_{\overline{l}_k,\overline{a}_k}$. Each of the model

938: functions is an increasing function on $[\tau_k,\infty)$ (that

939: can arise as a quantile-distribution function), with the identity

940: function referring to the absence of the treatment effect.

941:

942: With the parameter denoted by $\psi=(\psi_1,\psi_2,\psi_3)$,

943: one example of an SNFTM would be

944: \begin{equation*}

945: \gamma^\psi_{\overline{l}_k,\overline{a}_k}\left(t\right)=\tau_k

946: +\left(\min\left\{\tau_{k+1},t\right\}-\tau_k\right)

947: e^{\psi_1 a_k +\psi_2 a_k a_{k-1} +\psi_3 a_k l_k}+

948: \left(t-\tau_{k+1}\right)1_{\left\{t>\tau_{k+1}\right\}}.

949: \end{equation*}

950: If $\psi=0$, then this function reduces to the identity function,

951: indicating that the parameter value $\psi=0$ corresponds to

952: the absence of a treatment effect. For nonzero values of $\psi$

953: the model corresponds to a

954: ``change of time scale''  depending on present and

955: past treatment $\left(a_k,a_{k-1}\right)$ and present covariate

956: ($l_k$). The variable $L_k$ might for instance be

957: the univariate covariate CD4 lymphocyte count at $\tau_k$, and

958: the variable $A_k$ the AZT prescription. Then the given model

959: allows for interaction between CD4 lymphocyte count and

960: treatment, and could of course be extended with other factors.

961: Figure~\ref{figureSNFTM} shows two typical functions

962: $\gamma$ following this model.

963:

964: \begin{figure}

965: \qquad\resizebox{4in}{!}{\includegraphics{SNFTM.eps}}

966: \caption{Examples of shift functions. The picture shows the identity

967: function (dashed) and the function

968: $t\mapsto \tau_k

969: +\left(\min\left\{\tau_{k+1},t\right\}-\tau_k\right)

970: 0.5+\left(t-\tau_{k+1}\right)1_{\left\{t>\tau_{k+1}\right\}}$

971: for $\tau_k=1<\tau_{k+1}=2$, which corresponds to decreasing

972: survival by skipping the treatment in the interval $(\tau_k, \tau_{k+1}]$.}

973: \label{figureSNFTM}

974: \end{figure}

975:

976: \section{Mimicking counterfactual outcomes}

977: \label{mimsec}

978: In the next two sections we present two methods for estimating the

979: parameter $\psi$ in a structural nested failure time

980: model. Theorem~\ref{Tog} below is basic for both methods. Consider the

981: following transformation of the observation

982: $\left(\overline{L}_K,\overline{A}_K,T\right)$, using the ``true'' shift

983: functions $\gamma$ (given by (\ref{gd})):

984: \begin{equation} \label{Togd}

985: T_0^\gamma=\gamma_{\overline{L}_0,\overline{A}_0}

986: \circ \gamma_{\overline{L}_1,\overline{A}_1}

987: \circ \cdots

988: \circ \gamma_{\overline{L}_{p\left(T\right)},

989: \overline{A}_{p\left(T\right)}}\left(T\right),

990: \end{equation}

991: where $p(T)=\max\left\{k:\tau_k<T\right\}$.

992: %By its definition

993: The application of the function

994: $\gamma_{\overline{L}_{p\left(T\right)},\overline{A}_{p\left(T\right)}}$

995: to $T$ annihilates the effect of the last treatment

996: $A_{p\left(T\right)}$, and each further application of a shift

997: function annihilates the effect of an earlier treatment.  This

998: explains the following theorem, which is proved in

999: Appendix~\ref{appTog}.

1000:

1001: \begin{thm} \label{Tog}\emph{(mimicking counterfactual outcomes).}

1002: %Under Assumptions~\ref{ass} (no unmeasured confounding), \ref{cons}

1003: %(consistency) and~\ref{base} (admissible baseline treatment regime),

1004: %we have that

1005: The variable $T_0^\gamma$ defined in (\ref{Togd})

1006: possesses survival function $s_{\overline{0}}$.

1007: Furthermore, for every $k\geq0$,

1008: \begin{equation}\label{indep}

1009: A_k \cip T_0^\gamma |\overline{L}_k,\overline{A}_{k-1},T>\tau_k.

1010: \end{equation}

1011: \end{thm}

1012:

1013: The variable $T_0^\gamma$ is a (deterministic) function of the data

1014: vector $\left(\overline{L}_K,\overline{A}_K,T\right)$, through the

1015: (unknown) family of shift-functions $\gamma$. If the shift functions

1016: $\gamma$ would be known, then we would be able to ``mimic'' the

1017: survival time without treatment by calculating the transformation

1018: $T_0^\gamma$. By the preceding theorem this variable is distributed

1019: according to $s_{\overline 0}$ and hence under the conditions of

1020: Theorem~\ref{Gcomp} possesses the same distribution as $T^{\GG}$ for

1021: $\GG=\overline{0}$, the null treatment.

1022:

1023: Equation (\ref{indep}) shows that the variable $T_0^\gamma$ also

1024: shares the ``no unmeasured confounding'' property

1025: (Assumption~\ref{ass}) of counterfactual variables (in a slightly

1026: stronger form).

1027:

1028: \section{Maximum likelihood estimation}

1029: \label{mlesec}

1030: In this section we consider likelihood based inference for the

1031: parameter $\psi$ in a given SNFTM. Clearly this requires that we make

1032: the parameter $\psi$ visible in the density of the observation

1033: $\left(\overline{L}_K,\overline{A}_K,T\right)$. We first show that this

1034: can be achieved using the transformation $T_0^\gamma = T_0^\gamma

1035: \left(T,\overline{L}_K,\overline{A}_K\right)$ defined in

1036: (\ref{Togd}), which will depend on $\psi$ if we use a

1037: SNFTM for $\gamma$.

1038:

1039: \begin{thm}\label{mleth}\emph{(the likelihood rewritten).}

1040: Suppose that Assumption~\ref{base} (admissible baseline treatment

1041: regime) holds.

1042: %\footnote{check must be more assumptions at least to be able to

1043: %apply Theorem~\ref{Tog}}

1044: Suppose moreover that $\left(T,\overline{L}_K,\overline{A}_K\right)$ has a

1045: Lebesgue density, and that the function $t\mapsto

1046: s_{\left(\overline{l}_k,\left(\overline{a}_k,\overline{0}\right)\right)}

1047: \left(t\right)$ is continuously differentiable in $t$, for all

1048: $\overline{l}_k$, $\overline{a}_k$ with

1049: $P\left(\overline{L}_k=\overline{l}_k,\overline{A}_k=\overline{a}_k,

1050: T>\tau_k \right)>0$, with strictly negative derivative except for at

1051: most finitely many points. Then the joint density of

1052: $\left(T,\overline{L},\overline{A}\right)$ can be rewritten as

1053: \begin{eqnarray*}

1054: \lefteqn{f_{T,\overline{L},\overline{A}}\left(t,\overline{l},

1055: \overline{a}\right)}\\

1056: &=&\frac{\partial}{\partial t} t_0^\gamma

1057: \left(t,\overline{l}_p,\overline{a}_p\right)

1058: f_{T_0^\gamma}\left(t_0^\gamma\left(t,\overline{l}_p,\overline{a}_p\right)

1059: \right)

1060: P\left(L_0=l_0|T_0^\gamma=t_0^\gamma\right)

1061: P\left(A_0=a_0|L_0=l_0\right)\nonumber\\

1062: &&\prod_{k=0}^p \Big\{P\left(L_k=l_k|\overline{L}_{k-1}=\overline{l}_{k-1},

1063: \overline{A}_{k-1}=\overline{a}_{k-1},T>\tau_k, T_0^\gamma=t_0^\gamma\right)

1064: \\&&\hspace{3cm}

1065: P\left(A_k=a_k|\overline{L}_{k}=\overline{l}_{k},

1066: \overline{A}_{k-1}=\overline{a}_{k-1},T>\tau_k\right)\Big\},

1067: \end{eqnarray*}

1068: where $\tau_p<t\le \tau_{p+1}$ and

1069: \begin{equation*}

1070: t_0^\gamma\left(t,\overline{l}_p,\overline{a}_p\right)=

1071: \gamma_{\overline{l}_0,\overline{a}_0}

1072: \circ \gamma_{\overline{l}_1,\overline{a}_1}

1073: \circ \cdots

1074: \circ \gamma_{\overline{l}_{p},

1075: \overline{a}_{p}}\left(t\right).

1076: \end{equation*}

1077: \end{thm}

1078:

1079: \proof

1080: Under the conditions of Theorem~\ref{mleth},

1081: \begin{equation*}

1082: \left(T,\overline{L},\overline{A}\right)\mapsto

1083: \left(T_0^\gamma,\overline{L},\overline{A}\right)=

1084: \left(t_0^\gamma\left(T\right),\overline{L},\overline{A}\right)

1085: \end{equation*}

1086: is a one-to-one mapping. Thus if $t_0^\gamma$ were continuously

1087: differentiable everywhere, then the identity

1088: \begin{equation}\label{lik1}

1089: f_{T,\overline{L},\overline{A}}\left(t,\overline{l},\overline{a}\right)

1090: =\frac{\partial}{\partial t} t_0^\gamma

1091: \left(t,\overline{l},\overline{a}\right)

1092: f_{T_0^\gamma,\overline{L},\overline{A}}

1093: \left(t_0^\gamma\left(t,\overline{l},\overline{a}\right),\overline{l},

1094: \overline{a}\right)

1095: \end{equation}

1096: would be immediate from the change of variables formula.  We show that

1097: (\ref{lik1}) holds under the conditions of Theorem~\ref{mleth}

1098: too. Next the assertion of the theorem follows by repeated

1099: conditioning and using Theorem~\ref{Tog}.

1100:

1101: To prove (\ref{lik1}) in general, note that the probability space

1102: consists of countably many sets of the form

1103: $\left(\overline{L}_K=\overline{l}_K,\overline{A}_K=\overline{a}_K\right)$, so

1104: that by countable additivity of measures it suffices to prove

1105: (\ref{lik1}) on each of these sets that has probability greater than

1106: $0$. On each of these sets, $t_0^\gamma$ is one-to-one and

1107: continuously differentiable except for at finitely many points: it is

1108: the composition of finitely many functions

1109: $\gamma_{\overline{l}_k,\overline{a}_k}$ and under the assumptions of

1110: Theorem~\ref{mleth},

1111: \begin{equation*}\gamma'_{\overline{l}_k,\overline{a}_k}\left(t\right)

1112: =\bigl(s^{-1}_{\overline{l}_k,\left(\overline{a}_{k-1},\overline{0}\right)}

1113: \circ

1114: s_{\overline{l}_k,\left(\overline{a}_{k},\overline{0}\right)}\bigr)'(t)

1115: =\frac{1}{s'_{\overline{l}_k,\left(\overline{a}_{k-1},\overline{0}\right)}

1116: \bigl(\gamma_{\overline{l}_k,\overline{a}_k}\left(t\right)\bigr)}

1117: s'_{\overline{l}_k,\left(\overline{a}_{k},\overline{0}\right)}\left(t\right)

1118: \end{equation*}

1119: exists and is continuous except for at most finitely many $t$.

1120: %\footnote{$f\circ f^{-1}(x)=x$, so

1121: %$f'\left(f^{-1}(x)\right) \cdot f^{-1}'(x)=1$, so

1122: %$f^{-1}'(x)=1/f'\left(f^{-1}(x)\right)$.

1123: %$\gamma=f^{-1}\circ g$, so $\gamma'(x)=1/(f'(\gamma(x))) \cdot g'(x)$.

1124: Thus, from the change of variables formula, equation (\ref{lik1}) is

1125: true on each set

1126: $\left(\overline{L}_K=\overline{l}_K,\overline{A}_K=\overline{a}_K\right)$, as

1127: we needed to show.

1128: \Endproof

1129:

1130:

1131: Regarding the conditions of Theorem~\ref{mleth} we note that the

1132: baseline treatment regime $\overline{0}$ may not be constant, whence

1133: the death rate under $\overline{0}$ may change at the time points

1134: $\tau_m$. However, it will often be reasonable to assume

1135: differentiability of the function

1136: $s_{\left(\overline{l}_k,\left(\overline{a}_k,\overline{0}\right)\right)}

1137: \left(t\right)$ on all intervals $\left(\tau_m,\tau_{m+1}\right)$.

1138:

1139: For likelihood inference concerning the parameter $\psi$ of an SNFTM, we

1140: shall generally drop the factors

1141: \begin{equation}

1142: \label{condlawtreatment}

1143: P\left(A_k=a_k|\overline{L}_k=\overline{l}_k,

1144: \overline{A}_{k-1}=\overline{a}_{k-1}\right)

1145: \end{equation}

1146: from the likelihood. All other terms involve $\psi$ through

1147: $T_0^\gamma$ and we will need to specify models for these terms in

1148: order to proceed, typically involving additional parameters.  Given

1149: such models we can estimate $\psi$ by the corresponding coordinate of

1150: the maximum likelihood estimator obtained by maximizing the likelihood

1151: over all parameters. Of course finding this maximizer may be a

1152: formidable task.

1153:

1154: Since the null hypothesis of no treatment effect is equivalent to the

1155: functions $\gamma_{\overline{l}_{k},\overline{a}_k}$ being equal to

1156: the identity function, by Theorem~\ref{geq}, this hypothesis can be

1157: fully expressed in the parameter $\psi$. For instance, we could, by

1158: convention, construct our SNFTM in such a way that this null

1159: hypothesis is equivalent to $H_0: \psi=0$. Then we can obtain a

1160: likelihood-based test for the null hypothesis of no treatment effect

1161: using the Wald, score or likelihood ratio test for $H_0: \psi=0$.

1162:

1163: \section{G--estimation}

1164: \label{Gest}

1165: The likelihood methods of the preceding section require the

1166: specification of models for the conditional laws of the covariates,

1167: among others, next to a specification of an SNFTM. In this section we

1168: present an alternative approach to testing and estimation of the

1169: parameter in a SNFTM, called \emph{G--estimation} in

1170: Robins~(1998). This approach is based on models for the conditional

1171: distributions of the treatment variables given in

1172: (\ref{condlawtreatment}). It can be considered a semiparametric

1173: approach, where the parametric component refers to the laws

1174: (\ref{condlawtreatment}) and all other laws appearing in

1175: Theorem~\ref{mleth} form the nonparametric, unspecified

1176: component. From a practical perspective modelling the distributions

1177: (\ref{condlawtreatment}) is more attractive than modelling the

1178: remaining laws in Theorem~\ref{mleth}, as it may be expected that

1179: doctors have clear ideas, at least qualitatively, about how they reach

1180: their decisions about treatment.

1181:

1182: The method of G--estimation is based on the conditional independence

1183: of the ``blipped-up'' variable $T_0^\gamma$ defined in (\ref{Togd})

1184: and the treatment variable $A_k$ given the variables $\overline L_k$

1185: and $\overline A_{k-1}$, for each $k$, asserted by

1186: Theorem~\ref{Tog}. Consider first testing the null hypothesis $H_0:

1187: \gamma=\gamma_0$ for a given shift function $\gamma_0$.

1188: Theorem~\ref{Tog} gives, under the null hypothesis, that, for each $k$,

1189: \begin{equation}\label{ind}

1190: A_k \cip T_0^{\gamma_0}\,|\,\overline{L}_k,\overline{A}_{k-1},T>\tau_k.

1191: \end{equation}

1192: This is an assertion about the observed data vector

1193: $(\overline L_K,\overline A_K, T)$ only. Any test for the validity

1194: of (\ref{ind}) is therefore a test for the null hypothesis

1195: $H_0: \gamma=\gamma_0$.

1196:

1197: In order to operationalize this idea we adopt for each $k$ a model

1198: \begin{equation*}

1199: P_\theta\left(A_k=a_k|\overline{L}_k=\overline{l}_k,

1200: \overline{A}_{k-1}=\overline{a}_{k-1},T>\tau_k\right)

1201: \end{equation*}

1202: for the prediction of treatment given the past, indexed by some

1203: parameter $\theta$. Such a model tries to explain the treatment $A_k$

1204: by the values of the covariates up to time $\tau_k$ and the preceding

1205: treatment history.  Formula (\ref{ind}) implies that, under the null

1206: hypothesis, inclusion of the variable $T_0^{\gamma_0}$ as an extra

1207: explanatory variable is useless for the prediction of $A_k$, if past

1208: covariate- and treatment information $\overline{L}_k$ and

1209: $\overline{A}_{k-1}$ are known. Thus given a term of the form

1210: $\alpha\, T_0^{\gamma_0}$ in the prediction model with $\alpha$ a

1211: parameter, the true value of $\alpha$ must be equal to $0$, because of

1212: (\ref{ind}). It follows that we can test the null hypothesis $H_0:

1213: \gamma=\gamma_0$ by adding a term $\alpha T_0^{\gamma_0}$ anyway, and

1214: next test the null hypothesis $H_0: \alpha=0$ in the model indexed by

1215: the overall parameter $(\theta,\alpha)$.  Depending on the chosen

1216: types of model such a test, for instance a Wald, score or the

1217: likelihood ratio test, can be performed by standard statistical

1218: software.

1219:

1220: This procedure is particularly simple for testing the

1221: null hypothesis of no treatment effect. In view of

1222: Theorem~\ref{geq}, this is equivalent to testing whether the function

1223: $\gamma$ is equal to the identity function, i.e.\ we take $\gamma_0$ in the

1224: preceding equal to the identity function. In this case

1225: the variable $T_0^{\gamma_0}$ is equal to $T$, and hence

1226: the G--estimation procedure reduces to testing the null hypothesis

1227: $H_0: \alpha=0$ in a regression model that tries to explain the variable

1228: $A_k$ by the variables $\overline L_k$, $\overline A_{k-1}$ and $\alpha T$.

1229: The null hypothesis of no treatment effect can be tested

1230: in this way without specifying a model for the shift function $\gamma$.

1231:

1232: For a specific example, suppose that the treatment variables $A_k$ are

1233: binary-valued.  Then a logistic regression model is a standard choice

1234: for modelling the probabilities (\ref{condlawtreatment}).  We might

1235: add the variable $\alpha T_0^{\gamma}$ to a logistic regression model

1236: to form the model

1237: \begin{equation*}

1238: P_{\theta,\alpha}\left(A_k=a_k|\overline{L}_k,\overline{A}_{k-1},T>\tau_k,T_0^{\gamma}\right)

1239: =\frac{1}{1+e^{\theta\cdot f_k(\overline{L}_k,\overline{A}_{k-1})

1240: +\alpha g_k(T_0^{\gamma})}},

1241: \end{equation*}

1242: for given, known functions $f_k$ and $g_k$, and unknown parameters

1243: $\theta$ and $\alpha$. A test for the null hypothesis $H_0: \alpha=0$

1244: can be carried out by standard software for logistic regression.

1245:

1246: Given an SNFTM $\psi\mapsto \gamma_\psi$ for the shift functions

1247: $\gamma$, indexed by a parameter $\psi$, we can extend the preceding

1248: testing methods to full inference on the parameter $\psi$.  First, we

1249: can obtain confidence regions for $\psi$ by inverting the tests for

1250: the null hypotheses $H_0: \gamma=\gamma_{\psi}$ in the usual way: the

1251: value $\psi$ belongs to the confidence region if the corresponding

1252: null hypothesis $H_0$ is not rejected.

1253:

1254: A natural estimator of $\psi$ would be the center of a confidence

1255: set, or, alternatively, a value of $\psi$ for

1256: which $T_0^{\gamma_\psi}$ contributes the

1257: least to the prediction model for treatment given the

1258: past. That is, the $\psi$ for which the fitted model for

1259: \begin{equation}\label{akpred}

1260: P_{\theta,\alpha}\left(A_k=a_k|\overline{L}_k,

1261: \overline{A}_{k-1},T>\tau_k,\alpha T_0^{\gamma_\psi}\right).

1262: \end{equation}

1263: does not include the variable $T_0^{\gamma_\psi}$, i.e.\ $\alpha=0$.

1264: For each given value of the parameter $\psi$ of the SNFTM we may

1265: obtain estimators $\hat{\theta}(\psi)$ and $\hat{\alpha}(\psi)$ for

1266: the parameters $\theta$ and $\alpha$, based on the observations

1267: $(\overline{L}_K^i,\overline{A}_K^i,T^i)$ on $n$ persons.  Then we

1268: define $\hat{\psi}$ as the solution of the equation

1269: \begin{equation*}

1270: \hat{\alpha}\left(\psi\right)=0.

1271: \end{equation*}

1272: If we use a logistic regression model, then the estimators

1273: $\hat\theta$  and $\hat\alpha$ can be obtained with

1274: standard software, for each given value of $\psi$.

1275: The estimator $\hat\psi$ can next be found by a grid search method.

1276: Alternatively, we can implement a direct numerical method for

1277: estimating $\psi$.

1278:

1279: The procedures just outlined may appear a bit unusual, in view of

1280: their indirect nature. However, in most cases they can also be

1281: interpreted in a standard way. For instance, the procedure for

1282: estimating $\alpha$ for given $\psi$ will often be equivalent to

1283: solving $\hat{\alpha}=\hat{\alpha}\left(\psi\right)$ from an

1284: estimating equation of the type

1285: \begin{equation*}

1286: \sum_{i=1}^n h_{\alpha,\psi}\bigl(\overline{L}_K^i,\overline{A}_K^i,T^i\bigr)

1287: =0.

1288: \end{equation*}

1289: Then $\hat{\psi}$ satisfying

1290: $\hat{\alpha}\bigl(\hat{\psi}\bigr)=0$ will satisfy

1291: the estimating equation

1292: \begin{equation*}

1293: \sum_{i=1}^n h_{0,\hat{\psi}}\bigl(\overline{L}_K^i,\overline{A}_K^i,T^i\bigr)

1294: =0.

1295: \end{equation*}

1296: Because $\alpha(\psi_0)=0$ for the true value $\psi_0$ of $\psi$,

1297: the true value of $\psi$ is a solution to the equation

1298: \begin{equation*}

1299: E h_{0,\psi}\left(\overline{L}_K,\overline{A}_K,T\right)=0.

1300: \end{equation*}

1301: In other words, $\hat{\psi}$ will be the solution of an unbiased

1302: estimating equation, whence the (asymptotic) properties of $\hat\psi$

1303: can be ascertained with the usual theory for M-estimators

1304: (e.g.\ Van der Vaart~(1998)). For

1305: instance, we may expect the sequence

1306: $\sqrt{n}\bigl(\hat{\psi}-\psi\bigr)$ to be asymptotically (as

1307: $n\rightarrow\infty$) normal with mean zero and variance

1308: \begin{equation*}

1309: \frac{E h_{0,\psi}^2\left(\overline{L}_K,\overline{A}_K,T\right)}

1310: {\bigl(\frac{\partial}{\partial \psi} E

1311: h_{0,\psi}\left(\overline{L}_K,\overline{A}_K,T\right)\bigr)^2}.

1312: \end{equation*}

1313: Lok~(1991) has studied the validity of these results, and has thus

1314: justified the preceding procedures.

1315:

1316: \section{Summary and extensions}

1317: We have shown that the AZT treatment regime-specific, counterfactual

1318: AIDS-free survival curves $P\left(T^g>t\right)$ are identified for all

1319: evaluable treatment regimes $g$ if our maintained assumption of no

1320: unmeasured confounding, Assumption~\ref{ass}, is met. This assumption

1321: will hold if the investigator has succeeded in recording in

1322: $\overline{l}_k$ data on all covariates that, conditional on past AZT

1323: history $\overline{a}_{k-1}$, predict both the AZT dosage rate $a_k$

1324: in $\left(\tau_k,\tau_{k+1}\right]$ and the random variables $T^g$

1325: representing time to AIDS had, contrary to fact, all subjects followed

1326: an AZT treatment history consistent with regime $g$.

1327:

1328: Further, we have shown that, under the assumption of no unmeasured

1329: confounding, Assumption~\ref{ass}, the shift functions $\gamma$ of an

1330: SNFTM are the identity function if and only if the G--null hypothesis

1331: of no causal effect of AZT on time to AIDS is true. We have expressed

1332: the likelihood of the observable random variables

1333: $\left(T,\overline{L}_K,\overline{A}_K\right)$ in terms of the transformed

1334: random variables $\left(T_0^\gamma,

1335: \overline{L}_K,\overline{A}_K\right)$. We then developed parametric

1336: likelihood based tests of the hypothesis $\gamma={\rm id}$ by specifying fully

1337: parametric models for the joint distribution of

1338: $\left(T,\overline{L}_K,\overline{A}_K\right)$ in terms of the

1339: transformed random variables $\left(T_0^\gamma,

1340: \overline{L}_K,\overline{A}_K\right)$.

1341:

1342: Even in the absence of censoring or missing data, a major limitation

1343: of the fully parametric likelihood-based tests of the null hypothesis

1344: $\gamma={\rm id}$ from Section~\ref{mlesec} is that misspecification of the

1345: parametric models for the distribution of $L_k$ given

1346: $\overline{L}_{k-1}$, $\overline{A}_{k-1}$ and $T_0^\gamma$, or for

1347: the distribution of $T^{\overline{0}}$, can cause the true

1348: $\alpha$-level of the test to deviate from the nominal

1349: $\alpha$-level. This limitation raised the question of whether it is

1350: possible to construct $\alpha$-level tests of the null hypothesis

1351: $\gamma={\rm id}$ and of more general hypotheses concerning $\gamma$, which

1352: are asymptotically distribution-free. A closely related question is

1353: whether there exist $n^{1/2}$-consistent asymptotically normal

1354: estimators of the parameter $\psi$ of a correctly specified structural

1355: nested failure time model if the joint distribution of the observables

1356: $\left(\overline{L}_K,\overline{A}_K,T\right)$ is otherwise unspecified,

1357: i.e.\ if the distribution of $L_k$ given $\overline{L}_{k-1}$,

1358: $\overline{A}_{k-1}$ and $T_0^\gamma$ and the distribution of the variable

1359: $T^{\overline{0}}$ are left completely unspecified. In

1360: Section~\ref{Gest} we showed that one only needs to specify a parametric

1361: model for the shift function $\gamma$, which models the causal effect

1362: of one treatment dosage given the past, and a parametric model for the

1363: distribution of actual treatment dosage given past treatment- and

1364: covariate history. Doctors will usually have clear ideas about this

1365: latter distribution of treatment decisions. Moreover, the doctors'

1366: interest will often be in the causal effect of one treatment dosage

1367: given the past.

1368:

1369: If the null hypothesis of no treatment effect has been rejected and

1370: the parameter $\psi$ of the shift function $\gamma$ has been

1371: estimated, one might wish to estimate the survival distribution

1372: $t\mapsto P\bigl(T^\GG>t\bigr)$ of the outcome under specific

1373: treatment regimes $\GG$ in a way consistent with the estimator

1374: $\hat{\psi}$. This can be done by estimating the distribution of

1375: $T^{\overline{0}}$ (e.g.\ by the empirical distribution of

1376: $T_0^{\gamma^\psi}$) and the empirical distribution of $L_k$ given

1377: $\overline{L}_{k-1}$, $\overline{A}_{k-1}$ and $T_0^\gamma$

1378: ($k=0,\ldots,K$) for histories $\overline{L}_{k-1}$,

1379: $\overline{A}_{k-1}$ consistent with $\GG$.  An approximate sample

1380: $\tilde{T}^\GG_i$ ($i=1,2,\ldots$) from the distribution of $T^\GG$

1381: could then be generated by using these estimated distributions: first

1382: draw $T'_0$ from the distribution of $T^{\overline{0}}$, then draw

1383: $L'_0$ from the distribution of $L_0$ given $T_0^\gamma=T'_0$, then

1384: put $A'_0=\GG\bigl(L'_0\bigr)$, then draw $L'_1$ from the distribution

1385: of $L_1$ given $T_0^\gamma=T'_0$, $A_0=A'_0$ and $L_0=L'_0$, etcetera.

1386: Finally put

1387: \begin{equation*}\tilde{T}^\GG=

1388: {\gamma^{\hat{\psi}}_{\overline{L}'_K,\overline{A}'_K}}^{-1}\circ\ldots\circ

1389: {\gamma^{\hat{\psi}}_{\overline{L}'_0,\overline{A}'_0}}^{-1}\bigl(T'_0\bigr).

1390: \end{equation*}

1391: This variable will be generated from the desired distribution.

1392:

1393: Extensions of the results of this paper that allow for censoring and

1394: missing data are discussed in Robins~(1988, 1992, 1993, 1998), and Robins

1395: et al~(1992). The extension of G--tests and estimators to continuous

1396: $L_k$ and $A_k$ are discussed in Robins~(1992, 1993), Robins et

1397: al.~(1992), and Gill and Robins~(2001). Robins~(1998) and Lok~(2001)

1398: show that the results in this paper can be extended to allow for jumps

1399: in the treatment- and covariate processes in continuous time.

1400:

1401: \begin{appendix}

1402:

1403: \section{Alternative formulation of the null hypothesis}

1404: \label{appid}

1405: In this appendix we prove Theorem~\ref{geq} through two lemmas.  The

1406: first lemma shows that if all functions $\gamma$ are equal to the

1407: identity function, then all survival curves $P\left(T^g>t\right)$ for

1408: evaluable treatment regimes are the same. The second lemma shows the

1409: reverse.

1410:

1411: \begin{lem}

1412: \label{alem1}

1413: Suppose that Assumptions~\ref{ass} (no unmeasured confounding),

1414: \ref{cons} (consistency) and~\ref{base} (admissible baseline treatment

1415: regime) hold. If $\gamma_{\overline{l}_k,\overline{a}_k}$ is the

1416: identity function for all $k$, $\overline{l}_k\in\overline{{\cal

1417: L}}_k$ and $\overline{a}_k\in\overline{A}_k$ with

1418: $P\left(\overline{L}_k=\overline{l}_k,\overline{A}_k=\overline{a}_k,

1419: T>\tau_k\right)>0$, then

1420: all survival curves $P\left(T^\GG>t\right)$ for evaluable treatment

1421: regimes $\GG$ are the same.

1422: \end{lem}

1423:

1424: \proof

1425: We show that for all evaluable treatment regimes $\GG$ and

1426: all $\overline{l}_k$ with \linebreak

1427: $P\left(\overline{L}_k=\overline{l}_k,\overline{A}_k=

1428: \overline{\GG}\left(\overline{l}_k\right),T>\tau_k\right)>0$,

1429: the conditional distributions of the counterfactual variables

1430: $T^\GG$ and

1431: $T^{\left(\overline{\GG}_{k-1}\left(\overline{l}_{k-1}\right),

1432: \overline{0}\right)}$

1433: given $\overline{L}_k=\overline{l}_k,\overline{A}_{k-1}

1434: =\overline{\GG}\left(\overline{l}_{k-1}\right),T>\tau_k$

1435: are the same, i.e., for $t\ge \tau_k$,

1436: \begin{equation}\label{kweg}

1437: s_{\overline{l}_k,\GG}(t)=

1438: s_{\overline{l}_k,\left(\overline{\GG}_{k-1}\left(\overline{l}_{k-1}\right),

1439: \overline{0}\right)}(t).

1440: \end{equation}

1441: For $k=-1$ this should be read as $s_\GG\left(t\right)=s_{\overline{0}}(t)$,

1442: which implies Lemma~\ref{alem1}.

1443:

1444: We prove (\ref{kweg}) by backward induction on $k$, for $t$ fixed.

1445: With $\tau_p$ the last clinic visit time strictly before $t$, we start

1446: with $k=p$ and end with $k=0$. The statement for $k=-1$ follows from

1447: the statement for $k=0$ by summation over $l_0$.

1448:

1449: Basis: For $k=p$, by the definition of $s$ as the right side

1450: of (\ref{Gfc}),

1451: $$s_{\overline{l}_p,\GG}(t)

1452: =P\left(T>t|\overline{L}_p=\overline{l}_p,\overline{A}_p=

1453: \overline{\GG}_p\left(\overline{l}_p\right),T>\tau_p\right)

1454: =s_{\overline{l}_p,\left(\overline{\GG}_p\left(\overline{l}_p\right),

1455: \overline{0}\right)}(t),$$

1456: by another application of the definition of $s$.

1457: The right side is equal to

1458: $s_{\overline{l}_p,\left(\overline{\GG}_{p-1}\left(\overline{l}_{p-1}\right),

1459: \overline{0}\right)}(t)$ by the assumption that

1460: the function $\gamma_{\overline{l}_p,\overline{a}_p}$ with

1461: $\overline a_p=\overline g_p(\overline l_p)$, is the identity function

1462: is the identity.

1463:

1464: Induction step: we suppose that (\ref{kweg}) is true for $k\ge 1$ and

1465: establish (\ref{kweg}) for $k-1$. By straightforward algebra using the

1466: definition of $s_{\overline l_{k-1},g}$,

1467: \begin{eqnarray*} s_{\overline{l}_{k-1},\GG}\left(t\right)

1468: &=& P\left(T>\tau_{k}|\overline{L}_{k-1}=\overline{l}_{k-1},

1469: \overline{A}_{k-1}=\overline{\GG}\left(\overline{l}_{k-1}\right),

1470: T>\tau_{k-1}\right)\\

1471: &&\hspace{0.5cm}\sum_{l_{k}}

1472: P\left(L_{k}=l_{k}|\overline{L}_{k-1}=\overline{l}_{k-1},

1473: \overline{A}_{k-1}=\overline{\GG}\left(\overline{l}_{k-1}\right),

1474: T>\tau_{k}\right) s_{\overline{l}_{k},\GG}\left(t\right).

1475: \end{eqnarray*}

1476: Here we can replace $s_{\overline l_{k},g}$ using the induction

1477: hypothesis, giving that the preceding display is equal to

1478: \begin{eqnarray*}

1479: &&P\left(T>\tau_{k}|\overline{L}_{k-1}=\overline{l}_{k-1},

1480: \overline{A}_{k-1}=\overline{\GG}\left(\overline{l}_{k-1}\right),

1481: T>\tau_{k-1}\right)\\

1482: &&\hspace{0.5cm}

1483: \sum_{l_{k}}

1484: P\left(L_{k}=l_{k}|\overline{L}_{k-1}=\overline{l}_{k-1},

1485: \overline{A}_{k-1}=\overline{\GG}\left(\overline{l}_{k-1}\right),

1486: T>\tau_{k}\right)

1487: s_{\overline{l}_{k},\left(\overline{\GG}_{k-1}\left(\overline{l}_{k-1}\right),

1488: \overline{0}\right)}\left(t\right)\\

1489: &&\hspace{1cm}= s_{\overline{l}_{k-1},\left(\overline{\GG}_{k-1}\left(\overline{l}_{k-1}

1490: \right), \overline{0}\right)}\left(t\right)\\

1491: &&\hspace{1cm}= s_{\overline{l}_{k-1},\left(\overline{\GG}_{k-2}\left(\overline{l}_{k-2}

1492: \right),\overline{0}\right)}\left(t\right),

1493: \end{eqnarray*}

1494: where we use the definition of $s$ in the first equality, and the

1495: assumption that $\gamma_{\overline{l}_{k-1},\overline{a}_{k-1}}$,

1496: for  $\overline{a}_{k-1}=\overline{\GG}_{k-1}(\overline{l}_{k-1})$, is the

1497: identity function in the second.

1498: \Endproof

1499:

1500: \begin{lem}\label{alem2}

1501: Suppose that Assumptions~\ref{ass} (no unmeasured confounding),

1502: \ref{cons} (consistency) and~\ref{base} (admissible baseline treatment

1503: regime) hold. If the survival curves

1504: $P\left(T^\GG>t\right)$ are the same for all evaluable treatment regimes

1505: $\GG$, then the shift function

1506: $\gamma_{\overline{l}_k,\overline{a}_k}$ is the identity for all

1507: $k$, $\overline{l}_k\in\overline{{\cal L}}_k$ and

1508: $\overline{a}_k\in\overline{A}_k$ with

1509: $P\left(\overline{L}_k=\overline{l}_k,\overline{A}_k=\overline{a}_k,

1510: T>\tau_k\right)>0$.

1511: \end{lem}

1512:

1513: \proof

1514: Let fixed $\overline{l}_k$, $\overline{a}_k$ with

1515: $P\left(\overline{L}_k=\overline{l}_k,\overline{A}_k=\overline{a}_k,

1516: T>\tau_k\right)>0$ be given. To prove that

1517: $\gamma_{\overline{l}_k,\overline{a}_k}$ is the identity we need to

1518: show that, for all $t>\tau_k$,

1519: \begin{equation}\label{akweg}

1520: s_{\overline{l}_k,\left(\overline{a}_k,\overline{0}\right)}(t)=

1521: s_{\overline{l}_k,\left(\overline{a}_{k-1},\overline{0}\right)}(t).

1522: \end{equation}

1523: Define a treatment regime $\GG^1$ by the coordinate functions

1524: $\GG_m^1\bigl(\overline{\tilde{l}}_m\bigr)=a_m$ if

1525: $\overline{\tilde{l}}_m$ is the initial part of $\overline{l}_k$,

1526: and by $\GG_m^1\bigl(\overline{\tilde{l}}_m\bigr)=0$ otherwise. Define

1527: a second treatment regime $\GG^2$ by

1528: and $\GG^2=\bigl(\overline{\GG^1}_{k-1},\overline{0}\bigr)$.

1529: Because of Assumption~\ref{base} and because

1530: $P\left(\overline{L}_k=\overline{l}_k,\overline{A}_k=\overline{a}_k,

1531: T>\tau_k\right)>0$, the treatment

1532: regimes $\GG^1$ and $\GG^2$ are evaluable.   Thus, by assumption,

1533: we have that $P\left(T^{\GG_1}>t\right)=P\left(T^{\GG_2}>t\right)$, and these

1534: probabilities are given by the G--computation formula,

1535: given in Theorem~\ref{Gcomp}. For the first regime this formula

1536: can be written in the form

1537: \begin{eqnarray*}

1538: \lefteqn{P\left(T^{\GG_1}>t\right)}\\

1539: &=&

1540: \sum_{\tilde{l}_0}\cdots

1541: \sum_{\tilde{l}_k} 1_{\overline{\tilde{l}}_k\neq\overline{l}_k}

1542: \prod_{m=0}^k\Big\{ P\bigl(T>\tau_m|\overline{L}_{m-1}=

1543: \overline{\tilde{l}}_{m-1},

1544: \overline{A}_{m-1}=\overline{\GG^1}\bigl(\overline{\tilde{l}}_{m-1}\bigr),

1545: T>\tau_{m-1}\bigr)\\

1546: &&\hspace{1.5cm}

1547: P\bigl(L_m=\tilde{l}_m| \overline{L}_{m-1}=\overline{\tilde{l}}_{m-1},

1548: \overline{A}_{m-1}=\overline{\GG^1}\bigl(\overline{\tilde{l}}_{m-1}\bigr),

1549: T>\tau_{m}\bigr)

1550: \Big\}s_{\overline{\tilde{l}}_k,\GG^1}(t)\\

1551: &&+\bigg[\prod_{m=0}^k

1552: \Big\{P\bigl(T>\tau_m|\overline{L}_{m-1}=\overline{l}_{m-1},

1553: \overline{A}_{m-1}=\overline{\GG^1}\left(\overline{l}_{m-1}\right),

1554: T>\tau_{m-1}\bigr)\\

1555: &&\hspace{1.5cm}

1556: P\bigl(L_m=l_m| \overline{L}_{m-1}=\overline{l}_{m-1},

1557: \overline{A}_{m-1}=\overline{\GG^1}\bigl(\overline{l}_{m-1}\bigr),

1558: T>\tau_{m}\bigr)\Big\}\bigg] s_{\overline{l}_k,\GG^1}(t).

1559: \end{eqnarray*}

1560: A similar expression holds for the treatment regime $\GG^2$.  Because

1561: the regimes $\GG^1$ and $\GG^2$ are constructed to be the same up to

1562: time $\tau_{k-1}$, only the second terms of the summs differs between these

1563: two expressions. Even there, the product preceding

1564: $s_{\overline{l}_k,\GG^1}(t)$ and $s_{\overline{l}_k,\GG^2}(t)$ is the

1565: same for $\GG^1$ and $\GG^2$. Moreover, this factor is strictly

1566: positive, since

1567: $P\left(\overline{L}_k=\overline{l}_k,\overline{A}_k=\overline{a}_k,

1568: T>\tau_k\right)>0$ by assumption. The equality of

1569: $P\left(T^{\GG_1}>t\right)$ and $P\left(T^{\GG_2}>t\right)$ therefore

1570: implies the equality of $s_{\overline{l}_k,\GG^1}(t)$ and

1571: $s_{\overline{l}_k,\GG^2}(t)$. By construction of $\GG^1$ and $\GG^2$,

1572: equation (\ref{akweg}) and hence Lemma~\ref{alem2} follow.

1573: \Endproof

1574:

1575:

1576: \section{Mimicking counterfactual outcomes} \label{appTog}

1577: For $t>0$ define $p(t)$ by $\tau_{p(t)}<t\le \tau_{p(t)+1}$,

1578: i.e.\ $\tau_{p(t)}$ is the last clinic visit time strictly before $t$.

1579: For $k\geq 0$ with $k\le p(T)$ we define a random variable by

1580: \begin{equation*}

1581: T_k^\gamma=\gamma_{\overline{L}_k,\overline{A}_k}\circ

1582: \cdots\circ\gamma_{\overline{L}_p(T),\overline{A}_{p(T)}}(T).

1583: \end{equation*}

1584: For $k>p(T)$ we interprete the (empty) composition of transformations

1585: on the right as the identity and define $T_k^\gamma=T$.

1586:

1587: In this appendix we prove the following theorem, which generalizes the

1588: first part of Theorem~\ref{Tog}. This theorem implies the second part,

1589: since $T_0^\gamma$ is a function of

1590: $\bigl(\overline{L}_{k-1},\overline{A}_{k-1},T_k^\gamma\bigr)$.

1591:

1592: \begin{thm}

1593: For $t>\tau_k$ and every $\overline l_k$, $\overline a_k$ with

1594: $P\left(\overline{L}_k=\overline{l}_k,\overline{A}_k=

1595: \overline{a}_k,T>\tau_k\right)>0$,

1596: \begin{eqnarray*}

1597: P\left(T^\gamma_k>t|\overline{L}_k=\overline{l}_k,\overline{A}_k=

1598: \overline{a}_k,T>\tau_k\right)

1599: &=&P\left(T^\gamma_k>t|\overline{L}_k=\overline{l}_k,

1600: \overline{A}_{k-1}=\overline{a}_{k-1},T>\tau_k\right)\\

1601: &=&s_{\overline{l}_k,\left(\overline{a}_{k-1},\overline{0}\right)}(t).

1602: \end{eqnarray*}

1603: \end{thm}

1604:

1605: \proof

1606: We use backward induction on $k$, starting with $k=K$ and ending with $k=0$.

1607: For $k=K$,

1608: \begin{eqnarray*}

1609: P\bigl(T_K^\gamma>t|\overline{L}_K=\overline{l}_K,

1610: \overline{A}_K=\overline{a}_K,T>\tau_K\bigr)&=&

1611: P\bigl(\gamma_{\overline{l}_K,\overline{a}_K}\left(T\right)>t|

1612: \overline{L}_K=\overline{l}_K,\overline{A}_K=\overline{a}_K,T>\tau_K\bigr)\\

1613: &=&P\bigl(T>\gamma^{-1}_{\overline{l}_K,\overline{a}_K}(t)|

1614: \overline{L}_K=\overline{l}_K,\overline{A}_K=\overline{a}_K,T>\tau_K\bigr)\\

1615: &=&s_{\overline{l}_K,\left(\overline{a}_K,\overline{0}\right)}

1616: \bigl(\gamma^{-1}_{\overline{l}_K,\overline{a}_K}(t)\bigr)\\

1617: &=&s_{\overline{l}_K,\left(\overline{a}_{K-1},\overline{0}\right)}(t).

1618: \end{eqnarray*}

1619: Here the first equality is immediate from the definition of

1620: $T_K^\gamma$, the second follows by the strict monotonicity of the

1621: functions $\gamma$, the third by definition of $s$ and the last by

1622: definition of $\gamma$.

1623:

1624: Induction step: we show that if the theorem is true for $k+1$, then it

1625: is also true for $k$. Just as for $k=K$,

1626: \begin{equation*}

1627: P\bigl(T_k^\gamma>t|\overline{L}_k=\overline{l}_k,

1628: \overline{A}_k=\overline{a}_k,T>\tau_k\bigr)

1629: =P\bigl(T_{k+1}^\gamma>\gamma^{-1}_{\overline{l}_k,\overline{a}_k}(t)|

1630: \overline{L}_k=\overline{l}_k,\overline{A}_k=\overline{a}_k,T>\tau_k\bigr).

1631: \end{equation*}

1632: Now we distinguish two possibilities:

1633: $\gamma^{-1}_{\overline{l}_k,\overline{a}_k}(t)\leq\tau_{k+1}$ and

1634: $\gamma^{-1}_{\overline{l}_k,\overline{a}_k}(t)>\tau_{k+1}$. In the

1635: first case, the right side of the preceding display is equal to

1636: \begin{eqnarray*}

1637: &&P\bigl(T>\gamma^{-1}_{\overline{l}_k,\overline{a}_k}(t)|

1638: \overline{L}_k=\overline{l}_k,\overline{A}_k=\overline{a}_k,T>\tau_k\bigr)\\

1639: &&\hspace{1cm}=s_{\overline{l}_k,\left(\overline{a}_k,\overline{0}\right)}

1640: \bigl(\gamma^{-1}_{\overline{l}_k,\overline{a}_k}(t)\bigr)\\

1641: &&\hspace{1cm}=s_{\overline{l}_k,\left(\overline{a}_{k-1},\overline{0}\right)}(t),

1642: \end{eqnarray*}

1643: where the first equality holds because for

1644: $s\in\left(\tau_k,\tau_{k+1}\right]$ we have that

1645: $\left\{T_{k+1}^\gamma>s\right\}=\left\{T>s\right\}$ by the construction of

1646: $T_{k+1}^\gamma$, and the last equality holds by the definition of $\gamma$.

1647: In the second possibility, i.e.\ if

1648: $\gamma^{-1}_{\overline{l}_k,\overline{a}_k}(t)>\tau_{k+1}$,

1649: \begin{eqnarray*}

1650: \lefteqn{P\bigl(T_{k+1}^\gamma>\gamma^{-1}_{\overline{l}_k,\overline{a}_k}(t)|

1651: \overline{L}_k=\overline{l}_k,\overline{A}_k=\overline{a}_k,T>\tau_k\bigr)}\\

1652: &=&P\bigl(T_{k+1}^\gamma>\tau_{k+1}|\overline{L}_k=\overline{l}_k,

1653: \overline{A}_k=\overline{a}_k,T>\tau_k\bigr)\\

1654: &&\hspace{0.5cm} P\bigl(T_{k+1}^\gamma>\gamma^{-1}_{\overline{l}_k,

1655: \overline{a}_k}(t)|\overline{L}_k=\overline{l}_k,

1656: \overline{A}_k=\overline{a}_k,T>\tau_k,T_{k+1}^\gamma>\tau_{k+1}\bigr)\\

1657: &=&P\left(T>\tau_{k+1}|\overline{L}_k=\overline{l}_k,

1658: \overline{A}_k=\overline{a}_k,T>\tau_k\right)\\

1659: &&\hspace{0.5cm} \sum_{l_{k+1}}\Big\{P\left(L_{k=1}=l_{k+1}|

1660: \overline{L}_{k}=\overline{l}_{k},\overline{A}_k=\overline{a}_k,

1661: T>\tau_{k+1}\right)\\

1662: &&\hspace{0.5cm} \hspace{0.95cm}

1663: P\bigl(T_{k+1}^\gamma>\gamma^{-1}_{\overline{l}_k,\overline{a}_k}(t)|

1664: \overline{L}_{k+1}=\overline{l}_{k+1},\overline{A}_k=\overline{a}_k,

1665: T>\tau_{k+1}\bigr)\Big\}\\

1666: &=&P\left(T>\tau_{k+1}|\overline{L}_k=\overline{l}_k,

1667: \overline{A}_k=\overline{a}_k,T>\tau_k\right)\\

1668: &&\hspace{0.5cm} \sum_{l_{k+1}}\Big\{

1669: P\left(L_{k=1}=l_{k+1}|\overline{L}_{k}=\overline{l}_{k},

1670: \overline{A}_k=\overline{a}_k,T>\tau_{k+1}\right)

1671: s_{\overline{l}_{k+1},\left(\overline{a}_k,\overline{0}\right)}

1672: \bigl(\gamma^{-1}_{\overline{l}_k,\overline{a}_k}(t)\bigr)\Big\}\\

1673: &=&P\left(T>\tau_{k+1}|\overline{L}_k=\overline{l}_k,

1674: \overline{A}_k=\overline{a}_k,T>\tau_k\right)\\

1675: &&\hspace{0.5cm} \sum_{l_{k+1}}\Big\{

1676: P\left(L_{k=1}=l_{k+1}|\overline{L}_{k}=\overline{l}_{k},

1677: \overline{A}_k=\overline{a}_k,T>\tau_{k+1}\right)

1678: s_{\overline{l}_{k+1},\left(\overline{a}_{k-1},\overline{0}\right)}

1679: \left(t\right)\Big\}\\

1680: &=&s_{\overline{l}_{k},\left(\overline{a}_{k-1},\overline{0}\right)}(t),

1681: \end{eqnarray*}

1682: where in the first step we condition on $T_{k+1}^\gamma>\tau_{k+1}$,

1683: in the second we use that

1684: $\left\{T_{k+1}^\gamma>\tau_{k+1}\right\}=\left\{T>\tau_{k+1}\right\}$

1685: and we condition on $L_{k+1}$, the fourth is the induction step, the

1686: fifth follows from the definition of $\gamma$ and the last from

1687: the definition of

1688: $s_{\overline{l}_k,\left(\overline{a}_{k-1},\overline{0}\right)}$.

1689: \Endproof

1690:

1691:

1692: \end{appendix}

1693:

1694: \bigskip\noindent

1695: {\bf Acknowledgement.} This paper is based on an earlier manuscript

1696: by the first author.

1697:

1698: %\addcontentsline{toc}{chapter}{Bibliography}

1699: %\bibliographystyle{harry}

1700: %\bibliography{ref}

1701:

1702: \begin{thebibliography}{13}

1703: \expandafter\ifx\csname natexlab\endcsname\relax\def\natexlab#1{#1}\fi

1704:

1705: \bibitem[{Dawid(1979)}]{Dawid}

1706: Dawid, A.~P. (1979).

1707: \newblock {Conditional independence in statistical theory (with discussion)}.

1708: \newblock {\em Journal of the Royal Statistical Society\/} {\bf B 41}, 1--31.

1709:

1710: \bibitem[{Gill and Robins(2001)}]{RJ}

1711: Gill, R.~D. and Robins, J.~M. (2001).

1712: \newblock {Causal inference for complex longitudinal data: the continuous

1713:   case}.

1714: \newblock {\em Annals of Statistics\/} {\bf 29}(6), 1785--1811.

1715:

1716: \bibitem[{Lok(2001)}]{Lok}

1717: Lok, J.~J. (2001).

1718: \newblock {\em {Statistical modelling of causal effects in time}\/}.

1719: \newblock Ph.D. thesis, Division of Mathematics and Computer Science,

1720: Vrije Universiteit Amsterdam.

1721:

1722: \bibitem[{Robins(1986)}]{R86}

1723: Robins, J.~M. (1986).

1724: \newblock {A new approach to causal inference in mortality studies with a

1725:   sustained exposure period -- Applications to control of the healthy worker

1726:   survivor effect}.

1727: \newblock {\em Mathematical Modelling\/} {\bf 7}, 1393--1512.

1728:

1729: \bibitem[{Robins(1987{\natexlab{a}})}]{R87a}

1730: Robins, J.~M. (1987{\natexlab{a}}).

1731: \newblock {A graphical approach to the identification and estimation of causal

1732:   parameters in mortality studies with sustained exposure periods}.

1733: \newblock {\em Journal of Chronic Disease\/} {\bf 40}(Suppl. 2), 139S--161S.

1734:

1735: \bibitem[{Robins(1987{\natexlab{b}})}]{R87b}

1736: Robins, J.~M. (1987{\natexlab{b}}).

1737: \newblock {Addendum to ``A new approach to causal inference in mortality

1738:   studies with a sustained exposure period -- Application to control of the

1739:   healthy worker survivor effect}.

1740: \newblock {\em Computers and Mathematics with Applications\/} {\bf 14},

1741:   923--945.

1742:

1743: \bibitem[{Robins(1988{\natexlab{a}})}]{R89}

1744: Robins, J.~M. (1988{\natexlab{a}}).

1745: \newblock {The analysis of randomized and nonrandomized AIDS treatment trials

1746:   using a new approach to causal inference in longitudinal studies}.

1747: \newblock In {\em Health service research methodology: a focus on AIDS\/}, pp.

1748:   113--159. NCHSR, U.S. Publc Health Service, Washington.

1749:

1750: \bibitem[{Robins(1988{\natexlab{b}})}]{R88b}

1751: Robins, J.~M. (1988{\natexlab{b}}).

1752: \newblock {The control of confounding by intermediate variables}.

1753: \newblock {\em Statistics in Medicine\/} {\bf 8}, 679--701.

1754:

1755: \bibitem[{Robins(1992)}]{R92}

1756: Robins, J.~M. (1992).

1757: \newblock {Estimation of the time-dependent accelerated failure time model in

1758:   the presence of confounding factors}.

1759: \newblock {\em Biometrika\/} {\bf 78}, 321--334.

1760:

1761: \bibitem[{Robins(1993)}]{R93}

1762: Robins, J.~M. (1993).

1763: \newblock {Analysis methods for HIV treatment and cofactor effects}.

1764: \newblock In {D.G. Ostrow and R. Kessler}, ed., {\em Methodological issues of

1765:   AIDS behavioral research\/}, pp. 113--159. Plenum Press, New York.

1766:

1767: \bibitem[{Robins(1998)}]{R98}

1768: Robins, J.~M. (1998).

1769: \newblock {Structural nested failure time models}.

1770: \newblock In {P.K. Andersen and N. Keiding}, ed., {\em Survival Analysis\/},

1771:  volume~6 of {\em Encyclopedia of Biostatistics\/}, pp. 4372--4389.

1772: John Wiley and Sons, New York.

1773:

1774: \bibitem[{Robins et~al.(1992)Robins, Blevins, Ritter and Wulfsohn}]{Aids}

1775: Robins, J.~M., Blevins, J.~M., Ritter, G. and Wulfsohn, M. (1992).

1776: \newblock {G-estimation of the effect of prophylaxis therapy for pneumocystis

1777:   carinii pneumonia on the survival of AIDS patients}.

1778: \newblock {\em Epidemiology\/} {\bf 3}, 319--336.

1779:

1780: \bibitem[{Rubin(1978)}]{Rubin}

1781: Rubin, D.~B. (1978).

1782: \newblock {Bayesian inference for causal effects: the role of randomization}.

1783: \newblock {\em Annals of Statistics\/} {\bf 6}, 34--58.

1784:

1785: \bibitem[{Vandervaart(1998)}]{Vaart}

1786: Van der Vaart, A.W. (1998).

1787: \newblock {\em Asymptotic Statistics}.

1788: \newblock {Cambridge University Press}.

1789:

1790: \end{thebibliography}

1791:

1792: \bigskip\noindent

1793: Corresponding author:\\

1794: {\sl Aad van der Vaart}\\

1795: {\sl Department of Mathematics}\\

1796: {\sl Faculty of Sciences}\\

1797: {\sl Vrije Universiteit}\\

1798: {\sl De Boelelaan 1081 a}\\

1799: {\sl 1081 HV Amsterdam}\\

1800: {\sl The Netherlands}\\

1801:

1802:

1803:

1804: \end{document}

1805: Specifically, if $l_k$ and $a_k$

1806: are both discrete for all $k$, the class of G--null tests discussed in

1807: \citet{R86,R87a,R89} are asymptotically distribution-free tests of the

1808: null hypothesis $\psi_0=0$. The class of G--tests and estimators

1809: introduced in \citet{R89} are, respectively, asymptotically

1810: distribution-free tests of the null hypothesis $\psi_0=0$ and

1811: $n^{1/2}$-consistent semiparametric estimators of $\psi_0$.

1812:

1813:

1814:

1815: The following corollary is obvious.

1816:

1817: \begin{cor} Suppose that Assumptions~\ref{ass} (no

1818: unmeasured confounding) and~\ref{cons} (consistency) hold. Then the

1819: survival curves $P\left(T^{\GG}>t\right)$ are the same for all

1820: evaluable treatment regimes $\GG$ if and only if the functions

1821: $s_{\GG}$ do not depend on $\GG$.

1822: \end{cor}

1823: