0403:physics0403148/cp.tex

1: % abstract1.tex

2: \documentclass[12pt]{article}

3: \usepackage{Input/spie,graphicx,xspace,amsmath,amssymb}

4: \graphicspath{{figures_changepoint/}}

5: \textheight 8.74in

6: \textwidth 6.74in

7: %--------------------------- MACROS -------------------------------

8: \input{Input/alphabet}

9: \input{Input/abrege}

10: \input{Input/abrmath}

11: \input{Input/beginend}

12: %--------------------------- MACROS -------------------------------

13: \def\bm#1{\mbox{\boldmath #1}}

14: \def\d#1{\,\hbox{d}#1}

15: \def\bs{\bar{s}}

16: \def\us{\underline{s}}

17: \def\fmin{f_{\mbox{min}}}

18: \def\fmax{f_{\mbox{max}}}

19: \def\expf#1{\mbox{exp}\left\{#1\right\}}

20: \def\argmin#1#2{\mbox{arg}\min_{#1}\left\{#2\right\}}

21: \def\argmax#1#2{\mbox{arg}\max_{#1}\left\{#2\right\}}

22:

23: \def\lra{\longrightarrow}

24: \def\fh{\widehat{f}}

25: \def\fbh{\widehat{\fb}}

26: \def\Rbh{\widehat{\Rb}}

27: \def\Sigmabh{\widehat{\Sigmab}}

28:

29: \def\rbh{\widehat{\rb}}

30: \def\qbh{\widehat{\qb}}

31: \def\xbh{\widehat{\xb}}

32: \def\tbh{\widehat{\tb}}

33: \def\thetabh{\widehat{\thetab}}

34:

35: \def\disp{\displaystyle}

36: \def\vsm{\vspace*{-12pt}}

37: \def\hsm{\hspace*{-3em}}

38:

39: \def\pmata#1#2{\left(\barr{c} #1 \\ #2 \earr\right)}

40: \def\pmatb#1#2#3#4{\left(\barr{cc} #1 & #2 \\ #3 & #4 \earr\right)}

41:

42: \def\th{\widehat{t}}

43: \def\xh{\widehat{x}}

44: \def\lambdah{\widehat{\lambda}}

45: \def\muh{\widehat{\mu}}

46: \def\sigmah{\widehat{\sigma}}

47: \def\rhoh{\widehat{\rho}}

48: \def\betah{\widehat{\beta}}

49: \def\alphah{\widehat{\alpha}}

50: \def\tbh{\widehat{\tb}}

51: \def\mubh{\widehat{\mub}}

52: \def\sigmabh{\widehat{\sigmab}}

53: \def\rhobh{\widehat{\rhob}}

54: \def\oneb{\mbox{\bf 1}}

55: %----------- Le document  -------------------------------------------

56:

57: \title{A Bayesian approach to change point analysis of discrete time series}

58:

59: \author{Ali Mohammad-Djafari and Olivier F\'eron\\[.4cm]

60:   Laboratoire des Signaux et Syst\`emes,\\

61:   Unit\'e mixte de recherche 8506 (CNRS-Sup\'elec-UPS) \\

62:   Sup\'elec, Plateau de Moulon, 91192 Gif-sur-Yvette, France\\

63:   emails = {djafari,feron@lss.supelec.fr}

64: }

65: \authorinfo{\noindent \hspace*{-0.5cm}${}^\ast$Correspondence:~E-mail:

66: djafari@lss.supelec.fr}

67: \date{}

68:

69: \bdoc

70: \maketitle

71:

72: \begin{abstract}

73: In this work we consider time series with a finite number of discrete point changes. We assume that the data in each segment follows a different probability density functions (pdf).

74: We focus on the case where the data in all segments are modeled by Gaussian probability density functions with different means, variances and correlation lengths.

75: We put a prior law  on the change point instances (Poisson process) as well as on these different parameters(conjugate priors) and give the expression of the posterior probality distributions of these change points. The computations are done by using an appropriate Markov Chain Monte Carlo (MCMC) technique.

76:

77: The problem as we stated can also be considered as an unsupervised classification and/or segmentation of the time serie.

78: This analogy gives us the possibility to propose alternative modeling and computation of change points, which are more appropriate for multivariate signals, for example in image processing.

79: \\ ~\\

80: {\bf key words:}~ Bayesian change-points estimation, classification and segmentation.

81: \end{abstract}

82:

83:

84: \section{Introduction}

85:

86: Figure 1 shows typical change point problems we consider in this work.

87: Note that, very often people consider problems in which there is only one change point \cite{Basseville88}. Here we propose to consider more general problems with any number of change points. However, very often the change point analysis problems need online or real time detection algorithms \cite{Wax91,Kormylo82,Chi85,Goutsias88}, while here, we focus only on off line methods where we assume that we have gathered all the data and we want to analyse it to detect change points who have been occured during the observation time.

88: Also, even if we consider here change point estimation of 1-D time series, we can extend the proposed method to multivariate data, for example the images where the change point problems become equivalent to segmentation.

89: One more point to position this work is that, very often the models used in change point problems assume to know perfectly the model of the signal in each segment, \ie a linear or nonlinear regression model \cite{Goutsias88,Oliver96,Hughes99,Fitzgibbon00,Fitzgibbon02}, while here, we use a probabilistic model for the signals in each segment which gives probably more generality and applicability when we do not know perfectly those models.

90:

91:

92: \bfig[hbt]

93: \bcc

94: \includegraphics[width=150mm,height=75mm]{cp1}

95: \ecc

96: \caption[Change point problems description.]{Change point problems description:

97: In the first row, only mean values of the different segments are different. In the second row, only variances are changed. In the third row only the correlation strengths are changed. In the fifth row, the whole nature shape of their probability distribution have been changed. The last row show the change points $t_n$.}

98: \label{fig1}

99: \efig

100:

101: More specifically, we model the time series by a hierarchical Gauss-Markov modeling with hidden varaibles which are themselves modeled by a Markov model. Though, in each segment which corresponds to a particular value of the hidden variable, the time series is assumed to be modeled by a stationnary Gauss-Markov model. However, we choosed a simple parametric model defined only with three parameters of mean $\mu$, variance $\sigma^2=1/\tau$ and a parameter $\rho$ measuring the local

102: correlation strength of the neighboring samples.

103:

104: The choice of the hidden variable is also important. We have studied three different modeling: i) change point time instants $t_n$, ii) classification labels $z_n$

105: or iii) a Bernouilli variable $q_n$ which is always equal to zero except when a change point occurs.

106:

107: The rest of the paper is organized as follows: In the next section we introduce the notations and fixe the objectives of the paper. In section 3 we consider the model with explicite change point times as the hidden variables and propose particular modeling for them and an MCMC algorithm to compute their \apost probabilities. In sections 4 and 5 we consider the two other aformentionned models. Finally, we show some simulation results and present our conclusions and perspectives.

108:

109:

110: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

111:

112: \newpage

113: \section{Notations and modeling}

114: We note by $\xb=[x(t_0), \cdots, x(t_0+T)]'$ the vector containing the data observed from time $t_0$ to $t_0+T$. We note by

115: $\tb=[t_1,\cdots,t_N]'$ the unknown change points and note

116: $\xb=[\xb_0, \xb_1, \cdots , \xb_N]'$

117: where $\xb_n=[x(t_n), x(t_n+1), \cdots, x(t_{n+1})]', \quad n=0,\cdots,N$

118: represent the data samples in each segment. In the following we will have $t_{N+1}=T$.

119:

120: We model the data

121: $\xb_n=[x(t_n), x(t_n+1), \cdots, x(t_{n+1})]', \quad n=0,\cdots,N$

122: in each segment by a Gauss-Markov chain:

123:

124: \beqn

125: p(x(t_n))&=&\Nc(\mu_n, \sigma_n^2) \nonumber \\

126: p(x(t_n+l)|x(t_n+l-1))&=&\Nc(\rho_n \, x(t_n+l-1)+(1-\rho_n)\mu_n, \sigma_n^2(1-\rho_n^2)),

127: \quad l=1,\cdots,l_n -1 \nonumber \\

128: \mbox{with~~~}&&

129: l_n=t_{n+1}-t_n+1=\dim{[\xb_n]}

130: \label{eq1}

131: \eeqn

132: Then we have

133: \beqn

134: p(\xb_n) &=&p(x(t_n)) \prod_{l=1}^{l_n} p(x(t_n+l)|x(t_n+l-1)) \nonumber \\

135: p(\xb_n) &\propto& \expf{-\frac{1}{2\sigma_n^2} (x(t_n)-\mu_n)^2} \nonumber\\

136:          &&\expf{-\frac{1}{2(\sigma_n^2(1-\rho_n^2))} \sum_{l=1}^{l_n}

137: [x(t_n+l)- \rho_n x(t_n+l-1)-(1-\rho_n)\mu_n]^2} \nonumber\\

138: p(\xb_n)&=&\Nc(\mu_n\oneb, \Sigma_n)

139: \mbox{~~with~~} \Sigmab_n=\sigma_n^2 \,

140: \mbox{Toeplitz}([1, \rho_n, \rho_n^2,\cdots, \rho_n^{l_n}])

141: \label{eq2}

142: \eeqn

143:

144: Noting by $\tb=[t_1,\cdots,t_N]$ the vector of the change points and

145: assuming that the samples from any two segments are independent, we can write:

146: \beq \label{eq3}

147: p(\xb|\tb,\thetab, N)=\prod_{n=0}^N \Nc(\mu_n\oneb, \Sigmab_n)

148: =\pth{\prod_{n=0}^N \frac{|\Sigmab_n|^{-1/2}}{(2\pi)^{(l_n/2)}}}

149: \expf{-\frac{1}{2} \sum_{n=0}^N  (\xb_n-\mu_n\oneb)'\Sigmab_n^{-1}(\xb_n-\mu_n\oneb)}

150: \eeq

151: where we noted

152: $\thetab=\acc{\mu_n,\sigma_n,\rho_n,\; n=0,\cdots,N}$.

153:

154: Note that

155: \beq \label{eq4}

156: -\ln p(\xb|\tb,\thetab,N)

157: =\sum_{n=0}^N (l_n/2) \ln (2\pi)

158: +\frac{1}{2} \sum_{n=0}^N \ln {|\Sigmab_n|}

159: -\frac{1}{2} \sum_{n=0}^N  (\xb_n-\mu_n\oneb)'\Sigmab_n^{-1}(\xb_n-\mu_n\oneb)

160: \eeq

161: and when the data are \iid, ($\Sigmab_n=\sigma_n\Ib$) this becomes

162: \beq \label{eq5}

163: -\ln p(\xb|\tb,\thetab,N)

164: =(T/2)\ln (2\pi)

165: +\sum_{n=0}^N (l_n/2) \ln {\sigma_n^2}

166: - \sum_{n=0}^N  \frac{\|(\xb_n-\mu_n\oneb)\|^2}{2\sigma_n^2}

167: \eeq

168:

169: Then, the inference problems we will be faced are the following:

170: \ben

171: \item Infer on $\thetab$ given $\xb$ and $\tb$;

172: \item Infer on $\tb$ given  $\xb$ and $\thetab$;

173: \item Infer on $\tb$ and $\thetab$ given $\xb$;

174: \item Infer on $\thetab$ given $\xb$.

175: \item Infer on $\tb$ given  $\xb$;

176: \een

177: It is clear that the first problem is the easiest.

178:

179: The classical maximum likelihood estimation (MLE) approach can handle only the

180: first three problems by maximizing $p(\xb|\tb,\thetab)$, respectively, with respect to $\thetab$, to $\tb$ and jointly $(\tbh,\thetabh)$:

181:

182: \bit

183: \item Estimating $\thetab$ given $\xb$ and $\tb$: \quad

184: \(

185: \thetabh=\argmax{\thetab}{p(\xb|\tb,\thetab)}

186: \)

187:

188: \item Estimating $\tb$ given  $\xb$ and $\thetab$: \quad

189: \(

190: \tbh=\argmax{\tb}{p(\xb|\tb,\thetab)}

191: \)

192:

193: \item Estimating $\tb$ and $\thetab$ given $\xb$: \quad

194: \(

195: (\tbh,\thetabh)=\argmax{(\tb,\thetab)}{p(\xb|\tb,\thetab)}

196: \)

197: \eit

198: However, we must be careful to check the boundedness of the likelihood function before using any optimization algorithm.

199: The optimization with respect to $\thetab$ when $\tb$ is known can be done easily, but the optimization with respect to $\tb$ is very hard and computationally costly.

200:

201: The two last problems cannot be handled easily because they need to define the likelihood fuctions $p(\xb|\thetab)$ and $p(\xb|\tb)$ which need integrations with respect to $\tb$ or $\thetab$ of $p(\xb|\tb,\thetab)$. There may not be possible to find analytical expressions for these integrals which may even not exist.

202:

203:

204:

205:

206: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

207:

208:

209:

210:

211: \section{Bayesian estimation of the change point time instants}

212:

213: In Bayesian approach, one assigns prior probability laws on both $\tb$ and $\thetab$ and use the posterior probability law $p(\tb,\thetab|\xb)$ as a tool for doing any inference. Choosing a prior pdf for $\tb$ is also usual in classical approach. A simple model is the following:

214: \beq \label{eq6}

215: t_n=t_{n-1}+\epsilon_n \quad \mbox{with}\quad \epsilon_n\sim\Pc(\lambda),

216: \eeq

217: %\beq \label{eq6}

218: %t_n=t_{n-1}+\epsilon \quad \mbox{with}\quad \epsilon\sim\Ec(\lambda)

219: %\eeq

220:

221: \noindent

222: where $\varepsilon_n$ are assumed iid end $\lambda$ is the \aprio mean value of time intervals $(t_n-t_{n-1})$. if $N$ is the number of changepoint we can take $\lambda=\frac{T}{N+1}$. With this modeling we have :

223: \rem{

224: can either be a constant or equal to the mean value of the past intervals:

225: $\lambda_n=\frac{1}{n-1}\sum_{k=1}^n (t_n-t_{n-1}), \quad n>2$

226: }

227: \beq \label{eq7}

228: \barr{l}

229: p(\tb|\lambda)=\prod_{n=1}^{N+1} \Pc(t_n-t_{n-1}|\lambda)=

230: \prod_{n=1}^{N+1} e^{-\lambda} \frac{\lambda^{(t_n-t_{n-1})}}{(t_n-t_{n-1})!}

231: \\

232: \ln p(\tb|\lambda)=

233:  -(N+1)\lambda + \ln(\lambda)\sum_{n=1}^{N+1} (t_n-t_{n-1})-\sum_{n=1}^{N+1} \ln((t_n-t_{n-1})!)%

234: \earr

235: \eeq

236:

237: %\beq \label{eq7}

238: %\barr{l}

239: %p(\tb)=p(\tb|\lambda)=\prod_{n=1}^N \Ec(t_n-t_{n-1}|\lambda)=

240: %\prod_{n=1}^N \lambda e^{-\lambda(t_n-t_{n-1})}

241: %\\

242: %\ln p(\tb)=

243: % N \ln(\lambda)- \lambda\sum_{n=1}^N(t_n-t_{n-1})

244: %\earr

245: %\eeq

246:

247:

248: With this prior selection, we have

249: %\beq

250: %p(\xb,\tb|\thetab,\lambda)=p(\xb|\tb,\thetab) \, p(\tb|\lambda)

251: %\eeq

252: \beq

253: p(\xb,\tb|\thetab,N)=p(\xb|\tb,\thetab,N) \, p(\tb|\lambda,N)

254: \eeq

255: and

256: %\beq \label{eq9}

257: %p(\tb|\xb,\thetab,\lambda)\propto p(\xb|\tb,\thetab) \, p(\tb|\lambda)

258: %\eeq

259: \beq \label{eq9}

260: p(\tb|\xb,\thetab,N)\propto p(\xb|\tb,\thetab,N) \, p(\tb|\lambda,N)

261: \eeq

262:

263: In Bayesian approach, one goes one step further with assigning prior probability laws to the hyperparameters $\thetab$, %and $\lambda$,

264: \ie $p(\thetab)$ %and $p(\lambda)$

265: and then one writes the joint \apost:

266:

267: %\beq

268: %p(\tb,\thetab,\lambda|\xb) \propto p(\xb|\tb,\thetab)\, p(\tb|\lambda) \, %p(\thetab) \, p(\lambda)

269: %\eeq

270: \beq

271: p(\tb,\thetab|\xb,\lambda,N) \propto p(\xb|\tb,\thetab,N)\, p(\tb|\lambda,N) \, p(\thetab|N)

272: \eeq

273: where here we noted

274: $\thetab=\acc{\mu_n,\sigma^2_n,\rho_n,\; n=1,\cdots,N}$.

275:

276: To go further in details, we need to assign $p(\thetab)$.%, $p(\lambda)$ and $p(N)$.

277:  The following is our selection:

278: \beqnx

279: p(\mu_n) &=& \Nc(\mu_0,\sigma_0^2)\\

280: p(\sigma_n^2) &=& {\cal IG}(\alpha_0,\beta_0) \\

281: p(\rho_n) &=& \Uc([0,1]) %\\

282: %p(\lambda) &=& \Ec(\lambda_0)=\lambda_0\expf{-\lambda/\lambda_0}

283: \eeqnx

284: which correspond mainely to the conjugate or reference priors.

285:

286: Given all these, we propose the following Gibbs MCMC algorithm:

287: \[

288: \barr{lllll}

289: \mbox{Iterate until convergency} \\

290: %\mbox{.~~sample~~} \lambda &\mbox{using}& p(\lambda|\xb,\tb,\thetab) =p(\lambda|\tb)\propto p(\tb|\lambda) \, p(\lambda) &

291: %(\mbox{equations~} \ref{eq6} \mbox{~~and~~} \ref{eq7})

292: %\\

293: \mbox{.~~sample~~} \tb &\mbox{using}& p(\tb|\xb,\thetab,N)

294: %(\mbox{equation~} \ref{eq9}

295: \\

296: \mbox{.~~sample~~} \theta_n : \\

297: %\mbox{   -~~sample~~}

298: \qquad \mu_n &\mbox{using}& p(\mu_n|\xb,\tb,N) \\

299: %\mbox{   -~~sample~~}

300: \qquad \sigma_n^2 &\mbox{using}& p(\sigma_n^2|\xb,\tb,N) \\

301: %\mbox{   -~~sample~~}

302: \qquad \rho_n &\mbox{using}& p(\rho_n|\xb,\tb,N) \\

303: \earr

304: \]

305:

306: \subsection{Sampling $\tb$ using $p(\tb|\xb,\thetab,N)$}

307: P. Fearnhead showed \cite{Fearnhead} that it is possible to perform perfect simulation of $p(\tb|\xb,\thetab,N)$ when we have assumed that segments of data separated by a changepoint $t_n$ are independant. This simulation can be obtained by a method based on recursion on the changepoints. An approximation of this method is possible to obtain an algorithm whose computational cost is linear in the number of observations. The main principle of this algorithm is to compute the following probabilities :\\

308: \noindent

309: Let note $\xb_{t:s}=[x(t),x(t+1),\dots,x(s)]$, and

310: \begin{eqnarray*}

311: R(t,s|\lambda) & = & p(\xb_{t:s}|t,s \mbox{ in the same segment},\lambda) \\

312: Q(t|\lambda) & = & p(\xb_{t:s}|\mbox{ changepoint at } t-1,\lambda), \quad Q(1)=p(\xb|\lambda)

313: \end{eqnarray*}

314: \noindent

315: Let also note $F(t|\lambda)$ the associated cumulative distribution function of the prior density $\mathcal P(t_n-t_{n-1}|\lambda)$ which is defined by (7). \\

316: \noindent

317: We compute $R(t,s|\lambda)$ with the following relation :

318: \beq

319: R(t,s)|\lambda)=\int p(\xb_{t:s}|\thetab,\lambda)p(\thetab) d\thetab

320: \nonumber

321: \eeq

322: The computation of $Q(t|\lambda)$ can be done recursively by the following result : for $t=1,\dots,T$,

323: \beq

324: Q(t|\lambda)=\sum^{T-1}_{s=t}R(t,s|\lambda)Q(s+1|\lambda)\mathcal P(s+1-t|\lambda)+R(t,T|\lambda)(1-F(T-t|\lambda)),

325: \nonumber

326: \eeq

327: \noindent

328: This result is shown by P. Fearnhead \cite{Fearnhead} . And he also demonstrates that the posterior distribution of $t_n$ given $t_{n-1}$ is

329: \begin{eqnarray*}

330: p(t_n|t_{n-1},\xb,\lambda)=\frac{R(t_{n-1},t_n|\lambda)Q(t_n+1|\lambda) \mathcal P(t_n-t_{n-1}|\lambda)}{Q(t_{n-1}|\lambda)}

331: \end{eqnarray*}

332: \noindent

333: and the posterior distribution of no further changepoint is given by

334: \beq

335: p(t_n=T|t_{n-1},\xb,\lambda)=\frac{R(t_{n-1},T|\lambda)(1-F(T-t_{n-1}-1|\lambda))}{Q(t_{n-1}|\lambda)}

336: \nonumber

337: \eeq

338:

339:

340: \subsection{Sampling $\theta_n$ using $p(\theta_n|\xb,\tb,N)$}

341: We may note that, thanks to the conjugacy, we have:

342: \beqnx

343: %p(\lambda|\tbh)&=&\Gc(\rhoh,\betah) \mbox{~~with~~}

344: %\rhoh=1+\sum_{n=1}^N (\th_n-\th_{n-1}), \quad

345: %\betah=N+(1/\lambda_0).

346: %\\

347: p(\mu_n|\xb,\tb)&=&\Nc(\muh_n,\sigmah_n^2) \mbox{~~with~~}

348: \left\{\barr{l}

349: \muh_n= \sigmah_n^2 \left[ \frac{\mu_0}{\sigma_0^2}+\oneb'\Sigmab_n^{-1} \xb_n \right]\\

350: \sigmah_n^2= \left( \oneb'\Sigmab_n^{-1}\oneb + \frac{1}{\sigma_0^2} \right)^{-1}

351: \earr\right.

352: \\

353: p(\sigma_n^2|\xb,\tb)&=&{\cal IG}(\alphah_n,\betah_n)  \mbox{~~with~~}

354: \left\{\barr{l}

355: \alphah_n= \alpha_0 + \frac{l_n}{2} \\

356: \betah_n= \beta_0 + \frac{1}{2}(\xb_n-\mu_n\oneb)'\Rb_n^{-1}(\xb_n-\mu_n\oneb),

357: \earr\right.

358: %\\

359: %p(\rho_n|\xb,\tb)&=&\delta(\rho_n-\rhoh_n),  \mbox{~~with~~}

360: %\rhoh_n=\frac{1}{l_n-1} \sum_{j=2}^{l_n} (\xh(\th_n+j)-\xh(\th_n+j-1))

361: \eeqnx

362: \noindent

363: where $\Rb_n = \mbox{Toeplitz}([1, \rho_n, \rho_n^2,\cdots, \rho_n^{l_n}])$. Then the simulation of these densities is quite simple.\\  \\ \\

364: \noindent

365: $p(\rho_n|\xb,\tb)$ is not a classical law. Its expression is given by :

366: \begin{eqnarray*}

367: p(\rho_n|\xb,\tb,N) & = & \prod_{n=0}^N p(\rho_n|\xb_n,\tb,N) \\

368: & \propto & \left( \frac{1}{\sigma_n^2(1-\rho_n^2)}\right)^{\frac{ln}{2}} \exp \left\{- \frac{1}{2\sigma_n^2(1-\rho_n^2)} (\xb_n-\mu_n\oneb)'\Rb_n^{-1}(\xb_n-\mu_n\oneb) \right\} \\

369: & \propto & \left( \frac{1}{\sigma_n^2(1-\rho_n^2)}\right)^{\frac{ln}{2}} \exp \left\{- \frac{1}{2\sigma_n^2(1-\rho_n^2)} \sum_{l=1}^{ln} (x(t_n+l)-\rho_n x(t_n+l-1)-(1-\rho_n)\mu_n)^2 \right\}

370: \end{eqnarray*}

371: \noindent

372: Then we can not sample easily this density. \\

373: \noindent

374: The solution we propose is to use, in this step, a Hastings-Metropolis algorithm for sampling this density. As an instrumental density we propose to use a Gaussian approximation of the posterior density, \ie we estimate the mean $m_{\rho_n}$ and the variance $\sigma^2_{\rho_n}$ of $p(\rho_n|\xb,\tb,N)$ and we use a Gaussian law $\mathcal N(m_{\rho_n},\sigma^2_{\rho_n})$ to obtain a sample. This sample is accepted or rejected following $p(\rho_n|\xb,\tb,N)$. In practice we compute $m_{\rho_n}$ and $\sigma^2_{\rho_n}$ calculating by approximation of their definition :

375: \begin{eqnarray*}

376: m_{\rho_n} & \lra & \int_0^1 \rho_n \quad p(\rho_n|\xb,\tb,N) \\

377: \sigma^2_{\rho_n} & \lra & \int_0^1 \rho_n^2 \quad p(\rho_n|\xb,\tb,N) - m_{\rho_n}^2

378: \end{eqnarray*}

379: \noindent

380: %Another way to find approximation of $m_{\rho_n}$ and $\sigma^2_{\rho_n}$ is to %use empirical estimators :

381: %\begin{eqnarray*}

382: %\hat{m}_{\rho_n} & = & \frac{1}{ln} \sum_{l=1}^{ln} %\frac{x(tn+l)-\mu_n}{x(tn+l-1)-\mu_n} \\

383: %\hat{\sigma}^2_{\rho_n} & = & \frac{1}{ln} \sum_{l=1}^{ln} \left( %\frac{x(tn+l)-\mu_n}{x(tn+l-1)-\mu_n} - \hat{m}_{\rho_n}\right)^2

384: %\end{eqnarray*}

385: %\\ \\

386:

387:

388:

389: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

390:

391: \newpage

392: \section{Other formulations}

393: Other formulation can also exist.

394: We introduce two sets of hidden variables

395:

396: \centerline{

397: $\zb=[z(t_0), \cdots, z(t_0+T)]'$ and

398: $\qb=[q(t_0), \cdots, q(t_0+T)]'$

399: }

400: where

401: \beq

402: \barr{l}

403: q(t)=\left\{\barr{ll}

404: 1 & \mbox{if~}  z(t)\not= z(t-1) \\

405: 0 & \mbox{elsewhere}

406: \earr\right.

407: =\left\{\barr{ll}

408: 1 & \mbox{if~}  t=t_n, n=0,\cdots,N \\

409: 0 & \mbox{elsewhere}

410: \earr\right.

411: \earr.

412: \eeq

413: and where $z(t)$ takes an integer value $k$ in each segment : $k=1,\dots,N+1$. With these two related hidden variables, we can propose two other modeling to be used in change point analysis. For example, $\qb$ can be modeled by a Bernouilli process

414: \[

415: P(\Qb=\qb)=\lambda^{\sum_j q_j} (1-\lambda)^{\sum_j (1-q_j)}

416: =\lambda^{\sum_j q_j} (1-\lambda)^{N -\sum_j q_j}

417: \]

418: and $\zb$ can be modeled by a Mrkov chain, \ie

419: $\{z(t), t=1,\cdots,T\}$ forms a Markov chain:

420: \beq

421: \barr{l}

422: P(z(t)=k)=p_k, \quad k=1,\cdots,K,\\

423: P(z(t)=k|z(t-1)=l)=p_{kl}, \quad\mbox{with~~} \sum_k p_{kl}=1.

424: \earr

425: \eeq

426: These two models are related. In the first one, $\lambda$ plays the role of the mean value of the segment lengths and in the second $p_k$ and $p_{kl}$ give more precise control of the segment lengths.

427: In the multivariate case, or more precisely in  bivariate case (image processing), $\qb$ may represent the contours and $\zb$ the labels for the regions in the image.

428: Then, we may also give a Markov model for them. For example, if we note by

429: $r\in \Sc$ the position of a pixel, $\Sc$ the set of pixels positions and

430: by $\Vc(r)$ the set of pixels in the neighorhood of the pixel position $r$,

431: we may use an Ising model for $\qb$

432: \beq

433: P(\Qb=\qb)\propto \expf{-\rho \sum_{r\in\Sc} \sum_{s\in\Vc(r)} \delta(z(r)-z(s))}

434: \eeq

435: or a Potts model for $\zb$:

436: \beq

437: P(\zb)\propto \expf{-\rho \sum_{r\in\Sc} \sum_{s\in\Vc(r)}

438: \delta(z(r)-z(s))}.

439: \eeq

440: where $rho$ in the first controls the mean lengths of the contours in the image and in the second the mean surface of the regions in the image.

441: Other more complexe modelings are also possible.

442:

443: With these auxiliary variables, we can write

444: \beq

445: p(\xb|\zb,\thetab)=\sum_{n=1}^N P(z_j=n) \Nc(\mu_n\oneb, \Sigmab_n)

446: =\sum_{n=1}^N p_k \Nc(\mu_n\oneb, \Sigmab_n)

447: \eeq

448: if we choose $K=N$. Here,

449: $\thetab=\acc{N,\acc{\mu_n,\sigma_n, p_n,\; n=1,\cdots,N}, \pth{p_{kl}, \; k,l=1,\cdots,N}}$ and the model is a mixture of Gaussians.

450:

451: We can again assign appropriate prior law on $\thetab$ and give the expression of $p(\zb,\thetab|\xb)$ and do any inference on $\zb$, $\thetab$.

452:

453: Finally, we can also use $\qb$ as the auxiliary variable and write

454: \beqn

455: p(\xb|\qb,\thetab)&=&

456: (2\pi)^{-N/2}

457: \pth{\prod_{n=1}^N 1/\sigma_n}

458: \expf{-\frac{1}{2\sigma_n^2} \sum_{n=1}^N \pth{x(t_n)-\mu_n}^2} \nonumber \\

459: &+&

460: (2\pi)^{-(T-N)/2}

461: \pth{\prod_{n=1}^N 1/\sigma_n^{(l_n-1)}}

462: \expf{-\frac{1}{2\sigma_n^2} \sum_{j=1}^T

463: (1-q_j) \pth{x_{j}-x_{j-1}}^2} \nonumber \\

464: &=&

465: (2\pi)^{-T/2}

466: \pth{\prod_{n=1}^N 1/\sigma_n^{(l_n)}}

467: \expf{-\frac{1}{2\sigma_n^2} \sum_{j=1}^T

468: \cro{(1-q_j) \pth{x_{j}-x_{j-1}}^2

469: +q_j\pth{x_{j}-\mu_n}}} \nonumber \\

470: \eeqn

471: and again assign appropriate prior law on $\thetab$ and give the expression of $p(\qb,\thetab|\xb)$ and do any inference on $\qb$, $\thetab$. We are still working on using these auxiliary hidden variables particularly for applications in data fusion in image processing and we will report on these works very soon.

472:

473:

474: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

475:

476: \newpage

477: \section{Simulation results}

478: To test the feasability and to mesaure the performances of the proposed algorithms, we generated a few simple cases corresponding to only changes of one of the three parameters $\mu_n$, $\sigma^2_n$ and $\rho_n$. \\

479: \noindent

480: In each case we present the data, the histogram of the \apost samples of $\tb$ during the first and the last iterations of the MCMC algorithm. For each case we also give the value of the parameters used to simulate the data, the estimated values when the changepoints are known and the estimated values by the proposed method.

481:

482: \newpage

483: \subsection{Change of the means}

484:

485: We can see in figure \ref{fig_mean} that we obtain precise results on the position of the changepoints. In the case of change of means, the algorithm is very fast to converge to the good solution. In fact it needs only few iterations (about 5). The main cause of this results is the importance of the means in the likelihood $p(\xb|\tb,\thetab,N)$. \\

486: We can also see in table 1 that the estimations of the means are very precise, particularly when the size of the segment is long.

487: \bfig[hbt]

488: \bcc

489: \includegraphics[width=150mm,height=75mm]{result_moyenne}

490: \ecc

491: \caption[Different means.]{Change in the means. up to down : simulated data, histogram in the 50th iteration, histogram in the first iteration, real position of the changepoints.}

492: \label{fig_mean}

493: \efig

494:

495:

496: \begin{table}[h]

497: \begin{center}

498: \begin{tabular}{|c|c|c|} \hline

499: m & $\hat{m}|\xb,\tb$ & $\hat{m}|\xb$ \\

500: \hline \hline

501: 1.5 & 1.4966 & 1.4969  \\

502: 1.7 & 1.7084 & 1.7013  \\

503: 1.5 & 1.4912 & 1.5015  \\

504: 1.7 & 1.6940 & 1.6929  \\

505: 1.9 & 1.9012 & 1.8915  \\

506: \hline

507: \end{tabular}

508: \caption{Estimated value of the means}

509: \end{center}

510: \end{table}

511:

512: \newpage

513: \subsection{Change in the variances}

514:

515: We can  see in figure \ref{fig_var} that we have again good results on the position of the changepoints. However, for little difference of variances, the algorithm give an uncertainty on the exact position of the changepoint. This can be justified by the fact that the simulated data give itself this uncertainty. \\

516: In table 2 we can see again good estimations on the variances on each segments.

517: \bfig[hbt]

518: \bcc

519: \includegraphics[width=150mm,height=75mm]{result_variance}

520: \ecc

521: \caption[Different variances.]{Change in the variances. up to down : simulated data, histogram in the 50th iteration, histogram in the first iteration, real position of the changepoints.}

522: \label{fig_var}

523: \efig

524:

525: \begin{table}[h]

526: \begin{center}

527: \begin{tabular}{|c|c|c|} \hline

528: $\sigma^2$ & $\hat{\sigma}^2|\xb,\tb$ & $\hat{\sigma}^2 |\xb$ \\

529: \hline \hline

530: 0.01 & 0.0083 & 0.0081 \\

531: 1 & 0.9918 & 0.9598 \\

532: 0.001 & 0.0007 & 0.0026 \\

533: 0.1 & 0.0945 & 0.0940 \\

534: 0.01 & 0.0079 & 0.0107 \\

535: \hline

536: \end{tabular}

537: \caption{Estimated value of the variances}

538: \end{center}

539: \end{table}

540:

541: \newpage

542: \subsection{Change in the correlation coefficient}

543: The results showed in figure \ref{fig_corr_coef} are worse than in the two first cases. The position of the changepoints are less precise, and we can see that another changepoint appears. This affects the estimation of the correlation coefficient in the third segment because the algorithm alternates between two positions of changepoint. This problem can be justified by the fact that a value of the correlation coefficient near 1 implies locally a change of the mean, which can be considered by the algorithm as a changepoint. Also this problem appears when the size of the segments are far from the \aprio size $\lambda$.

544: \bfig[hbt]

545: \bcc

546: \includegraphics[width=150mm,height=75mm]{result_coeff_a}

547: \ecc

548: \caption[Different correlation coefficients.]{Change in the correlation coefficient. up to down : simulated data, histogram in the 50th iteration, histogram in the first iteration, real position of the changepoints.}

549: \label{fig_corr_coef}

550: \efig

551:

552: \begin{table}[h]

553: \begin{center}

554: \begin{tabular}{|c|c|} \hline

555: $a$  & $\hat{a}|\xb$ \\

556: \hline \hline

557: 0 & 0.0988 \\

558: 0.9 & 0.7875 \\

559: 0.1 & 0.3737 \\

560: 0.8 & 0.8071 \\

561: 0.2 & 0.1710 \\

562: \hline

563: \end{tabular}

564: \caption{Estimated vaue of the correlation coefficients}

565: \end{center}

566: \end{table}

567:

568: \newpage

569: \subsection{Influence of the prior law}

570: \noindent

571: In this section we study the influence of the \aprio on $\lambda$, \ie the size of the segments. In the following we fix the number of changepoints as before and we change the \aprio size of the segments by $\lambda_0=\frac{\lambda}{2}$ and $\lambda_1=2\lambda$. We apply then our algorithm on the change of the correlation coefficient.

572: \bfig[hbt]

573: \bcc

574: \includegraphics[width=150mm,height=75mm]{result_coeff_a2}

575: \ecc

576: \caption[Change in the correlation coefficients]{Different correlation coefficient with $\lambda_0=\frac{1}{2}\frac{T}{N+1}$. up to down : simulated data, histogram in the 50th iteration, histogram in the first iteration, real position of the changepoints.}

577: \label{fig_corr_coef_1}

578: \efig

579:

580: \bfig[hbt]

581: \bcc

582: \includegraphics[width=150mm,height=75mm]{result_coeff_a3}

583: \ecc

584: \caption[Change in the correlation coefficients.]{Different correlation coefficient with $\lambda_1=2\frac{T}{N+1}$. up to down : simulated data, histogram in the 50th iteration, histogram in the first iteration, real position of the changepoints.}

585: \label{fig_corr_coef_2}

586: \efig

587:

588: \noindent

589: In figure \ref{fig_corr_coef_1}, we can see that the algorithm has detected other changepoints, forming segments whose size is near $\lambda_0$. This result shows the importance of the \aprio when the data are not enough significant. We can also see this conclusion in figure \ref{fig_corr_coef_2} where only three changepoints are detected, forming segments whose size is again near $\lambda_1$. We can also remark that fixing \aprio a size $\lambda$ comes down to fix the number of changepoints. Our algorithm give then good results for instance if we have a good \aprio on the number of changepoints.

590:

591: \section{Conclusions}

592:

593: \small

594: %\def\bibdir{/home/djafari/Tex/Inputs/bib/}

595: \bibliographystyle{ieeetr}

596: %\bibliography{revuedef,biben,\bibdir baseAJ,\bibdir baseKZ}

597: \bibliography{revuedef,biben,baseAJ,baseKZ,cp}

598:

599: \edoc

600: