physics0403148/cp.tex
1: % abstract1.tex
2: \documentclass[12pt]{article}
3: \usepackage{Input/spie,graphicx,xspace,amsmath,amssymb}
4: \graphicspath{{figures_changepoint/}} 
5: \textheight 8.74in 
6: \textwidth 6.74in 
7: %--------------------------- MACROS -------------------------------
8: \input{Input/alphabet}   
9: \input{Input/abrege}     
10: \input{Input/abrmath}    
11: \input{Input/beginend}   
12: %--------------------------- MACROS -------------------------------
13: \def\bm#1{\mbox{\boldmath #1}}
14: \def\d#1{\,\hbox{d}#1}
15: \def\bs{\bar{s}}
16: \def\us{\underline{s}}
17: \def\fmin{f_{\mbox{min}}}
18: \def\fmax{f_{\mbox{max}}}
19: \def\expf#1{\mbox{exp}\left\{#1\right\}}
20: \def\argmin#1#2{\mbox{arg}\min_{#1}\left\{#2\right\}}
21: \def\argmax#1#2{\mbox{arg}\max_{#1}\left\{#2\right\}}
22: 
23: \def\lra{\longrightarrow}
24: \def\fh{\widehat{f}}
25: \def\fbh{\widehat{\fb}}
26: \def\Rbh{\widehat{\Rb}}
27: \def\Sigmabh{\widehat{\Sigmab}}
28: 
29: \def\rbh{\widehat{\rb}}
30: \def\qbh{\widehat{\qb}}
31: \def\xbh{\widehat{\xb}}
32: \def\tbh{\widehat{\tb}}
33: \def\thetabh{\widehat{\thetab}}
34: 
35: \def\disp{\displaystyle}
36: \def\vsm{\vspace*{-12pt}}
37: \def\hsm{\hspace*{-3em}}
38: 
39: \def\pmata#1#2{\left(\barr{c} #1 \\ #2 \earr\right)}
40: \def\pmatb#1#2#3#4{\left(\barr{cc} #1 & #2 \\ #3 & #4 \earr\right)}
41: 
42: \def\th{\widehat{t}}
43: \def\xh{\widehat{x}}
44: \def\lambdah{\widehat{\lambda}}
45: \def\muh{\widehat{\mu}}
46: \def\sigmah{\widehat{\sigma}}
47: \def\rhoh{\widehat{\rho}}
48: \def\betah{\widehat{\beta}}
49: \def\alphah{\widehat{\alpha}}
50: \def\tbh{\widehat{\tb}}
51: \def\mubh{\widehat{\mub}}
52: \def\sigmabh{\widehat{\sigmab}}
53: \def\rhobh{\widehat{\rhob}}
54: \def\oneb{\mbox{\bf 1}}
55: %----------- Le document  -------------------------------------------
56: 
57: \title{A Bayesian approach to change point analysis of discrete time series}
58: 
59: \author{Ali Mohammad-Djafari and Olivier F\'eron\\[.4cm]
60:   Laboratoire des Signaux et Syst\`emes,\\  
61:   Unit\'e mixte de recherche 8506 (CNRS-Sup\'elec-UPS) \\  
62:   Sup\'elec, Plateau de Moulon, 91192 Gif-sur-Yvette, France\\ 
63:   emails = {djafari,feron@lss.supelec.fr} 
64: }
65: \authorinfo{\noindent \hspace*{-0.5cm}${}^\ast$Correspondence:~E-mail:
66: djafari@lss.supelec.fr}
67: \date{}
68: 
69: \bdoc
70: \maketitle
71: 
72: \begin{abstract}
73: In this work we consider time series with a finite number of discrete point changes. We assume that the data in each segment follows a different probability density functions (pdf). 
74: We focus on the case where the data in all segments are modeled by Gaussian probability density functions with different means, variances and correlation lengths. 
75: We put a prior law  on the change point instances (Poisson process) as well as on these different parameters(conjugate priors) and give the expression of the posterior probality distributions of these change points. The computations are done by using an appropriate Markov Chain Monte Carlo (MCMC) technique. 
76: 
77: The problem as we stated can also be considered as an unsupervised classification and/or segmentation of the time serie. 
78: This analogy gives us the possibility to propose alternative modeling and computation of change points, which are more appropriate for multivariate signals, for example in image processing.
79: \\ ~\\ 
80: {\bf key words:}~ Bayesian change-points estimation, classification and segmentation. 
81: \end{abstract}
82: 
83: 
84: \section{Introduction}
85: 
86: Figure 1 shows typical change point problems we consider in this work. 
87: Note that, very often people consider problems in which there is only one change point \cite{Basseville88}. Here we propose to consider more general problems with any number of change points. However, very often the change point analysis problems need online or real time detection algorithms \cite{Wax91,Kormylo82,Chi85,Goutsias88}, while here, we focus only on off line methods where we assume that we have gathered all the data and we want to analyse it to detect change points who have been occured during the observation time. 
88: Also, even if we consider here change point estimation of 1-D time series, we can extend the proposed method to multivariate data, for example the images where the change point problems become equivalent to segmentation. 
89: One more point to position this work is that, very often the models used in change point problems assume to know perfectly the model of the signal in each segment, \ie a linear or nonlinear regression model \cite{Goutsias88,Oliver96,Hughes99,Fitzgibbon00,Fitzgibbon02}, while here, we use a probabilistic model for the signals in each segment which gives probably more generality and applicability when we do not know perfectly those models. 
90: 
91: 
92: \bfig[hbt]
93: \bcc
94: \includegraphics[width=150mm,height=75mm]{cp1} 
95: \ecc
96: \caption[Change point problems description.]{Change point problems description: 
97: In the first row, only mean values of the different segments are different. In the second row, only variances are changed. In the third row only the correlation strengths are changed. In the fifth row, the whole nature shape of their probability distribution have been changed. The last row show the change points $t_n$.}
98: \label{fig1}
99: \efig
100: 
101: More specifically, we model the time series by a hierarchical Gauss-Markov modeling with hidden varaibles which are themselves modeled by a Markov model. Though, in each segment which corresponds to a particular value of the hidden variable, the time series is assumed to be modeled by a stationnary Gauss-Markov model. However, we choosed a simple parametric model defined only with three parameters of mean $\mu$, variance $\sigma^2=1/\tau$ and a parameter $\rho$ measuring the local 
102: correlation strength of the neighboring samples. 
103: 
104: The choice of the hidden variable is also important. We have studied three different modeling: i) change point time instants $t_n$, ii) classification labels $z_n$ 
105: or iii) a Bernouilli variable $q_n$ which is always equal to zero except when a change point occurs. 
106: 
107: The rest of the paper is organized as follows: In the next section we introduce the notations and fixe the objectives of the paper. In section 3 we consider the model with explicite change point times as the hidden variables and propose particular modeling for them and an MCMC algorithm to compute their \apost probabilities. In sections 4 and 5 we consider the two other aformentionned models. Finally, we show some simulation results and present our conclusions and perspectives.  
108: 
109: 
110: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
111: 
112: \newpage
113: \section{Notations and modeling}
114: We note by $\xb=[x(t_0), \cdots, x(t_0+T)]'$ the vector containing the data observed from time $t_0$ to $t_0+T$. We note by 
115: $\tb=[t_1,\cdots,t_N]'$ the unknown change points and note  
116: $\xb=[\xb_0, \xb_1, \cdots , \xb_N]'$ 
117: where $\xb_n=[x(t_n), x(t_n+1), \cdots, x(t_{n+1})]', \quad n=0,\cdots,N$ 
118: represent the data samples in each segment. In the following we will have $t_{N+1}=T$.
119: 
120: We model the data 
121: $\xb_n=[x(t_n), x(t_n+1), \cdots, x(t_{n+1})]', \quad n=0,\cdots,N$ 
122: in each segment by a Gauss-Markov chain:
123: 
124: \beqn
125: p(x(t_n))&=&\Nc(\mu_n, \sigma_n^2) \nonumber \\ 
126: p(x(t_n+l)|x(t_n+l-1))&=&\Nc(\rho_n \, x(t_n+l-1)+(1-\rho_n)\mu_n, \sigma_n^2(1-\rho_n^2)), 
127: \quad l=1,\cdots,l_n -1 \nonumber \\ 
128: \mbox{with~~~}&& 
129: l_n=t_{n+1}-t_n+1=\dim{[\xb_n]}  
130: \label{eq1}
131: \eeqn
132: Then we have 
133: \beqn
134: p(\xb_n) &=&p(x(t_n)) \prod_{l=1}^{l_n} p(x(t_n+l)|x(t_n+l-1)) \nonumber \\ 
135: p(\xb_n) &\propto& \expf{-\frac{1}{2\sigma_n^2} (x(t_n)-\mu_n)^2} \nonumber\\ 
136:          &&\expf{-\frac{1}{2(\sigma_n^2(1-\rho_n^2))} \sum_{l=1}^{l_n} 
137: [x(t_n+l)- \rho_n x(t_n+l-1)-(1-\rho_n)\mu_n]^2} \nonumber\\ 
138: p(\xb_n)&=&\Nc(\mu_n\oneb, \Sigma_n) 
139: \mbox{~~with~~} \Sigmab_n=\sigma_n^2 \, 
140: \mbox{Toeplitz}([1, \rho_n, \rho_n^2,\cdots, \rho_n^{l_n}])
141: \label{eq2}
142: \eeqn
143: 
144: Noting by $\tb=[t_1,\cdots,t_N]$ the vector of the change points and 
145: assuming that the samples from any two segments are independent, we can write:
146: \beq \label{eq3}
147: p(\xb|\tb,\thetab, N)=\prod_{n=0}^N \Nc(\mu_n\oneb, \Sigmab_n)
148: =\pth{\prod_{n=0}^N \frac{|\Sigmab_n|^{-1/2}}{(2\pi)^{(l_n/2)}}}
149: \expf{-\frac{1}{2} \sum_{n=0}^N  (\xb_n-\mu_n\oneb)'\Sigmab_n^{-1}(\xb_n-\mu_n\oneb)}
150: \eeq
151: where we noted   
152: $\thetab=\acc{\mu_n,\sigma_n,\rho_n,\; n=0,\cdots,N}$. 
153: 
154: Note that 
155: \beq \label{eq4}
156: -\ln p(\xb|\tb,\thetab,N)
157: =\sum_{n=0}^N (l_n/2) \ln (2\pi)
158: +\frac{1}{2} \sum_{n=0}^N \ln {|\Sigmab_n|}
159: -\frac{1}{2} \sum_{n=0}^N  (\xb_n-\mu_n\oneb)'\Sigmab_n^{-1}(\xb_n-\mu_n\oneb)
160: \eeq
161: and when the data are \iid, ($\Sigmab_n=\sigma_n\Ib$) this becomes 
162: \beq \label{eq5}
163: -\ln p(\xb|\tb,\thetab,N)
164: =(T/2)\ln (2\pi)
165: +\sum_{n=0}^N (l_n/2) \ln {\sigma_n^2}
166: - \sum_{n=0}^N  \frac{\|(\xb_n-\mu_n\oneb)\|^2}{2\sigma_n^2} 
167: \eeq
168: 
169: Then, the inference problems we will be faced are the following:
170: \ben
171: \item Infer on $\thetab$ given $\xb$ and $\tb$;
172: \item Infer on $\tb$ given  $\xb$ and $\thetab$;
173: \item Infer on $\tb$ and $\thetab$ given $\xb$;
174: \item Infer on $\thetab$ given $\xb$.  
175: \item Infer on $\tb$ given  $\xb$; 
176: \een
177: It is clear that the first problem is the easiest. 
178: 
179: The classical maximum likelihood estimation (MLE) approach can handle only the 
180: first three problems by maximizing $p(\xb|\tb,\thetab)$, respectively, with respect to $\thetab$, to $\tb$ and jointly $(\tbh,\thetabh)$: 
181: 
182: \bit
183: \item Estimating $\thetab$ given $\xb$ and $\tb$: \quad 
184: \(
185: \thetabh=\argmax{\thetab}{p(\xb|\tb,\thetab)}
186: \)
187: 
188: \item Estimating $\tb$ given  $\xb$ and $\thetab$: \quad 
189: \(
190: \tbh=\argmax{\tb}{p(\xb|\tb,\thetab)}
191: \)
192: 
193: \item Estimating $\tb$ and $\thetab$ given $\xb$: \quad 
194: \(
195: (\tbh,\thetabh)=\argmax{(\tb,\thetab)}{p(\xb|\tb,\thetab)}
196: \)
197: \eit
198: However, we must be careful to check the boundedness of the likelihood function before using any optimization algorithm. 
199: The optimization with respect to $\thetab$ when $\tb$ is known can be done easily, but the optimization with respect to $\tb$ is very hard and computationally costly. 
200: 
201: The two last problems cannot be handled easily because they need to define the likelihood fuctions $p(\xb|\thetab)$ and $p(\xb|\tb)$ which need integrations with respect to $\tb$ or $\thetab$ of $p(\xb|\tb,\thetab)$. There may not be possible to find analytical expressions for these integrals which may even not exist. 
202: 
203: 
204: 
205: 
206: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
207: 
208: 
209: 
210: 
211: \section{Bayesian estimation of the change point time instants}
212: 
213: In Bayesian approach, one assigns prior probability laws on both $\tb$ and $\thetab$ and use the posterior probability law $p(\tb,\thetab|\xb)$ as a tool for doing any inference. Choosing a prior pdf for $\tb$ is also usual in classical approach. A simple model is the following:
214: \beq \label{eq6}
215: t_n=t_{n-1}+\epsilon_n \quad \mbox{with}\quad \epsilon_n\sim\Pc(\lambda),
216: \eeq
217: %\beq \label{eq6}
218: %t_n=t_{n-1}+\epsilon \quad \mbox{with}\quad \epsilon\sim\Ec(\lambda)
219: %\eeq
220: 
221: \noindent
222: where $\varepsilon_n$ are assumed iid end $\lambda$ is the \aprio mean value of time intervals $(t_n-t_{n-1})$. if $N$ is the number of changepoint we can take $\lambda=\frac{T}{N+1}$. With this modeling we have :
223: \rem{
224: can either be a constant or equal to the mean value of the past intervals:  
225: $\lambda_n=\frac{1}{n-1}\sum_{k=1}^n (t_n-t_{n-1}), \quad n>2$
226: }
227: \beq \label{eq7}
228: \barr{l}
229: p(\tb|\lambda)=\prod_{n=1}^{N+1} \Pc(t_n-t_{n-1}|\lambda)=
230: \prod_{n=1}^{N+1} e^{-\lambda} \frac{\lambda^{(t_n-t_{n-1})}}{(t_n-t_{n-1})!}
231: \\
232: \ln p(\tb|\lambda)=
233:  -(N+1)\lambda + \ln(\lambda)\sum_{n=1}^{N+1} (t_n-t_{n-1})-\sum_{n=1}^{N+1} \ln((t_n-t_{n-1})!)%
234: \earr
235: \eeq
236: 
237: %\beq \label{eq7}
238: %\barr{l}
239: %p(\tb)=p(\tb|\lambda)=\prod_{n=1}^N \Ec(t_n-t_{n-1}|\lambda)=
240: %\prod_{n=1}^N \lambda e^{-\lambda(t_n-t_{n-1})} 
241: %\\ 
242: %\ln p(\tb)=
243: % N \ln(\lambda)- \lambda\sum_{n=1}^N(t_n-t_{n-1})
244: %\earr
245: %\eeq
246: 
247: 
248: With this prior selection, we have
249: %\beq
250: %p(\xb,\tb|\thetab,\lambda)=p(\xb|\tb,\thetab) \, p(\tb|\lambda)
251: %\eeq
252: \beq
253: p(\xb,\tb|\thetab,N)=p(\xb|\tb,\thetab,N) \, p(\tb|\lambda,N)
254: \eeq
255: and
256: %\beq \label{eq9}
257: %p(\tb|\xb,\thetab,\lambda)\propto p(\xb|\tb,\thetab) \, p(\tb|\lambda)
258: %\eeq
259: \beq \label{eq9}
260: p(\tb|\xb,\thetab,N)\propto p(\xb|\tb,\thetab,N) \, p(\tb|\lambda,N)
261: \eeq
262: 
263: In Bayesian approach, one goes one step further with assigning prior probability laws to the hyperparameters $\thetab$, %and $\lambda$, 
264: \ie $p(\thetab)$ %and $p(\lambda)$ 
265: and then one writes the joint \apost:
266: 
267: %\beq
268: %p(\tb,\thetab,\lambda|\xb) \propto p(\xb|\tb,\thetab)\, p(\tb|\lambda) \, %p(\thetab) \, p(\lambda) 
269: %\eeq
270: \beq
271: p(\tb,\thetab|\xb,\lambda,N) \propto p(\xb|\tb,\thetab,N)\, p(\tb|\lambda,N) \, p(\thetab|N)   
272: \eeq
273: where here we noted   
274: $\thetab=\acc{\mu_n,\sigma^2_n,\rho_n,\; n=1,\cdots,N}$. 
275: 
276: To go further in details, we need to assign $p(\thetab)$.%, $p(\lambda)$ and $p(N)$. 
277:  The following is our selection:
278: \beqnx
279: p(\mu_n) &=& \Nc(\mu_0,\sigma_0^2)\\ 
280: p(\sigma_n^2) &=& {\cal IG}(\alpha_0,\beta_0) \\ 
281: p(\rho_n) &=& \Uc([0,1]) %\\ 
282: %p(\lambda) &=& \Ec(\lambda_0)=\lambda_0\expf{-\lambda/\lambda_0}  
283: \eeqnx 
284: which correspond mainely to the conjugate or reference priors. 
285: 
286: Given all these, we propose the following Gibbs MCMC algorithm:
287: \[
288: \barr{lllll}
289: \mbox{Iterate until convergency} \\ 
290: %\mbox{.~~sample~~} \lambda &\mbox{using}& p(\lambda|\xb,\tb,\thetab) =p(\lambda|\tb)\propto p(\tb|\lambda) \, p(\lambda) &
291: %(\mbox{equations~} \ref{eq6} \mbox{~~and~~} \ref{eq7})  
292: %\\ 
293: \mbox{.~~sample~~} \tb &\mbox{using}& p(\tb|\xb,\thetab,N)  
294: %(\mbox{equation~} \ref{eq9} 
295: \\ 
296: \mbox{.~~sample~~} \theta_n : \\
297: %\mbox{   -~~sample~~} 
298: \qquad \mu_n &\mbox{using}& p(\mu_n|\xb,\tb,N) \\ 
299: %\mbox{   -~~sample~~} 
300: \qquad \sigma_n^2 &\mbox{using}& p(\sigma_n^2|\xb,\tb,N) \\ 
301: %\mbox{   -~~sample~~} 
302: \qquad \rho_n &\mbox{using}& p(\rho_n|\xb,\tb,N) \\ 
303: \earr
304: \]
305: 
306: \subsection{Sampling $\tb$ using $p(\tb|\xb,\thetab,N)$}
307: P. Fearnhead showed \cite{Fearnhead} that it is possible to perform perfect simulation of $p(\tb|\xb,\thetab,N)$ when we have assumed that segments of data separated by a changepoint $t_n$ are independant. This simulation can be obtained by a method based on recursion on the changepoints. An approximation of this method is possible to obtain an algorithm whose computational cost is linear in the number of observations. The main principle of this algorithm is to compute the following probabilities :\\
308: \noindent
309: Let note $\xb_{t:s}=[x(t),x(t+1),\dots,x(s)]$, and
310: \begin{eqnarray*}
311: R(t,s|\lambda) & = & p(\xb_{t:s}|t,s \mbox{ in the same segment},\lambda) \\
312: Q(t|\lambda) & = & p(\xb_{t:s}|\mbox{ changepoint at } t-1,\lambda), \quad Q(1)=p(\xb|\lambda) 
313: \end{eqnarray*} 
314: \noindent
315: Let also note $F(t|\lambda)$ the associated cumulative distribution function of the prior density $\mathcal P(t_n-t_{n-1}|\lambda)$ which is defined by (7). \\
316: \noindent
317: We compute $R(t,s|\lambda)$ with the following relation :
318: \beq
319: R(t,s)|\lambda)=\int p(\xb_{t:s}|\thetab,\lambda)p(\thetab) d\thetab
320: \nonumber
321: \eeq 
322: The computation of $Q(t|\lambda)$ can be done recursively by the following result : for $t=1,\dots,T$,
323: \beq
324: Q(t|\lambda)=\sum^{T-1}_{s=t}R(t,s|\lambda)Q(s+1|\lambda)\mathcal P(s+1-t|\lambda)+R(t,T|\lambda)(1-F(T-t|\lambda)),
325: \nonumber
326: \eeq
327: \noindent
328: This result is shown by P. Fearnhead \cite{Fearnhead} . And he also demonstrates that the posterior distribution of $t_n$ given $t_{n-1}$ is
329: \begin{eqnarray*}
330: p(t_n|t_{n-1},\xb,\lambda)=\frac{R(t_{n-1},t_n|\lambda)Q(t_n+1|\lambda) \mathcal P(t_n-t_{n-1}|\lambda)}{Q(t_{n-1}|\lambda)}
331: \end{eqnarray*}
332: \noindent
333: and the posterior distribution of no further changepoint is given by
334: \beq
335: p(t_n=T|t_{n-1},\xb,\lambda)=\frac{R(t_{n-1},T|\lambda)(1-F(T-t_{n-1}-1|\lambda))}{Q(t_{n-1}|\lambda)}
336: \nonumber
337: \eeq
338: 
339: 
340: \subsection{Sampling $\theta_n$ using $p(\theta_n|\xb,\tb,N)$}
341: We may note that, thanks to the conjugacy, we have: 
342: \beqnx
343: %p(\lambda|\tbh)&=&\Gc(\rhoh,\betah) \mbox{~~with~~} 
344: %\rhoh=1+\sum_{n=1}^N (\th_n-\th_{n-1}), \quad 
345: %\betah=N+(1/\lambda_0). 
346: %\\ 
347: p(\mu_n|\xb,\tb)&=&\Nc(\muh_n,\sigmah_n^2) \mbox{~~with~~} 
348: \left\{\barr{l}
349: \muh_n= \sigmah_n^2 \left[ \frac{\mu_0}{\sigma_0^2}+\oneb'\Sigmab_n^{-1} \xb_n \right]\\ 
350: \sigmah_n^2= \left( \oneb'\Sigmab_n^{-1}\oneb + \frac{1}{\sigma_0^2} \right)^{-1}
351: \earr\right. 
352: \\ 
353: p(\sigma_n^2|\xb,\tb)&=&{\cal IG}(\alphah_n,\betah_n)  \mbox{~~with~~}  
354: \left\{\barr{l}
355: \alphah_n= \alpha_0 + \frac{l_n}{2} \\ 
356: \betah_n= \beta_0 + \frac{1}{2}(\xb_n-\mu_n\oneb)'\Rb_n^{-1}(\xb_n-\mu_n\oneb),
357: \earr\right. 
358: %\\ 
359: %p(\rho_n|\xb,\tb)&=&\delta(\rho_n-\rhoh_n),  \mbox{~~with~~}
360: %\rhoh_n=\frac{1}{l_n-1} \sum_{j=2}^{l_n} (\xh(\th_n+j)-\xh(\th_n+j-1)) 
361: \eeqnx
362: \noindent
363: where $\Rb_n = \mbox{Toeplitz}([1, \rho_n, \rho_n^2,\cdots, \rho_n^{l_n}])$. Then the simulation of these densities is quite simple.\\  \\ \\
364: \noindent
365: $p(\rho_n|\xb,\tb)$ is not a classical law. Its expression is given by : 
366: \begin{eqnarray*}
367: p(\rho_n|\xb,\tb,N) & = & \prod_{n=0}^N p(\rho_n|\xb_n,\tb,N) \\
368: & \propto & \left( \frac{1}{\sigma_n^2(1-\rho_n^2)}\right)^{\frac{ln}{2}} \exp \left\{- \frac{1}{2\sigma_n^2(1-\rho_n^2)} (\xb_n-\mu_n\oneb)'\Rb_n^{-1}(\xb_n-\mu_n\oneb) \right\} \\
369: & \propto & \left( \frac{1}{\sigma_n^2(1-\rho_n^2)}\right)^{\frac{ln}{2}} \exp \left\{- \frac{1}{2\sigma_n^2(1-\rho_n^2)} \sum_{l=1}^{ln} (x(t_n+l)-\rho_n x(t_n+l-1)-(1-\rho_n)\mu_n)^2 \right\}
370: \end{eqnarray*}
371: \noindent
372: Then we can not sample easily this density. \\
373: \noindent
374: The solution we propose is to use, in this step, a Hastings-Metropolis algorithm for sampling this density. As an instrumental density we propose to use a Gaussian approximation of the posterior density, \ie we estimate the mean $m_{\rho_n}$ and the variance $\sigma^2_{\rho_n}$ of $p(\rho_n|\xb,\tb,N)$ and we use a Gaussian law $\mathcal N(m_{\rho_n},\sigma^2_{\rho_n})$ to obtain a sample. This sample is accepted or rejected following $p(\rho_n|\xb,\tb,N)$. In practice we compute $m_{\rho_n}$ and $\sigma^2_{\rho_n}$ calculating by approximation of their definition :
375: \begin{eqnarray*}
376: m_{\rho_n} & \lra & \int_0^1 \rho_n \quad p(\rho_n|\xb,\tb,N) \\
377: \sigma^2_{\rho_n} & \lra & \int_0^1 \rho_n^2 \quad p(\rho_n|\xb,\tb,N) - m_{\rho_n}^2
378: \end{eqnarray*} 
379: \noindent
380: %Another way to find approximation of $m_{\rho_n}$ and $\sigma^2_{\rho_n}$ is to %use empirical estimators :
381: %\begin{eqnarray*}
382: %\hat{m}_{\rho_n} & = & \frac{1}{ln} \sum_{l=1}^{ln} %\frac{x(tn+l)-\mu_n}{x(tn+l-1)-\mu_n} \\
383: %\hat{\sigma}^2_{\rho_n} & = & \frac{1}{ln} \sum_{l=1}^{ln} \left( %\frac{x(tn+l)-\mu_n}{x(tn+l-1)-\mu_n} - \hat{m}_{\rho_n}\right)^2
384: %\end{eqnarray*}
385: %\\ \\
386: 
387: 
388: 
389: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
390: 
391: \newpage
392: \section{Other formulations}
393: Other formulation can also exist. 
394: We introduce two sets of hidden variables 
395:  
396: \centerline{
397: $\zb=[z(t_0), \cdots, z(t_0+T)]'$ and  
398: $\qb=[q(t_0), \cdots, q(t_0+T)]'$ 
399: }
400: where 
401: \beq
402: \barr{l}
403: q(t)=\left\{\barr{ll} 
404: 1 & \mbox{if~}  z(t)\not= z(t-1) \\ 
405: 0 & \mbox{elsewhere} 
406: \earr\right.
407: =\left\{\barr{ll} 
408: 1 & \mbox{if~}  t=t_n, n=0,\cdots,N \\ 
409: 0 & \mbox{elsewhere} 
410: \earr\right.
411: \earr.
412: \eeq
413: and where $z(t)$ takes an integer value $k$ in each segment : $k=1,\dots,N+1$. With these two related hidden variables, we can propose two other modeling to be used in change point analysis. For example, $\qb$ can be modeled by a Bernouilli process
414: \[
415: P(\Qb=\qb)=\lambda^{\sum_j q_j} (1-\lambda)^{\sum_j (1-q_j)}
416: =\lambda^{\sum_j q_j} (1-\lambda)^{N -\sum_j q_j}
417: \]
418: and $\zb$ can be modeled by a Mrkov chain, \ie 
419: $\{z(t), t=1,\cdots,T\}$ forms a Markov chain:
420: \beq
421: \barr{l}
422: P(z(t)=k)=p_k, \quad k=1,\cdots,K,\\ 
423: P(z(t)=k|z(t-1)=l)=p_{kl}, \quad\mbox{with~~} \sum_k p_{kl}=1. 
424: \earr
425: \eeq
426: These two models are related. In the first one, $\lambda$ plays the role of the mean value of the segment lengths and in the second $p_k$ and $p_{kl}$ give more precise control of the segment lengths.
427: In the multivariate case, or more precisely in  bivariate case (image processing), $\qb$ may represent the contours and $\zb$ the labels for the regions in the image. 
428: Then, we may also give a Markov model for them. For example, if we note by 
429: $r\in \Sc$ the position of a pixel, $\Sc$ the set of pixels positions and 
430: by $\Vc(r)$ the set of pixels in the neighorhood of the pixel position $r$, 
431: we may use an Ising model for $\qb$
432: \beq
433: P(\Qb=\qb)\propto \expf{-\rho \sum_{r\in\Sc} \sum_{s\in\Vc(r)} \delta(z(r)-z(s))}
434: \eeq
435: or a Potts model for $\zb$: 
436: \beq
437: P(\zb)\propto \expf{-\rho \sum_{r\in\Sc} \sum_{s\in\Vc(r)} 
438: \delta(z(r)-z(s))}. 
439: \eeq
440: where $rho$ in the first controls the mean lengths of the contours in the image and in the second the mean surface of the regions in the image.
441: Other more complexe modelings are also possible. 
442: 
443: With these auxiliary variables, we can write
444: \beq
445: p(\xb|\zb,\thetab)=\sum_{n=1}^N P(z_j=n) \Nc(\mu_n\oneb, \Sigmab_n)
446: =\sum_{n=1}^N p_k \Nc(\mu_n\oneb, \Sigmab_n)
447: \eeq
448: if we choose $K=N$. Here, 
449: $\thetab=\acc{N,\acc{\mu_n,\sigma_n, p_n,\; n=1,\cdots,N}, \pth{p_{kl}, \; k,l=1,\cdots,N}}$ and the model is a mixture of Gaussians. 
450: 
451: We can again assign appropriate prior law on $\thetab$ and give the expression of $p(\zb,\thetab|\xb)$ and do any inference on $\zb$, $\thetab$. 
452: 
453: Finally, we can also use $\qb$ as the auxiliary variable and write
454: \beqn
455: p(\xb|\qb,\thetab)&=&
456: (2\pi)^{-N/2} 
457: \pth{\prod_{n=1}^N 1/\sigma_n} 
458: \expf{-\frac{1}{2\sigma_n^2} \sum_{n=1}^N \pth{x(t_n)-\mu_n}^2} \nonumber \\ 
459: &+&
460: (2\pi)^{-(T-N)/2} 
461: \pth{\prod_{n=1}^N 1/\sigma_n^{(l_n-1)}}
462: \expf{-\frac{1}{2\sigma_n^2} \sum_{j=1}^T 
463: (1-q_j) \pth{x_{j}-x_{j-1}}^2} \nonumber \\ 
464: &=&
465: (2\pi)^{-T/2} 
466: \pth{\prod_{n=1}^N 1/\sigma_n^{(l_n)}}
467: \expf{-\frac{1}{2\sigma_n^2} \sum_{j=1}^T 
468: \cro{(1-q_j) \pth{x_{j}-x_{j-1}}^2
469: +q_j\pth{x_{j}-\mu_n}}} \nonumber \\ 
470: \eeqn
471: and again assign appropriate prior law on $\thetab$ and give the expression of $p(\qb,\thetab|\xb)$ and do any inference on $\qb$, $\thetab$. We are still working on using these auxiliary hidden variables particularly for applications in data fusion in image processing and we will report on these works very soon.
472: 
473: 
474: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
475: 
476: \newpage
477: \section{Simulation results}
478: To test the feasability and to mesaure the performances of the proposed algorithms, we generated a few simple cases corresponding to only changes of one of the three parameters $\mu_n$, $\sigma^2_n$ and $\rho_n$. \\
479: \noindent
480: In each case we present the data, the histogram of the \apost samples of $\tb$ during the first and the last iterations of the MCMC algorithm. For each case we also give the value of the parameters used to simulate the data, the estimated values when the changepoints are known and the estimated values by the proposed method.
481: 
482: \newpage
483: \subsection{Change of the means}
484: 
485: We can see in figure \ref{fig_mean} that we obtain precise results on the position of the changepoints. In the case of change of means, the algorithm is very fast to converge to the good solution. In fact it needs only few iterations (about 5). The main cause of this results is the importance of the means in the likelihood $p(\xb|\tb,\thetab,N)$. \\
486: We can also see in table 1 that the estimations of the means are very precise, particularly when the size of the segment is long.
487: \bfig[hbt]
488: \bcc
489: \includegraphics[width=150mm,height=75mm]{result_moyenne} 
490: \ecc
491: \caption[Different means.]{Change in the means. up to down : simulated data, histogram in the 50th iteration, histogram in the first iteration, real position of the changepoints.}
492: \label{fig_mean}
493: \efig
494: 
495: 
496: \begin{table}[h]
497: \begin{center}
498: \begin{tabular}{|c|c|c|} \hline
499: m & $\hat{m}|\xb,\tb$ & $\hat{m}|\xb$ \\
500: \hline \hline
501: 1.5 & 1.4966 & 1.4969  \\
502: 1.7 & 1.7084 & 1.7013  \\
503: 1.5 & 1.4912 & 1.5015  \\
504: 1.7 & 1.6940 & 1.6929  \\
505: 1.9 & 1.9012 & 1.8915  \\
506: \hline
507: \end{tabular}
508: \caption{Estimated value of the means}
509: \end{center}
510: \end{table}
511: 
512: \newpage
513: \subsection{Change in the variances}
514: 
515: We can  see in figure \ref{fig_var} that we have again good results on the position of the changepoints. However, for little difference of variances, the algorithm give an uncertainty on the exact position of the changepoint. This can be justified by the fact that the simulated data give itself this uncertainty. \\
516: In table 2 we can see again good estimations on the variances on each segments. 
517: \bfig[hbt]
518: \bcc
519: \includegraphics[width=150mm,height=75mm]{result_variance} 
520: \ecc
521: \caption[Different variances.]{Change in the variances. up to down : simulated data, histogram in the 50th iteration, histogram in the first iteration, real position of the changepoints.}
522: \label{fig_var}
523: \efig
524: 
525: \begin{table}[h]
526: \begin{center}
527: \begin{tabular}{|c|c|c|} \hline
528: $\sigma^2$ & $\hat{\sigma}^2|\xb,\tb$ & $\hat{\sigma}^2 |\xb$ \\
529: \hline \hline
530: 0.01 & 0.0083 & 0.0081 \\
531: 1 & 0.9918 & 0.9598 \\
532: 0.001 & 0.0007 & 0.0026 \\
533: 0.1 & 0.0945 & 0.0940 \\
534: 0.01 & 0.0079 & 0.0107 \\
535: \hline
536: \end{tabular}
537: \caption{Estimated value of the variances}
538: \end{center}
539: \end{table}
540: 
541: \newpage
542: \subsection{Change in the correlation coefficient}
543: The results showed in figure \ref{fig_corr_coef} are worse than in the two first cases. The position of the changepoints are less precise, and we can see that another changepoint appears. This affects the estimation of the correlation coefficient in the third segment because the algorithm alternates between two positions of changepoint. This problem can be justified by the fact that a value of the correlation coefficient near 1 implies locally a change of the mean, which can be considered by the algorithm as a changepoint. Also this problem appears when the size of the segments are far from the \aprio size $\lambda$. 
544: \bfig[hbt]
545: \bcc
546: \includegraphics[width=150mm,height=75mm]{result_coeff_a} 
547: \ecc
548: \caption[Different correlation coefficients.]{Change in the correlation coefficient. up to down : simulated data, histogram in the 50th iteration, histogram in the first iteration, real position of the changepoints.}
549: \label{fig_corr_coef}
550: \efig
551: 
552: \begin{table}[h]
553: \begin{center}
554: \begin{tabular}{|c|c|} \hline
555: $a$  & $\hat{a}|\xb$ \\
556: \hline \hline
557: 0 & 0.0988 \\
558: 0.9 & 0.7875 \\
559: 0.1 & 0.3737 \\
560: 0.8 & 0.8071 \\
561: 0.2 & 0.1710 \\
562: \hline
563: \end{tabular}
564: \caption{Estimated vaue of the correlation coefficients}
565: \end{center}
566: \end{table}
567: 
568: \newpage
569: \subsection{Influence of the prior law}
570: \noindent
571: In this section we study the influence of the \aprio on $\lambda$, \ie the size of the segments. In the following we fix the number of changepoints as before and we change the \aprio size of the segments by $\lambda_0=\frac{\lambda}{2}$ and $\lambda_1=2\lambda$. We apply then our algorithm on the change of the correlation coefficient.
572: \bfig[hbt]
573: \bcc
574: \includegraphics[width=150mm,height=75mm]{result_coeff_a2} 
575: \ecc
576: \caption[Change in the correlation coefficients]{Different correlation coefficient with $\lambda_0=\frac{1}{2}\frac{T}{N+1}$. up to down : simulated data, histogram in the 50th iteration, histogram in the first iteration, real position of the changepoints.}
577: \label{fig_corr_coef_1}
578: \efig
579: 
580: \bfig[hbt]
581: \bcc
582: \includegraphics[width=150mm,height=75mm]{result_coeff_a3} 
583: \ecc
584: \caption[Change in the correlation coefficients.]{Different correlation coefficient with $\lambda_1=2\frac{T}{N+1}$. up to down : simulated data, histogram in the 50th iteration, histogram in the first iteration, real position of the changepoints.}
585: \label{fig_corr_coef_2}
586: \efig
587: 
588: \noindent
589: In figure \ref{fig_corr_coef_1}, we can see that the algorithm has detected other changepoints, forming segments whose size is near $\lambda_0$. This result shows the importance of the \aprio when the data are not enough significant. We can also see this conclusion in figure \ref{fig_corr_coef_2} where only three changepoints are detected, forming segments whose size is again near $\lambda_1$. We can also remark that fixing \aprio a size $\lambda$ comes down to fix the number of changepoints. Our algorithm give then good results for instance if we have a good \aprio on the number of changepoints.
590: 
591: \section{Conclusions}
592: 
593: \small 
594: %\def\bibdir{/home/djafari/Tex/Inputs/bib/}
595: \bibliographystyle{ieeetr}
596: %\bibliography{revuedef,biben,\bibdir baseAJ,\bibdir baseKZ}
597: \bibliography{revuedef,biben,baseAJ,baseKZ,cp}
598: 
599: \edoc
600: