nlin0308032/ppg.tex
1: \documentclass[twocolumn,showpacs,preprintnumbers,amsmath,amssymb]{revtex4}
2: 
3: \usepackage{epsf}
4: \def\DoubleR{{\rm\bf R}}
5: 
6: \begin{document}
7: 
8: 
9: \title{Estimating the distribution of dynamic invariants: illustrated
10: with an application to human photo-plethysmographic time series}
11: \author{Michael Small\thanks{Tel: +852 2766 4744 , 
12: Fax: +852 2362 8439, email: {\tt ensmall@polyu.edu.hk}.}}
13: \affiliation{Department of Electronic and Information
14: Engineering\\ Hong Kong Polytechnic University, Hung Hom, Kowloon,
15: Hong Kong} 
16: 
17: \date{\today} 
18: 
19: \begin{abstract}
20: Dynamic invariants are often estimated from experimental time series
21: with the aim of differentiating between different physical states in the
22: underlying system. The most popular schemes for estimating dynamic
23: invariants are capable of estimating confidence intervals, however such
24: confidence intervals do not reflect variability in the underlying
25: dynamics. In this communication we propose a surrogate based method to
26: estimate the expected distribution of values under the null hypothesis
27: that the underlying deterministic dynamics are stationary. We
28: demonstrate the application of this method by considering four
29: recordings of human pulse waveforms in differing physiological states
30: and provide conclusive evidence that correlation dimension is capable of
31: differentiating between three (but not all four) of these states.
32: \end{abstract}
33: 
34: % insert suggested PACS numbers in braces on next line
35: \pacs{05.45.-a, 05.45.Tp, 05.10.-a}
36: 
37: \maketitle
38: 
39: Various dynamic invariants are often estimated from time series in a
40: wide variety of scientific disciplines. It has long been known that
41: these estimates (and in particular correlation dimension
42: estimates) alone are not sufficient to differentiate between chaos and noise. Most
43: notably, the method of surrogate data \cite{jT92} was introduced in an
44: attempt to reduce the rate of false positives during the hunt for physical
45: examples of chaotic dynamics. Although it is not possible to find
46: conclusive evidence of chaos through estimation of dynamic invariants,
47: surrogate methods are used to generate a distribution of statistic
48: (i.e. the estimates of the dynamic invariant) values under the
49: hypothesis of linear noise. In the most general form, the standard
50: surrogate methods can generate the distribution of statistic values under
51: the null hypothesis of a static monotonic nonlinear transformation of
52: linearly filtered noise.
53: 
54: In this communication, we introduce a significant generalisation of a recent
55: surrogate generation algorithm \cite{cyclsurr,pps2}. The {\em pseudo-periodic
56: surrogate}  (PPS) algorithm allows one to generate data consistent with
57: the null hypothesis of a noise driven periodic orbit --- provided the
58: data exhibits pseudo-periodic dynamics. This algorithm has been
59: applied to differentiate between a noisy limit cycle, and deterministic
60: chaos. By modifying this algorithm and applying it to noisy time series
61: data, we are able to generate surrogate time series that are independent
62: trajectories of the same deterministic system.
63: 
64: This ensemble of {\em attractor trajectory surrogates} (ATS) can then be used
65: to estimate the distribution statistic values for estimates of
66: any statistic derived from these time series. The statistics of greatest
67: interest
68: to us are dynamic invariants of the underlying attractor, and in
69: particular correlation dimension and entropy estimates provided by the
70: {\em Gaussian kernel algorithm} (GKA) \cite{cD96,effgka}. Our choice of the
71: GKA is entirely arbitrary, but based on our familiarity with this
72: particular algorithm.
73: 
74: An important application for the ATS technique is to determine whether
75: dynamic invariants estimated from distinct time series are significantly
76: different. The question this technique can address is whether (for
77: example) a correlation dimension of 2.3 measured during normal
78: electrocardiogram activity is really distinct from the correlation
79: dimension of 2.4 measured during an episode of ventricular tachycardia
80: \cite{csf,cic4}. Estimates of dynamic invariants (including the GKA
81: \cite{cD96,effgka}) often come with confidence intervals. But these
82: confidence intervals can only be based on uncertainty in the
83: least-mean-square fit, not the underlying dynamics. Conversely, it is
84: standard practice to obtain a large number of representative time series
85: for each (supposedly distinct) physical state, and compare the
86: distribution of statistic values derived from these. But, this approach
87: is not always feasible: in \cite{csf,cic4} for example, the problem is not
88: merely that these physiological states are both difficult and dangerous to
89: replicate, but that inter-patient variability makes doing so infeasible.
90: 
91: In the remainder of this communication we describe the new ATS algorithm
92: and demonstrate that it can be used to estimate the distribution of
93: dynamic invariant estimates from a single time series of a known
94: dynamical system (the chaotic R\"ossler system). We then apply this same
95: method to four recordings of human pulse waveforms, measured via
96: photo-plethysmography \cite{jB99,jB01}. Each of the four recordings
97: correspond to a distinct physiological state. We compute correlation
98: dimension and entropy using the GKA method and show that the expected
99: distribution of correlation dimension and entropy estimates are
100: sufficient to differentiate between these four physiological states.
101: 
102: The ATS algorithm may be described as follows. Embed a scalar time
103: series $\{x_t\}$ to obtain a vector timeseries $\{z_t\}$ (of length
104: $N$). The choice of embedding is arbitrary, but has been adequately
105: discussed in the literature (\cite{window} for example). From the
106: embedded time series, the surrogate is obtained as follows. Choose an
107: initial condition, $w_1\in\{z_t|t=1,\ldots,N\}$. Then, at each step,
108: choose the successor to $w_t$ with probability
109: \begin{eqnarray}
110: \label{switch}
111: P(w_{t+1}=z_{i+1}) & \propto &
112: \exp{\frac{-\|w_t-z_i\|}{\rho}}
113: \end{eqnarray}
114: where the {\em noise radius} $\rho$ is an as-yet unspecified
115: constant. In other words, the successor to $w_t$ is the successor of a
116: randomly chosen neighbour of $w_t$. Finally, from the vector time series
117: $\{w_t\}$ the ATS $\{s_t\}$ is obtained by projecting $w_t$ onto
118: $[1\;0\;0\;0\;\cdots\;0]$ (the first coordinate). Hence
119: \begin{eqnarray}
120: s_t & = & w_t\cdot[1\;0\;0\;0\;\cdots\;0]
121: \end{eqnarray}
122: 
123: In \cite{cyclsurr,pps2} this algorithm was shown to be capable of
124: differentiating between deterministic chaos and a noisy periodic
125: orbit. In the context of the current communication we assume that
126: $\{x_t\}$ is contaminated by additive (but possibly dynamic) noise and
127: we choose the noise radius $\rho$ such that the observed noise
128: is replaced by an independent realisation of the same noise
129: process. Furthermore, we assume
130: that the deterministic dynamics are preserved by suitable choice of
131: embedding parameters. Under these two assumptions, $\{z_t\}$ and
132: $\{w_t\}$ have the same invariant density and $\{x_t\}$ and $\{s_t\}$
133: are therefore (noisy) realisation of the same dynamical system with (for
134: suitable choice of $\rho$) the same noise distribution.
135: 
136: As in \cite{cyclsurr,pps2} the problem remains the correct choice of
137: $\rho$. This is the major difference between the ATS described here and the PPS of
138: \cite{cyclsurr,pps2}. However, since the null hypothesis we wish to
139: address is different from (and more general than) that of the PPS,
140: choice of $\rho$ for the ATS is less restrictive. For $t=T$ given, one
141: can compute $P(w_{t+1}\neq z_{i+1}
142: \wedge \|w_t-z_i\|=0 | t=T)$ directly from the data by applying (\ref{switch}). Assuming the
143: process is ergodic \footnote{This assumption is sufficient rather than
144: necessary.} one can then sum
145: \begin{eqnarray}
146: \lefteqn{P(w_{t+1}\neq z_{i+1} \wedge \|w_t-z_i\|=0) = }\\
147: \nonumber && \frac{1}{N}\sum_{T=1}^N
148: P(w_{t+1}\neq z_{i+1} \wedge \|w_t-z_i\|=0 | t=T)
149: \end{eqnarray}
150: to get the probability of a temporal discontinuity
151: \footnote{By temporal discontinuity we mean that $w_t=z_i$ but
152: $w_{t+1}\neq z_{i+1}$.} in the
153: surrogate at any time instant. There is a 1:1 correspondence between a
154: value $p=P(w_{t+1}\neq z_{i+1} \wedge \|w_t-z_i\|=0)$ and $\rho$, and we
155: choose to implement (\ref{switch}) for a particular value of $p$ (i.e. a
156: particular transition probability) rather than a specific noise
157: level. In what follows we find that studying intermediate values of $p$
158: ($p\in[0.05,0.95]$) is sufficient. However, the significant point is
159: that $p\in[0.05,0.95]$ corresponds to a very narrow range of values of
160: $\rho$.
161: 
162: 
163: \begin{figure}
164:   \[\epsfxsize 75mm \epsfbox{fig1a.eps}\]
165:   \[\epsfxsize 75mm \epsfbox{fig1b.eps}\]
166: \caption{{\bf Distribution of statistics $D$, $K$ and $S$ for
167: short and noisy realisations of the R\"ossler system.} The histogram
168: shows the distribution of statistic estimates ($D$, $K$ and $S$) for
169: $500$ ATS time series generated from a $1000$ point realisation of the
170: R\"ossler system. The solid vertical line on each plot is the comparable
171: value for the data and the stars marked on the horizontal axes are for
172: $20$ independent realisations of the same process. The top row of
173: figures depicts results for the R\"ossler system with observational
174: noise only, the bottom row of figures has both observational and dynamic
175: noise. Panels (a) and (d) show correlation dimension estimates, (b) and
176: (e) are entropy, and (c) and (f) are noise level.}
177: \label{rossler}
178: \end{figure} 
179: 
180: We now demonstrate the applicability of this method for noisy time
181: series data simulated from the R\"ossler differential equations (during
182: ``broad-band'' chaos). We integrated ($1000$ points with a time step of
183: $0.2$) the R\"ossler equations both with and without multidimensional dynamic noise at
184: $5\%$ of the standard deviation of the data. We then studied the
185: $x$-component after the addition of $5\%$ observational noise. We
186: selected embedding parameters using the standard methods ($d_e=3$ and
187: $\tau=8$) and then compute ATS surrogates for various exchange
188: probabilities $p=0.05,0.1,0.15,\ldots,0.95$. For the data set and each
189: ensemble of surrogates we then estimated correlation dimension $D$,
190: entropy $K$ and noise level $S$ using the GKA algorithm
191: \cite{cD96,effgka} (GKA embedding using embedding dimension $m=2,3,\ldots,10$ and
192: embedding lag of $1$). Figure \ref{rossler} depicts the results when the GKA is
193: applied with embedding dimension $m=4$ and the exchange probability is
194: $p=0.35$. Other values of $m$ gave equivalent results, as did various
195: values of $p$ in the range $[0.2,0.8]$.
196: 
197: For $p\in[0.2,0.8]$ we found that the estimate of noise $S$ from the GKA
198: algorithm coincided for data and surrogates, but this was often not the
199: case for extreme values of $p$. Therefore, this estimate of signal noise
200: content is a good indicator of the accuracy of the dynamics reproduced
201: by the ATS time series. Furthermore to confirm the spread of the data we
202: also estimated $D$, $K$, and $S$ for $20$ further
203: realisations of the same R\"ossler system (with different initial
204: conditions). In each case, as expected, the range of these values lies
205: well within the range predicted by the ATS scheme.
206: 
207: \begin{figure}
208:   \[\epsfxsize 75mm \epsfbox{fig2.eps}\]
209: \caption{{\bf Human pulse waveform recorded with photo-plethysmography.}
210: Four recordings of human pulse waveform (61 Hz) in four different
211: physiological conditions. The four time series correspond to: (a)
212: normal, (b) quasi-stable, (c) unstable, and (d) post-operative (stable).}
213: \label{ppg}
214: \end{figure} 
215: 
216: We now consider the application of this method to photo-plethysmographic
217: recordings of human pulse dynamics over a short time period (about 16.3
218: seconds). We have access to only a limited amount of data representative
219: of each of four different dynamic regimes. In any case, we would expect
220: the system dynamics to change if measured over a significantly longer
221: time frame. The data collection and processing with the methods of
222: nonlinear time series analysis are described in
223: \cite{jB99,jB01}. Previously, we have studied nonlinear determinism in
224: cardiac dynamics measured with electrocardiogram (ECG)
225: \cite{csf,cic4}. Although we do not consider ECG data here, this data would
226: be another useful system to examine with these methods
227: \footnote{Actually, the problem here is that we have too much data and
228: it is therefore difficult to select a ``representative'' small number of
229: short time series. However, we intend to examine this data more
230: carefully in forthcoming work.}. The four data sets we examine in this
231: communication are depicted in figure \ref{ppg}.
232: 
233: \begin{figure}
234:   \[\epsfxsize 75mm \epsfbox{fig3.eps}\]
235: \caption{{\bf Distribution of statistics $D$, $K$ and $S$ for
236: human pulse waveforms.} The histogram shows the distribution of
237: statistic estimates ($D$, $K$ and $S$) for $500$ ATS time series
238: generated from each of the four time series depicted in figure
239: \ref{ppg}. The solid vertical line on each plot is the comparable value for the data
240: and the stars marked on the horizontal axes are for the (limited)
241: subsequent data recorded from each patient. In each case only two or
242: three subsequent contiguous but non-overlapping timeseries were
243: available. The figures are: (a) correlation dimension ($D$), (b) entropy
244: ($K$), and (c) noise ($S$) for the normal rhythm; (d) $D$, (e) $K$, and
245: (f) $S$ for the quasi-stable rhythm; (g) $D$, (h) $K$, and (i) $S$ for
246: the unstable rhythm; and (j) $D$, (k) $K$, and (l) $S$ for the
247: post-operative stable rhythm.}
248: \label{ppgres1}
249: \end{figure} 
250: 
251: For each data set we repeated the analysis described for the R\"ossler
252: time series. Results for GKA embedding dimension $m=4$ and $p=0.35$
253: are depicted in figure \ref{ppgres1}. As with the R\"ossler system,
254: variation of the parameters $m$ and $p$ did not significantly change
255: the results. We find that in every case (except for $p\notin[0.2,0.8]$)
256: the distribution of $D$, $K$ and $S$ estimated from the ATS data using
257: the GKA included the true value. Most significantly, this indicates that
258: the range of values of $p$ is appropriate. Moreover, these results are
259: consistent with the hypotheses that
260: the noise is effectively additive and can be modelled with this simple
261: scheme, and that the underlying deterministic dynamics can be
262: approximated with a local constant modelling scheme. 
263: 
264: We also estimated the statistics $D$, $K$ and $S$ for
265: additional available data (subsequent, contiguous, but non-overlapping)
266: from each of the four rhythms. This small amount of data afforded us two
267: or three additional estimates of each statistic for each rhythm. For the
268: unstable and quasi-stable rhythm we observed good agreement. For the
269: stable (normal and post-operative) rhythms, this is not the case. On
270: examination of the data we find that this result is to be
271: expected. Both the stable rhythms undergo a change in amplitude and
272: baseline subsequent to the end of the original $16$ second recording,
273: this non-stationarity is reflected in the results. This same
274: non-stationarity has also been observed independently in Bhattacharya
275: and co-workers \cite{jB99,jB01}.
276: 
277: \begin{figure}
278:   \[\epsfxsize 85mm \epsfbox{fig4.eps}\]
279: \caption{{\bf Discriminating power of the statistics $D$, $K$ and $S$ for
280: human pulse waveforms.} The distribution (a binned histogram) of
281: statistic values estimated via the ATS method (as described in figure
282: \ref{ppgres1}) for each of the four distinct physiological waveforms is
283: shown. The four rhythms are labelled 'a', 'b', 'c', and 'd'
284: corresponding to the same labelling in figure \ref{ppg}.  These figures
285: show that correlation dimension alone is sufficient to differentiate
286: between three of these four physiological states. The exception is that
287: these three statistics are insufficient to differentiate between the
288: normal and post-operative states.}
289: \label{ppgres2}
290: \end{figure} 
291: 
292: We now return to the question that the ATS test was designed to address:
293: can we differentiate between these four rhythms based on the GKA?
294: Figure \ref{ppgres2} provides the answer. In figure \ref{ppgres2} we see
295: the estimated distribution of statistic values ($D$, $K$ and $S$) for
296: each of the four rhythms shown in figure \ref{ppg}. Clearly (and not
297: surprisingly), the correlation dimension estimate and noise level of the
298: unstable rhythm is significantly different from the other three
299: rhythms. More significantly, the quasi-stable rhythm is also observed to
300: be distinct from the other three regimes. Furthermore, we observe that
301: in the quasi-stable state the correlation dimension estimate is
302: significantly less than one, while for the unstable state it is
303: significantly larger than one. For example, the quasi-stable state may
304: be characterised by a noise driven stable focus
305: \footnote{Due to the discretisation necessary to digitise this data, a
306: noise drive stable periodic orbit is also a plausible cause of the
307: observed data. To distinguish these two, a more detailed study is
308: required.}, while the unstable state exhibits high dimensional
309: (i.e. $D>1$) deterministic dynamics. Both these regimes exhibit
310: significantly more (additive Gaussian) noise than the stable states.
311: 
312: The two stable states (panels (a) and (d) of figure \ref{ppg}) are
313: harder to distinguish: both visually and using the statistics $D$, $K$,
314: and $S$. While the individual data sets we depict in figure \ref{ppg}
315: exhibit different statistic values (for example $D=1.06$ and $D=1.01$),
316: we find that the ATS analysis indicates that these statistics are not
317: significantly different. Both regimes exhibit a correlation dimension of
318: about one, and a similar noise level. The variation in observed results
319: is lesser in the post-operative stable regime, but the distribution do
320: overlap.
321: 
322: Finally, we find that entropy estimated with the GKA algorithm $K$ is of
323: no use in differentiating between these four rhythms.
324: 
325: The results of this analysis are in general agreement with those
326: presented in \cite{jB99,jB01}. Independent linear surrogate analysis
327: \cite{jT92} has confirmed that each of these four rhythms is inconsistent
328: with a monotonic nonlinear transformation of linearly filtered noise
329: \footnote{These calculations are routine, and not presented in this
330: communication.}. The only significant difference is that the correlation
331: dimension estimates we present here are significantly lower than those in
332: \cite{jB99,jB01}. This is due to the different correlation dimension
333: algorithm. Unlike the algorithm employed in \cite{jB99,jB01}, the GKA
334: seperates the data into purely deterministic and stochastic components,
335: and hence estimates both $D$ and $S$. The correlation dimension
336: estimated in \cite{jB99,jB01} is the combined effect of both components
337: of the GKA.
338: 
339: Although we have considered the specific application of human pulse
340: dynamics, the algorithm we have proposed may be applied to a wide
341: variety of problems. We have shown that provided time delay embedding
342: parameters can be estimated adequately, and an appropriate value of the
343: exchange probability is chosen, the ATS algorithm generates independent
344: trajectories from the same dynamical system. When applied to data from
345: the R\"ossler system we confirm this result, and we demonstrate its
346: application to experimental data. 
347: 
348: When the ATS algorithm is applied to generate independent realisation for a
349: hypothesis test, one is able to construct a test for non-stationarity. If
350: two data sets do not fit the same distribution of ATS data then they can
351: not be said to be from the same deterministic dynamical
352: system. Unfortunately, the converse is not always true and the power of
353: the test depends on the choice of statistic. The utility of this
354: technique as a test for stationarity remains a subject for future investigation.
355: 
356: %\vspace{0.5cm}
357: \section*{Acknowledgments}
358: 
359: This research was supported by a Hong Kong Polytechnic University
360: Research Grant (NO. A-PE46). The authors wish to thank J. Bhattacharya
361: for supplying the photo-plethysmographic time series.
362: 
363: \bibliographystyle{unsrt}
364: \begin{thebibliography}{10}
365: 
366: \bibitem{jT92}
367: J. Theiler {\em et al.}
368: %\newblock Testing for nonlinearity in time series: The method of surrogate
369: %  data.
370: \newblock {\em Physica D}, 58:77--94, 1992.
371: 
372: \bibitem{cyclsurr}
373: M. Small, D. Yu, and R.G. Harrison.
374: %\newblock A surrogate test for pseudo-periodic time series data.
375: \newblock {\em Physical Review Letters}, 87:188101, 2001.
376: 
377: \bibitem{pps2}
378: M. Small and C.K. Tse.
379: %\newblock Applying the method of surrogate data to cyclic time series.
380: \newblock {\em Physica D}, 164:187--201, 2002.
381: 
382: \bibitem{cD96}
383: C.~Diks.
384: %\newblock Estimating invariants of noisy attractors.
385: \newblock {\em Physical Review E}, 53:R4263--R4266, 1996.
386: 
387: \bibitem{effgka}
388: D. Yu {\em et al.}
389: %\newblock Efficient implementation of the {G}aussian kernel algorithm in
390: %  estimating invariants and noise level from noisy time series data.
391: \newblock {\em Physical Review E}, 61:3750--3756, 2000.
392: 
393: \bibitem{csf}
394: M. Small {\em et al.}
395: %\newblock Uncovering nonlinear structure in human {ECG} recordings.
396: \newblock {\em Chaos, Solitons and Fractals}, 13:1755--1762, 2001.
397: 
398: \bibitem{cic4}
399: M. Small {\em et al.}
400: %\newblock Automatic identification and recording of cardiac arrhythmia.
401: \newblock {\em Computers in Cardiology}, 27:355--358, 2000.
402: 
403: \bibitem{jB99}
404: J.~Bhattacharya and P.P. Kanjilal.
405: %\newblock Assessing determinism of photo-plethysmographic signal.
406: \newblock {\em IEEE Transactions on Systems, Man and Cybernetics A},
407:   29:406--410, 1999.
408: 
409: \bibitem{jB01}
410: J.~Bhattacharya, P.P. Kanjilal, and V.~Muralidhar.
411: %\newblock Analysis and characterization of photo-plethysmographic signal.
412: \newblock {\em IEEE Transactions on Biomedical Engineering}, 48:5--11, 2001.
413: 
414: \bibitem{window}
415: M. Small and C.K. Tse.
416: %\newblock Optimal embedding parameters: A modelling paradigm.
417: \newblock {\em Physica D}, 2003.
418: \newblock To appear.
419: 
420: \end{thebibliography}
421: 
422: \end{document}