physics0604167/ms.tex
1: \documentclass[aps,showkeys,groupedaddress,twocolumn,showpacs]{revtex4/revtex4}
2: \usepackage{graphicx}
3: \usepackage[ansinew]{inputenc}
4: \usepackage[tbtags]{amsmath}
5: \usepackage{amssymb}
6: 
7: \newcommand{\erfc}{\mbox{erfc}}
8: \newcommand{\joint}{$\rho^{\Theta}(x_{n},x_{n+1},d)$ }  
9: \newcommand{\ja}{$\rho^{\Theta}_j(x_{n+1},x_n,a,eta)$ }
10: \newcommand{\marg}{$\rho^{\Theta}_m(x_n,d)$ }
11: \newcommand{\ma}{$\rho(x_n|X(\eta),a)$ }
12: \newcommand{\ptotal}{$\rho_{\Theta}(a,\eta)$ }
13: \newcommand{\antima}{$\rho(x_n|\overline{X(\eta)},a)$ }
14: \begin{document}
15: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
16: \title{Precursors of extreme increments}
17: \author{Sarah Hallerberg, Eduardo G. Altmann, Detlef Holstein, Holger Kantz}  
18: \affiliation{Max Planck Institute for the Physics of Complex Systems\\
19: N\"othnitzer Str.\ 38, D 01187 Dresden, Germany\\}
20: \date{\today}
21: %%%%%%
22: \begin{abstract}
23: We investigate precursors and predictability of extreme increments in a time
24: series. The events we are focusing on consist in large increments within
25: successive time steps. We are especially interested in understanding how the quality of the predictions depends on the strategy to choose precursors,
26: on the size of the event and on the correlation strength. We study the
27: prediction of extreme increments analytically in an AR(1) process, and
28: numerically in wind speed recordings and long-range correlated ARMA data.
29: We evaluate the success of predictions via
30: receiver operator characteristics (ROC-curves). 
31: Furthermore, we observe an increase of the quality of predictions with
32: increasing event size and with decreasing correlation in all examples. Both effects can be understood by using the likelihood ratio as a summary index for smooth ROC-curves.
33: \end{abstract}
34: %%%%%%%%%%%
35: \pacs{02.50.-r,05.45.Tp}
36: \keywords{time series analysis, extreme events, extreme increments,
37: precursors, ROC-curve, likelihood ratio}
38: \maketitle
39: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
40: \section{Introduction}
41: %
42: 
43: Extreme value statistics \cite{Coles} is a well established approach to predict
44: the relative frequency of rare extreme events, but does not include forecasts
45: of when the next event will occur.
46: %
47: There have been many attempts to employ time series
48: strategies for the latter purpose. These strategies usually investigate a record of
49: historical data about the phenomenon under study and try to infer knowledge
50: about the future. A standard approach is to search for precursors, i.e.,
51: typical signatures preceeding an extreme event. Such precursors have
52: been discussed, e.g., in the literature about earthquakes \cite{Jackson}, 
53: epileptic seizures  \cite{Kapiris}, and stock market crashes
54: \cite{Sornette2,stocks,Sornette3}. As the above listed examples illustrate, the definitions of
55: what an extreme event is depends on the context. Frequently, one encounters
56: extremely large values of some observable, or some drastic changes. It is the
57: latter which is the focus of this paper where we discuss large increments
58: motivated by stock markets or by turbulent gust in wind speed data.
59: 
60: One might expect that the more extreme an event is, the more difficult it is
61: to predict it, simply because more extreme events are usually also much
62: rarer.  However, it has been reported in the literature of wind speed
63: predictions \cite{Physa}, precipitation forecast \cite{Goeber}, multi agent games \cite{Johnson1} and earthquakes
64: \cite{Fatemeh} that more extreme events
65: are better predictable than small events. Therefore one particular goal of
66: this contribution is to investigate how the predictability of large increments
67: depends on the size of the increment. 
68:  
69: In this contribution we study predictions in a simple autoregressive process
70: of order 1 (AR(1) process) \cite{Box-Jen} analytically in order to obtain a detailed understanding of
71: some questions on precursors and predictions. The AR(1) process is a simple
72: stationary stochastic model process, that might not reflect all features of
73: more complex processes occurring in nature, but it admits a fully analytic
74: treatment. Additionally, we study similar prediction procedures numerically
75: in long-range correlated data and in wind speed data, verifying the same
76: quantitative results.
77: The questions, which we intend to answer are the following:
78: \begin{itemize}
79: \item [{\bf (Q1)}] How to choose a precursor in order to obtain good
80: predictions? 
81: \item [{\bf (Q2)}] Are extreme increments the better predictable, the more
82: extreme they are?
83: \item [{\bf (Q3)}] How does the correlation of the data influence the predictability
84: of extreme increments?
85: \end{itemize}
86: 
87: The paper is organized as follows.
88: In Sec.\ \ref{pre} we discuss two strategies which can be used to choose
89: precursory structures and in Sec.\ \ref{roc} we introduce a method to evaluate
90: the predictive power of precursors.  The extreme events we dicuss in this
91: contribution are defined in Sec.\ \ref{def} and we show how to obtain their
92: joint PDFs analytically in Sec.\ \ref{tfilter}. We apply these procedures to 
93: AR(1) correlated stochastic processes in Sec.\ \ref{AR1}, to wind speed
94: measurements in Sec.\ \ref{wind} and to long-range correlated data Sec.\ \ref{long}. Conclusions appear in Sec.\ \ref{con}.
95: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
96: \section{Definitions and set-up\label{sec2}}
97: The considerations in this introductory section are made for general dynamical
98: systems
99: with a complex time evolution. They might be purely deterministic, then high-dimensional and
100: chaotic, or they might be stochastic. In any case we assume that the
101: time evolution of the system cannot be easily modeled and hence one tries to
102: extract information about the future from time series data.
103:  This means that through some experimental observation one can record a usually
104: univariate time series, i.e., a set of measurements $x_n$ at discrete
105: times $t_n$, where $t_n = t_0 + n\Delta$ with a sampling
106: interval $\Delta$. The recording should contain sufficiently many extreme events so that we are able to extract statistical
107: information about them. We also assume that the event of
108: interest can be identified on the basis of the observations, e.g., by the
109: value of the observation function exceeding some threshold, by a sudden
110: increase, or by its variance exceeding again some threshold.
111: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
112: \subsection{The choice of the precursor \label{pre}}
113: Ideally, a precursor is a typical signature in the data preceeding {\em every}
114: individual event. Unfortunately the time evolution of
115: most systems is usually too irregular to demand this, so one would call
116: a precursor a data structure which is {\em typically} preceeding an event,
117: allowing deviations from the given structure, but also allowing events without
118: preceeding structure. This interpretation of a precursor allows to determine
119: the specific values of the precursory structure by statistical considerations. 
120: 
121: In order to predict an event occurring at the time $(n+1)$ we compare the last
122: $k$ observations ${\mathbf x}_{(n,k)}= (x_{n-k+1},x_{n-k+2}, ..., x_{n-1},x_n)$
123: with a specific precursory structure ${\mathbf x}_{pre}= (x_{n-k+1}^{pre},
124: x_{n-k+2}^{pre}, ...,x_{n-1}^{pre}, x_n^{pre})$.
125: 
126: This precursory structure can be chosen according to different strategies. The
127: two possible strategies which we address here, represent the
128: most fundamental choices. They consist in using either the
129: maximum of the {\sl a posteriori PDF} or of the maximum of the {\sl
130: likelihood}  \cite{Bernado}.
131: %
132: In more applied examples one looks for precursors which minimize or maximize
133: more sophisticated quantities, e.g., discriminant functions or loss matrices.
134: These quantities are usually functions of the posterior PDF or the
135: likelihood, but they take into account the additional demands of the
136: specific problem, e.g., minimizing the loss due to a false prediction \cite{Bishop}.
137: The two strategies studied in this contribution are thus fundamental in the
138: sense that they enter into most of the more sophisticated quantities which
139: are used for predictions and decision making.
140: 
141: 
142: The a posteriori PDF $\rho({\mathbf x}_{(n,k)}|X)$ takes
143: into account all events of size $X$ and provides the
144: probability density to find a specific precursory structure before an observed
145: event. 
146: 
147: \begin{itemize}
148: \item [(I)]  Hence strategy I consists in defining the precursors in a
149: retrospective or {\sl a posteriori} way: once the extreme event $X$ has been
150: identified, one asks for the signals right before it.
151: Formally, this implies that the precursory structure consists of the global maxima in
152:  each component $(x_{n-k+1}^{*}, x_{n-k+2}^{*},...,x_{n-1}^{*}, x_n^{*})$ of
153: the a posteriori PDF.
154: \end{itemize}
155: %%%%%%%%%%%%%%%%%%%%%%%%%
156: 
157: %%%%%%%%%%%%%%%%%%%%%%%%%
158: The likelihood  $\rho(X|{\mathbf x}_{(n,k)})$ takes into account all possible
159: values of precursory structures, and provides the probability density that an
160: event of size X will follow them. Note that the likelihood is thus not a
161: density function with respect to the precursory structure, but with respect to
162: the event size X. The precursory structure enters into the likelihood only as a
163: parameter. 
164: 
165: \begin{itemize}
166: \item [(II)] Strategy II consists in determining those values of each
167: component $x_i$ of the condition ${\mathbf x}_{(n,k)}$ for which the
168: likelihood has a global maximum.
169: \end{itemize}
170: Note that the a posterior PDF and the likelihood are linked via Bayes's
171: theorem
172: \begin{eqnarray}
173:  \rho({\mathbf x}_{(n,k)}, X) & = & \rho({\mathbf x}_{(n,k)})\, \rho(X|{\mathbf x}_{(n,k)}) =
174: \rho({\mathbf x}_{(n,k)}|X)\,\rho(X), \nonumber 
175: \end{eqnarray}
176: where $\rho({\mathbf x}_{(n,k)})$ represents the marginal PDF to find the
177: precursory structure ${\mathbf x}_{(n,k)}$  and $\rho(X)$ represents the marginal
178: PDF to find events of size X.
179: 
180: In summary the possible values of precursors are given by
181: %
182: \begin{eqnarray}
183: {\mathbf x}_{pre} & = & \left\{ \begin{array}{l} {{\mathbf x}_{I}}, \\
184: {{\mathbf x}_{II}},
185:  \end{array}\label{precursor} \right.\\
186: \mbox{where}\quad
187: {\mathbf x}_{I} & := &  \left( x_{n-k+1}^{*},
188: x_{n-k+2}^{*},...,x_{n-1}^{*},x_n^{*}\right), \nonumber\\
189:   \mbox{and} \quad {\mathbf x}_{II} & := &  \left(x_{n-k+1}^{\dagger}, x_{n-k+2}^{\dagger},...,x_{n-1}^{\dagger},x_n^{\dagger}\right),\nonumber
190: \end{eqnarray}
191: where $x_i^*$ are the points in which $\rho({\mathbf x}_{(n,k)}|X)$ has a global maximum
192: and $ x_i^{\dagger}$  are the points in which $\rho(X|{\mathbf x}_{(n,k)})$ has
193: its largest
194:  maximum, with $n-k+1\leq i\leq n$. In both cases the event size $X$ is assumed to be fixed.
195: %
196: Once the precursory structure  ${\mathbf x}_{pre}$ is determined, we give an alarm for an extreme event when we find the last $k$ observations
197: ${\mathbf x}_{(n,k)}$ in the volume 
198: %
199: \begin{widetext}
200: \begin{eqnarray}
201: V_{pre}(\delta) & = &
202: \left(x_{n-k+1}^{pre}-\frac{\delta}{2},x_{n-k+1}^{pre}+\frac{\delta}{2}\right)\times \left(
203: x_{n-k+2}^{pre}- \frac{\delta}{2} , x_{n-k+2}^{pre}+ \frac{\delta}{2} \right) \times
204: ... \times \left(x_n^{pre}-\frac{\delta}{2},x_n^{pre}+\frac{\delta}{2}\right). \label{vol}
205: \end{eqnarray}
206: \end{widetext}
207: %
208: This method of determining the precursor is especially useful if the PDF of
209: a process has one clearly defined maximum. For multimodal PDFs the strategy of
210: using only the global maxima can surely be
211: improved by considering also the influence of smaller maxima of the
212: PDF. In this case the precursory volume could, e.g., consist of ${\mathbf x}_{(n,k)}$ for which the PDFs have
213: values above a certain threshold. In this case $V_{pre}(\delta)$ might not be
214: simple connected, but apart from this the procedure of predicting should not
215: be different. However, we restrict ourselves to unimodal PDFs in this contribution.
216: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
217: \subsection{Testing for predictive power\label{roc}}
218: A common method to verify a hypothesis or test the quality of a prediction is
219: the receiver operating characteristic curve (ROC-plot) \cite{Swets1,Egan}.
220: %In the 1980s it became popular for medical diagnostic testing, nowadays there %are many other fields of applications as well. 
221: The idea of the ROC-curve consists simply in comparing the rate of correctly
222: predicted events $r_{c}$ with the rate of false alarms $r_{f}$ by plotting
223: $r_c$ vs. $r_f$. The resulting curve in the unit-square of the $r_f$-$r_c$
224: plane  approaches the origin for $\delta \rightarrow0$ and the point $(1,1)$ in the limit $\delta
225: \rightarrow \infty$, where $\delta$ accounts for the size of the precursor
226: volume $V_{pre}(\delta)$ (see Eq.\ (\ref{vol})). 
227: 
228: The shape of the curve characterizes the significance of the prediction. A
229: curve above the diagonal reveals that the corresponding strategy of prediction
230: is better than a random prediction which is characterized by the
231: diagonal. Furthermore we are interested in curves which converge as fast as
232: possible to $r_c=1$, since this scenario tells us that we reach the highest
233: possible rate of correct prediction without having a large rate of false
234: alarms.
235: 
236:  There are various so called  {\it summary indices} \cite{Pepe}
237: which quantify the behavior of the ROC. 
238: %
239: %The
240: %most popular of them consists in measuring the area under the ROC-curve, but
241: %there are other concepts like the Kolmogorov-Smirnov index, which measures the% largest distance of the ROC-curve from the
242: %diagonal. 
243: %
244: In this contribution we use the so called {\it likelihood ratio} \cite{Egan} in
245: order to quantify the ROC-curve. The likelihood ratio
246: is identical to the slope $m$ of the ROC-curve. For the usage as a summary index,
247: we consider the slope in the vicinity of the
248: origin which implies $\delta \rightarrow 0 $.
249: 
250: The term likelihood ratio results from signal detection theory in which context
251:  the term "a posteriori PDF" refers to the PDF
252: which we call likelihood in the context of predictions, and vice versa. This is
253: due to the fact that the aim of signal detection is to identify a signal
254: which was already observed in the past, whereas predictions are made about
255: future events. Thus the
256: "likelihood ratio" is in our case in fact a ratio of the posterior PDFs, as
257: defined by
258: %
259: \begin{eqnarray}
260: m & = & \frac{\Delta r_c}{\Delta r_f} \sim \left. \frac{\rho({\mathbf x}_{(n,k)}|X)}{\rho({\mathbf x}_{(n,k)}|\overline{X})}
261: \right|_{\delta \approx 0} + \mathcal{O}(\delta)\label{defm},
262: \end{eqnarray}
263: % 
264: where $\rho({\mathbf x}_{(n,k)}|\overline{X})$ denotes the a posterior PDF for non-events.
265: However, we will use the common name likelihood ratio throughout the text.
266: 
267: The {likelihood ratio} can be
268: expressed in terms of the likelihood $\rho\bigl(X|{\mathbf x}_{(n,k)}\bigr)$ and the total probability to find events $\rho\bigl(X\bigr)$
269: %
270: \begin{eqnarray}
271: m({\mathbf x}_{(n,k)},X) & \sim & \frac{\Bigr(1-\rho(X)\Bigr)}{\rho(X)}
272: \frac{\rho(X|{\mathbf x}_{(n,k)})}{\Bigl(1 - \rho(X|{\mathbf x}_{(n,k)})\Bigr)}. \label{mint}
273: \end{eqnarray}
274: %
275: If we assume that the events we are observing are quite rare and hence $\rho(X),
276: \rho(X|{\mathbf x}_{(n,k)}) \ll 1$, the likelihood ratio is approximately given by
277: 
278: \begin{eqnarray}
279: m ({\mathbf x}_{(n,k)},X)&\sim& \frac{\rho(X|{\mathbf x}_{(n,k)})}{\rho(X)} =
280: \frac{\rho({\mathbf x}_{(n,k)}|X)}{\rho({\mathbf x}_{(n,k)})} 
281: %%%=\frac{\rho({\mathbf x}_{(n,k)},X)}{\rho({\mathbf x}_{(n,k)})\rho(X)}
282: \quad \label{shortm}
283: \end{eqnarray}
284: Eq.\ \ref{shortm} already suggest an answers to questions {\bf (Q1)} and
285: {\bf (Q2)}, by considering $m ({\mathbf x}_{(n,k)},X)$ as a summary index.
286: {\bf ad (Q1):} This asymptotic form of the likelihood ratio allows us to
287: compare different strategies of prediction. Looking for the maximum
288: of $\rho({\mathbf x}_{(n,k)}|X)$ in ${\mathbf x}_{(n,k)}$, according to
289: strategy I, there is always the influence of the denominator
290: $\rho({\mathbf x}_{(n,k)})$ which will keep the likelihood ratio small, even if
291: $\rho({\mathbf x}_{(n,k)}|X)$ in ${\mathbf x}_{(n,k)}$ is maximized. This is
292: due to the fact that $\rho({\mathbf x}_{(n,k)}|X)$ cannot be large without
293: $\rho({\mathbf x}_{(n,k)})$ being large. Strategy II, which uses
294: the maximum of $\rho(X|{\mathbf x}_{(n,k)})$ in $ {\mathbf x}_{(n,k)} $ should thus be superior, since
295: the denominator $\rho(X)$ is independent of the chosen precursor. The examples
296: which are studied in Sec.\ \ref{AR1}, Sec.\ \ref{wind} and Sec.\ \ref{long} support this idea.
297: 
298: {\bf ad (Q2):}
299: According to Eq.\ (\ref{shortm}), the likelihood ratio is larger than unity, if
300: $\rho({\mathbf x}_{(n,k)},X) > \rho({\mathbf x}_{(n,k)})\rho(X) $, i.e., if $ {\mathbf x}_{(n,k)}$
301: and $X$ are correlated. This condition can be also written as $
302: \rho(X|{\mathbf x}_{(n,k)}) > \rho(X)$ or as $\rho({\mathbf x}_{(n,k)}|X) >
303: \rho({\mathbf x}_{(n,k)})$ using Bayes's theorem. The latter expression states
304: that the a posteriori PDF $\rho({\mathbf x}_{(n,k)}|X)$, i.e., the probability to
305: find the precursor prior to an event should be larger than the probability to
306: find the precursor prior to an arbitrary value.
307: Thus, the condition is
308: fulfilled by choosing the precursor in a reasonable way, e.g.,
309: using the maximum of $\rho({\mathbf x}_{(n,k)}|X)$  in ${\mathbf x}_{(n,k)}$ or the
310: maximum of $\rho({\mathbf x}_{(n,k)}|X)$.
311: 
312: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
313: \subsection{Definition of Extreme Increments \label{def}}
314:  In this contribution we will concentrate on extreme events which
315: consist in a sudden increase (or decrease) of the observed variable within a
316: few time steps. Examples of this kind of extreme events are the increases in
317: wind speed in \cite{Physa,Euromech}, but also stock market
318: crashes \cite{stocks,Sornette2} which consist in sudden decreases.
319: 
320: 
321: We define our extreme event by an increment $x_{n+1}-x_n$
322: exceeding a given threshold $d$
323: %
324: \begin{equation}
325:  x_{n+1}-x_n \geq   d, \label{e0}
326: \end{equation}
327: %
328: where $x_{n}$ and $x_{n+1}$ denote the observed values at two consecutive time
329: steps.
330: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
331: 
332: \subsection{Obtaining the analytic expression of the posterior PDFs \label{tfilter}}
333: 
334: A mathematical expression for a filter, which selects the PDF of our
335: extreme events out of the PDFs of the underlying stochastic process can be
336: obtained through the  Heaviside function $ \Theta( x_{n+1} - x_{n}
337: -d)$. This filter is then applied to the joint PDF of a stochastic process.
338: 
339: Since only the time steps $(x_n, x_{n+1})$ are of relevance for the filtering,
340: we can neglect all previous time steps and apply the filter simply to the
341: joint PDF for $(x_n, x_{n+1})$, which has then the form $\rho(x_n, x_{n+1}) =
342: \rho(x_n) \rho(x_{n+1}|x_n)$ 
343:  This implies that we can regard all previous time-steps $x_0, x_1, ...,
344: x_{n-1}$, on which $\rho_n$ and $\rho_{n+1}$ might depend, as
345: parameters.
346: 
347: The joint PDF of the extreme events $\rho^\Theta(x_{n+1},x_n,d)$ can then be
348: obtained by multiplication with $\Theta( x_{n+1} - x_{n}-d)$. If the resulting
349: expression is non zero, the condition of the extreme event (\ref{e0}) is
350: fulfilled and for $x_{n+1}$ and $x_{n}$ the following relation holds:
351: % 
352: \begin{eqnarray}
353: x_{n+1} & = & x_{n} + d + \gamma \label{gammadef} \quad (\gamma \in  \mathbb R,
354: \gamma \geq 0) \quad. \label{gamma0}
355: \end{eqnarray}
356: %
357: Hence it is possible to express the joint probability density in terms of
358: $x_{n}$ or $x_{n+1}$ with the new random variable $\gamma$. We can then use the integral representation of the Heaviside function with appropriate
359: substitutions to obtain: 
360: %
361: \begin{eqnarray}
362: f^\Theta(x_{n+1},x_{n},d)&=& \rho(x_{n})\int_{0}^{\infty} \rho(x_{n} + d + \gamma|x_n)
363: \nonumber\\
364: &\it{}&\quad \delta((x_{n+1} - x_{n}
365: - d) - \gamma)\;d\gamma.\quad\label{int1} 
366: \end{eqnarray}
367: %
368: By normalizing this expression with the total probability $\rho_{\Theta} (d)$ to find extreme
369: events of size $d$ or larger
370: we obtain the joint PDF \joint of all values of $x_{n}$ and $x_{n+1}$ which are part of an extreme event.
371: Integrating the resulting joint PDF \joint over $x_{n+1}$ we find the
372: following expression for the marginal distribution, i.e., the a posteriori PDF: 
373: %
374: \begin{eqnarray}
375:  \rho(x_n|X(d))
376:  & = &  
377: \frac{\rho(x_{n})}{\rho_{\Theta}(d)}\int_{0}^{\infty}d\gamma\;\rho(x_{n}
378: + d + \gamma|x_n). \nonumber\\ \label{marginal}
379: \end{eqnarray}
380: %
381: 
382: Analogously $\rho(x_{n}|\overline{X(d)})$ denotes the a posteriori PDF to
383: observe the value $x_{n}$ before an non-event, i.e., before an increment which
384: is smaller than $d$.
385: 
386: %
387: \begin{eqnarray}
388: \rho(x_{n}|\overline{X(d)}) & = &
389: \frac{\rho(x_{n})}{(1-\rho_{\Theta}(d))}\int_{-\infty}^{\infty} dx_{n+1}\; \Bigr(1- \Bigl. \nonumber\\
390: &\it{}& \;\;\;\;\Theta(x_{n+1}-x_{n}-d)\Bigl) \rho_{n+1}(x_{n+1}|x_{n}). \nonumber \\ 
391: \end{eqnarray}
392: %
393: 
394: If for a given process the joint PDF of two consecutive events is known, we can hence analytically
395: determine $\rho(x_n|X(d))$, $\rho(x_{n}|\overline{X(d)})$ and
396: $\rho_{\Theta}(d)$.
397: 
398: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
399: \section{Extreme increments in the AR(1) model \label{AR1}}
400: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
401: \subsection{AR(1) model}
402: %%%%%%%%%%%%%%%%%%%%%
403: \begin{figure}[t!!!]
404: \includegraphics[width=6cm, angle= -90]{figure1.eps} 
405: \caption[]{\small \label{fig:data+} (Color online)
406: Parts of the time series of the AR(1) process for different values of $a$.}
407: \end{figure}
408: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
409: We assume that the time-series $\{x_n\}$ is generated by an auto-regressive model of order 1
410: (AR(1)) (see e.g., \cite{Box-Jen}) 
411: %
412: \begin{equation}
413: x_{n+1}= a x_n + \xi_n,
414: \end{equation}
415: %
416: where $\xi_n$ are uncorrelated Gaussian random numbers with unit variance and $-1 < a < 1$ is a constant
417: which represents the coupling strength. The size and the sign of the coupling
418: strength sets whether successive values of $x_n$ are clustered or
419: spread, as illustrated in Fig.\ \ref{fig:data+}. 
420: 
421: In the case $a =0$ the process reduces to uncorrelated random numbers with
422: mean $\mu =0$ and variance $\sigma^2 =1$, whereas generally the process is
423: exponentially correlated $\langle x_n x_{n+k}\rangle = a^k <1 $ and has the
424: marginal PDF
425:  \begin{equation}
426: \rho(x_n) = \sqrt{\frac{1-a^2}{2\pi}}\exp\left(-\frac{1-a^2}{2}{x_n}^2 \right).
427: \end{equation}  
428: Since the size of the events is naturally measured in units of the standard deviation $\sigma(a)$ we introduce a new scaled variable $\eta =
429: \frac{d}{\sigma(a)} =d\sqrt{1-a^2}$.
430: 
431:  Applying the filter mechanism developed in Sec.
432: \ref{tfilter} we obtain the following expressions for the posterior PDF of extreme events and the posterior PDF of non-extreme events
433: %
434: \begin{eqnarray}
435: \rho(x_n|X(\eta),a)
436: &=&\frac{\sqrt{1-a^2}}{2\sqrt{2\pi}\rho^{\Theta}(a,\eta)}\exp\left(-\frac{1-a^2}{2}x_{n}^2\right)
437: \nonumber\\
438: &\it{}&\quad \mbox{erfc}\left(\frac{(1-a)x_{n}}{\sqrt{2}} +
439: \frac{\eta}{\sqrt{2}\sqrt{1-a^2}}\right), \label{marga}\\
440: \rho(x_n|\overline{X(\eta)},a)
441: &=&\frac{\sqrt{1-a^2}}{2\sqrt{2\pi}(1-\rho^{\Theta}(a,\eta))}\exp\left(-\frac{1-a^2}{2}x_{n}^2\right)
442: \nonumber\\
443: &\it{}& \left( 1 + \mbox{erf}\left(\frac{(1-a)x_{n}}{\sqrt{2}} +
444: \frac{\eta}{\sqrt{2}\sqrt{1-a^2}}\right)\right).\nonumber\\
445: \label{antima}
446: \end{eqnarray}
447: %
448: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
449: \subsection{Determining the precursor value\label{Det}}
450: Because of the Markov-property of the AR(1) model the probability for an event
451: at time $n+1$ depends only on the last value $x_n$, hence $k=1$ in Eq.\ (\ref{precursor}). Thus, we give an alarm for an extreme
452: event when an observed value $x_n$ is in an interval $ V_{pre}  =  [x_{pre} - \delta/2,x_{pre} + \delta/2] ; \label{I1}$
453: around the precursor value $x_{pre}$. We compute the precursor values $x_{I}$ and $x_{II}$ defined by Eq.\ (\ref{precursor}) according to the strategies
454: described in Sec. \ref{pre}. 
455: 
456: The maximum ${x_I}$ of \ma is given by the solution of the transcendental equation
457: %
458: \begin{eqnarray}
459: {x_I}(\eta) & = & \frac{\sqrt{2}}{\sqrt{\pi} (1+a)} \frac{\exp \left(-
460: \frac{1}{2}\left((1-a){x_I}  +\frac{\eta}{\sqrt{1-a^2}}\right)^{2}
461: \right)}{\mbox{erfc}\left({\frac{(1-a){x_I}}{\sqrt{2}} +
462: \frac{\eta}{\sqrt{2}\sqrt{1-a^2}}}\right)}. \nonumber \\ \label{trans}
463: \end{eqnarray}
464: %
465: Inserting the asymptotic expansion for large arguments of the complementary error function
466: %
467: \begin{eqnarray}
468: \erfc(z) &\sim &  \frac{\exp(-z^2)}{\sqrt{\pi}z} \left(1 + \sum_{m=1}^{\infty}
469: (-1)^m \frac{1 \cdot 3 ...(2m-1)}{(2z^2)^m }\right),\nonumber\\ 
470: &\it{}& \left( z \rightarrow \infty, |
471: \mbox{arg} z | < \frac{3\pi}{4} \right)\label{erfcapprox}
472: \end{eqnarray}
473: %
474: which can be found in \cite{Abram} we obtain:
475: %
476: \begin{eqnarray}
477: {x_{I}}(\eta) &\sim & \frac{-\eta}{2\sqrt{1-a^2} \left( 1 + 
478: \mathcal{O}
479: \left(\frac{1}{\eta ^2} \right)\right)}, \label{dhalbe}
480: \quad (\eta \rightarrow \infty).
481: \end{eqnarray}
482: %
483: %%%%%%%%%%%%%%%%
484: \begin{figure}[t!!!]
485: \includegraphics[width=6cm, angle= -90]{figure2.eps} 
486: \caption[]{\small\label{fig:marginala-} (Color online)
487: The a posteriori PDFs for the AR(1) process for different values of $a<0$ and
488: $\eta$. The vertical lines represent the means. The PDFs become asymmetric for
489: $a \rightarrow -1$. (For $a=-0.99$ and $\eta \rightarrow \infty$ the marginal
490: PDFs becomes very flat and can hence not be distinguished from the x-axis in this figures).}
491: \end{figure}
492: %%%%%%%%%%%%%%%
493: Fig.\ \ref{fig:marginala-}  shows the posterior PDFs \ma according to
494: Eq.\ (\ref{marga}) for different values of $a$ and $\eta$. One can see
495: that the maximum of \ma moves  towards $- \infty$ with increasing size of
496: $\eta$ and $a \rightarrow 1$.
497: % Note that  \ma becomes asymmetric if  $a
498: %\rightarrow -1$ and its variance increases immensely if $|a| \rightarrow
499: %1$.
500: Although we can always formally define the maximum $x_I$ and the
501: mean~$\langle x_n \rangle$ as precursor values, one can argue that the
502: maximum of the distribution has no predictive power if  $a \rightarrow
503: 1$. Since the variance of the posterior PDF increases immensely in this limit, the value of \ma in its
504: maximum does not considerably differ from the values in any other
505: point.
506: 
507: For large values of $\eta$ we can also assume that the maximum and the mean of \ma nearly coincide, i.e., 
508: %
509: \begin{equation}
510:  \langle x_n \rangle  \simeq {x_{I}} \sim  
511: \frac{-\eta}{2\sqrt{1-a^2} \left(1 + \mathcal{O} \left(\frac{1}{\eta^2} \right)\right)},\quad (\eta \rightarrow \infty), \label{dhalbe2}
512: \end{equation}
513: %
514: provided that \ma is not too asymmetric (i.e., $a$ is not close to $-1$). 
515: In the numerical tests in Sec.\ \ref{AR1_1} we will hence use the mean of the posterior
516: PDF as a precursor for strategy I, since it can be calculated explicitly by
517: evaluating the corresponding integral.
518: 
519: 
520: In order to determine $x_{II}$, the precursor for strategy II, we have to find
521: the maximum in $x_n$ of the likelihood 
522: %
523: \begin{eqnarray}
524: \rho(X(\eta)|x_n,a) 
525: & = &\frac{1}{2} \erfc\left(
526: \frac{(1-a)x_n}{\sqrt{2}} +
527: \frac{\eta}{\sqrt{2}\sqrt{1-a}}\right). \nonumber\\
528: \label{erfcII}
529: \end{eqnarray}
530: %
531: Since the complementary error function is a monotonously decreasing function of
532: $x_n$ we see that we do not have a well defined maximum~$x_{II}$, ( we will
533: thus denote $x_{II}:-\infty$) and that
534: the interval $V_-=\left( -\infty,x_- \right]$ with the upper limit $x_{-}$ represents the interval for raising alarms according to
535: strategy II.
536: 
537: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
538: \subsection{Testing the Performance of the Precursors \label{AR1_1}}
539: 
540: In order to test for the predictive power of the precursors specified above,
541: we  used two different methods to create
542: ROC-curves (see Sec.\ \ref{roc}). The first method consists in evaluating the integrals which lead to the rate
543: of correct and false predictions
544: \begin{eqnarray}
545: r_c(x_{pre},\eta,\delta) & = & \int_{V(\delta)} dx_{n}
546: \;\rho(x_{n}|X(\eta),a), \label{rcar} \\
547: r_f(x_{pre},\eta,\delta) & = & \int_{V(\delta)} dx_{n} \;\rho(x_{n}|\overline{X(\eta)},a). \label{rfar}
548: \end{eqnarray}
549: The second method consists in simply performing predictions on a time
550: series of $10^7$ AR(1) data and counting the number of extreme increments,
551: which could be predicted by using the precursors specified above.
552: %
553: For
554: different values of the correlation coefficients the data sets contained the
555: following numbers of extreme increments:
556: \begin{center}
557: \begin{tabular}{|c||c|c|c|c|}
558: \hline 
559:  & \multicolumn{4}{c|}{ number of increments of size}\\ \hline
560: $a$ & $\eta \geq 0$ & $\eta \geq 2$ & $\eta \geq 4$ & $\eta \geq 8$\\ \hline
561: \hline
562: -0.99 & 5000059 & 1579103 & 222858 & 310 \\ \hline
563: -0.75 & 5000563 & 1425146 & 162405 & 107 \\ \hline
564: 0 & 5000417 & 786355 & 23370 & 0 \\ \hline
565: 0.75 & 5000818 & 23377 & 0 & 0 \\ \hline
566: 0.99 & 5001081 & 0 & 0 & 0 \\ \hline
567: \end{tabular}
568: \end{center}
569: %
570: In all cases, where the AR(1) correlated data sets contain increments, the empirically determined rates comply very well with the rates
571: obtained via the evaluation of Eqs.\ (\ref{rcar}) and (\ref{rfar}). For those
572: values of $a$ and $\eta$, which were not accessible for the numerical test, we evaluated the integrals in Eqs.\ (\ref{rcar}) and (\ref{rfar}).   
573: 
574: In the numerical tests for both strategies and also for the evaluation of the
575: integrals in Eqs.\ (\ref{rcar}) and (\ref{rfar}) according to strategy I, the
576: size of the precursory volume ranged from $ 10^{-6} $ to $ 4 $,
577: measured in size of the standard deviation of the marginal PDF of the AR(1)
578: process $\sigma (a) = 1/ \sqrt{1-a^2}$. As precursors according strategy I
579: we used the means of the a posteriori PDF. For the empirically created
580: ROC-plots according to strategy II we used the smallest values of the data
581: sets as precursors. 
582: 
583: The evaluation of the integrals in Eqs.\ (\ref{rcar}) and
584: (\ref{rfar}) was done in a slightly different way for strategy II. Since there
585: were no events in the data sets for certain value of $a$ and $d$ (as indicated
586: in the table above), one could argue that the data sets also did not
587: contain any precursor. From the previous section, we know that the theoretical precursor value according to
588: strategy II should be $x_{II} =- \infty$. Thus, we used a sufficiently small
589: value as a precursor and adjusted the size of the prediction interval in order
590: to capture all events. However, the resulting ROC-curves for strategy II
591: coincided with the curves obtained empirically, as far as they were available.
592: %%%%%%%%%%%%%%%%%%%%%%%% 
593: \begin{figure}[t!!!]
594: \includegraphics[width=6cm, angle= -90]{figure3.eps} 
595: \caption[]{\small \label{fig:newstraroc} (Color online)
596: The  ROC-Curves made for the precursors of strategy I and II. The lines
597: represent the results of strategy I, the symbols correspond to predictions
598: made according to strategy II. In both cases the predictions were made within
599: $10^7$ AR(1)- correlated data. For the values of $\eta$ and $a$, where the
600: data sets contained no increments, we created the ROC-curves by
601: evaluating the integrals in Eqs.\ (\ref{rcar}) and (\ref{rfar}).}
602: \end{figure}
603: %%%%%%%%%%%%%%%%%%%%%%%%%%%%
604: 
605: The resulting ROC-curves in Fig.\ \ref{fig:newstraroc} display the following
606: properties:
607: 
608: {\bf ad (Q1):}\hspace{3mm} The predictions according
609: to strategy II are better than the predictions according to strategy I for all
610: values of $a$ and $\eta$.
611: %% A detailed discussion of this phenomenon will be
612: %%provided in Sec.\ \ref{analyse}.
613: 
614: {\bf ad (Q2):}\hspace{3mm}
615: The ROC-curves display an increase of the quality of our
616: prediction with increasing size of the events $\eta$.
617: % Explanations for this
618: %effect will be provided by the asymptotic expression for the likelihood ratio %in Sec.\ \ref{analyse}.
619: 
620: 
621: {\bf ad (Q3):}\hspace{3mm}
622: The ROC-curves in Fig.\ \ref{fig:newstraroc} show that the quality of the
623: predictions increases with decreasing correlation strength $a$.
624: Especially for $a=0$, when the predictions were made within completely
625: uncorrelated random numbers, the ROC curves are far better
626: than ROC curves for any random prediction.
627:  This is in agreement
628: with results reported %% by Sornette et\ al.\ 
629: in \cite{Sornette1} for
630: the prediction of signs of increments in uncorrelated random numbers, i. e.,
631: the case ($a=0, \eta =0)$. 
632: %Theprediction of the sign of an increment within uncorrelated random numbers
633: %corresponds to the special case $\eta =0$ and $a=0$ of the AR(1) process we di%scuss here.
634: 
635: Intuitively, the result for {\bf (Q3)} can be understood easily by considering that increments are
636: not independent from the last observation. More precisely $x_{n+1} - x_n =
637: (a-1)x_n + \xi_n$, so that the known part of the increment $(a-1)x_n$ is the
638: larger, the smaller $a$.
639: In other words: if we consider a very small value of $x_n$ (small compared to
640: the mean) in an uncorrelated process, the probability that the next value will
641: be closer to the mean and hence lead to a large increment is high. Positive correlation hinders this effect,
642: since it causes successive values to be closer to each other. 
643: %%An increment 
644: %%results in this cases from the random part $\xi_n$ and is therefore less
645: %%predictable. Hence the predictability decreases with increasing correlation st%%%rength. 
646: 
647: A formal explanation of the results {\bf (Q1)-(Q3)} is also given by an
648: asymptotic expression for the slope~$m(a,\eta, x_{pre})$ in the following section.
649: %\end{list}
650: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
651: \subsection{Analytical discussion of the Precursor Performance
652: \label{analyse}}
653: In this section, we will try to understand the effects shown by the ROC-curves
654: in the previous section more detailed. Thus, we evaluate the asymptotic structure of the likelihood ratio as defined by Eq.\ (\ref{defm}) for different scenarios. 
655: 
656: In the case of the AR(1) process the slope of the ROC-curve in the vicinity of
657: the origin is given by 
658: % 
659: \begin{eqnarray}
660: m(a,\eta, x_{pre}) \sim \quad
661: \frac{\left(1-\rho_{\Theta}(\eta)\right)}{\rho_{\Theta}(\eta)}\, r\bigl(x_{pre},\eta\bigr) ,\label{mAR}\\
662: \mbox{with}\quad r\bigl(x_{pre},\eta\bigr) =  \frac{\erfc\left(\frac{(1-a)x_{pre}}{\sqrt{2}} +
663: \frac{\eta}{\sqrt{2}\sqrt{1-a^2}}\right)}{1
664: +\mbox{erf}\left(\frac{(1-a)x_{pre}}{\sqrt{2}} +
665: \frac{\eta}{\sqrt{2}\sqrt{1-a^2}}\right)}.  \label{ratio}
666:  \end{eqnarray} 
667: %
668: {\bf ad (Q1):}\hspace{3mm}
669: We will first consider the behavior of the precursor according to strategy
670: II. As we saw in Sec. \ref{Det}, the optimal precursor value of strategy II is
671: the limiting case $x_{II}=-\infty$.
672: 
673: Since  
674: $\lim_{x_{pre} \rightarrow -\infty} r\bigl(x_{pre},\eta\bigr) =
675: \infty $ we find $\lim_{x_{pre} \rightarrow -\infty} m (a,\eta, x_{II}) = \infty$. Thus, we should expect  ROC-curves made with $x_{II} =-\infty$ to be tangent to the vertical axis of the curve and hence represent an ideal
676: predictability for all sizes of events and all possible correlation
677: strengths. However, for any finite precursor value of strategy I and strategy II we find non-ideal ROC-curves. 
678: 
679: %%%%%%%%%%%%%%%%%%%%%%
680: \begin{figure}[t!!!]
681: \includegraphics[width=5.5cm, angle= -90]{figure4.eps} 
682: \caption[]{\small\label{fig:dreids} (Color online) \ma and \antima for $a
683: =-0.75$. The maximum of the posterior PDF to observe extreme events \ma which
684:    is used as precursor, moves towards $-\infty$ with increasing $\eta$
685: since $x_I \sim -\eta/(2\sqrt(1-a^2))$. Because the maximum of the failure
686: posterior PDF \antima remains at the origin, the values of
687: \antima which are observed at the precursor value ${x_{I}}$ decrease
688: according to the decrease of \antima as $x_n \rightarrow -\infty$.   
689: }
690: \end{figure}
691: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
692: 
693: Another way to understand the superiority of strategy II is to analyze the asymptotic behavior of the rate of correct predictions
694: \ma and the rate of false alarms,
695: \antima at the precursor value of
696: strategy I. For the following calculations we use an
697: approximation for the total probability to observe extreme events
698: %
699: %\begin{widetext}
700: \begin{eqnarray}
701: \rho_{\Theta} (\eta,a) & \sim &
702: \frac{\sqrt{1-a}}{\sqrt{\pi} }\;\frac{1}{\eta}\;
703: \exp\left(-\frac{\eta^2}{4(1-a)}  \right)\nonumber\\
704: &\it{}& 
705: \left( 1 +  \mathcal{O} \left(\frac{1}{\eta^2} \right)\right), \quad (\eta
706: \rightarrow \infty), \label{P}
707: \end{eqnarray}
708: %\end{widetext}
709: which is derived in Appendix \ref{Ap1}.
710: 
711: Inserting the asymptotic expression  for $\rho_{\Theta}(\eta,a)$, the approximation of $x_I$ in
712: Eq.\ (\ref{dhalbe2}) and the asymptotic expansion of the complementary error function Eq.\ (\ref{erfcapprox})  into Eqs.\ (\ref{marga}) and
713: (\ref{antima}), we find the following expressions
714: %
715: \begin{eqnarray}
716: \rho \left(x_I|X(\eta),a \right) 
717: &\sim& \frac{\sqrt{1-a^2}\sqrt{1+a}}{\sqrt{\pi}} \frac{\left(1+\mathcal{O}\left(\frac{1}{\eta^2} \right)\right)}{\left(1+a +
718: \mathcal{O}\left(\frac{1}{\eta^2} \right)\right)},\nonumber\\
719: &\it{}&\quad  (\eta \rightarrow \infty).
720:  \label{qasym}  \\
721: \rho\left(x_I|\overline{X(\eta)},a,\right) & \sim &
722: \frac{\sqrt{1-a^2}}{\sqrt{2\pi}} \exp \left( -\frac{\eta^2}{8} \frac{1}{\left( 1 + 
723: \mathcal{O}\left(\frac{1}{\eta^2}
724: \right)\right)}\right) ,\nonumber\\
725: &\it{}& \quad
726: (\eta \rightarrow \infty)\quad .  \label{qbarapp}
727: \end{eqnarray}
728: Hence the value of \ma  at the precursor value approaches a constant for large
729: $\eta$, whereas the values of \antima decrease exponentially in this limit.
730: Fig.\ \ref{fig:dreids} illustrates this effect for
731: the case $a=-0.75$. The maximum of the failure PDF
732: remains at the origin for $\eta \rightarrow \infty$. Thus the
733: values of this PDF which are observed at the decreasing
734: precursor value $x_{I}\propto \frac{-\eta}{2\sqrt{1-a^2}}$ decrease
735: according to the shape of the distribution. This explains also the success of strategy II. Since the precursor value obtained by
736: strategy II is the smallest possible value, strategy II seems to focus on the
737: minimization of the failure rate. Note that by "minimization of
738: the failure rate", we understand here a minimization of the integrand in Eq.
739: (\ref{rfar}), while the alarm interval of size $\delta$ remains constant.
740: The fact that in this point the corresponding value of \ma is also far away
741: from the maximum of \ma does apparently not influence the outcome of the
742: prediction. 
743: 
744: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
745: 
746: {\bf ad (Q2):}\hspace{3mm} In the following calculation we will obtain the
747: asymptotic form of the likelihood ratio for large events. 
748: %
749:  Inserting the asymptotic form of the probability $\rho_{\Theta}(\eta,a)$ provided by
750: Eq.\ (\ref{P}),
751: and using the asymptotic expansion of
752: the complementary error function in Eq.\ (\ref{erfcapprox}), the likelihood ratio reads 
753: %
754: \begin{widetext}
755: \begin{eqnarray}
756: m(a,\eta, x_{pre}) & \sim & \frac{1}{2\sqrt{1-a}}\;\;  \frac{\eta\exp
757: \left(\frac{\eta^2}{4(1-a)} -z(\eta,a)^2 \right) \left(1+\mathcal{O} \left(\frac{1}{\eta^2} \right)
758: \right)} {z(\eta,a)\left(1 +
759: \mathcal{O}\left(\frac{1}{\eta^2} \right)\right) +
760: \mathcal{O}\left(\exp(-z(\eta,a)^2)\right)}
761: %\nonumber\\
762: %&\it{} & 
763: +\mathcal{O}\left(\frac{\exp(-z(\eta,a)^2)}{z} \right),\nonumber\\
764: &\it{} & \quad \quad (\eta \rightarrow \infty), \;(z(\eta,a)
765: \rightarrow \infty)
766: %\nonumber\\ 
767: \quad \quad \mbox{with} \quad z(\eta,a)  = \frac{(1-a)}{\sqrt{2}}x_{pre} + \frac{\eta}{\sqrt{2}\sqrt{1-a^2}}.\label{m3}
768: \end{eqnarray} 
769: \end{widetext}
770: %
771: Note that the limit $z(\eta,a)\rightarrow \infty$ corresponds to the
772: limit $\eta \rightarrow \infty$ in the context of (Q2), but we can
773: also interpret it as the limit $a \rightarrow \pm 1$ in the context of (Q3) if
774: $\eta\neq0$.
775: 
776: The expression in Eq.\ (\ref{m3}) tends to infinity in the limit $\eta
777: \rightarrow \infty$, if the argument of the exponential function in Eq.\ (\ref{m3})
778: %
779: {\small 
780: \begin{eqnarray}
781: f(x_{pre},a,\eta) & = & \frac{\eta^2}{4(1-a)} -\left( \frac{(1-a)x_{pre}}{\sqrt{2}} +
782: \frac{\eta}{\sqrt{2}\sqrt{1-a^2}}\right)^2, \nonumber\\\label{arg}
783: \end{eqnarray}
784: }
785: %
786: is positive. This is indeed the case for every precursor value $x_{pre}<0$. Therefore, for both strategies of prediction, the slope $m(x_{pre},a,\eta)$ increases as
787: a squared exponential with increasing size of the events $\eta$ according to
788: Eq.\ (\ref{m3}). Hence, the considerations of
789: Sec. \ref{roc} hold for our example, according to which an event is the better predictable
790: the more rare it is.
791: 
792: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
793: {\bf ad (Q3):}\hspace{3mm} One can also calculate the asymptotic behavior of
794: the likelihood ratio for $a \rightarrow \pm 1$. The limit
795: $z(\eta,a) \rightarrow \infty$, which is relevant for the asymptotic
796: form in  Eq.\ (\ref{m3}), can also be interpreted as the limit $a \rightarrow
797: \pm$ 1. We assume that  $\eta$ is big enough, e.g., $\eta >2$,
798: such that Eq.\ (\ref{P}), which enters into Eq.\ (\ref{m3}), is a useful approximation. One can now discuss again the argument of the exponential function in Eq.\ (\ref{arg}).
799: 
800: Inserting the precursor of strategy I (as given by Eq. ref{}), one obtains
801: $f(x_{I},a,\eta)  =   \frac{\eta^2}{8}$, hence
802: \begin{eqnarray}
803:  m(a,\eta, x_{I})&\rightarrow &  \sqrt{\frac{2}{1+a}}\exp
804: \left(\frac{\eta^2}{8}\right), 
805:  \quad  (z(\eta,a)
806: \rightarrow \infty).\nonumber\\ \label{straIa}
807: \end{eqnarray}
808: As $a\rightarrow 1$, this expression converges to  $\exp\left(\eta^2/8\right)$. As $a\rightarrow -1$, this expression
809: approaches infinity as $m(1,\eta, x_{I}) \sim 1/\sqrt{1+a}$.
810: Fig. \ref{fig:adepm}(a) illustrates this behavior.
811:  Fig.\ \ref{fig:adepm}(b) shows that the asymptotic expression in Eq.\ (\ref{straIa}) becomes
812: better in the limit $\eta\rightarrow \infty$, since in this limit the higher
813: order terms of the approximation vanish even faster.
814: 
815: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
816: \begin{figure}[t!!!]
817: \includegraphics[width=6cm, angle= -90]{figure5.eps} 
818: \caption[]{\small\label{fig:adepm} (Color online)  The bold lines show the
819: dependence of the slope $m(x_I,a,\eta)$ on the coupling strength according to
820: Eq.\ (\ref{m3}). The thin lines display  the asymptotic behavior, given by
821: Eq.\ (\ref{straIa}). The constant lines represent the values, obtained from
822: Eq.\ (\ref{straIa}) in the limit $a\rightarrow 1$. Fig.\ (b) illustrates, that
823: this asymptotic expression becomes better in the limit $\eta\rightarrow
824: \infty$, since in this limit the higher order terms in the approximation vanish even faster.
825: }
826: \end{figure}
827: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
828: For the theoretical precursor of strategy II $x_{II} = -\infty$ the slope would be independent of the value of the coupling strength if
829: the exact precursor of strategy II could be used. For any real precursor value
830: of strategy II $x_{II} = \mbox{const.} <0$,
831: Eq. (\ref{arg}) reads
832: \begin{eqnarray}
833: f(x_{II},a,\eta) & \sim & \frac{\eta^2}{2(1-a)} \left( \frac{1}{2} - \frac{1}{1+a}
834: \right)\nonumber\\
835: &\it{}&  + \mathcal{O}\left((1-a)
836: \right), \quad(a \rightarrow 1).
837: \end{eqnarray}
838: This expression approaches a small negative value close to zero in the point $a
839: = 1$. Hence, we find $m(a,\eta,x_{II})  \sim 1$, as $a\rightarrow 1$.
840: 
841: In the limit $a \rightarrow -1$ and for any finite precursor value $x_{II}=\mbox{const.}<0$ Eq.\ (\ref{arg}) reads
842: \begin{eqnarray}
843: f(x_{II},a,\eta)  &\sim& \frac{\eta^2}{4} \left(\frac{1}{2} -
844: \frac{1}{1-a^2} \right) -\frac{2x_{II}\eta}{\sqrt{1-a^2}} -2x_{II}^2\nonumber\\
845: &\sim& -\frac{1}{1-a^2}\frac{\eta^2}{4} -\frac{2x_{II}\eta}{\sqrt{1-a^2}} -2x_{II}^2.
846: \end{eqnarray}
847: If the precursor is sufficiently small, e.g $x_{II} <-\eta/(4\sqrt{1-a^2})$,
848: this expression is positive and hence $m(a,\eta,x_{II}) \rightarrow
849: \infty$, as $a \rightarrow -1$.
850: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
851: \begin{figure}[t!!!]
852: \includegraphics[width=6cm]{figure6.eps} 
853: \caption[]{\small\label{fig:addepende} (Color online) The asymptotic dependence of the slope
854: $m(x_I,a,\eta)$ on the coupling strength and the event size, if the
855: precursor of strategy I is used.}
856: \end{figure}
857: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
858: Hence, the asymptotic expressions of the likelihood ratio are able to describe the
859: behavior of the ROC-curves, shown in the previous
860: section. Fig.\ \ref{fig:addepende} combines the dependence of the likelihood ratio
861: on the event size and the correlation strength. One can see that the
862: influence of the event size on the likelihood ratio is dominating, as long as
863: one does not approach the singularity at $a \rightarrow -1$.
864: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
865: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
866: \section{Application: wind speed measurements}\label{wind}
867: 
868: As an illustration of the preceeding considerations and also in order
869: to demonstrate the usefulness of the benchmarks derived for AR(1)
870: processes, we study here time series data of wind speed measurements. The
871: data are recorded at 30m above ground by a cup anemometer with a
872: sampling rate of 8 Hz in the Lammefjord site of the Ris\o\ research
873: center \cite{winddata}. Wind speed data are evidently non-stationary
874: and strongly correlated, so
875: that, e.g the principle of persistence yields surprisingly accurate
876: forecasts: the very simple prediction scheme $\hat{x}_{n+1}=x_n$ is
877: almost as accurate as an AR(20) model fitted on moving windows (in
878: order to take non-stationarity into account) or order-10 Markov
879: chains\cite{Euromech}. The amplitude of the fluctuations around a time
880: local mean value are proportional to this mean value, i.e., there is
881: statistical evidence that the noise in this process is
882: multiplicative. However, when subtracting the time local mean (more
883: precisely, performing a high-pass filtering with a Gaussian kernel 
884: with a standard deviation of 75 time steps), we receive data for which
885: it is reasonable to fit an AR(1) process. When doing so, we find a
886: coefficient $a\approx 0.94$. 
887: 
888: Turbulent gusts, i.e., sudden increases of the
889: wind speed, are relevant events, e.g for the save operation of wind
890: turbines, for aircrafts during take-off and landing, and for all
891: wind-driven sports activities. In previous work\cite{Physa} we were
892: therefore concerned with their prediction, where we were studying the
893: performance of a Markov chain model. Here, we will restrict ourselves
894: to the simpler (and less appropriate) AR(1)-philosophy: The current
895: state of the process generating the wind time series is assumed to be
896: fully specified by the last observation $x_n$, and the event is
897: assumed to be characterized by the upward jump of the wind speed in a
898: single time step by more than $g$ m/s. 
899: 
900: \subsection{Determining the precursor value}
901: If we extract from the data set
902: all subsequences of data where such a jump is present, then we can, in
903: principle, construct empirically the distribution $p(x_n|g)$, which
904: corresponds to $\rho({\mathbf x}_{(n,k)}|X)$ of strategy I. 
905: %%%%%%%%%%%%%%%%
906: \begin{figure}[h!!!]
907: \centerline{\includegraphics[width=8cm]{figure7.ps}}
908: \caption{\label{fig:gustprofiles} 
909: The profiles obtained from the mean of $p(x_{n+k}|g)$ for gust events of
910: amplitude $g$. Also shown is the theoretical 
911: profile for an AR(1) process with $a=0.94$}
912: \end{figure}
913: %%%%%%%%%%%%%%%%%%%
914: %%%%%%%%%%%%%%%%%%%
915: \begin{figure}[h!!!]
916:    \centerline{\includegraphics[width=8cm]{figure8.ps}}
917: \caption{\label{fig:precursors2}
918: The profiles obtained from the maxima of $p(g|x_{n+k})$ for gust events of
919: amplitude $g$. Also shown is the theoretical profile for an AR(1) process
920: with $a=0.94$.}
921: \end{figure}
922: %%%%%%%%%%%%%%%%%
923: In Fig.\ \ref{fig:gustprofiles} we show instead the mean value of
924: $p(x_{n+k}|g)$ for $k=-20,\ldots,20$, i.e., we show the mean profile
925: of gusts of strength $g$. Otherwise said, this is an average of all
926: those time series segments, which (in shifted time) fulfill
927: $x_1-x_0>g$, so that the part of these segments with $k\le 0$ is what
928: one would call naively a precursor of a gust event. This has to be
929: compared to the values $x_{n+k}$ which we find when we focus on the
930: maximum $x_{II}$ in $x_n$ of $p(g|x_n)$ which corresponds to the conditional
931: probability $\rho(X|x_n)$ of strategy II.
932: More specifically, in Fig.\ \ref{fig:precursors2} we show the profiles $\langle
933: x_{n+k}\rangle|_{x_n=x_{II}}$ , where $x_{II}$ is defined by
934: $p(g|x_{II})=max_{x_{n}}$. In even different words, the value plotted at $k=0$ is the
935: value $x_n$ for which $p(g|x_n)$ is maximal, and at the preceeding and succeeding
936: time steps we show the average over all time series segments which fulfill
937: $x_n=x_{II}$ is some precision. These profiles differ from the precursors shown
938: before, as we have to expect for an AR(1)-model: In a perfect AR(1) process,
939: the precursors equivalent to those in Fig.\ \ref{fig:gustprofiles} would show a
940: jump larger than
941: $g$ from $k=0$ to $k=1$, with $x_0=-x_1$, and with $x_k=a^k x_0$ for
942: $k<0$, and $x_k=a^k x_1$ for $k>1$. For the same idealized process, one
943: expects Fig.\ \ref{fig:precursors2} to show curves given by
944: $x_k=a^{|k|} x_{II}$ for all $k$. Evidently, the wind data show a qualitatively very
945: similar behavior, whereas, however, additional correlations are visible. 
946: 
947: \subsection{Testing for predictive power}
948: The ROC-curves for the two prediction strategies are shown in
949: Fig. \ \ref{fig:ROCwind1} and \ref{fig:ROCwind2}. As expected, the minimization
950: of false alarms (strategy II) is here superior, as strategy I has no predictive
951: power. The latter is consistent with the observed value $a\approx 0.94$ and
952: the results for the AR(1) process.
953: 
954: %%%%%%%%%%%%%%%%%%%%
955: \begin{figure}[h!!!]
956: \centerline{\includegraphics[width=6cm, angle =-90]{figure9.eps}}
957: \caption{\label{fig:ROCwind1} (Color online)
958: The ROC curves using strategy I, exploiting $p(x_n|X)$ and maximizing the hit
959: rate. Evidently, the rate of false alarms exceeds the hit rate.}
960: \end{figure}
961: %%%%%%%%%%%%%%%%%%%%%%%%%%
962: 
963: In order to compute the ROC-curves we use 
964: the following numerically expensive but 
965: theoretically best justified algorithm:
966: In theory, we want to generate an alarm if the current observation
967: $x_n$ lies in an interval $V$ which is defined by the
968: subset of the ${\mathbb R}$ where either $p(g|x_n)$ or $p(x_n|g)$ exceeds
969: some threshold $0\le p_c \le 1$. We assume that both 
970: conditional PDFs are smooth in $x_n$. 
971: 
972: We can locally approximate
973: $p(g|x_n)$ by searching all similar states 
974: ${x_j}$, with $|x_n-x_j|<\epsilon$ and counting the relative number 
975: of events in this set of states. When this number exceeds $p_c$, 
976: we give the alarm and can see whether it is a hit or a false alarm.
977: 
978: In order to evaluate $p(x_n|g)$ we first create the set of all states $x_{e}$
979: which  are preceeding an event, and then compute the fraction of
980: these which is $\epsilon$-close to the current state $x_n$. Since
981: this fraction evidently depends on the value of $\epsilon$, we should
982: introduce a normalization. However, in order to create the ROC statistics we just have to introduce a threshold which runs from 0 to
983: the largest value thus found. Both schemes can be straightforwardly
984: generalized to situations where the current state of the process is
985: defined by a sequence ${\mathbf x}_{(n,k)}$ of $k$ past measurements
986: $(x_{n-k+1}, x_{n-k+2}, \ldots,,x_{n-1},x_n)$, e.g., for an AR(2) model
987: $k=2$,  whereas in \cite{Physa} we were using $k=10$ for a Markov chain of
988: order 10. 
989: 
990: Since the wind speed data are strongly correlated, $a\approx 0.94$, it is not
991: possible to predict the increments of the data sufficiently well.  This
992: corresponds to the previously derived results for the AR(1) model in the
993: limit $a \rightarrow 1$. However, we also find deviations from the theoretical
994: ROC-curve for $a=0.94$, which is additionally plotted in Figs.\ \ref{fig:ROCwind1}
995: and \ref{fig:ROCwind2}. These deviations show that the AR(1) model is not
996: able to describe the wind data completely.
997: 
998: The wind data also show the increase of predictability with increasing event
999: size. This suggests that this effect is more general and not
1000: limited to the class of AR(1) models. Again, we also observe that strategy II
1001: is superior to strategy I.
1002: %%%%%%%%%%%%%%%%%%%%%%%%
1003: \begin{figure}[t!!!]
1004: \centerline{\includegraphics[width=6cm, angle =-90]{figure10.eps}}
1005: \caption{\label{fig:ROCwind2} (Color online)
1006: The ROC-curves for the prediction of jumps of amplitude larger than $g$
1007: for the wind data. Strategy II exploits $p(X|x_n)$ which minimizes 
1008: the false alarm rate and
1009: performs the better the larger $g$.
1010: }
1011: \end{figure}
1012: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%5
1013: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1014: \section{Extreme Increments in long-range correlated Processes \label{long}} 
1015: We studied the same questions which are described before, in
1016: long-range correlated processes.  Since the precursors we were interested in
1017: live on a very short time scale (one step before the event), one should not
1018: expect long-range correlations to lead to qualitatively different results for
1019: the aspects we were interested in. The results obtained in this section
1020: support this assumption.
1021: 
1022: There are various definitions of long-range
1023: correlation. Typically long-range correlation in a time series is characterized by the exponent $0<\gamma_c<1$ of
1024: the power-law decay of the autocorrelation function as a function of the time
1025: t
1026: \begin{eqnarray}
1027: C_x(t) & = & \langle x_n x_{n+t}\rangle = \frac{1}{N-t} \sum_{n=1}^{N-t} x_n
1028: x_{n+t} \sim t^{-\gamma_c}
1029: \end{eqnarray}
1030: The correlation coefficient $\gamma_c$ is controlling, how fast the correlations decay.
1031: 
1032: We study the predictability of increments numerically by applying the
1033: prediction strategies described in Sec.\ \ref{pre}. 
1034: The data used for this numerical study were generated as described in
1035: \cite{Eduardosref} and used in\cite
1036: {Eduardo1}: Imposing a power-law decay on the Fourier spectrum,
1037: \begin{eqnarray}
1038: f_x(k) \propto k^{-\beta}
1039: \end{eqnarray}
1040: with $0<\beta<0.5$ and choosing phase angles at random one obtains through an
1041: inverse Fourier transform the long-range correlated time series in $x$ with
1042: $\gamma_c=1-2\beta$. The data are Gaussian distributed with $ \langle x \rangle
1043: =0$, $\sigma=1$. Having specified the power spectrum or, correspondingly, the
1044: autocorrelation function for sequences of Gaussian random numbers means to
1045: have fixed all parameters of a linear stochastic process. Hence, in principle
1046: the coefficients of an autoregressive or moving average process can be
1047: uniquely determined, where, due to the power-law nature of the spectrum and
1048: autocorrelation function the order of either of these models have to be
1049: infinite \cite{Brockwell,Box-Jen}. 
1050: Thus, the effects which we observed for this
1051: ARMA($\infty$, $\infty$) model should be valid for the whole class of linear long-term
1052: correlated processes.
1053: %%%%%%%%%%%%%%%%%%%%%%%%%
1054: \begin{figure}
1055: \includegraphics[width=6cm, angle= -90]{figure11.eps} 
1056: \caption[]{\small\label{fig:longrange} (Color online)
1057: ROC-curves for the ARMA($\infty$,$ \infty$) processes  with
1058: $\gamma_c=0.2$ and$\gamma_c=0.8$.}
1059: \end{figure}
1060: %%%%%%%%%%%%%%%%%%%%%%%%%
1061: The ROC-curves in Fig.\ \ref{fig:longrange}, which are generated from the
1062: long-range correlated data are very similar to the ones for the AR(1)
1063: process in terms of the question we want to study. 
1064: 
1065: {\bf ad (Q1):}\hspace{3mm}
1066: The ROC-curves obtained by
1067: using strategy II are superior to the curves resulting from strategy I.
1068: 
1069: {\bf ad (Q2) and (Q3):}\hspace{3mm}
1070:  The quality of the prediction also
1071: increases with increasing event size and decreasing correlation. 
1072: %The decrease of the correlation is explicitely
1073: %shown in Fig.\ \ref{fig:correlation} 
1074: %\begin{figure}[t!!!]
1075: %\includegraphics[width=5cm, angle= -90]{figure12.eps} 
1076: %\caption[]{\small\label{fig:correlation} 
1077: %The autocorrelation function of the ARMA($\infty$,$ \infty$) processes with
1078: %$\gamma_c=0.2$ and $\gamma_c=0.8$ .}
1079: %\end{figure}
1080: 
1081: Hence we observe the same effects which we described before for the AR(1)
1082: process and the wind speed data in a long range correlated ARMA($\infty$, $\infty$) process.
1083: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1084: 
1085: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1086: \section{Conclusions\label{con}}
1087: We studied the predictability of extreme increments in an AR(1) correlated
1088: process, in wind speed data and in a long-range correlated ARMA process. 
1089: To measure the quality of the prediction we used the ROC-curve and additionally
1090: the slope of the ROC-curve in the vicinity of the
1091: origin as a summary index. This so called {\sl likelihood ratio},
1092: characterizes particularly the behavior in the limit of low false-alarm rates. 
1093: 
1094: In the case of the AR(1) process we could construct the posterior PDF and the
1095: likelihood analytically
1096: from a given joint PDF and hence we were able to obtain the asymptotic
1097: behavior of the likelihood ratio analytically. In the case of the two other
1098: examples, we constructed the posterior PDFs numerically. The resulting distributions were then
1099: used to determine precursors according to two different strategies of prediction. 
1100: 
1101: In all examples we studied the aspects : {\bf (Q1)} Which is
1102: the best strategy to choose precursors? {\bf (Q2)} How does the
1103: predictability depend on the event size? {\bf (Q3)} And how does the
1104: predictability depend on the correlation? The results can be summarized as follows:
1105: 
1106: {\bf ad (Q1):}\hspace{3mm} 
1107: Strategy I, the a posteriori approach,  maximizes  the rate of
1108: correct predictions, while strategy II focuses on the
1109: minimization of the rate of false alarms.
1110: %(Note that the terms maximization
1111: %and minimization refer to changes in the integrand, which enters into the
1112: %alarm rates as given by Eqs.\ (\ref{rcar}) and (\ref{rfar}) and not to changes% in the integration ranges $V_{pre}$ and $V_{-}$.)
1113: For the example of 
1114: the AR(1) process one can show that strategy II is the optimal strategy to make predictions. 
1115: For other stochastic processes, it is
1116: not in general clear which of the two strategies leads to a better
1117: predictability. However, the application to the prediction of wind speeds and
1118: the numerical study within long-range correlated data reveals that also for
1119: these examples better results are obtained by predicting according to strategy
1120: II.
1121: 
1122: 
1123: {\bf ad (Q2):}\hspace{3mm} 
1124: For all examples studied, we
1125: observe an increase of predictability with increasing size of the events. This
1126: phenomenon which is also reported in the literature
1127: \cite{Physa,Goeber,Johnson1}, can be better studied by investigating the asymptotic behavior of our summary index. In the case of the AR(1) process we showed explicitly that the likelihood ratio
1128: increases as a squared exponential with increasing event size. 
1129: In Sec.\ \ref{roc} we discussed for a general stochastic process that this
1130: effect appears,  if the PDFs of the studied process fulfill certain conditions.
1131: 
1132: 
1133: 
1134: {\bf ad (Q3):}\hspace{3mm} 
1135: For the AR(1) process and the long-range correlated data we observe that the correlation of
1136: the data is  inversely proportional to the quality of the predictions. The
1137: ROC-curves for the wind data, which we assume to be a strongly correlated AR(1)
1138: process with correlation strength $a=0.94$, display also a bad
1139: predictability. This effect is  due to the special definition of the events as
1140: increments. The asymptotic expression for the likelihood ratio  in Eq.\
1141: (\ref{m3}) provides us also with a formally understanding of the $a$-dependence. 
1142: 
1143: All the considerations made in this contribution are made for a very simple
1144: but general method. In order to make predictions, we use the largest maximum of
1145: the a posterior PDF or the likelihood. For multimodal distributions, one can
1146: think about more sophisticated methods, which take into account also other
1147: maxima of the distribution. Furthermore, we investigate only stationary
1148: processes in these contributions. It remains to be studied, whether the answers, obtained to the questions
1149: ({\bf Q1})-({\bf Q3}) are also valid for non-stationary processes or multimodal
1150: distributions. \\
1151: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1152: \begin{acknowledgements}
1153: E. G. A. was supported by CAPES (Brazil).
1154: \end{acknowledgements}
1155: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1156: \begin{appendix}
1157: \section{Obtaining an asymtotic form of the total probability to find
1158: increments of size $\eta$ \label{Ap1}}
1159: The total probability $\rho_{\Theta}(\eta,a)$ to find increments of size $\eta$ can be obtained by
1160: integrating the pre-form of the posterior probability in Eq.\ \ref{int1}. For the
1161: example of the AR(1) process the corresponding integral reads
1162: \begin{eqnarray}
1163: \rho_{\Theta}(\eta,a)&=&\int_{-\infty}^{\infty}\frac{\sqrt{1-a^2}}{2\sqrt{2\pi}}\exp\left(-\frac{1-a^2}{2}x_{n}^2\right)
1164: \nonumber\\
1165: &\it{}&\quad \mbox{erfc}\left(\frac{(1-a)x_{n}}{\sqrt{2}} +
1166: \frac{\eta}{\sqrt{2}\sqrt{1-a^2}}\right).
1167: \end{eqnarray}  
1168: In the special case $\eta=0$ one can find the analytical form of the total
1169: probability 
1170: $\rho_{\Theta}(0,a)$ using again an integral identity from \cite{Pru}. The resulting value $ \rho_{\Theta}(0,a) = 1/2$ corresponds to the
1171: intuitive expectation one would have, since for $\eta=0$ the condition of our
1172: extreme event is always fulfilled if $x_{n+1}$ is larger than
1173: $x_{n}$. This special case of predicting the sign of increments in
1174: uncorrelated data is discussed in \cite{Sornette1}.
1175: 
1176: For $\eta \neq 0$, we can find an asymptotic form of the total probability $
1177: \rho_{\Theta}(\eta,a)$ via
1178: evaluating the mean of the posterior PDF.
1179: An analytic expression of the mean can be obtained using an integral representation from \cite{Pru}
1180: %
1181: \begin{equation}
1182: \langle x_{n}\rangle = \frac{-\exp \left(-\frac{\eta^2}{4(1-a)}  \right)}{2 \sqrt{\pi}
1183: \sqrt{1+a}\; \rho_{\Theta}(\eta, a)}, \label{anamean_ar}
1184: \end{equation}
1185: %
1186: %where $ \rho_{\Theta}(\eta,a)$ denotes the total probability to find events of
1187: %size $\eta$.
1188: For large values of $\eta$ we can also assume that the maximum and the mean of \ma nearly coincide, i.e., 
1189: %
1190: \begin{equation}
1191:  \langle x_n \rangle  \simeq {x_{I}} \sim  
1192: \frac{-\eta}{2\sqrt{1-a^2} \left(1 + \mathcal{O} \left(\frac{1}{\eta^2} \right)\right)},\quad (\eta \rightarrow \infty), \label{dhalbe2}
1193: \end{equation}
1194: %
1195: provided that \ma is not too asymmetric (i.e., $a$ is not close to
1196: $-1$). Using this approximation, we find the following asymptotic form of the
1197: total probability to find increments of size $\eta$
1198: 
1199: \begin{eqnarray}
1200: \rho_{\Theta} (\eta,a) & \sim &
1201: \frac{\sqrt{1-a}}{\sqrt{\pi} }\;\frac{1}{\eta}\;
1202: \exp\left(-\frac{\eta^2}{4(1-a)}  \right)\nonumber\\
1203: &\it{}& 
1204: \left( 1 +  \mathcal{O} \left(\frac{1}{\eta^2} \right)\right), \quad (\eta
1205: \rightarrow \infty). \label{P}
1206: \end{eqnarray}
1207: 
1208: 
1209: 
1210: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1211: \section{Transformation of extreme increments into extreme values}
1212: \label{trafo}
1213: We show how to relate the 
1214: results obtained using the definition of extreme events as extreme
1215: increments~($x_{n+1}-x_n \geq d$, as in Eq.~(\ref{e0})) to the case when
1216: extreme events are defined as extreme values ($y_{n+1} \geq d$) which exceed a
1217: certain threshold $d$, for ARMA(p,q) processes.  An ARMA(p,q) model is defined
1218: as~\cite{Box-Jen} 
1219: %
1220: \begin{equation}\label{eq.arma}
1221: \Phi(B) x_n = \theta(B) \xi_n,
1222: \end{equation}
1223: %
1224: where~$\{\xi\}$ correspond to white noise and
1225: %
1226: \begin{eqnarray*}
1227: \Phi(B) & = &  1 - \Phi_1 B - \Phi_2 B^2 - ... - \Phi_p
1228: B^p,\\
1229: \theta(B) & = &  1 + \theta_1 B + \theta_2 B^2 + ... +
1230: \theta_q B^q, 
1231: \end{eqnarray*}
1232: %
1233: with $B^j x_n =  x_{n-j}$. Searching for extreme increments in a time
1234: series~$\{x\}$ is equivalent to search for extreme values in the time
1235: series~$\{y\}$, defined through the transformation 
1236: %
1237: \begin{equation}\label{eq.transf}
1238: y_{n+1}  =  x_{n+1} - x_n.
1239: \end{equation}
1240: 
1241: Assuming that $\{x\}$ is described by an ARMA(p,q) process defined by
1242: Eq.~(\ref{eq.arma}), and inserting Eq.~(\ref{eq.transf}) in
1243: Eq.~(\ref{eq.arma}), one obtains that $\{y\}$ is described by an
1244: ARMA(p,q+1) model with 
1245: the following transformed coefficients
1246: %
1247: \begin{eqnarray}\label{eq.identification}
1248: \Phi^\dagger_i & = &\Phi_i \quad i=1,2,...p\quad, \nonumber\\
1249: \theta^\dagger_i &=& \theta_i - \theta_{i-1} \quad i=1,2,...q \quad,\nonumber\\
1250:  \theta_{q+1}^\dagger & = & \theta_q \quad.
1251: \end{eqnarray}
1252:  Due to the transformation~(\ref{eq.transf}) the precursory structure
1253:  equivalent to the one used in Sec.~\ref{AR1} is obtained choosing\footnote{We
1254:  assume $x_0=0$,  which is the mean value of $\{x\}$. This assumption is
1255:  irrelevant for large values of~$n$.}  
1256: %
1257: \begin{equation}\label{eq.pre2}
1258: y_{pre}=\sum_{j=0}^{n} y_j - x_0= x_n.
1259: \end{equation}
1260: %
1261: With this choice of precursory structure and the corresponding
1262: transformation of the process (Eq.\ (\ref{eq.transf})), the results obtained for
1263: extreme increments can be transfered to the case of extreme values. In
1264: particular, for the case of AR(1) processes (which corresponds to an ARMA(1,0))
1265: discussed in Sec.~\ref{AR1}, all results are also valid for an ARMA(1,1)
1266: process with the precursor given by~(\ref{eq.pre2}) and events defined
1267: as extreme values. E.g the alarm strategies consist in this case in raising
1268: an alarm whenever $y_{pre}$ falls near the precursor values given in
1269: Eq.~(\ref{precursor}). 
1270: \end{appendix}
1271: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
1272: \begin{thebibliography}{99}
1273: %1
1274: \bibitem{Coles} S. Coles, {\sc An introduction to Statistical Modeling of
1275: Extreme Values}, Springer, 2001.
1276: %2
1277: \bibitem{Jackson} David D. Jackson, {\sl Hypothesis testing and earthquake
1278: prediction}, Proc. Natl. Acad. Sci. USA {\bf 93} [3772-3775] (1996).
1279: %3
1280: \bibitem{Kapiris} F. Mormann, T. Kreuz, C. Rieke, R.G. Andrzejak, A. Kraskov,
1281: P.David, C.E. Elger, K.Lehnertz, {\sl On the predictability of epileptic
1282: seizures}, Clin. Neurophysiol.(2005).
1283: %4
1284: \bibitem {Sornette2} A. Johansen and D. Sornette,
1285: {\sl Stock market crashes are outliers}, European Physical Journal B 1,
1286: 141-143 (1998).
1287: %5
1288: \bibitem{stocks}  N. Vandewalle, M. Ausloos, P. Boveroux, et al.,
1289: {\sl How the financial crash of October 1997 could have been predicted},
1290: European Physical Journal B {\bf 4} (2): [139-141] (1998).
1291: %6
1292: \bibitem{Sornette3}
1293:  D. Sornette,
1294: {\sl Predictability of catastrophic events: material rupture, earthquakes,
1295: turbulence, financial crashes and human birth}, Proceedings of the National
1296: Academy of Sciences USA, V99 SUPP1:2522-2529 (2002 FEB 19)
1297: %7
1298: \bibitem{Box-Jen} G. E. P. Box, G. M. Jenkins, G. C. Reinsel, {\sc Time Series
1299: Analysis}, Prentice-Hall, Inc. (1994).
1300: %8
1301: \bibitem{Brockwell} P.J. Brockwell, R.A. Davis, {\sc Time Series: Theory and
1302: Methods}, Springer (1998).
1303: %9
1304: \bibitem{Physa}{H. Kantz, D. Holstein, M. Ragwitz, N. K. Vitanov, {\sl Markov
1305: chain model for turbulent wind speed data}, Physica {\bf A 342} (2004)
1306: 315-321}.
1307: %10
1308: \bibitem{Goeber} M. G{\"o}ber, C. A. Wilson, S.F. Milton, D.B. Stephenson, {\sl
1309: Fairplay in the verification of operational quantitative precipitation
1310: forecasts}, Journal of Hydrology {\bf 288} (2004) [225-236].
1311: %11
1312: \bibitem{Johnson1} D. Lamper, S.D. Howison, N.F. Johnson, {\sl Predictability
1313: of Large Future Changes in a Competitive Evolving Population},
1314: Phys. Rev. Lett. {\bf 88}(1)(2002).
1315: %12
1316: \bibitem{Fatemeh} M.Reza Rahimi Tabar, M. Sahimi, F. Ghasemi, K. Kaviani,
1317: M. Allamehzadeh, J. Peinke, M. Mokhtaru, M. Vesaghi,
1318: M. D. Niry, A. Bahraminasab, S. Tabatabai, S. Fayazbakhsh and M. Akbari {\sl
1319: Short-Term Prediction of Medium- and Large-Size Earthquakes Based on Markov
1320: and Extended Self-Similarity Analysis of Seismic Data} arXiv:physics/0510043 v1 6 Oct 2005.
1321: %13
1322: \bibitem{Johnson2} P. Jefferies, D. Lamper, N.F. Johnson, {\sl Anatomy of
1323: extreme events in a complex adaptive system}, Physica A {\bf 318}:[592-600] (2003).
1324: %14
1325: \bibitem{Bernado} {J. M. Bernado, A. F. M. Smith, {\sc Bayesian Theory}, Wiley, New
1326: York, 1994}.
1327: %15
1328: \bibitem{Bishop}{ C. M. Bishop, {\sc Neural Networks for Pattern Recognition},
1329: Oxford University Press, 1995}
1330: %
1331: %16
1332: \bibitem{Swets1}{D. M. Green and J. A. Swets, {\sl Signal detection theory and
1333: psychophysics.},  Wiley, New York, 1966. }
1334: %17
1335: \bibitem{Egan}{J. P. Egan, {\sc Signal detection theory and ROC analysis},
1336: Academic Press, New York} 1975.
1337: %18
1338: \bibitem{Pepe} M. S. Pepe, {\sc The Statistical Evaluation of Medical Tests for
1339: Classification and Prediction}, Oxford University Press, 2003. 
1340: %19
1341: \bibitem{Euromech}
1342: Holger Kantz, Detlef Holstein, Mario Ragwitz, Nikolay K. Vitanov, {\sl  Short
1343: time prediction of wind speeds from local measurements}, in: {\sc Wind Energy
1344: -- Proceedings of the EUROMECH Colloquium}, eds. J. Peinke, P. Schaumann, S. Barth, Springer, 2006.
1345: %20
1346: \bibitem{Abram} M. Abramowitz, and I. A. Stegun, {\sc Handbook of Mathematical
1347: Functions}, (Dover, New York, 1972).
1348: %21
1349: \bibitem{Pru}A. P. Prudnikov, Yu. A. Brychkov, O. I. Marichev, {\sc Integrals
1350: and series Vol. II. Special functions}, Gordon and Breach Science Publ.New
1351: York.
1352: %22
1353: \bibitem{Sornette1} D. Sornette and J.V. Andersen, {\sl Increments of
1354: Uncorrelated Time Series Can Be Predicted With a Universal 75\% Probability of
1355: Success}, Int. J. Mod. Phys. C  {\bf 11} (4), 713-720 (2000).
1356: %23
1357: \bibitem{winddata} The wind-speed data were recorded at the Ris\o\ research
1358: national research laboratory in Denmark {\ttfamily http://www.risoe.dk/vea};
1359: see also {\ttfamily http://winddata.com }.
1360: %24
1361: \bibitem{Eduardosref} S. Prakash, S. Havlin, M. Schwartz, and H. E. Stanley,
1362: {\sl Structural and dynamical properties of long-range correlated percolation}
1363: Phys. Rev. A {\bf 46}, R1724 (1992). 
1364: %25
1365: \bibitem{Eduardo1} E. G. Altmann, H.Kantz, {\sl Recurrence time analysis,
1366: long-term correlations and extreme events}, Phys. Rev. E {\bf 71} 056106 (2005).
1367: \end{thebibliography}
1368: \end{document}
1369: 
1370: 
1371: