physics0410039/ssc.tex
1: \documentclass{article}
2: 
3: \usepackage{graphicx}
4: \usepackage{psfig}
5: \usepackage{epsfig}
6: \usepackage[round]{natbib}
7: 
8: \setlength{\hoffset}{-1in}\setlength{\oddsidemargin}{2.5cm}
9: \setlength{\textwidth}{16cm} \setlength{\voffset}{-1in}
10: %\setlength{\topmargin}{1cm} \setlength{\textheight}{11cm}
11: \setlength{\topmargin}{1cm} \setlength{\textheight}{25cm}
12: \setlength{\unitlength}{1cm} \setlength{\parindent}{0cm}
13: 
14: \newcommand{\bx}[1]{\fbox{\begin{minipage}{15.8cm}#1\end{minipage}}}
15: \newcommand{\bxx}[1]{\fbox{\begin{minipage}{12.0cm}#1\end{minipage}}}
16: 
17: \bibliographystyle{plainnat}
18: 
19: \title{Probabilistic forecasts of temperature: measuring the utility of the ensemble spread}
20: 
21: \author{Stephen Jewson\footnote{\emph{Correspondence address}: RMS, 10 Eastcheap, London, EC3M 1AJ, UK.
22: Email: \texttt{x@stephenjewson.com}}}
23: 
24: \begin{document}
25: 
26: \maketitle
27: 
28: \begin{abstract}
29: The spread of ensemble weather forecasts contains information
30: about the spread of possible future weather scenarios. But how much
31: information does it contain, and how useful is that information in
32: predicting the probabilities of future temperatures? One
33: traditional answer to this question is to calculate the
34: spread-skill correlation. We discuss the spread-skill correlation
35: and how it interacts with some simple calibration schemes. We then
36: point out why it is not, in fact, a useful measure for the amount
37: of information in the ensemble spread, and discuss a number of
38: other measures that are more useful.
39: \end{abstract}
40: 
41: \section{Introduction}
42: 
43: Forecasts of the expected surface air temperature
44: over the next 15 days are readily available from commercial forecast vendors.
45: The best of these forecasts have been proven to be consistently better than climatology
46: and such forecasts are widely used within industry.
47: There is also demand within industry for \emph{probabilistic} forecasts of temperature
48: i{.}e{.} forecasts that predict the whole distribution of temperatures.
49: Such forecasts are much more useful than forecasts of the expectation alone
50: in situations where the ultimate
51: variables being predicted are a non-linear function of temperature, as is commonly the case.
52: 
53: Probabilistic forecasts of temperature can be made rather easily
54: from forecasts of the expected temperature
55: using linear regression.
56: The parameters of the regression model are derived using past forecasts and
57: past observations after these forecasts and observations have been converted
58: to standardized anomalies using the climatological mean and standard deviation.
59: Probabilistic forecasts made in this way provide a standard against which
60: forecasts made using more sophisticated methods should be compared, and it turns
61: out that they are hard to beat (our own attempts to beat regression, which have
62: more or less failed, are summarised in~\citet{jewson04l}).
63: 
64: Regression-based probabilistic forecasts have a skill that doesn't vary with weather state.
65: It has been shown, however, that the uncertainty around forecasts of the expectation
66: \emph{does} vary with weather state and that these variations are predictable, to a certain
67: extent, using
68: the spread of ensemble forecasts (see, for example, ~\citet{kalnay}, and many others).
69: What is not clear is whether the level of predicability
70: in the variations of the uncertainty is useful in any material sense or whether the beneficial
71: effect on the final forecast of the temperature distribution is too small to be relevant.
72: How might we investigate this question of how much useful information there is in the ensemble spread?
73: 
74: One method that is frequently used to assess the amount of information in the spread
75: from ensemble forecasts is the spread-skill correlation (SSC), defined in a number
76: of different ways (see for example~\citet{barker91}, \citet{whitaker98} and~\citet{hou01}).
77: SSC is usually calculated before the ensemble forecast has been calibrated
78: (i{.}e{.} before it has been turned into a probabilistic forecast).
79: However, it is the properties of the forecast \emph{after} calibration that we really care about.
80: In this article we investigate some of the properties of the spread-skill correlation, and in particular
81: how it interacts with the calibration procedure.
82: We will show that, under certain combinations of the definition of the SSC and the
83: calibration procedure, the SSC is the same before and after
84: the calibration, implying that pre-calibration estimates of the SSC
85: can be used to predict post-calibration values.
86: 
87: However we also note that even the post-calibration SSC is not a particularly good indicator
88: of the level of useful information that can be derived from the ensemble spread
89: and we describe how it can be possible
90: that the SSC is high but the ensemble spread is effectively useless as a predictor
91: of the future temperature distribution.
92: 
93: Finally we present some simple measures that improve on the SSC and
94: that can be used to ascertain whether the information in the ensemble spread
95: is really useful or not.
96: 
97: \section{The linear anomaly correlation}
98: 
99: We start by reviewing some of the properties of the linear anomaly correlation (LAC).
100: This will help us understand how to think about the properties of the SSC.
101: 
102: The amount of information in a temperature forecast from an NWP model is commonly measured using the
103: LAC between the forecast and an analysis.
104: One of the reasons that the LAC is a useful measure is that it is conserved under linear transformations,
105: and so if the forecast is calibrated using a linear transformation (such as linear regression)
106: then the LAC post-calibration is the same as the LAC pre-calibration. This means that
107: one doesn't actually have
108: to perform the calibration to know what the post-calibration LAC is going to be.
109: 
110: \section{The spread-skill correlation}
111: 
112: In a similar way the SSC is commonly applied to the output from NWP models to assess the ability of the model
113: to capture variations in the uncertainty (see for example~\citet{buizza97}).
114: 
115: Four commonly used definitions of SSC are:
116: 
117: \begin{eqnarray}
118:  \mbox{SSC}_1&=&\mbox{linear correlation}(|e|,s)\\
119:  \mbox{SSC}_2&=&\mbox{linear correlation}(e^2,s)\\
120:  \mbox{SSC}_3&=&\mbox{linear correlation}(|e|,s^2)\\
121:  \mbox{SSC}_4&=&\mbox{linear correlation}(e^2,s^2)
122: \end{eqnarray}
123: 
124: where $e$ are the forecast errors and $s$ is the ensemble spread.
125: 
126: In the same way that predictions of the mean temperature must be calibrated, so must predictions of the uncertainty.
127: In~\citet{jewsonbz03a} we argued that both an offset and a scaling are needed in this calibration:
128: this allows for both the
129: mean level of uncertainty and the amplitude of the variability of the uncertainty to be set correctly.
130: We have proposed and tested various models that can be used for this calibration: a summary of our
131: results is given in~\citet{jewson04l}.
132: All the models we propose are generalisations of linear regression.
133: The two models of most relevance to the current discussion are standard deviation and variance
134: based \emph{spread regression} models defined by:
135: 
136: \begin{equation}\label{sr1}
137:  T_i \sim N (\mbox{mean}=\alpha+\beta m_i, \mbox{standard deviation}=\gamma+\delta s_i)
138: \end{equation}
139: 
140: and
141: 
142: \begin{equation}\label{sr2}
143:  T_i \sim N (\mbox{mean}=\alpha+\beta m_i, \mbox{variance}=\gamma^2+\delta^2 s_i^2)
144: \end{equation}
145: 
146: where $T_i$ is the temperature anomaly on day $i$,
147: $m_i$ is the ensemble mean anomaly on day $i$,
148: $s_i$ is the ensemble spread anomaly on day $i$
149: and where anomalies are defined by subtracting a climatological seasonal cycle in the
150: mean and dividing by a climatological seasonal cycle in the spread.
151: $\alpha, \beta, \gamma$ and $\delta$ are
152: free parameters:
153: we call $\gamma$ and $\gamma^2$ the \emph{spread-skill bias correction} and $\delta$ and
154: $\delta^2$ the
155: \emph{spread-skill regression coefficient}, while we call $\gamma+\delta \overline{s}$  and
156: $\gamma^2+\delta^2 \overline{s^2}$ the \emph{spread-skill offset}.
157: 
158: 
159: Which of the standard deviation or variance based calibration models is better
160: is not clear a-priori
161: but can be answered for any particular data set by comparing the in-sample or out-of-sample
162: likelihoods achieved by the two models.
163: 
164: It would be very useful if the SSC (for any of the above definitions)
165: were the same before and after
166: calibration (for either of the above calibration methods).
167: Then, as with linear correlation, the pre-calibration SSC could be used to predict
168: the post-calibration SSC and we would not actually have to perform the calibration
169: to calculate the post-calibration SSC.
170: 
171: We now investigate whether the SSC has this useful property, which we will call conservation.
172: 
173: \section{Conservation properties of the spread-skill correlation}
174: 
175: The conservation properties of the SSC are straightforward
176: and somewhat obvious. They can be derived based on the
177: observation that linear correlations are not affected by linear transformations
178: of either variable.
179: 
180: Under the standard deviation based spread regression model
181: the spread skill correlation defined as either SSC$_1$ or SSC$_2$
182: will be conserved because these measures base the SSC on $s$ and
183: the calibration of $s$ is simply a linear tranformation.
184: The SSC measures based on $s^2$ will not, however, be conserved when using
185: standard deviation based spread regression.
186: 
187: Alternatively under the variance based spread regression model
188: the spread skill correlation define as either SSC$_3$ or SSC$_4$
189: will be conserved because these measures base the SSC on $s^2$
190: and the calibration of $s^2$ is now a linear transformation.
191: However SSC$_1$ and SSC$_2$ will not be conserved under the variance based spread regression model.
192: 
193: Together these results suggest
194: that the choice of which SSC measure to choose is not arbitrary but should be influenced
195: by whichever of the calibration models works better for the data in hand.
196: 
197: \section{The offset problem}
198: 
199: We have shown that the SSC can be conserved during calibration as long as the definition of SSC is
200: chosen to match the method used for the calibration.
201: There is, however, a problem with the SSC as a measure for the amount of information
202: in a probabilistic forecast.
203: This problem is caused by the spread-skill offset given by $\gamma+\delta \overline{s}$
204: in equation~\ref{sr1} and by $\gamma^2+\delta^2 \overline{s^2}$ in equation~\ref{sr2}.
205: 
206: When the offset is large relative to the amplitude of the variability of the uncertainty we find ourselves
207: in a situation in which predictions of the variations of the uncertainty are more or less irrelevant, even
208: if they are very good, simply because they don't contribute much as a fraction of the total uncertainty.
209: 
210: In such cases the SSC may be large but the ensemble spread could be ignored without reducing the skill of the
211: calibrated forecast: linear regression would work as well as spread regression.
212: We clearly need other measures to assess whether the spread is really useful that take into account
213: the \emph{size} of the calibrated variations in uncertainty.
214: Since this question depends crucially on the offset
215: and the offset can only be derived during the calibration procedure
216: it will not be possible to estimate the usefulness
217: of the spread before calibration has taken place.
218: 
219: This is a fundamental difference between forecasts of spread and forecasts of the expectation, since, as we
220: have seen, it \emph{is} possible to estimate the information in a forecast of the expectation before the calibration
221: has taken place. This difference arises because when we predict the mean temperature we are
222: concerned with predicting changes from the normal while when we predict the uncertainty we are only interested
223: in the extent to which our estimate of the uncertainty improves the forecast of the temperature distribution.
224: Thus we are interested in actual values of the uncertainty rather than just departures from normal.
225: 
226: \section{Other measures of the utility of ensemble spread}
227: 
228: Because of the offset problem with the SSC we now suggest some alternative methods
229: for measuring the usefulness of the ensemble spread.
230: All of these measures can only be calculated \emph{after} calibration,
231: as explained above.
232: 
233: \subsection{Coefficient of variation of spread}
234: 
235: Our first measure is the \emph{coefficient of variation of spread} defined as:
236: 
237: \begin{equation}
238: \mbox{COVS}=\frac{\sigma_\sigma}{\mu_\sigma}
239: \end{equation}
240: 
241: where $\sigma_\sigma$ is the standard deviation of variations in the uncertainty or the spread,
242: and $\mu_\sigma$ is the mean level in the uncertainty or the spread.
243: 
244: COVS was introduced in~\citet{jewsonbz03a} and measures the size of the variations of the
245: spread relative to the mean spread. Values for the COVS versus lead time for ECMWF ensemble
246: forecasts for London Heathrow are given in that paper.
247: 
248: If the post-calibration COVS is small then that implies that the variations
249: in the uncertainty are small relative to the mean uncertainty, and, depending on the
250: level of accuracy required, that it may be reasonable to ignore
251: the variations in the uncertainty completely and model it as constant i{.}e{.} that linear
252: regression may be as good as spread regression.
253: 
254: 
255: \subsection{Spread mean variability ratio}
256: 
257: The limitation of using the COVS to understand the importance of
258: variations in the ensemble spread is it doesn't take into account the size of the
259: variations in the mean temperature.
260: One can imagine the following two limiting cases:
261: 
262: \begin{enumerate}
263: 
264:     \item The expected temperature is the same every day but the standard deviation
265:     of possible temperatures varies. In this case forecasts of the uncertainty
266:     of temperature would be very useful. We call this a `\emph{mean constant spread varies}' world.
267: 
268:     \item The expected temperature varies from day to day but the standard
269:     deviation of possible temperatures is constant. In this case forecasts
270:     of the uncertainty of temperature would not be useful.
271:     We call this a `\emph{mean varies spread constant}' world.
272: 
273: \end{enumerate}
274: 
275: In order to distinguish between these two scenarios we define the
276: \emph{spread-mean variability ratio} as:
277: 
278: \begin{equation}
279:  \mbox{SMVR}_1=\frac{\sigma_\sigma}{\sigma_\mu}
280: \end{equation}
281: 
282: where $\sigma_\sigma$ is the standard deviation of variations in the uncertainty or the spread
283: and $\sigma_\mu$ is the standard deviation of variations in the expected temperature.
284: 
285: An alternative definition based on variance would be:
286: \begin{equation}
287:  \mbox{SMVR}_2=\frac{\sigma^2_\sigma}{\sigma^2_\mu}
288: \end{equation}
289: 
290: The SMVR measures the size of variations of the spread relative to the size of the variations of the mean.
291: Small values of the SMVR imply that we are close to the mean-varies-spread-constant world while large
292: values of SMVR imply that we are close to the mean-constant-spread-varies world.
293: 
294: Figure~\ref{f:f1} shows the post-calibration SMVR$_1$ for the forecasts used in~\citet{jewsonbz03a}.
295: We see that the SMVR is small at all leads, with smallest values at the shortest leads.
296: We thus see that we are much closer to the mean-varies-spread-constant world than we are to the
297: mean-constant-spread-varies world, and hence that predicting variations in the uncertainty
298: is likely to be less useful than it would be in a world in which the SMVR  were larger.
299: 
300: %\subsection{Quantile-mean variability ratio}
301: %
302: %How much impact variations in the uncertainty have on final predictions depends in part
303: %on the point on the distribution that is being predicted.
304: %For example consider that we are predicting a quantile from a normal distribution. The prediction
305: %is given by:
306: %\begin{equation}
307: %q=\mu+a \sigma
308: %\end{equation}
309: %where $a$ specifies which quantile is being predicted. Small values of $a$ predict
310: %quantiles near the mean while large values of $a$ predict quantiles in the tails.
311: %If we consider the variance of $q$ we see that:
312: %\begin{equation}
313: % var(q)=var(\mu)+a^2 var(\sigma)
314: %\end{equation}
315: %We see from this that variations in the spread become more important the more
316: %extreme the quantiles that we are trying to predict. It thus makes
317: %sense to measure the relevance of variations in the spread for each quantile
318: %separately: small variations in the spread may be unimportant for predictions
319: %near the mean but important for predictions in the tails.
320: %
321: %Thus for a given quantile specified by a value of $a$ we can define
322: %
323: %\begin{equation}
324: % \mbox{QMVR_1}=\frac{\sigma_q}{\sigma_\mu}
325: %\end{equation}
326: %
327: %or
328: %
329: %\begin{equation}
330: % \mbox{QMVR}=\frac{var(q)}{\sigma^2_\mu}
331: %\end{equation}
332: %
333: %Small values of the QMVR suggest that the spread is not important for predicting this particular
334: %quantile. Large values of the QMVR suggest that it is. For large enough values of $a$, even tiny
335: %variations in the spread become important.
336: 
337: \subsection{Impact on the log-likelihood}
338: 
339: The final measure of the utility of forecasts of spread that we present is simply the change
340: in the cost function that is being used to calibrate and evaluate the forecast. We ourselves prefer
341: to evaluate probabilistic forecasts of temperature using the log-likelihood from classical
342: statistics (\citet{fisher1912}, \citet{jewson03d})
343: and hence we consider the change in the log-likelihood due to the
344: inclusion of information from the ensemble spread as a measure of how useful that information is.
345: When we evaluated
346: the usefulness of the spread in temperature forecasts derived from the ECMWF ensemble using
347: this method we found that the spread was not very important~\citep{jewson04l}.
348: 
349: One aspect of our comparison of forecasts using log-likelihoods in~\citet{jewson04l} is that
350: we calculated log-likelihood based on the whole
351: distribution of future temperatures. This was deliberate: it is predicting the whole distribution of
352: temperature that we are interested in.
353: However, if instead we were mainly interested in the tails of the
354: distribution then a version of the log-likelihood based only on the tails
355: would be more appropriate and the ensemble spread would perhaps be more useful.
356: 
357: \section{Summary}
358: 
359: We have considered how to measure the importance of variations in the ensemble spread when making probabilistic
360: temperature forecasts.
361: First we have considered the interaction between measures of the spread-skill correlation (SSC) and the
362: methods used to calibrate the forecast. We find that certain definitions of SSC are conserved through the
363: calibration process for certain calibration algorithms, implying that the choice of SSC measure to
364: be used should be linked to the choice of calibration method.
365: 
366: However we also discuss why the SSC is not a particularly useful measure of the information in the ensemble
367: spread and explain how a high value of the SSC does not necessarily
368: mean that the spread improves the quality of the final forecast because of the possibility of a large
369: offset in the calibrated uncertainty.
370: 
371: We have discussed some alternative and preferable diagnostics that focus on the role the spread plays in the final
372: calibrated forecast.
373: The first of these diagnostics measures the size of variations
374: in the uncertainty relative to the mean uncertainty
375: and the second measures the size of variations in the uncertainty relative to the size of the variations in the
376: expected temperature.
377: We calculate the latter for a year of forecast data and find that we are much closer to a world
378: in which the mean varies and the spread is fixed than we are to a world in which the
379: the spread varies and the mean is fixed. This seems to partly explain why we see so little improvement
380: in the skill of probabilistic forecasts when we add the ensemble spread as an extra predictor.
381: 
382: \section{Acknowledgements}
383: 
384: Thanks to Jeremy Penzer and Christine Ziehmann for some interesting discussions on this topic.
385: 
386: \section{Legal statement}
387: 
388: SJ was employed by RMS at the time that this article was written.
389: 
390: However, neither the research behind this article nor the writing
391: of this article were in the course of his employment, (where 'in
392: the course of their employment' is within the meaning of the
393: Copyright, Designs and Patents Act 1988, Section 11), nor were
394: they in the course of his normal duties, or in the course of
395: duties falling outside his normal duties but specifically assigned
396: to him (where 'in the course of his normal duties' and 'in the
397: course of duties falling outside his normal duties' are within the
398: meanings of the Patents Act 1977, Section 39). Furthermore the
399: article does not contain any proprietary information or trade
400: secrets of RMS. As a result, the author is the owner of all the
401: intellectual property rights (including, but not limited to,
402: copyright, moral rights, design rights and rights to inventions)
403: associated with and arising from this article. The author reserves
404: all these rights. No-one may reproduce, store or transmit, in any
405: form or by any means, any part of this article without the
406: author's prior written permission. The moral rights of the author
407: have been asserted.
408: 
409: The contents of this article reflect the author's personal
410: opinions at the point in time at which this article was submitted
411: for publication. However, by the very nature of ongoing research,
412: they do not necessarily reflect the author's current opinions. In
413: addition, they do not necessarily reflect the opinions of the
414: author's employer.
415: 
416: \clearpage
417: \begin{figure}[!htb]
418:   \begin{center}
419:    \includegraphics[scale=0.7,angle=0]{fig1.ps}
420:   \end{center}
421:   \caption{
422: The SMVR$_1$ calculated from one year of ECMWF ensemble forecasts
423: for London Heathrow calibrated using the standard deviation based
424: spread regression model.
425:   }
426:   \label{f:f1}
427: \end{figure}
428: 
429: \bibliography{jewson}
430: 
431: \end{document}
432: