0410:physics0410039/ssc.tex

1: \documentclass{article}

2:

3: \usepackage{graphicx}

4: \usepackage{psfig}

5: \usepackage{epsfig}

6: \usepackage[round]{natbib}

7:

8: \setlength{\hoffset}{-1in}\setlength{\oddsidemargin}{2.5cm}

9: \setlength{\textwidth}{16cm} \setlength{\voffset}{-1in}

10: %\setlength{\topmargin}{1cm} \setlength{\textheight}{11cm}

11: \setlength{\topmargin}{1cm} \setlength{\textheight}{25cm}

12: \setlength{\unitlength}{1cm} \setlength{\parindent}{0cm}

13:

14: \newcommand{\bx}[1]{\fbox{\begin{minipage}{15.8cm}#1\end{minipage}}}

15: \newcommand{\bxx}[1]{\fbox{\begin{minipage}{12.0cm}#1\end{minipage}}}

16:

17: \bibliographystyle{plainnat}

18:

19: \title{Probabilistic forecasts of temperature: measuring the utility of the ensemble spread}

20:

21: \author{Stephen Jewson\footnote{\emph{Correspondence address}: RMS, 10 Eastcheap, London, EC3M 1AJ, UK.

22: Email: \texttt{x@stephenjewson.com}}}

23:

24: \begin{document}

25:

26: \maketitle

27:

28: \begin{abstract}

29: The spread of ensemble weather forecasts contains information

30: about the spread of possible future weather scenarios. But how much

31: information does it contain, and how useful is that information in

32: predicting the probabilities of future temperatures? One

33: traditional answer to this question is to calculate the

34: spread-skill correlation. We discuss the spread-skill correlation

35: and how it interacts with some simple calibration schemes. We then

36: point out why it is not, in fact, a useful measure for the amount

37: of information in the ensemble spread, and discuss a number of

38: other measures that are more useful.

39: \end{abstract}

40:

41: \section{Introduction}

42:

43: Forecasts of the expected surface air temperature

44: over the next 15 days are readily available from commercial forecast vendors.

45: The best of these forecasts have been proven to be consistently better than climatology

46: and such forecasts are widely used within industry.

47: There is also demand within industry for \emph{probabilistic} forecasts of temperature

48: i{.}e{.} forecasts that predict the whole distribution of temperatures.

49: Such forecasts are much more useful than forecasts of the expectation alone

50: in situations where the ultimate

51: variables being predicted are a non-linear function of temperature, as is commonly the case.

52:

53: Probabilistic forecasts of temperature can be made rather easily

54: from forecasts of the expected temperature

55: using linear regression.

56: The parameters of the regression model are derived using past forecasts and

57: past observations after these forecasts and observations have been converted

58: to standardized anomalies using the climatological mean and standard deviation.

59: Probabilistic forecasts made in this way provide a standard against which

60: forecasts made using more sophisticated methods should be compared, and it turns

61: out that they are hard to beat (our own attempts to beat regression, which have

62: more or less failed, are summarised in~\citet{jewson04l}).

63:

64: Regression-based probabilistic forecasts have a skill that doesn't vary with weather state.

65: It has been shown, however, that the uncertainty around forecasts of the expectation

66: \emph{does} vary with weather state and that these variations are predictable, to a certain

67: extent, using

68: the spread of ensemble forecasts (see, for example, ~\citet{kalnay}, and many others).

69: What is not clear is whether the level of predicability

70: in the variations of the uncertainty is useful in any material sense or whether the beneficial

71: effect on the final forecast of the temperature distribution is too small to be relevant.

72: How might we investigate this question of how much useful information there is in the ensemble spread?

73:

74: One method that is frequently used to assess the amount of information in the spread

75: from ensemble forecasts is the spread-skill correlation (SSC), defined in a number

76: of different ways (see for example~\citet{barker91}, \citet{whitaker98} and~\citet{hou01}).

77: SSC is usually calculated before the ensemble forecast has been calibrated

78: (i{.}e{.} before it has been turned into a probabilistic forecast).

79: However, it is the properties of the forecast \emph{after} calibration that we really care about.

80: In this article we investigate some of the properties of the spread-skill correlation, and in particular

81: how it interacts with the calibration procedure.

82: We will show that, under certain combinations of the definition of the SSC and the

83: calibration procedure, the SSC is the same before and after

84: the calibration, implying that pre-calibration estimates of the SSC

85: can be used to predict post-calibration values.

86:

87: However we also note that even the post-calibration SSC is not a particularly good indicator

88: of the level of useful information that can be derived from the ensemble spread

89: and we describe how it can be possible

90: that the SSC is high but the ensemble spread is effectively useless as a predictor

91: of the future temperature distribution.

92:

93: Finally we present some simple measures that improve on the SSC and

94: that can be used to ascertain whether the information in the ensemble spread

95: is really useful or not.

96:

97: \section{The linear anomaly correlation}

98:

99: We start by reviewing some of the properties of the linear anomaly correlation (LAC).

100: This will help us understand how to think about the properties of the SSC.

101:

102: The amount of information in a temperature forecast from an NWP model is commonly measured using the

103: LAC between the forecast and an analysis.

104: One of the reasons that the LAC is a useful measure is that it is conserved under linear transformations,

105: and so if the forecast is calibrated using a linear transformation (such as linear regression)

106: then the LAC post-calibration is the same as the LAC pre-calibration. This means that

107: one doesn't actually have

108: to perform the calibration to know what the post-calibration LAC is going to be.

109:

110: \section{The spread-skill correlation}

111:

112: In a similar way the SSC is commonly applied to the output from NWP models to assess the ability of the model

113: to capture variations in the uncertainty (see for example~\citet{buizza97}).

114:

115: Four commonly used definitions of SSC are:

116:

117: \begin{eqnarray}

118:  \mbox{SSC}_1&=&\mbox{linear correlation}(|e|,s)\\

119:  \mbox{SSC}_2&=&\mbox{linear correlation}(e^2,s)\\

120:  \mbox{SSC}_3&=&\mbox{linear correlation}(|e|,s^2)\\

121:  \mbox{SSC}_4&=&\mbox{linear correlation}(e^2,s^2)

122: \end{eqnarray}

123:

124: where $e$ are the forecast errors and $s$ is the ensemble spread.

125:

126: In the same way that predictions of the mean temperature must be calibrated, so must predictions of the uncertainty.

127: In~\citet{jewsonbz03a} we argued that both an offset and a scaling are needed in this calibration:

128: this allows for both the

129: mean level of uncertainty and the amplitude of the variability of the uncertainty to be set correctly.

130: We have proposed and tested various models that can be used for this calibration: a summary of our

131: results is given in~\citet{jewson04l}.

132: All the models we propose are generalisations of linear regression.

133: The two models of most relevance to the current discussion are standard deviation and variance

134: based \emph{spread regression} models defined by:

135:

136: \begin{equation}\label{sr1}

137:  T_i \sim N (\mbox{mean}=\alpha+\beta m_i, \mbox{standard deviation}=\gamma+\delta s_i)

138: \end{equation}

139:

140: and

141:

142: \begin{equation}\label{sr2}

143:  T_i \sim N (\mbox{mean}=\alpha+\beta m_i, \mbox{variance}=\gamma^2+\delta^2 s_i^2)

144: \end{equation}

145:

146: where $T_i$ is the temperature anomaly on day $i$,

147: $m_i$ is the ensemble mean anomaly on day $i$,

148: $s_i$ is the ensemble spread anomaly on day $i$

149: and where anomalies are defined by subtracting a climatological seasonal cycle in the

150: mean and dividing by a climatological seasonal cycle in the spread.

151: $\alpha, \beta, \gamma$ and $\delta$ are

152: free parameters:

153: we call $\gamma$ and $\gamma^2$ the \emph{spread-skill bias correction} and $\delta$ and

154: $\delta^2$ the

155: \emph{spread-skill regression coefficient}, while we call $\gamma+\delta \overline{s}$  and

156: $\gamma^2+\delta^2 \overline{s^2}$ the \emph{spread-skill offset}.

157:

158:

159: Which of the standard deviation or variance based calibration models is better

160: is not clear a-priori

161: but can be answered for any particular data set by comparing the in-sample or out-of-sample

162: likelihoods achieved by the two models.

163:

164: It would be very useful if the SSC (for any of the above definitions)

165: were the same before and after

166: calibration (for either of the above calibration methods).

167: Then, as with linear correlation, the pre-calibration SSC could be used to predict

168: the post-calibration SSC and we would not actually have to perform the calibration

169: to calculate the post-calibration SSC.

170:

171: We now investigate whether the SSC has this useful property, which we will call conservation.

172:

173: \section{Conservation properties of the spread-skill correlation}

174:

175: The conservation properties of the SSC are straightforward

176: and somewhat obvious. They can be derived based on the

177: observation that linear correlations are not affected by linear transformations

178: of either variable.

179:

180: Under the standard deviation based spread regression model

181: the spread skill correlation defined as either SSC$_1$ or SSC$_2$

182: will be conserved because these measures base the SSC on $s$ and

183: the calibration of $s$ is simply a linear tranformation.

184: The SSC measures based on $s^2$ will not, however, be conserved when using

185: standard deviation based spread regression.

186:

187: Alternatively under the variance based spread regression model

188: the spread skill correlation define as either SSC$_3$ or SSC$_4$

189: will be conserved because these measures base the SSC on $s^2$

190: and the calibration of $s^2$ is now a linear transformation.

191: However SSC$_1$ and SSC$_2$ will not be conserved under the variance based spread regression model.

192:

193: Together these results suggest

194: that the choice of which SSC measure to choose is not arbitrary but should be influenced

195: by whichever of the calibration models works better for the data in hand.

196:

197: \section{The offset problem}

198:

199: We have shown that the SSC can be conserved during calibration as long as the definition of SSC is

200: chosen to match the method used for the calibration.

201: There is, however, a problem with the SSC as a measure for the amount of information

202: in a probabilistic forecast.

203: This problem is caused by the spread-skill offset given by $\gamma+\delta \overline{s}$

204: in equation~\ref{sr1} and by $\gamma^2+\delta^2 \overline{s^2}$ in equation~\ref{sr2}.

205:

206: When the offset is large relative to the amplitude of the variability of the uncertainty we find ourselves

207: in a situation in which predictions of the variations of the uncertainty are more or less irrelevant, even

208: if they are very good, simply because they don't contribute much as a fraction of the total uncertainty.

209:

210: In such cases the SSC may be large but the ensemble spread could be ignored without reducing the skill of the

211: calibrated forecast: linear regression would work as well as spread regression.

212: We clearly need other measures to assess whether the spread is really useful that take into account

213: the \emph{size} of the calibrated variations in uncertainty.

214: Since this question depends crucially on the offset

215: and the offset can only be derived during the calibration procedure

216: it will not be possible to estimate the usefulness

217: of the spread before calibration has taken place.

218:

219: This is a fundamental difference between forecasts of spread and forecasts of the expectation, since, as we

220: have seen, it \emph{is} possible to estimate the information in a forecast of the expectation before the calibration

221: has taken place. This difference arises because when we predict the mean temperature we are

222: concerned with predicting changes from the normal while when we predict the uncertainty we are only interested

223: in the extent to which our estimate of the uncertainty improves the forecast of the temperature distribution.

224: Thus we are interested in actual values of the uncertainty rather than just departures from normal.

225:

226: \section{Other measures of the utility of ensemble spread}

227:

228: Because of the offset problem with the SSC we now suggest some alternative methods

229: for measuring the usefulness of the ensemble spread.

230: All of these measures can only be calculated \emph{after} calibration,

231: as explained above.

232:

233: \subsection{Coefficient of variation of spread}

234:

235: Our first measure is the \emph{coefficient of variation of spread} defined as:

236:

237: \begin{equation}

238: \mbox{COVS}=\frac{\sigma_\sigma}{\mu_\sigma}

239: \end{equation}

240:

241: where $\sigma_\sigma$ is the standard deviation of variations in the uncertainty or the spread,

242: and $\mu_\sigma$ is the mean level in the uncertainty or the spread.

243:

244: COVS was introduced in~\citet{jewsonbz03a} and measures the size of the variations of the

245: spread relative to the mean spread. Values for the COVS versus lead time for ECMWF ensemble

246: forecasts for London Heathrow are given in that paper.

247:

248: If the post-calibration COVS is small then that implies that the variations

249: in the uncertainty are small relative to the mean uncertainty, and, depending on the

250: level of accuracy required, that it may be reasonable to ignore

251: the variations in the uncertainty completely and model it as constant i{.}e{.} that linear

252: regression may be as good as spread regression.

253:

254:

255: \subsection{Spread mean variability ratio}

256:

257: The limitation of using the COVS to understand the importance of

258: variations in the ensemble spread is it doesn't take into account the size of the

259: variations in the mean temperature.

260: One can imagine the following two limiting cases:

261:

262: \begin{enumerate}

263:

264:     \item The expected temperature is the same every day but the standard deviation

265:     of possible temperatures varies. In this case forecasts of the uncertainty

266:     of temperature would be very useful. We call this a `\emph{mean constant spread varies}' world.

267:

268:     \item The expected temperature varies from day to day but the standard

269:     deviation of possible temperatures is constant. In this case forecasts

270:     of the uncertainty of temperature would not be useful.

271:     We call this a `\emph{mean varies spread constant}' world.

272:

273: \end{enumerate}

274:

275: In order to distinguish between these two scenarios we define the

276: \emph{spread-mean variability ratio} as:

277:

278: \begin{equation}

279:  \mbox{SMVR}_1=\frac{\sigma_\sigma}{\sigma_\mu}

280: \end{equation}

281:

282: where $\sigma_\sigma$ is the standard deviation of variations in the uncertainty or the spread

283: and $\sigma_\mu$ is the standard deviation of variations in the expected temperature.

284:

285: An alternative definition based on variance would be:

286: \begin{equation}

287:  \mbox{SMVR}_2=\frac{\sigma^2_\sigma}{\sigma^2_\mu}

288: \end{equation}

289:

290: The SMVR measures the size of variations of the spread relative to the size of the variations of the mean.

291: Small values of the SMVR imply that we are close to the mean-varies-spread-constant world while large

292: values of SMVR imply that we are close to the mean-constant-spread-varies world.

293:

294: Figure~\ref{f:f1} shows the post-calibration SMVR$_1$ for the forecasts used in~\citet{jewsonbz03a}.

295: We see that the SMVR is small at all leads, with smallest values at the shortest leads.

296: We thus see that we are much closer to the mean-varies-spread-constant world than we are to the

297: mean-constant-spread-varies world, and hence that predicting variations in the uncertainty

298: is likely to be less useful than it would be in a world in which the SMVR  were larger.

299:

300: %\subsection{Quantile-mean variability ratio}

301: %

302: %How much impact variations in the uncertainty have on final predictions depends in part

303: %on the point on the distribution that is being predicted.

304: %For example consider that we are predicting a quantile from a normal distribution. The prediction

305: %is given by:

306: %\begin{equation}

307: %q=\mu+a \sigma

308: %\end{equation}

309: %where $a$ specifies which quantile is being predicted. Small values of $a$ predict

310: %quantiles near the mean while large values of $a$ predict quantiles in the tails.

311: %If we consider the variance of $q$ we see that:

312: %\begin{equation}

313: % var(q)=var(\mu)+a^2 var(\sigma)

314: %\end{equation}

315: %We see from this that variations in the spread become more important the more

316: %extreme the quantiles that we are trying to predict. It thus makes

317: %sense to measure the relevance of variations in the spread for each quantile

318: %separately: small variations in the spread may be unimportant for predictions

319: %near the mean but important for predictions in the tails.

320: %

321: %Thus for a given quantile specified by a value of $a$ we can define

322: %

323: %\begin{equation}

324: % \mbox{QMVR_1}=\frac{\sigma_q}{\sigma_\mu}

325: %\end{equation}

326: %

327: %or

328: %

329: %\begin{equation}

330: % \mbox{QMVR}=\frac{var(q)}{\sigma^2_\mu}

331: %\end{equation}

332: %

333: %Small values of the QMVR suggest that the spread is not important for predicting this particular

334: %quantile. Large values of the QMVR suggest that it is. For large enough values of $a$, even tiny

335: %variations in the spread become important.

336:

337: \subsection{Impact on the log-likelihood}

338:

339: The final measure of the utility of forecasts of spread that we present is simply the change

340: in the cost function that is being used to calibrate and evaluate the forecast. We ourselves prefer

341: to evaluate probabilistic forecasts of temperature using the log-likelihood from classical

342: statistics (\citet{fisher1912}, \citet{jewson03d})

343: and hence we consider the change in the log-likelihood due to the

344: inclusion of information from the ensemble spread as a measure of how useful that information is.

345: When we evaluated

346: the usefulness of the spread in temperature forecasts derived from the ECMWF ensemble using

347: this method we found that the spread was not very important~\citep{jewson04l}.

348:

349: One aspect of our comparison of forecasts using log-likelihoods in~\citet{jewson04l} is that

350: we calculated log-likelihood based on the whole

351: distribution of future temperatures. This was deliberate: it is predicting the whole distribution of

352: temperature that we are interested in.

353: However, if instead we were mainly interested in the tails of the

354: distribution then a version of the log-likelihood based only on the tails

355: would be more appropriate and the ensemble spread would perhaps be more useful.

356:

357: \section{Summary}

358:

359: We have considered how to measure the importance of variations in the ensemble spread when making probabilistic

360: temperature forecasts.

361: First we have considered the interaction between measures of the spread-skill correlation (SSC) and the

362: methods used to calibrate the forecast. We find that certain definitions of SSC are conserved through the

363: calibration process for certain calibration algorithms, implying that the choice of SSC measure to

364: be used should be linked to the choice of calibration method.

365:

366: However we also discuss why the SSC is not a particularly useful measure of the information in the ensemble

367: spread and explain how a high value of the SSC does not necessarily

368: mean that the spread improves the quality of the final forecast because of the possibility of a large

369: offset in the calibrated uncertainty.

370:

371: We have discussed some alternative and preferable diagnostics that focus on the role the spread plays in the final

372: calibrated forecast.

373: The first of these diagnostics measures the size of variations

374: in the uncertainty relative to the mean uncertainty

375: and the second measures the size of variations in the uncertainty relative to the size of the variations in the

376: expected temperature.

377: We calculate the latter for a year of forecast data and find that we are much closer to a world

378: in which the mean varies and the spread is fixed than we are to a world in which the

379: the spread varies and the mean is fixed. This seems to partly explain why we see so little improvement

380: in the skill of probabilistic forecasts when we add the ensemble spread as an extra predictor.

381:

382: \section{Acknowledgements}

383:

384: Thanks to Jeremy Penzer and Christine Ziehmann for some interesting discussions on this topic.

385:

386: \section{Legal statement}

387:

388: SJ was employed by RMS at the time that this article was written.

389:

390: However, neither the research behind this article nor the writing

391: of this article were in the course of his employment, (where 'in

392: the course of their employment' is within the meaning of the

393: Copyright, Designs and Patents Act 1988, Section 11), nor were

394: they in the course of his normal duties, or in the course of

395: duties falling outside his normal duties but specifically assigned

396: to him (where 'in the course of his normal duties' and 'in the

397: course of duties falling outside his normal duties' are within the

398: meanings of the Patents Act 1977, Section 39). Furthermore the

399: article does not contain any proprietary information or trade

400: secrets of RMS. As a result, the author is the owner of all the

401: intellectual property rights (including, but not limited to,

402: copyright, moral rights, design rights and rights to inventions)

403: associated with and arising from this article. The author reserves

404: all these rights. No-one may reproduce, store or transmit, in any

405: form or by any means, any part of this article without the

406: author's prior written permission. The moral rights of the author

407: have been asserted.

408:

409: The contents of this article reflect the author's personal

410: opinions at the point in time at which this article was submitted

411: for publication. However, by the very nature of ongoing research,

412: they do not necessarily reflect the author's current opinions. In

413: addition, they do not necessarily reflect the opinions of the

414: author's employer.

415:

416: \clearpage

417: \begin{figure}[!htb]

418:   \begin{center}

419:    \includegraphics[scale=0.7,angle=0]{fig1.ps}

420:   \end{center}

421:   \caption{

422: The SMVR$_1$ calculated from one year of ECMWF ensemble forecasts

423: for London Heathrow calibrated using the standard deviation based

424: spread regression model.

425:   }

426:   \label{f:f1}

427: \end{figure}

428:

429: \bibliography{jewson}

430:

431: \end{document}

432: