Next Article in Journal
The Intrinsic Cause-Effect Power of Discrete Dynamical Systems—From Elementary Cellular Automata to Adapting Animats
Next Article in Special Issue
Nonlinear Predictive Control of a Hydropower System Model
Previous Article in Journal
Reaction Kinetic Parameters and Surface Thermodynamic Properties of Cu2O Nanocubes
Previous Article in Special Issue
On the Use of Information Theory to Quantify Parameter Uncertainty in Groundwater Modeling
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Probabilistic Forecasts: Scoring Rules and Their Decomposition and Diagrammatic Representation via Bregman Divergences

by
Gareth Hughes
*,† and
Cairistiona F.E. Topp
Crop and Soil Systems, SRUC, West Mains Road, Edinburgh, EH9-3JG, UK
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Entropy 2015, 17(8), 5450-5471; https://doi.org/10.3390/e17085450
Submission received: 18 May 2015 / Revised: 27 July 2015 / Accepted: 28 July 2015 / Published: 31 July 2015
(This article belongs to the Special Issue Applications of Information Theory in the Geosciences)

Abstract

:
A scoring rule is a device for evaluation of forecasts that are given in terms of the probability of an event. In this article we will restrict our attention to binary forecasts. We may think of a scoring rule as a penalty attached to a forecast after the event has been observed. Thus a relatively small penalty will accrue if a high probability forecast that an event will occur is followed by occurrence of the event. On the other hand, a relatively large penalty will accrue if this forecast is followed by non-occurrence of the event. Meteorologists have been foremost in developing scoring rules for the evaluation of probabilistic forecasts. Here we use a published meteorological data set to illustrate diagrammatically the Brier score and the divergence score, and their statistical decompositions, as examples of Bregman divergences. In writing this article, we have in mind environmental scientists and modellers for whom meteorological factors are important drivers of biological, physical and chemical processes of interest. In this context, we briefly draw attention to the potential for probabilistic forecasting of the within-season component of nitrous oxide emissions from agricultural soils.

1. Introduction

A probabilistic forecast provides a forecast probability p that an event will subsequently occur. Probabilistic forecasts are used extensively in meteorology, so it is there that we will look for example scenarios and data. Now, qualitatively, a forecast of “rain tomorrow” with probability p = 0.7 means that on the basis of the forecast scheme, rain is rather more likely than not. Of course, we require definitions of “rain” and “tomorrow” in order to be able to properly interpret the forecast, but let us assume these are available. Then, given these definitions, we are able, subsequent to the forecast, to make an observation of whether or not there was rainfall in sufficient quantity to be designated “rain” during the hours designated “tomorrow”. If we view the event as binary, the outcome is either true (it rained) or false (it did not rain). Suppose it rained. From the point of view of forecast evaluation, it would be natural to give a better rating to a preceding forecast—as above—that rain was rather more likely than not (p = 0.7), than one that rain was less likely (i.e., a smaller p). Quantitative methods for the calculation of such ratings in the context of forecast evaluation are called scoring rules [1]. This article discusses scoring rules for probabilistic forecasts. We will restrict our attention to the evaluation of forecasts for events with binary outcomes. Note that meteorologists often refer to forecast evaluation as forecast verification (e.g., [2]).
It is convenient to think of a scoring rule as a means of attaching a penalty score to a forecast; the better the forecast, the smaller the penalty (e.g., [3]). Returning to the example of a forecast of rain tomorrow with probability p = 0.7, the Brier score [4] is (1 − p)2 = 0.09 if rain is subsequently observed and (0 − p)2 = 0.49 if not. The logarithmic score (an early discussion is given in [5]) is ln ( p ) = 0.36 if rain is subsequently observed, and ln ( 1 p ) = 1.20 if not (we will use natural logarithms throughout). In practice, meteorologists are usually interested in the evaluation of a forecast scheme based on the average score for a data set comprising a sequence of forecasts and the corresponding observations. The Brier score and the logarithmic score apply different penalties; most notably, the logarithmic score attaches larger penalties than does the Brier score to forecasts for which p is close to 0 or 1 when the outcome viewed as unlikely on the basis of the forecast turns out subsequently to be the case. However, both scoring rules are “strictly proper” [6,7].
In the case of binary events, strictly proper scoring rules allow a statistical decomposition of the overall score into terms that further characterize a forecast [8]. Murphy [9] provided a statistical decomposition of the Brier score into three components, which he termed uncertainty, reliability and resolution (see also [10]). Weijs et al. [11,12] provided a further analysis of the logarithmic score, resulting in the divergence score and its statistical decomposition into the equivalent three components. The cited articles discuss uncertainty, reliability and resolution in detail.
Gneiting and Katzfuss [13] provide an analytical overview of probabilistic forecasting. One way of looking at the present article is as a complement to recent analytical innovations in forecast evaluation [11,12]. Using Bregman divergences, we provide a new calculation template for analysis of the Brier score and the divergence score, and new explanatory diagrams. Our objective in so doing is to provide an analysis with a straightforward diagrammatic interpretation as a basis for the evaluation of probabilistic forecasts in environmental applications where meteorological factors are important drivers of biological, physical and chemical processes of interest.
The present article is set out as follows. We introduce an example meteorological data set that is available in the public domain, and review the original analysis based on the Brier score. Following a brief discussion of the use of zero and one as probability forecasts, there is further analysis of both the Brier score and the divergence score for this data set. We then introduce our approach to the Brier score and the divergence score based on Bregman divergences, and provide examples of the calculations of the scores and their statistical decompositions. In a final discussion, we briefly mention the potential application of probabilistic forecasting to modelling of N2O emissions from agricultural soils at the within-season time-scale.

2. Methods

2.1. Data, Terminology, Notation

In the interests of producing an analysis that allows a straightforward diagrammatic representation, we will restrict our attention here to binary outcomes. We discuss the evaluation of probability forecasts using a data set that is in the public domain. The full data set comprises 24-h and 48-h forecasts for probability of daily precipitation in the city of Tampere in south-central Finland, as made by the Finnish Meteorological Institute during 2003; together with the corresponding daily rainfall records [14]. Our analysis here is based on the 24-h rainfall forecasts. The forecasts given in [14] were made for three rainfall categories, but here, as in the original analysis, the two higher-rainfall categories were combined in order to produce a binary forecast: probability of no-rain (≤0.2 mm rainfall) and probability of rain (otherwise). The observations were recorded as mm precipitation but for the purpose of forecast evaluation (again as in the original analysis) the observed rainfall data were combined into the same two categories as the forecasts: observation of no-rain (≤0.2 mm rainfall) and observation of rain (otherwise). After excluding days for which data were missing, the full record comprised N = 346 probability forecasts (denoted pt) and the corresponding observations (ot), t = 1, …, N, with ot = 0 for observation of no-rain and ot = 1 for observation of rain.
The Brier score for an individual forecast is ( o t p t ) 2 and the overall Brier score for a data set comprising a series of forecasts and the corresponding observations is the average of the individual scores: B S = 1 N t = 1 N ( o t p t ) 2 . This is the definition given in the original data analysis, retained for consistency. For the original data analysis the probability forecasts utilized eleven “allowed probability” forecast categories: for k = 1,…,11; pk = 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9 and 1 (pk denotes the forecast probability of rain in category k, thus the forecast probability of no-rain is its complement 1 − pk). The number of observations in each category is denoted nk and the number of observations of rain in each category is denoted ok. The average frequency of rain observations in category k is o ¯ k = ok/nk. Also k n k = N , k o k = O , and the overall average frequency of rain observations is o ¯ = O / N . The components of the decomposition of the Brier score are as follows: reliability, RELBS = 1 N k n k ( o ¯ k p k ) 2 ; resolution, RESBS = 1 N k n k ( o ¯ k o ¯ ) 2 ; uncertainty, UNCBS = o ¯ ( 1 o ¯ ) (which is the Bernoulli variance); and then BS = RELBSRESBS + UNCBS. For the original data set, we calculate the Brier score: BS = 0.1445 (all calculations are shown correct to 4 d.p.). The components of the decomposition of the Brier score are: reliability, RELBS = 0.0254; resolution, RESBS = 0.0602; uncertainty UNCBS = 0.1793. As required, RELBSRESBS + UNCBS = BS and the summary of results provided along with the original data set [14] is thus reproduced.

2.2. Probability Forecasts of Zero and One

In the original data set, the probability forecasts include pk = 0 (for category k = 1) and pk = 1 (for category k = 11); in words, respectively, “it is certain there will be no rain tomorrow” and “it is certain there will be rain tomorrow”. Such forecasts can present problems from the point of view of evaluation. Whereas probability forecasts 0 < pk < 1 explicitly leave open the chance that an erroneous forecast may be made, probability forecasts pk = 0 and pk = 1 do not. The question that then arises is how to evaluate a forecast that was made with certainty but then proves to have been erroneous. This is not a hypothetical issue, as can be seen in the original data set. For category k = 1 (pk = 0), we note that 1 out of the 46 forecasts made with certainty was erroneous, while for category k = 11 (pk = 1), we note that 2 out of 13 forecasts made with certainty were erroneous [14]. If such an outcome were to occur when the logarithmic (or divergence) score was in use, an indefinitely large penalty score would apply. In routine practice our preference is to avoid the use of probability forecasts pk = 0 and pk = 1 (as a rule of thumb: only use a probability forecast of zero or one when there is absolute certainty of the outcome). There is a price to be paid for taking this point of view, which we discuss later. Notwithstanding, for further analysis in the present article, we will replace the probability forecast for category k = 1 by pk = 0.05 (instead of zero) and the probability forecast for category k = 11 by pk = 0.95 (instead of one) (the observations remain unchanged). A summary of the data set incorporating this adjustment (to be used exclusively from this point on) is given in Table 1.
Table 1. Summary of the data set. a
Table 1. Summary of the data set. a
k1234567891011
pk0.050.10.20.30.40.50.60.70.80.90.95
ok11554861616811
nk4655594119222234241113
a Notation: k, forecast category index; pk, probability forecast (rain) (probability of no-rain is the complement); ok, number of rain observations; nk, number of observations.

2.3. The Brier Score and its Decomposition

For the adjusted data set (i.e., with probability forecasts pk = 0.05, 0.95 instead of 0, 1 for categories k = 1, 11 respectively) we recalculate the Brier score: BS = 0.1440. Then we recalculate the components of the decomposition of the Brier score as follows: reliability, RELBS = 0.0249; resolution, RESBS = 0.0602; uncertainty, UNCBS = 0.1793. As before, RELBSRESBS + UNCBS = BS (for full details see Appendix, Table 2).

2.4. The Divergence Score and its Decomposition

Weijs et al. [11,12] provide informative background on the provenance of the divergence score, and a detailed analysis of its derivation. We refer interested readers this work, and present here only enough details to illustrate a template calculation of the score and its reliability-resolution-uncertainty decomposition. The divergence score is based on the Kullback-Leibler divergence, a kind of measure of distance between two probability distributions [15,16]. For binary forecasts and the corresponding observations, all the distributions required for calculating the divergence score and its decomposition are Bernoulli, so we can write:
D K L ( x c x r ) = x c ln [ x c x r ] + ( 1 x c ) ln [ 1 x c 1 x r ]
where variable x is a place-holder and, in our analysis, represents particular comparison and reference values (here, xc and xr, respectively) that will be replaced by a probability or a frequency, ranged between zero and one. The distribution (xc, 1 − xc) is referred to as the comparison distribution, and the distribution (xr, 1 − xr) is referred to as the reference distribution. Note that D K L ( x c x r ) 0 and that the divergence is not necessarily symmetric with respect to the arguments. For the purpose of numerical calculation, recall that lim x 0 [ x ln ( x ) ] = 0 ; then we take 0 ln ( 0 ) = 0 .
The divergence score for an individual forecast is the Kullback-Leibler divergence between the observation (comparison) distribution and the forecast (reference) distribution: D K L ( o t p t ) = o t ln [ o t p t ] + ( 1 o t ) ln [ 1 o t 1 p t ] . For the adjusted data set we can now calculate the overall divergence score as the average of the individual scores: D S = 1 N t = 1 N D K L ( o t p t ) = 0.4471. The components of the decomposition of the divergence score are calculated as follows: reliability, RELDS = 1 N k n k D K L ( o ¯ k p k ) = 0.0712; resolution, RESDS = 1 N k n k D K L ( o ¯ k o ¯ ) = 0.1683; uncertainty (which in this case is characterized by the binary Shannon entropy [17]), UNCDS = u ( o ¯ ) = [ o ¯ ln ( o ¯ ) + ( 1 o ¯ ) ln ( 1 o ¯ ) ] = 0.5442. Then we have (for full details see Appendix, Table 2):
R E L D S R E S D S + U N C D S = D S

3. Forecast Evaluation via Bregman Divergences

Here we discuss forecast evaluation for the example data set via the Brier score and the divergence score, but using a different route through the calculations. Using Bregman divergences [18,19], our calculations lead to identical numerical results to those outlined above, in terms of the scores and their decompositions. What we gain by the analysis presented here is a set of diagrams which usefully complement those used by Weijs et al. [11,12] to illustrate the statistical decomposition both of the Brier score and the divergence score. This is possible because of the availability of a simple diagrammatic format for the illustration of Bregman divergences (e.g., [19,20]). So, by expressing reliability, resolution and score as Bregman divergences, we are able to illustrate these quantities directly as distances on graphical plots. In addition, this approach enables us to write down the Brier score and the divergence score and their corresponding decompositions in a common format, thus clearly demonstrating their analytical equivalence.
Bregman divergences are properties of convex functions. In particular, the squared Euclidean distance (on which the Brier score is based) is the Bregman divergence associated with f(x) = x2 and the Kullback-Leibler divergence (on which the divergence score is based) is the Bregman divergence associated with f(x) = x∙ln(x) + (1 − x)∙ln(1 − x) (the negative of the binary Shannon entropy function).
Generically, a tangent to the curve f ( x ) is drawn at xr (the reference value). The Bregman divergence between the tangent and the curve at xc (the comparison value) is then, for scalar arguments:
D B ( x c x r ) = f ( x c ) f ( x r ) ( x c x r ) f ( x r )
in which f ( x r ) is the slope of the tangent at xr. Recall that 0 ≤ xc ≤ 1, 0 ≤ xr ≤ 1; and note that D B ( x c x r ) 0 and that the divergence is not necessarily symmetric with respect to the arguments. Where necessary for calculation purposes, we take 0 ln ( 0 ) = 0 as previously.

3.1. Scoring Rules as Bregman Divergences

3.1.1. Brier Score and Divergence Score Diagrams for Individual Forecast Categories

Figure 1 shows examples of scoring rules as Bregman divergences in diagrammatic form, for pk = 0.4 and an observation o { 0 , 1 } (see Appendix, Table 3 and Table 4, category k = 5, for details of calculations based on Equation (3)). For individual forecasts, smaller divergences (scores) are better, and from Figure 1A (Brier score) we can see that for reference value pk = 0.4 the score for comparison value o = 0 (DB = 0.16, Table 3A, Appendix) is smaller than the score for comparison value o = 1 (DB = 0.36, Table 3B, Appendix). From Figure 1B (divergence score) we can see that for reference value pk = 0.4 the score for comparison value o = 0 (DB = 0.5108, Table 4A, Appendix) is smaller than the score for comparison value o = 1 (DB = 0.9163, Table 4B, Appendix). In each case this is as we require, because the forecast probability pk = 0.4 is closer to o = 0 than to o = 1. That is, a forecast of pk = 0.4 gets a better evaluation score if o = 0 is subsequently observed than if o = 1 is subsequently observed.
To calculate directly as Kullback-Leibler divergences the divergence scores for individual forecast categories as illustrated in Figure 1B, we have:
  • for o = 0, D K L ( 0 p k ) = 0 ln ( 0 0.4 ) + 1 ln ( 1 0 1 0.4 ) = 0.5108;
  • for o = 1, D K L ( 1 p k ) = 1 ln ( 1 0.4 ) + 0 ln ( 1 1 1 0.4 ) = 0.9163.

3.1.2. Overall Scores

For the Brier score, the Bregman divergence for each individual forecast category (as calculated via Equation (3)) is the squared Euclidean distance between o (the comparison value, where the divergence is calculated) and pk (the reference value, where the tangent is drawn) (Appendix, Table 3). For the divergence score, the Bregman divergence for each individual forecast category (as calculated via Equation (3)) is the Kullback-Leibler divergence between o (the comparison value, where the divergence is calculated) and pk (the reference value, where the tangent is drawn) (Appendix, Table 4). In each case, the overall score for a forecast-observation data set is calculated as a weighted average of the individual Bregman divergences. For the Brier score, we have B S = 1 N k n k D B ( o p k ) = 49.9375/346 = 0.1440; for the divergence score we have D S = 1 N k n k D B ( o p k ) = 154.6859/346 = 0.4471 (for full details see Appendix, Table 3 and Table 4).
Figure 1. Scoring rules as Bregman divergences. The long-dashed curve is a convex function of p, the solid line is a tangent to the convex function at the reference value of p (pk) indicated by a short-dashed line between the curve and the horizontal axis. The short-dashed lines between the curve and the tangent indicate the Bregman divergence at the comparison values of o (these lines coincide with sections of the vertical axes of the graphs, at comparison values o = 0 and o = 1). (A) Brier score (for calculations see Appendix, Table 3, k = 5). For this example, a tangent to the convex function f(p) = p2 is drawn at probability forecast of rain pk = 0.4. The score for this forecast depends on the subsequent observation. If no-rain is observed, the score is the Bregman divergence at o = 0, which is 0.16. If rain is observed, the score is the Bregman divergence at o = 1, which is 0.36. Bregman divergences for other forecast-observation combinations are given in the Appendix, Table 3. The overall score for a forecast-observation data set is calculated as a weighted average of the individual Bregman divergences; (B) Divergence score (for calculations see Appendix, Table 4, k = 5). For this example, a tangent to the convex function f(p) = p∙ln(p) + (1 − p)∙ln(1 − p) is drawn at probability forecast of rain pk = 0.4. The score for this forecast depends on the subsequent observation. If no-rain is observed, the score is the Bregman divergence at o = 0, which is 0.5108. If rain is observed, the score is the Bregman divergence at o = 1, which is 0.9163. Bregman divergences for other forecast-observation combinations are given in the Appendix, Table 4. The overall score for a forecast-observation data set is calculated as a weighted average of the individual Bregman divergences.
Figure 1. Scoring rules as Bregman divergences. The long-dashed curve is a convex function of p, the solid line is a tangent to the convex function at the reference value of p (pk) indicated by a short-dashed line between the curve and the horizontal axis. The short-dashed lines between the curve and the tangent indicate the Bregman divergence at the comparison values of o (these lines coincide with sections of the vertical axes of the graphs, at comparison values o = 0 and o = 1). (A) Brier score (for calculations see Appendix, Table 3, k = 5). For this example, a tangent to the convex function f(p) = p2 is drawn at probability forecast of rain pk = 0.4. The score for this forecast depends on the subsequent observation. If no-rain is observed, the score is the Bregman divergence at o = 0, which is 0.16. If rain is observed, the score is the Bregman divergence at o = 1, which is 0.36. Bregman divergences for other forecast-observation combinations are given in the Appendix, Table 3. The overall score for a forecast-observation data set is calculated as a weighted average of the individual Bregman divergences; (B) Divergence score (for calculations see Appendix, Table 4, k = 5). For this example, a tangent to the convex function f(p) = p∙ln(p) + (1 − p)∙ln(1 − p) is drawn at probability forecast of rain pk = 0.4. The score for this forecast depends on the subsequent observation. If no-rain is observed, the score is the Bregman divergence at o = 0, which is 0.5108. If rain is observed, the score is the Bregman divergence at o = 1, which is 0.9163. Bregman divergences for other forecast-observation combinations are given in the Appendix, Table 4. The overall score for a forecast-observation data set is calculated as a weighted average of the individual Bregman divergences.
Entropy 17 05450 g001

3.2. Reliability

3.2.1. Reliability Diagrams for Individual Forecast Categories

Figure 2 shows examples of reliability components as Bregman divergences in diagrammatic form, for reference value pk = 0.6 and comparison value o ¯ k = 0.2727 (see also Appendix, Table 5, category k = 7, for details of calculations based on Equation (3)). From Figure 2A (for the Brier score reliability component) D B ( o ¯ k p k ) = 0.1071. From Figure 2B (for the divergence score reliability component) D B ( o ¯ k p k ) = 0.2198. The corresponding calculation for this divergence score reliability component directly as a Kullback-Leibler divergence is as follows:
D K L ( o ¯ k p k ) = 0.2727 ln ( 0.2727 0.6 ) + ( 1 0.2727 ) ln ( 1 0.2727 1 0.6 ) = 0 . 2198
Figure 2. Reliability as a Bregman divergence. The long-dashed curve is a convex function of p, the solid line is a tangent to the convex function at the reference value of p (pk) indicated by a short-dashed line between the curve and the horizontal axis. A second short-dashed line, between the curve and the tangent, indicates the Bregman divergence at the comparison value of o (for calculations see Appendix, Table 5). Overall reliability for a forecast-observation data set is calculated as a weighted average of individual Bregman divergences. (A) Brier score reliability. For this example, a tangent to the convex function f(p) = p2 is drawn at probability forecast of rain pk = 0.6. The reliability component depends on the corresponding o ¯ k , the average frequency of rain observations following such forecasts, which is 0.2727 for the example data set. The reliability component is the Bregman divergence at o ¯ k = 0.2727, which is 0.1071; (B) Divergence score reliability. For this example, a tangent to the convex function f(p) = p∙ln(p) + (1 − p)∙ln(1 − p) is drawn at probability forecast of rain pk = 0.6. The reliability component depends on the corresponding o ¯ k which is 0.2727 for the example data set. The reliability component is the Bregman divergence at o ¯ k = 0.2727, which is 0.2198.
Figure 2. Reliability as a Bregman divergence. The long-dashed curve is a convex function of p, the solid line is a tangent to the convex function at the reference value of p (pk) indicated by a short-dashed line between the curve and the horizontal axis. A second short-dashed line, between the curve and the tangent, indicates the Bregman divergence at the comparison value of o (for calculations see Appendix, Table 5). Overall reliability for a forecast-observation data set is calculated as a weighted average of individual Bregman divergences. (A) Brier score reliability. For this example, a tangent to the convex function f(p) = p2 is drawn at probability forecast of rain pk = 0.6. The reliability component depends on the corresponding o ¯ k , the average frequency of rain observations following such forecasts, which is 0.2727 for the example data set. The reliability component is the Bregman divergence at o ¯ k = 0.2727, which is 0.1071; (B) Divergence score reliability. For this example, a tangent to the convex function f(p) = p∙ln(p) + (1 − p)∙ln(1 − p) is drawn at probability forecast of rain pk = 0.6. The reliability component depends on the corresponding o ¯ k which is 0.2727 for the example data set. The reliability component is the Bregman divergence at o ¯ k = 0.2727, which is 0.2198.
Entropy 17 05450 g002

3.2.2. Overall Reliability

For the Brier score reliability, the Bregman divergence for each individual forecast category (as calculated via Equation (3)) is the squared Euclidean distance between o ¯ k (the comparison value, where the divergence is calculated) and pk (the reference value, where the tangent is drawn) (see Appendix, Table 5A). For the divergence score reliability, the Bregman divergence for each individual forecast category (as calculated via Equation (3)) is the Kullback-Leibler divergence between o ¯ k and pk (see Appendix, Table 5B). In each case, the overall reliability score for a forecast-observation data set is calculated as a weighted average of the individual Bregman divergences. For the Brier score, we have R E L B S = 1 N k n k D B ( o ¯ k p k ) = 8.6204/346 = 0.0249; for the divergence score, we have R E L D S = 1 N k n k D B ( o ¯ k p k ) = 24.6440/346 = 0.0712 (for full details see Appendix, Table 5).

3.2.3. Interpreting Reliability

First, recall that reliability is defined so that smaller is better: perfect reliability corresponds to an overall reliability score equal to zero. From the formulation of the Bregman divergence D B ( o ¯ k p k ) , we can see that this occurs when o ¯ k = p k for all k categories (see Appendix, Table 5). In fact, since D B ( o ¯ k p k ) 0 , we require o ¯ k = p k for all k categories for an overall reliability score equal to zero. What this tells us is that for perfect reliablity of our probability forecast, the average frequency of rain observations in each category must be equal to the probability forecast for that category. In practice, we typically accept (small) deviations of o ¯ k from pk that contribute a small D B ( o ¯ k p k ) to the overall calculation of RELBS or RELDS.

3.3. Resolution

3.3.1. Resolution Diagrams for Individual Forecast Categories

Figure 3 shows examples of resolution components as Bregman divergences (as calculated via Equation (3)) in diagrammatic form, for reference value o ¯ = 0.2341 and comparison value o ¯ k = 0.6667 (see Appendix, Table 6, category k = 9). From Figure 3A (for the Brier score resolution component) D B ( o ¯ k o ¯ ) = 0.1871. From Figure 3B (for the divergence score resolution component) D B ( o ¯ k o ¯ ) = 0.4204. The corresponding calculation for this divergence score resolution component directly as a Kullback-Leibler divergence is as follows:
D K L ( o ¯ k o ¯ ) = 0.6667 ln ( 0.6667 0.2341 ) + ( 1 0.6667 ) ln ( 1 0.6667 1 0.2341 ) = 0 . 4204 .

3.3.2. Overall Resolution

For the Brier score resolution, each individual Bregman divergence (as calculated via Equation (3)) is the squared Euclidean distance between o ¯ k (the comparison value, where the divergence is calculated) and o ¯ (the reference value, where the tangent is drawn) (see Appendix, Table 6A). For the divergence score resolution, each individual Bregman divergence (as calculated via Equation (3)) is the Kullback-Leibler divergence between o ¯ and o ¯ k (see Appendix, Table 6B). In each case, the overall resolution score for a forecast-observation data set is calculated as a weighted average of the individual Bregman divergences. For the Brier score, we have R E S B S = 1 N k n k D B ( o ¯ k o ¯ ) = 20.8205/346 = 0.0602; for the divergence score we have, R E S D S = 1 N k n k D B ( o ¯ k o ¯ ) = 58.2471/346 = 0.1683 (for full details see Appendix, Table 6).
Figure 3. Resolution as a Bregman divergence. The long-dashed curve is a convex function of o, the solid line is a tangent to the convex function at the reference value of o ( o ¯ ) indicated by a short-dashed line between the curve and the horizontal axis. A second short-dashed line, between the curve and the tangent, indicates the Bregman divergence at the comparison value of o (for calculations see Appendix, Table 6). Overall resolution based on a forecast-observation data set is calculated as a weighted average of the individual Bregman divergences. (A) Brier score resolution. For this example, a tangent to the convex function f(o) = o2 is drawn at the overall average frequency of rain observations, o ¯ = 0.2341. The components of resolution are calculated for each particular o ¯ k , the average frequency of rain observations in each category. For k = 9, o ¯ k = 0.6667 for the example data set. The corresponding resolution component is the Bregman divergence at o ¯ k = 0.6667, which is 0.1871; (B) Divergence score resolution. For this example, a tangent to the convex function f(o) = o∙ln(o) + (1 − o)∙ln(1 − o) is drawn at the overall average frequency of rain observations, o ¯ = 0.2341. The components of resolution are calculated for each particular o ¯ k , the average frequency of rain observations in each category. For k = 9, o ¯ k = 0.6667 for the example data set. The corresponding resolution component is the Bregman divergence at o ¯ k = 0.6667, which is 0.4204.
Figure 3. Resolution as a Bregman divergence. The long-dashed curve is a convex function of o, the solid line is a tangent to the convex function at the reference value of o ( o ¯ ) indicated by a short-dashed line between the curve and the horizontal axis. A second short-dashed line, between the curve and the tangent, indicates the Bregman divergence at the comparison value of o (for calculations see Appendix, Table 6). Overall resolution based on a forecast-observation data set is calculated as a weighted average of the individual Bregman divergences. (A) Brier score resolution. For this example, a tangent to the convex function f(o) = o2 is drawn at the overall average frequency of rain observations, o ¯ = 0.2341. The components of resolution are calculated for each particular o ¯ k , the average frequency of rain observations in each category. For k = 9, o ¯ k = 0.6667 for the example data set. The corresponding resolution component is the Bregman divergence at o ¯ k = 0.6667, which is 0.1871; (B) Divergence score resolution. For this example, a tangent to the convex function f(o) = o∙ln(o) + (1 − o)∙ln(1 − o) is drawn at the overall average frequency of rain observations, o ¯ = 0.2341. The components of resolution are calculated for each particular o ¯ k , the average frequency of rain observations in each category. For k = 9, o ¯ k = 0.6667 for the example data set. The corresponding resolution component is the Bregman divergence at o ¯ k = 0.6667, which is 0.4204.
Entropy 17 05450 g003

3.3.3. Interpreting Resolution

Recall that resolution is defined so that larger is better. If forecasts and observations were independent (which is least desirable), resolution would be equal to zero; if forecasts were perfect (which is most desirable), resolution would be equal to uncertainty. Note that the conditions under which resolution is equal to uncertainty also fulfil the conditions for perfect reliability, equal to zero (as above, in the context of interpreting reliability).
Resolution depends on our ability to define forecast categories for which the observed frequencies o ¯ k are different from the overall average frequency o ¯ , such that the average for a forecast category provides a better prediction of the eventual outcome than the average over all forecast categories. For both the Brier score and the divergence score, if any o ¯ k is equal to o ¯ , then the corresponding resolution component is equal to zero. If o ¯ k = o ¯ for all k, then overall resolution is equal to zero.
Consider first the scenario in which–as in the initial analysis of the original data set–probability forecasts of pk = 0 and pk = 1 are allowed. Further, let us suppose that all 265 observations of no-rain followed forecasts of pk = 0 (in which case o ¯ k = 0 ) and all 81 observations of rain followed forecasts of pk = 1 (so o ¯ k = 1 ). Recall o ¯ = 0.2341 . If we calculate resolution based on squared Euclidean distance, we have RESBS = 1 N [ 265 ( 0 o ¯ ) 2 + 81 ( 1 o ¯ ) 2 ] = 62.0366/346 = 0.1793 = UNCBS. Alternatively, if we calculate resolution based on the Kullback-Leibler divergence, we have RESDS = 1 N [ 265 D K L ( 0 o ¯ ) + 81 D K L ( 1 o ¯ ) ] = 188.2875/346 = 0.5442 = UNCDS. That is to say, if we were to allow probability forecast categories pk = 0 and pk = 1, then use them exclusively in making forecasts and do so without error, resolution would be equal to uncertainty (i.e., RESBS = UNCBS and RESDS = UNCDS).
Now consider instead the scenario in which–as in our analysis of the adjusted data set–the most extreme allowed probabilities are pk = 0.05 and pk = 0.95. Now, the best resolution we can achieve is if all 265 observations of no-rain followed forecasts of pk = 0.05 (in which case o ¯ k = 0.05 ) and all 81 observations of rain followed forecasts of pk = 0.95 (so o ¯ k = 0.95 ). If we calculate resolution based on squared Euclidean distance, we have RESBS = 1 N [ 265 ( 0.05 o ¯ ) 2 + 81 ( 0.95 o ¯ ) 2 ] = 50.4960/346 = 0.1459. Alternatively, if we calculate resolution based on the Kullback-Leibler divergence, we have RESDS = 1 N [ 265 D K L ( 0.05 o ¯ ) + 81 D K L ( 0.95 o ¯ ) ] = 130.5177/346 = 0.3772. Thus, the price we pay for restricting the extreme allowed probabilities to pk = 0.05 and pk = 0.95 is to reduce the achievable upper limit of resolution.
In the present example the notional upper limit is reduced to about 80% of uncertainty for calculations based on squared Euclidean distance, and about 70% of uncertainty for calculations based on Kullback-Leibler divergence. The difference arises because of the larger penalty score that accrues with extreme discrepancies between forecast and observation for the divergence score compared with the Brier score (as mentioned in the Introduction).
We note in passing that overall resolution, as formulated, may be characterized as a Jensen gap [21] for a convex function. Banerjee et al. [22] refer to this as the Bregman information. Thus generically we have f ¯ ( x ) f ( x ¯ ) 0 , and in particular here, [ 1 N k n k f ( o ¯ k ) ] f ( o ¯ ) = R E S . Then, with f(x) = x2 (for the Brier score) we have R E S = 1 N k n k ( o ¯ k o ¯ ) 2 , the sample variance (e.g., [3]). With f(x) = x∙ln(x) + (1–x)∙ln(1–x) (for the divergence score) we have R E S = 1 N k n k D K L ( o ¯ k o ¯ ) , the expected mutual information (see also [11,12]).

3.4. Uncertainty

We select an uncertainty function appropriate for the analysis, depending on the chosen convex function and its associated Bregman divergence. For the Brier score, uncertainty is calculated as the value of the uncertainty function (the Bernoulli variance) at o ¯ : UNCBS = u ( o ¯ ) = o ¯ ( 1 o ¯ ) = 0.1793 (Figure 4A). For the divergence score, uncertainty is calculated as the value of the uncertainty function (the binary Shannon entropy) at o ¯ : UNCDS = u ( o ¯ ) = [ o ¯ ln ( o ¯ ) + ( 1 o ¯ ) ln ( 1 o ¯ ) ] = 0.5442 (Figure 4B). We interpret uncertainty as a quantification of our state of knowledge in the absence of a forecast, so based only on the data set from which overall average frequency of rain observations o ¯ is calculated.
Figure 4. Uncertainty functions. The long-dashed curves are uncertainty functions, u(o); the short dashed lines indicate o ¯ (= 0.2341 for the example data set) and the corresponding value of u ( o ¯ ) . (A) The Bernoulli variance u(o) = o∙(1 − o). For the example data set, u ( o ¯ ) = 0.1793; (B) The Shannon entropy u(o) = −(o∙ln(o) + (1 − o)∙ln(1 − o)). For the example data set, u ( o ¯ ) = 0.5442.
Figure 4. Uncertainty functions. The long-dashed curves are uncertainty functions, u(o); the short dashed lines indicate o ¯ (= 0.2341 for the example data set) and the corresponding value of u ( o ¯ ) . (A) The Bernoulli variance u(o) = o∙(1 − o). For the example data set, u ( o ¯ ) = 0.1793; (B) The Shannon entropy u(o) = −(o∙ln(o) + (1 − o)∙ln(1 − o)). For the example data set, u ( o ¯ ) = 0.5442.
Entropy 17 05450 g004

3.5. Overview

Theil [23] used a logarithmic scoring rule to describe the inaccuracy of predictions, but also found it convenient to write prediction errors directly in terms of the difference between the observed and forecast probabilities. This was achieved by use of a Taylor series expansion to write a logarithmic scoring rule in terms of a quadratic approximation. More recently, Benedetti [24] has attributed the lasting application of the Brier score in forecast evaluation to its being an approximation of the logarithmic score; however, an analysis leading to the Brier score as an approximation of the logarithmic score does not reveal a hierarchy in which the latter is in some way more fundamental than the former (cf. [25]).
For an individual probability forecast, with pk an allowed probability and o { 0 , 1 } the corresponding observation, we can calculate the scoring rule:
D B ( o p k ) = f ( o ) f ( p k ) ( o p k ) f ( p k )
(see Figure 1). Equation (4) calculates either the Brier score or the divergence score, depending on our choice of convex function on which to base the Bregman divergence. For a data set comprising a number of forecasts and corresponding observations, we calculate the overall score as 1 N k n k D B ( o p k ) for either the Brier score or the divergence score. On this basis, neither scoring rule is inherently superior to the other. However, it is possible to establish further criteria against which the properties of such scoring rules may be judged [24].
The statistical decomposition of the scoring rule in Equation (4) also has a common format:
R E L k = D B ( o ¯ k p k ) = f ( o ¯ k ) f ( p k ) ( o ¯ k p k ) f ( p k ) R E S k = D B ( o ¯ k o ¯ ) = f ( o ¯ k ) f ( o ¯ ) ( o ¯ k o ¯ ) f ( o ¯ ) U N C = u ( o ¯ ) }
(see Figure 2 and Figure 3, respectively, for example illustrations of components of REL and RES; and Figure 4 for an illustration of UNC, which does not vary with k). Again, it is only the choice of convex function (and corresponding choice of an appropriate uncertainty function) that distinguishes the calculation of the components of the Brier score from those of the divergence score. For a data set comprising a number of forecasts and the corresponding observations, we calculate the overall reliability and overall resolution scores, respectively, as 1 N k n k D B ( o ¯ k p k ) and 1 N k n k D B ( o ¯ k o ¯ ) .
We can compare the information-theoretic analysis of a boundary-line model by Topp et al. [26] with the present analysis. When, as in [26], forecast probabilities are based on retrospectively-calculated relative frequencies, reliability is equal to zero (i.e., perfect reliability), uncertainty is equal to the Shannon entropy, and resolution is equal to the expected mutual information. In such a retrospective analysis, a normalized version of expected mutual information may be calculated as a measure of the proportion of uncertainty in the observations that is explained by the forecasts.

4. Discussion

Figure 5 shows a diagrammatic summary of the overall divergence score and its components (see also Equation (2)), based on calculations using the example data set. Here, uncertainty (UNC) is characterized by the binary Shannon entropy at the overall average frequency of rain observations, o ¯ = 0.2341 . In this context, we can think of entropy as a measure of the extent of our uncertainty before use of the forecaster. A useful intuitive interpretation of reliability (REL) can be gained from the data summary set out in Table 1. There, the probabilities pk represent the allowed probability forecasts for rain. For a perfectly reliable forecaster, the observed frequencies of rain events, o ¯ k / n k , will be equal to pk in each category k; then REL = 0. Resolution (RES) is a measure of the extent to which the forecaster accounts for uncertainty (but not reliability), i.e., RESUNC. As mentioned above, in the case of the divergence score, resolution is characterized by expected mutual information. Then, the divergence score (DS) characterizes the uncertainty not accounted for by the forecaster (UNCRES) together with the reliability (REL), so that DS = UNCRES + REL.
Figure 5. The overall divergence score and its components. The overall divergence score is denoted DS, with components uncertainty (UNC), reliability (REL) and resolution (RES), such that DS = UNCRES + REL, with RESUNC as indicated by the vertical dashed line.
Figure 5. The overall divergence score and its components. The overall divergence score is denoted DS, with components uncertainty (UNC), reliability (REL) and resolution (RES), such that DS = UNCRES + REL, with RESUNC as indicated by the vertical dashed line.
Entropy 17 05450 g005
The evaluation of probabilistic weather forecasts is primarily of interest to meteorologists, of course; but the methodology for evaluation of probabilistic forecasts is also applicable more widely in those situations where weather factors are identified as drivers of processes contributing to risk. Weather factors are important drivers of N2O emissions from agricultural soils, but studies of management interventions aimed at greenhouse gas mitigation have mainly been concerned with emissions inventory, and mitigation options tend to be assessed on an integrated seasonal time-scale [27,28]. An interesting example of the potential for a probabilistic approach to describing short-term N2O flux dynamics was offered in discussion of a modelling study by Hawkins et al. [29], as follows: “The model depicts a realistic positive emissions response to soil moisture at the mean values of the other factors. This reflects the general understanding that N efficiency, in terms of lower N2O emission, may be promoted by drier conditions. The WETTEST and DRIEST scenarios were simulated to investigate the magnitude of this efficiency difference. Although these scenarios are hypothetical because in practice the wettest or driest day in a week in terms of soil moisture is not known until the end of the week, they are analogous to spreading fertiliser before or after a rainfall event.” We note here that although the wettest and driest day in a week in terms of soil moisture may only be known retrospectively, weather forecasts provide (probabilistic) advance warning of rainfall events.
Rees et al. [28] highlight the importance of reducing the supply of nitrogen in the context of greenhouse gas mitigation, so that management interventions with potential to increase nitrogen-use efficiency are of interest. Increasing nitrogen-use efficiency ought to represent a contribution to measures that, in relation to mitigation, reduce both greenhouse gas emissions and farm costs, constituting a “win-win” scenario [30]. The goal therefore is practical implementation of meteorological information, in the form of forecasts that could be incorporated into decision making for within-season environmental management interventions. This depends first on our ability to show that such forecasts have the required levels of reliability and resolution, using appropriate evaluation methodology as outlined here.

Acknowledgments

SRUC receives grant-in-aid from the Scottish Government.

Author Contributions

Both authors have contributed equally to this manuscript. Both authors have read and approved the final manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix

The Appendix contains the tables of results referred to in the text.
Table 2. Decomposition of the Brier score and the divergence score.a
Table 2. Decomposition of the Brier score and the divergence score.a
kpknkok o ¯ k nk/NRELBS,kRESBS,kRELDS,kRESDS,k
10.054610.02170.13290.03672.07450.48628.6362
20.15510.01820.15900.36822.56422.993910.8561
30.25950.08470.17050.78371.31622.97464.5399
40.34150.12200.11851.29980.51573.65761.6589
50.41940.21050.05490.68210.01061.54910.0302
60.52280.36360.06360.40910.36910.82860.9292
70.62260.27270.06362.35640.03284.83460.0883
80.734160.47060.09831.78941.90143.87024.5244
90.824160.66670.06940.42674.49071.169510.0892
100.91180.72730.03180.32822.67541.30525.9706
110.9513110.84620.03760.14024.86990.974510.9241
Column sumsb34681 1.00008.620420.820524.643958.2471
a Notation: k, forecast category index; pk, probability forecast (rain) (probability forecast of no-rain is the complement); nk, number of observations; ok, number of rain observations; o ¯ k , average frequency of rain observations = ok/nk ; nk/N, normalized frequency of observations; RELBS,k (components of RELBS) = n k ( p k o ¯ k ) 2 ; RESBS,k (components of RESBS) = n k ( o ¯ k o ¯ ) 2 ; RELDS,k (components of RELDS) = n k D K L ( o ¯ k p k ) ; RESDS,k (components of RESDS) = n k D K L ( o ¯ k o ¯ ) ; with o ¯ = O / N = 0.2341 (footnote b);b Column sums: k n k = N = 346 ; k o k = O = 81 ; k n k / N = 1 ; k n k ( p k o ¯ k ) 2 = 8.6204; k n k ( o ¯ k o ¯ ) 2 = 20.8205; k n k D K L ( o ¯ k p k ) = 24.6439; k n k D K L ( o ¯ k o ¯ ) = 58.2471.
Table 3. Brier score calculation via Bregman divergence.a
A. Observation = no-rain (o = 0)
A. Observation = no-rain (o = 0)
kpkonk f ( p k ) f(o)f(pk) ( o p k ) · f ( p k ) DB(0||pk)
10.050450.100.0025−0.00500.0025
20.10540.200.0100−0.02000.0100
30.20540.400.0400−0.08000.0400
40.30360.600.0900−0.18000.0900
5b0.40150.800.1600−0.32000.1600
60.50141.000.2500−0.50000.2500
70.60161.200.3600−0.72000.3600
80.70181.400.4900−0.98000.4900
90.8081.600.6400−1.28000.6400
100.9031.800.8100−1.62000.8100
110.95021.900.9025−1.80500.9025
B. Observation = rain (o = 1)
B. Observation = rain (o = 1)
kpkonk f ( p k ) f(o)f(pk) ( o p k ) · f ( p k ) DB(1||pk)
10.05110.110.00250.09500.9025
20.1110.210.01000.18000.8100
30.2150.410.04000.32000.4400
40.3150.610.09000.42000.4900
5b0.4140.810.16000.48000.3600
60.5181.010.25000.50000.2500
70.6161.210.36000.48000.1600
80.71161.410.49000.42000.0900
90.81161.610.64000.32000.0400
100.9181.810.81000.18000.0100
110.951111.910.90250.09500.0025
a Notation: k, forecast category index; pk, probability forecast for rain (reference value, at which the tangent is calculated), probability forecast for no-rain is the complement; o, comparison value, at which the divergence is calculated; nk, number of observations (total no-rain observations = 265, total rain observations = 81); f ( p k ) , slope of the tangent to f(p) at pk; f ( o ) f ( p k ) ( o p k ) f ( p k ) = DB(0||pk) (no-rain, o = 0), or f ( o ) f ( p k ) ( o p k ) f ( p k ) = DB(1||pk) (rain, o = 1);b See Figure 1A.
Table 4. Divergence score calculation via Bregman divergence.a
A. Observation = no-rain (o = 0)
A. Observation = no-rain (o = 0)
kpkonk f ( p k ) f(o)f(pk) ( o p k ) · f ( p k ) DB(0||pk)
10.05045−2.94440−0.19850.14720.0513
20.1054−2.19720−0.32510.21970.1054
30.2054−1.38630−0.50040.27730.2231
40.3036−0.84730−0.61090.25420.3567
5b0.4015−0.40550−0.67300.16220.5108
60.50140.00000−0.69310.00000.6931
70.60160.40550−0.6730−0.24330.9163
80.70180.84730−0.6109−0.59311.2040
90.8081.38630−0.5004−1.10901.6094
100.9032.19720−0.3251−1.97752.3026
110.95022.94440−0.1985−2.79722.9957
B. Observation = rain (o = 1)
B. Observation = rain (o = 1)
kpkonk f ( p k ) f(o)f(pk) ( o p k ) · f ( p k ) DB(1||pk)
10.0511−2.94440−0.1985−2.79722.9957
20.111−2.19720−0.3251−1.97752.3026
30.215−1.38630−0.5004−1.10901.6094
40.315−0.84730−0.6109−0.59311.2040
5b0.414−0.40550−0.6730−0.24330.9163
60.5180.00000−0.69310.00000.6931
70.6160.40550−0.67300.16220.5108
80.71160.84730−0.61090.25420.3567
90.81161.38630−0.50040.27730.2231
100.9182.19720−0.32510.21970.1054
110.951112.94440−0.19850.14720.0513
a Notation: k, forecast category index; pk, probability forecast for rain (reference value, at which the tangent is calculated), probability forecast of no-rain is the complement; o, comparison value, at which the divergence is calculated; nk, number of observations (total no-rain observations = 265, total rain observations = 81); f ( p k ) , slope of the tangent to f(p) at pk; f ( o ) f ( p k ) ( o p k ) f ( p k ) = DB(0||pk) (no-rain, o = 0), or f ( o ) f ( p k ) ( o p k ) f ( p k ) = DB(1||pk) (rain, o = 1);b See Figure 1B.
Table 5. Reliability calculation via Bregman divergence.a
A. Brier score
A. Brier score
kpk o ¯ k nk f ( p k ) f ( o ¯ k ) f(pk) ( o ¯ k p k ) · f ( p k ) D B ( o ¯ k p k )
10.050.0217460.10.00050.0025−0.00280.0008
20.10.0182550.20.00030.0100−0.01640.0067
30.20.0847590.40.00720.0400−0.04610.0133
40.30.1220410.60.01490.0900−0.10680.0317
50.40.2105190.80.04430.1600−0.15160.0359
60.50.3636221.00.13220.2500−0.13640.0186
7b0.60.2727221.20.07440.3600−0.39270.1071
80.70.4706341.40.22150.4900−0.32120.0526
90.80.6667241.60.44440.6400−0.21330.0178
100.90.7273111.80.52890.8100−0.31090.0298
110.950.8462131.90.71600.9025−0.19730.0108
B. Divergence score
B. Divergence score
kpk o ¯ k nk f ( p k ) f ( o ¯ k ) f(pk) ( o ¯ k p k ) · f ( p k ) D B ( o ¯ k p k )
10.050.021746−2.9444−0.1047−0.19850.08320.0106
20.10.018255−2.1972−0.0909−0.32510.17980.0544
30.20.084759−1.3863−0.2902−0.50040.15980.0504
40.30.122041−0.8473−0.3708−0.61090.15090.0892
50.40.210519−0.4055−0.5147−0.67300.07680.0815
60.50.3636220.0000−0.6555−0.69310.00000.0377
7b0.60.2727220.4055−0.5860−0.6730−0.13270.2198
80.70.4706340.8473−0.6914−0.6109−0.19440.1138
90.80.6667241.3863−0.6365−0.5004−0.18480.0487
100.90.7273112.1972−0.5860−0.3251−0.37950.1187
110.950.8462132.9444−0.4293−0.1985−0.30580.0750
a Notation: k, forecast category index; pk, probability forecast for rain (reference value, at which the tangent is calculated), probability forecast for no-rain is the complement; o ¯ k , average frequency of rain observations (comparison value, at which the divergence is calculated); nk, number of observations; f ( p k ) , slope of the tangent to f(p) at pk; f ( o ¯ k ) f(pk) − ( o ¯ k p k ) f ( p k ) = D B ( o ¯ k p k ) ;b See Figure 2.
Table 6. Resolution calculation via Bregman divergence.a
A. Brier score
A. Brier score
k o ¯ o ¯ k nk f ( o ¯ ) f ( o ¯ k ) f ( o ¯ ) ( o ¯ k o ¯ ) · f ( o ¯ ) D B ( o ¯ k o ¯ )
10.23410.0217460.46820.00050.0548−0.09940.0451
20.23410.0182550.46820.00030.0548−0.10110.0466
30.23410.0847590.46820.00720.0548−0.06990.0223
40.23410.1220410.46820.01490.0548−0.05250.0126
50.23410.2105190.46820.04430.0548−0.01100.0006
60.23410.3636220.46820.13220.05480.06060.0168
70.23410.2727220.46820.07440.05480.01810.0015
80.23410.4706340.46820.22150.05480.11070.0559
9b0.23410.6667240.46820.44440.05480.20250.1871
100.23410.7273110.46820.52890.05480.23090.2432
110.23410.8462130.46820.71600.05480.28660.3746
B. Divergence score
B. Divergence score
k o ¯ o ¯ k nk f ( o ¯ ) f ( o ¯ k ) f ( o ¯ ) ( o ¯ k o ¯ ) · f ( o ¯ ) D B ( o ¯ k o ¯ )
10.23410.021746−1.1853−0.1047−0.54420.25170.1877
20.23410.018255−1.1853−0.0909−0.54420.25590.1974
30.23410.084759−1.1853−0.2902−0.54420.17700.0769
40.23410.122041−1.1853−0.3708−0.54420.13290.0405
50.23410.210519−1.1853−0.5147−0.54420.02790.0016
60.23410.363622−1.1853−0.6555−0.5442−0.15350.0422
70.23410.272722−1.1853−0.5860−0.5442−0.04580.0040
80.23410.470634−1.1853−0.6914−0.5442−0.28030.1331
9b0.23410.666724−1.1853−0.6365−0.5442−0.51270.4204
100.23410.727311−1.1853−0.5860−0.5442−0.58450.5428
110.23410.846213−1.1853−0.4293−0.5442−0.72550.8403
a Notation: k, forecast category index; o ¯ , overall average frequency of rain observations (see Table 2) (reference value, at which the tangent is calculated); o ¯ k , average frequency of rain observations (comparison value, at which the divergence is calculated); nk, number of observations; f ( o ¯ ) , slope of the tangent to f(o) at o ¯ ; f ( o ¯ k ) f ( o ¯ ) ( o ¯ k o ¯ ) f ( o ¯ ) = D B ( o ¯ k o ¯ ) ;b See Figure 3.

References

  1. Lindley, D.V. Making Decisions, 2nd ed.; Wiley: Chichester, UK, 1985. [Google Scholar]
  2. Jolliffe, I.T.; Stephenson, D.B. (Eds.) Forecast Verification: A Practitioner’s Guide in Atmospheric Science, 2nd ed.; Wiley: Chichester, UK, 2012.
  3. Broecker, J. Probability forecasts. In Forecast Verification: A Practitioner’s Guide in Atmospheric Science, 2nd ed.; Jolliffe, I.T., Stephenson, D.B., Eds.; Wiley: Chichester, UK, 2012; pp. 119–139. [Google Scholar]
  4. Brier, G.W. Verification of forecasts expressed in terms of probability. Mon. Weather Rev. 1950, 78, 1–3. [Google Scholar] [CrossRef]
  5. Good, I.J. Rational decisions. J. Roy. Stat. Soc. B 1952, 14, 107–114. [Google Scholar]
  6. DeGroot, M.W.; Fienberg, S.E. The comparison and evaluation of forecasters. The Statistician 1983, 32, 12–22. [Google Scholar] [CrossRef]
  7. Bröcker, J.; Smith, L.A. Scoring probabilistic forecasts: the importance of being proper. Weather Forecast. 2007, 22, 382–388. [Google Scholar] [CrossRef]
  8. Bröcker, J. Reliability, sufficiency, and the decomposition of proper scores. Q. J. R. Meteorol. Soc. 2009, 135, 1512–1519. [Google Scholar] [CrossRef]
  9. Murphy, A.H. A new vector partition of the probability score. J. Appl. Meteorol. 1973, 12, 595–600. [Google Scholar] [CrossRef]
  10. Wilks, D.S. Statistical Methods in the Atmospheric Sciences, 3rd ed.; Academic Press: Oxford, UK, 2011. [Google Scholar]
  11. Weijs, S.V.; Schoups, G.; van de Giesen, N. Why hydrological predictions should be evaluated using information theory. Hydrol. Earth Syst. Sci. 2010, 14, 2545–2558. [Google Scholar] [CrossRef]
  12. Weijs, S.V.; van Nooijen, R.; van de Giesen, N. Kullback-Leibler divergence as a forecast skill score with classic reliability-resolution-uncertainty decomposition. Mon. Weather Rev. 2010, 138, 3387–3399. [Google Scholar] [CrossRef]
  13. Gneiting, T.; Katzfuss, M. Probabilistic forecasting. Annu. Rev. Stat. Appl. 2014, 1, 125–151. [Google Scholar] [CrossRef]
  14. Verifying probability of precipitation—an example from Finland. http://www.cawcr.gov.au/projects/verification/POP3/POP3.html (accessed on 18 June 2015).
  15. Kullback, S. Information Theory and Statistics, 2nd ed.; Dover: New York, NY, USA, 1968. [Google Scholar]
  16. Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; Wiley: Hoboken, NJ, USA, 2006. [Google Scholar]
  17. Shannon, C.E.; Weaver, W. The Mathematical Theory of Communication; University of Illinois Press: Urbana, IL, USA, 1949. [Google Scholar]
  18. Bregman, L.M. The relaxation method of finding the common points of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 1967, 7, 200–217. [Google Scholar] [CrossRef]
  19. Gneiting, T.; Raftery, A.E. Strictly proper scoring rules, prediction, and estimation. J. Am. Stat. Assoc. 2007, 102, 359–378. [Google Scholar] [CrossRef]
  20. Adamčik, M. The information geometry of Bregman divergences and some applications in multi-expert reasoning. Entropy 2014, 16, 6338–6381. [Google Scholar] [CrossRef]
  21. Reid, M.D.; Williamson, R.C. Information, divergence and risk for binary experiments. J. Mach. Learn. Res. 2011, 12, 731–817. [Google Scholar]
  22. Banerjee, A.; Merugu, S.; Dhillon, I.S.; Ghosh, J. Clustering with Bregman divergences. J. Mach. Learn. Res. 2005, 6, 1705–1749. [Google Scholar]
  23. Theil, H. Statistical Decomposition Analysis; North-Holland Publishing Company: Amsterdam, The Netherlands, 1972. [Google Scholar]
  24. Benedetti, R. Scoring rules for forecast verification. Mon. Weather Rev. 2010, 138, 203–211. [Google Scholar] [CrossRef]
  25. Tödter, J.; Ahrens, B. Generalization of the ignorance score: continuous ranked version and its decomposition. Mon. Weather Rev. 2012, 140, 2005–2017. [Google Scholar] [CrossRef]
  26. Topp, C.F.E.; Wang, W.; Cloy, J.M.; Rees, R.M.; Hughes, G. Information properties of boundary line models for N2O emissions from agricultural soils. Entropy 2013, 15, 972–987. [Google Scholar] [CrossRef]
  27. Cardenas, L.M.; Gooday, R.; Brown, L.; Scholefield, D.; Cuttle, S.; Gilhespy, S.; Matthews, R.; Misselbrook, T.; Wang, J.; Li, C.; Hughes, G.; Lord, E. Towards an improved inventory of N2O from agriculture: model evaluation of N2O emission factors and N fraction leached from different sources in UK agriculture. Atmos. Environ. 2013, 79, 340–348. [Google Scholar] [CrossRef]
  28. Rees, R.M.; Augustin, J.; Alberti, G.; Ball, B.C.; Boeckx, P.; Cantarel, A.; Castaldi, S.; Chirinda, N.; Chojnicki, B.; Giebels, M.; Gordon, H.; Grosz, B.; Horvath, L.; Juszczak, R.; Klemedtsson, Å.K.; Klemedtsson, L.; Medinets, S.; Machon, A.; Mapanda, F.; Nyamangara, J.; Olesen, J.E.; Reay, D.S.; Sanchez, L.; Sanz Cobena, A.; Smith, K.A.; Sowerby, A.; Sommer, J.M.; Soussana, J.F.; Stenberg, M.; Topp, C.F.E.; van Cleemput, O.; Vallejo, A.; Watson, C.A.; Wuta, M. Nitrous oxide emissions from European agriculture—an analysis of variability and drivers of emissions from field experiments. Biogeosciences 2013, 10, 2671–2682. [Google Scholar]
  29. Hawkins, M.J.; Hyde, B.P.; Ryan, M.; Schulte, R.P.O.; Connolly, J. An empirical model and scenario analysis of nitrous oxide emissions from a fertilised and grazed grassland site in Ireland. Nutr. Cycl. Agroecosyst. 2007, 79, 93–101. [Google Scholar] [CrossRef]
  30. Moran, D.; Lucas, A.; Barnes, A. Mitigation win-win. Nat. Clim. Change 2013, 3, 611–613. [Google Scholar] [CrossRef]

Share and Cite

MDPI and ACS Style

Hughes, G.; Topp, C.F.E. Probabilistic Forecasts: Scoring Rules and Their Decomposition and Diagrammatic Representation via Bregman Divergences. Entropy 2015, 17, 5450-5471. https://doi.org/10.3390/e17085450

AMA Style

Hughes G, Topp CFE. Probabilistic Forecasts: Scoring Rules and Their Decomposition and Diagrammatic Representation via Bregman Divergences. Entropy. 2015; 17(8):5450-5471. https://doi.org/10.3390/e17085450

Chicago/Turabian Style

Hughes, Gareth, and Cairistiona F.E. Topp. 2015. "Probabilistic Forecasts: Scoring Rules and Their Decomposition and Diagrammatic Representation via Bregman Divergences" Entropy 17, no. 8: 5450-5471. https://doi.org/10.3390/e17085450

Article Metrics

Back to TopTop