**1. Introduction**

Forecasting using two categories of actual status and two categories of forecast is common in many scientific and technical applications where evidence-based risk assessment is required as a basis for decision-making, including plant pathology and clinical medicine. The statistical evaluation of probabilistic disease forecasts often involves the calculation of metrics defined conditionally on actual disease status. For the purpose of disease management decision making, metrics defined conditionally on forecast outcomes (i.e., predictive values) are also of interest, although these are less frequently reported. Here we introduce a new diagrammatic format for disease forecasts with two categories of actual status and two categories of forecast. The format displays relative entropies, functions of predictive values that characterize expected information provided by disease forecasts. Our aims in introducing a new diagrammatic format are two-fold. First, we wish to highlight that performance metrics conditioned on forecast outcomes have a useful role in the overall evaluation of diagnostic tests and disease forecasters; second, bearing in mind the first aim, we wish to demonstrate that performance metrics based on information theoretic quantities can help distinguish characteristics of such tests and forecasters that may not be apparent from probability-scale metrics. The new diagrammatic format we introduce is intended to provide a generic approach that can applied in any suitable context.

Diagrammatic formats are useful for summarizing the processes of evaluation and comparison of disease forecasts in plant pathology and other disciplines where decisions about a subject must often be taken based on a proxy risk variable rather than knowledge of a subject's actual status. The receiver operating characteristic (ROC) curve [1] is one such well-known format. In plant pathology, ROC curves are widely applied to characterize disease forecasters in terms of probabilities defined conditionally on actual disease status. Calculating the new diagrammatic format that we describe here has the same data requirements as the calculation of the ROC curve, but relates to relative entropy, an information theoretic metric that quantifies the expected amount of diagnostic information consequent on probability revision from prior to posterior arising from application of a disease forecaster. That is to say, it depicts (functions of) probabilities defined conditionally on the forecast. Even when the full underlying ROC curve data are not available, the new format can be constructed simply from ROC curve summary statistics.

The new diagrammatic format is linked analytically to other formats in ways that may not always be obvious simply from the resulting diagrams. We describe other formats and the links between them and the new format, using example data from a previously published study. In a general discussion, we consider the complementarity of metrics defined conditionally on the actual disease status and metrics defined conditionally on the outcome of the forecast.

#### **2. Methods**

We discuss information graphs for disease forecasters with two categories of actual status for subjects and two categories of forecast. In the present article, the terms 'forecast' and 'prediction' are synonymous. We place our discussion in the context of plant pathology, but the information graphs we describe likely have wider application. We are not concerned here with the detailed experimental and analytical methodology that underlies the development of disease forecasters. Readers seeking a description of such work are referred to Yuen et al. [2], Twengström et al. [3], and Yuen and Hughes [4], for example. Rather, we will describe some graphical methods for the comparison and evaluation of forecasters, and will outline some terminology and notation accordingly.

We need forecasters for support in crop protection decision making because the stage of the growing season at which disease management decisions are taken is usually much earlier than an assessment of actual (or 'gold standard') disease status could be made. For the purpose of development of a forecaster, two disease assessments are made on each of a series of experimental crops during the growing season. The actual status of each crop is characterized by an assessment of yield, or of disease intensity, at the end of the growing season. Crops are classified as cases ('*c*') or non-cases ('*nc*'), based on whether or not the gold standard end-of-season assessment indicates economically significant damage, respectively. Because the end-of-season assessment takes place too late to provide a basis for crop protection decision-making, an earlier assessment of disease risk is made, at a stage of the growing season when appropriate action can still be taken, if necessary. This earlier risk assessment may take the form of observation of a single variable that provides a risk score for the crop in question, or observation of a set of variables that are then combined to provide a risk score [5]. The risk score is a proxy variable, related to the actual status of the crop, that can be obtained at an appropriately early stage of the growing season for use in crop protection decision-making. Risk scores are usually calibrated so that higher scores are indicative of greater risk.

Now, consider the introduction of a threshold on the risk score scale. Scores above the threshold are designated '+', indicative of (predicted) need for a crop protection intervention. Scores at or below the threshold are designated '−', indicative of (predicted) no need for a crop protection intervention. The considerations underlying the adoption of a specific threshold risk score for use in a particular crop protection setting are beyond the scope of this article. Madden [6] discusses this in connection with an example data set that we consider in more detail below. In all settings, an adopted threshold characterizes the operational classification rule that is used as a basis for predictions of the need or otherwise for a crop protection intervention. The variable that characterizes the risk score together with the adopted threshold risk score that characterizes the operational classification rule together characterize what we may refer to as a (binary) 'test' ('forecaster' and 'predictor' are synonymous). A prediction-realization table [7] encapsulates the cross-classified experimental data underlying such a test. The data provide estimates of probabilities as shown in Table 1. Then, from Table 1 via Bayes' Rule, we can write *<sup>p</sup>*ˆ*i*∩*<sup>j</sup>* <sup>=</sup> *p*ˆ*j*∩*<sup>i</sup>* = *p*ˆ*i*|*j*·*p*ˆ*<sup>j</sup>* = *p*ˆ*j*|*i*·*p*ˆ*i*, with *i* = +, − (for the predictions) and *j* = *c*, *nc* (for the realizations). The *p*ˆ*<sup>j</sup>* are taken as the Bayesian prior probabilities of case (*j* = *c*) or non-case (*j* = *nc*) status, such that *p*ˆ*nc* = 1 − *p*ˆ*c*. Note also that the *p*ˆ*<sup>i</sup>* for intervention required (*i* = +) and intervention not required (*i* = −) can be written as *p*ˆ*<sup>i</sup>* = *p*ˆ*i*|*c*·*p*ˆ*<sup>c</sup>* + *p*ˆ*i*|*nc*·*p*ˆ*nc* via the Law of Total Probability.

**Table 1.** The prediction-realization table for a test with two categories of realized (actual) status (*c*, *nc*) and two categories of prediction (+, −). In the body of the table are the joint probabilities.


The posterior probability of (gold standard) case status (*c*) given a + prediction on using a test is *pc*|+, referred to as the *positive predictive value*. Here, this refers to correct predictions of the need for a crop protection intervention; the complement *pnc*|+ = 1 − *pc*|+ refers to incorrect predictions of the need for an intervention. The posterior probability of (gold standard) non-case (*nc*) status given a – prediction on using a test is *pnc*<sup>|</sup>−, referred to as the *negative predictive value*. Here, this refers to correct predictions of no need for an intervention; the complement *pc*<sup>|</sup><sup>−</sup> = 1 − *pnc*<sup>|</sup><sup>−</sup> refers to incorrect predictions of no need for an intervention. If we think of *pj* (*j* = *c*, *nc*) as representing the Bayesian prior probabilities (i.e., before the test is used to make a prediction), the *pj*<sup>|</sup>*<sup>i</sup>* (*i* = +, −) then represent the corresponding posteriors (i.e., after obtaining the prediction). Predictive values are metrics defined conditionally on forecast outcomes.

The proportion of + predictions made for cases is referred to as the true positive proportion, or *sensitivity*, and provides an estimate of the conditional probability *p*+|*c*. The complementary false negative proportion is an estimate of *p*−|*c*. The proportion of + predictions made for non-cases is referred to as the false positive proportion, and provides an estimate of *p*+|*nc*. The complementary true negative proportion, or *specificity*, is an estimate of *p*−|*nc*. *Sensitivity* and *specificity* are metrics defined conditionally on actual disease status. The ROC curve, which has become a familiar device in crop protection decision support following the pioneering work of Jonathan Yuen and colleagues [2,3], is a graphical plot of *sensitivity* against 1−*specificity* for a set of possible binary tests, based on the disease assessments made during the growing season and derived by varying the threshold on the risk score scale. Since *sensitivity* and *specificity* values are linked, a disease forecaster based on a particular threshold represents values chosen to achieve an appropriate balance [8].

#### **3. Results**

#### *3.1. Biggersta*ff*'s Analysis*

We denote the likelihood ratio of a + prediction as *L*+, estimated by:

$$L\_{+} = \frac{\mathfrak{p}\_{+} \mid\_{c}}{\mathfrak{p}\_{+} \mid\_{nc}}\tag{1}$$

(in words, the expression on the RHS is the true positive proportion divided by the false positive proportion or *sensitivity*/(1–*specificity*)). We denote the likelihood ratio of a − prediction as *L*−, estimated by:

$$\hat{L}\_{-} = \frac{\hat{p}\_{-} \parallel\_{c}}{\hat{p}\_{-} \parallel\_{nc}} \tag{2}$$

(in words, the expression on the RHS is the false negative proportion divided by the true negative proportion or (1–*sensitivity*)/*specificity*). Likelihood ratios are properties of a predictor (i.e., they are independent of prior probabilities) [9]. Values *L*<sup>+</sup> > 1 and 0 < *L*<sup>−</sup> < 1 are the minimum requirements for a useful binary test; within these ranges, larger positive values of *L*+ and smaller positive values of *L*<sup>−</sup> are desirable. *L*<sup>+</sup> characterizes the extent to which a + prediction is more likely from *c* crops than from *nc* crops; *L*<sup>−</sup> characterizes the extent to which a − prediction is less likely from *c* crops than from *nc* crops.

Now, working in terms of odds (*o*) rather than probability (*p*) (with *o* = *p*/(1−*p*)), we can write versions of Bayes' Rule, for example:

$$
\mathfrak{d}\_{\mathfrak{c}|+} = \mathfrak{d}\_{\mathfrak{c}} \cdot \mathfrak{L}\_{+} \tag{3}
$$

and:

$$
\hat{\sigma}\_{\mathfrak{c}|-} = \hat{\sigma}\_{\mathfrak{c}^\cdot} \hat{\mathcal{L}}\_{-}.\tag{4}
$$

Thus, a + prediction increases the posterior odds of *c* status relative to the prior odds by a factor of *L*ˆ <sup>+</sup> and a – prediction decreases the posterior odds of *<sup>c</sup>* status relative to the prior odds by a factor of *<sup>L</sup>*<sup>ˆ</sup> <sup>−</sup>. Biggerstaff [10] used Equations (3) and (4) to make pairwise comparisons of binary tests (with both tests applied at the same prior odds), premised on the availability only of the sensitivities and specificities corresponding to the two tests' operational classification rules (for example, when considering tests for application based on their published ROC curve summary statistics, *sensitivity* and *specificity*).

At this point, we refer to a previously published phytopathological data set [11] in order to illustrate our analysis. Note, however, that the analysis we present is generic, and is not restricted to application in one particular pathosystem. Table 2 summarizes data for five different scenarios, based in essence on five different normalized prediction-realization tables, derived from the original data set and discussed previously in [6] in the context of decision making in epidemiology.

**Table 2.** Example data set. See [6,11] for full details.


*p*ˆ*c*: prior probability of an epidemic or for the need for a control intervention, estimated by disease prevalence. *p*ˆ+| *<sup>c</sup>*: estimated probability of an actual epidemic being correctly predicted on using a test (as defined by a prediction-realization table). Referred to as *sensitivity*. *p*ˆ−| *nc*: estimated probability of an actual non-epidemic being correctly predicted on using a test (as defined by a prediction-realization table). Referred to as *specificity*. *p*ˆ*c*|+: estimated posterior probability of an epidemic given that one is predicted on using a test (as defined by a prediction-realization table). Referred to as *positive predictive value*. *p*ˆ*nc*|−: estimated posterior probability of no epidemic given that one is not predicted on using a test (as defined by a prediction-realization table). Referred to as *negative predictive value*.

Recall that we are interested in probability (or odds) revision calculated on the basis of a forecast. For illustration, we first consider the pairwise comparison of the tests derived from Scenario B (reference) and Scenario C (comparison) made at *p*ˆ*<sup>c</sup>* = 0.05 (Table 2). Madden [6] gives a detailed comparison based on knowledge of the full ROC curve derived from field experimentation. Biggerstaff's analysis essentially represents an attempt to reverse engineer a similar comparison based only on knowledge of the tests' published sensitivities and specificities. Scenario B yields *sensitivity* = 0.833 and *specificity* <sup>=</sup> 0.844, so we have *<sup>L</sup>*<sup>ˆ</sup> <sup>+</sup> <sup>=</sup> 5.333 and *<sup>L</sup>*<sup>ˆ</sup> <sup>−</sup> <sup>=</sup> 0.198. Scenario C yields *sensitivity* <sup>=</sup> 0.390 and *specificity* <sup>=</sup> 0.990, so we have *<sup>L</sup>*<sup>ˆ</sup> <sup>+</sup> <sup>=</sup> 39.000 and *<sup>L</sup>*<sup>ˆ</sup> <sup>−</sup> <sup>=</sup> 0.616. Thus, Scenario C's test is superior in terms of *<sup>L</sup>*<sup>ˆ</sup> <sup>+</sup> values but inferior in terms of *<sup>L</sup>*<sup>ˆ</sup> <sup>−</sup> values (even though its *sensitivity* is lower and *specificity* higher than that of the reference test). As long as we restrict ourselves to pairwise comparisons of binary tests at the same prior probability we have a simple analysis that leads, via calculation of likelihood ratios, to an evaluation of tests made on the basis of Bayesian posteriors (directly in terms of posterior odds, but these are easily converted to posterior probabilities if so desired). The diagrammatic version of this comparison is shown in Figure 1. The likelihood ratios graph comprises two single-point ROC

curves. A similar analysis for Scenario D (reference) and Scenario E (comparison) (Figure 2) shows that Scenario E's test is inferior in terms of *<sup>L</sup>*<sup>ˆ</sup> <sup>+</sup> values but superior in terms of *<sup>L</sup>*<sup>ˆ</sup> <sup>−</sup> values (even though its *sensitivity* is higher and *specificity* lower than that of the reference test).

**Figure 1.** Biggerstaff's likelihood ratios graph for Scenario B (reference) and Scenario C (comparison). The graph for Scenario B consists of a single point at 1–*specificity* = 0.156, *sensitivity* = 0.833 (see Table 2). The solid red line through (0, 0) and (0.156, 0.833) has slope = *sensitivity*/(1–*specificity*) = 5.333 = *L*ˆ <sup>+</sup>. The dashed red line through (0.156, 0.833) and (1, 1) has slope <sup>=</sup> (1–*sensitivity*)/*specificity* <sup>=</sup> 0.198 <sup>=</sup> *<sup>L</sup>*<sup>ˆ</sup> <sup>−</sup>. The graph for Scenario C consists of a single point at 1–*specificity* = 0.01, *sensitivity* = 0.39 (see Table 2). The solid green line through (0.01, 0.39) and (1, 1) has slope = *sensitivity*/(1–*specificity*) = 39.0 = *L*ˆ <sup>+</sup>. The dashed green line through (0.156, 0.833) and (1, 1) has slope <sup>=</sup> (1–*sensitivity*)/*specificity* <sup>=</sup> 0.616 <sup>=</sup> *<sup>L</sup>*<sup>ˆ</sup> <sup>−</sup>.

Referring back to Table 2, the likelihood ratios, and corresponding graphs, for Scenarios A, B and D would be numerically identical. It is in this context that the information theoretic properties of likelihood ratios graphs (not pursued by Biggerstaff) are of interest. To elaborate further, we will require an estimate of the prior probability *p*ˆ*c*. This is beyond what Biggerstaff's analysis allowed, but it is not so unlikely that such an estimate might be available. For example, a *p*ˆ*<sup>c</sup>* value is provided for any test for which a numerical version of the prediction-realization table (see Table 1) is accessible.

For information quantities, the specified unit depends on the choice of logarithmic base; bits for log base 2, nats for log base *e*, and hartleys (abbreviation: Hart) for log base 10 [12]. Our preference is to use base *e* logarithms, symbolized ln, where we need derivatives, following Thiel [7]. In this article, we will also make use of base 10 logarithms, symbolized log10, where this serves to make our presentation straightforwardly compatible with previously published work, specifically that of Johnson [13]. To convert from hartleys to nats, divide by log10(*e*); or to convert from nats to hartleys, divide by ln(10). When logarithms are symbolized just by log, as immediately following, this indicates use of a generic format such that specification of a particular logarithmic base is not required until the formula in question is used in calculation.

**Figure 2.** Biggerstaff's likelihood ratios graph for Scenario D (reference) and Scenario E (comparison). The graph for Scenario D consists of a single point at 1–*specificity* = 0.156, *sensitivity* = 0.833 (see Table 2). The solid red line through (0, 0) and (0.156, 0.833) has slope = *sensitivity*/(1–*specificity*) = 5.333 = *L*ˆ <sup>+</sup>. The dashed red line through (0.156, 0.833) and (1, 1) has slope <sup>=</sup> (1–*sensitivity*)/*specificity* <sup>=</sup> 0.198 <sup>=</sup> *<sup>L</sup>*<sup>ˆ</sup> <sup>−</sup>. The graph for Scenario E consists of a single point at 1–*specificity* = 0.344, *sensitivity* = 0.944 (see Table 2). The solid blue line through (0, 0) and (0.344, 0.944) has slope = *sensitivity*/(1–*specificity*) = 2.744 = *L*ˆ <sup>+</sup>. The dashed blue line through (0.344, 0.944) and (1, 1) has slope <sup>=</sup> (1–*sensitivity*)/*specificity* <sup>=</sup> 0.085 <sup>=</sup> *<sup>L</sup>*<sup>ˆ</sup> <sup>−</sup>.

We start with disease prevalence as an estimate of the prior probability *p*ˆ*<sup>c</sup>* of need for a crop protection intervention, and seek to update this by application of a predictor. The information required for certainty (i.e., when the posterior probability of need for an intervention is equal to one) is then log(1/*p*ˆ*c*) denominated in the appropriate information units. However, a predictor typically does not provide certainty, but instead updates *p*ˆ*<sup>c</sup>* to *p*ˆ*c*|*i*< 1. The information still required for certainty is then log 1/*p*ˆ*c*|*<sup>i</sup>* in the appropriate information units. We see from log(1/*p*ˆ*c*) <sup>−</sup> log 1/*p*ˆ*c*<sup>|</sup> *<sup>i</sup>* = log *p*ˆ*c*<sup>|</sup> *<sup>i</sup>*/*p*ˆ*<sup>c</sup>* that the term log *p*ˆ*c*<sup>|</sup> *<sup>i</sup>*/*p*ˆ*<sup>c</sup>* represents the information content of prediction *i* in relation to actual status *c* in the appropriate information units. Provided the prediction is correct (i.e., in this case, *i* = +), the posterior probability is larger than the prior, and thus information content of the *positive predictive value* is > 0. In general, the information content of correct predictions is > 0. Predictions that result in a posterior unchanged from the prior have zero information content and incorrect predictions have information content < 0.

Here, we consider the information content of a particular forecast, averaged over the possible actual states. These quantities are *expected* information contents, often referred to as relative entropies. For a binary test:

$$\hat{I}\_{+} = \sum\_{c,nc} \hat{\rho}\_{\dot{f}}|\_{+} \cdot \log \left[ \frac{\hat{\rho}\_{\dot{f}}|\_{+}}{\hat{\rho}\_{\dot{f}}} \right] \tag{5}$$

for the forecast *i* = + and:

$$\hat{I}\_{-} = \sum\_{c,nc} \hat{\rho}\_{j}|\_{-} \cdot \log \left[ \frac{\hat{\rho}\_{j}|\_{-}}{\hat{\rho}\_{j}} \right] \tag{6}$$

for the forecast *i* = –. Relative entropies measure expected information consequent on probability revision from prior *p*ˆ*<sup>j</sup>* to posterior *p*ˆ*j*|*<sup>i</sup>* after obtaining a forecast. Relative entropies are ≥ 0, with equality only if the posterior probabilities are the same as the priors. Larger values of both <sup>ˆ</sup>*I*<sup>+</sup> and <sup>ˆ</sup>*I*<sup>−</sup> are preferable, as being indicative of forecasts that, on average, provide more diagnostic information.

We can write the relative entropies <sup>ˆ</sup>*I*<sup>+</sup> and <sup>ˆ</sup>*I*<sup>−</sup> in terms of *sensitivity*, *specificity* and (constant) prior probability. Working here in natural logarithms, and recalling that *p*ˆ<sup>−</sup> | *<sup>c</sup>* = 1 −*p*ˆ<sup>+</sup> | *<sup>c</sup>* , *p*ˆ<sup>−</sup> | *nc* = 1 −*p*ˆ<sup>+</sup> | *nc* , and *p*ˆ*nc* = 1 − *p*ˆ*<sup>c</sup>* we have:

$$\begin{array}{rcl} \mathcal{I}\_{+} &=& \frac{\not{\mathcal{P}}\_{+} \mid\_{\mathcal{C}} \cdot \not{\mathcal{P}}\_{\mathcal{C}}}{\not{\mathcal{P}}\_{+} \mid\_{\mathcal{C}} \cdot \not{\mathcal{P}}\_{\mathcal{C}} + \not{\mathcal{P}}\_{+} \cdot \not{\mathcal{P}}\_{\mathcal{C}} \cdot \not{\mathcal{P}}\_{\mathcal{C}\mathcal{C}}} \cdot \ln \left[ \frac{\not{\mathcal{P}}\_{+} \mid\_{\mathcal{C}}}{\not{\mathcal{P}}\_{+} \mid\_{\mathcal{C}} \cdot \not{\mathcal{P}}\_{\mathcal{C}} + \not{\mathcal{P}}\_{+} \cdot \not{\mathcal{P}}\_{\mathcal{C}} \cdot \not{\mathcal{P}}\_{\mathcal{C}}} \right] \\ &+ \frac{\not{\mathcal{P}}\_{+} \mid\_{\mathcal{C}} \cdot \not{\mathcal{P}}\_{\mathcal{C}} \cdot \not{\mathcal{P}}\_{\mathcal{C}} \cdot \not{\mathcal{P}}\_{\mathcal{C}} \cdot \not{\mathcal{P}}\_{\mathcal{C}} \cdot \not{\mathcal{P}}\_{\mathcal{C}} \cdot \not{\mathcal{P}}\_{\mathcal{C}} \cdot \not{\mathcal{P}}\_{\mathcal{C}} \cdot \not{\mathcal{P}}\_{\mathcal{C}} \cdot \not{\mathcal{P}}\_{\mathcal{C}} \cdot \not{\mathcal{P}}\_{\mathcal{C}} \end{array} \tag{7}$$

in nats and:

$$\begin{array}{lcl}\mathcal{I}\_{-} &= \frac{\not{\mathcal{P}\_{-}}\mid\_{\mathcal{E}} \cdot \not{\mathcal{P}\_{c}}}{\not{\mathcal{P}\_{-}}\mid\_{\mathcal{E}} \cdot \not{\mathcal{P}\_{c}} + \not{\mathcal{P}\_{-}}\mid\_{\mathcal{M}} \cdot \not{\mathcal{P}\_{\mathcal{M}}}} \cdot \ln\left[\frac{\not{\mathcal{P}\_{-}}\mid\_{\mathcal{E}}}{\not{\mathcal{P}\_{-}}\mid\_{\mathcal{E}} \cdot \not{\mathcal{P}\_{c}} + \not{\mathcal{P}\_{-}}\mid\_{\mathcal{M}} \cdot \not{\mathcal{P}\_{\mathcal{M}}}}\right] \\ &+\frac{\not{\mathcal{P}\_{-}}\mid\_{\mathcal{E}} \cdot \not{\mathcal{P}\_{c}} \cdot \not{\mathcal{P}\_{\mathcal{M}}} \cdot \not{\mathcal{P}\_{\mathcal{M}}} \cdot \not{\mathcal{P}\_{\mathcal{M}}} \cdot \not{\mathcal{P}\_{\mathcal{M}}} \cdot \not{\mathcal{P}\_{\mathcal{M}}} \cdot \not{\mathcal{P}\_{\mathcal{M}}} \end{array} \tag{8}$$

again in nats. Now we can use these formulas to plot sets of iso-information contours for constant relative entropies <sup>ˆ</sup>*I*<sup>+</sup> and <sup>ˆ</sup>*I*<sup>−</sup> on the graph with axes *sensitivity* and 1 – *specificity*, for given prior probabilities. From Equation (7) we obtain:

$$\frac{\mathbf{d}(\hat{\boldsymbol{p}}\_{+}|\_{\boldsymbol{c}})}{\mathbf{d}(\hat{\boldsymbol{p}}\_{+}|\_{\boldsymbol{nc}})} = \frac{\hat{\boldsymbol{p}}\_{+}|\_{\boldsymbol{c}}}{\hat{\boldsymbol{p}}\_{+}|\_{\boldsymbol{nc}}} \tag{9}$$

the solution of which is the straight line *<sup>p</sup>*ˆ<sup>+</sup> <sup>|</sup> *<sup>c</sup>* = *<sup>a</sup>*·*p*ˆ<sup>+</sup> <sup>|</sup> *nc* , which yields *<sup>a</sup>* = *<sup>L</sup>*<sup>ˆ</sup> <sup>+</sup>. From Equation (8) we obtain:

$$\frac{\mathbf{d}(\hat{\boldsymbol{p}}\_{+}|\_{\mathcal{L}})}{\mathbf{d}(\hat{\boldsymbol{p}}\_{+}|\_{\mathcal{m}})} = \frac{1 - \hat{\boldsymbol{p}}\_{+}|\_{\mathcal{L}}}{1 - \hat{\boldsymbol{p}}\_{+}|\_{\mathcal{m}}} \tag{10}$$

the solution of which is the straight line *<sup>p</sup>*ˆ<sup>+</sup> <sup>|</sup> *<sup>c</sup>* = (<sup>1</sup> <sup>−</sup> *<sup>b</sup>*) + *<sup>b</sup>*·*p*ˆ<sup>+</sup> <sup>|</sup> *nc* , which yields *<sup>b</sup>* = *<sup>L</sup>*<sup>ˆ</sup> <sup>−</sup>. Thus, we find that iso-information contours for <sup>ˆ</sup>*I*<sup>+</sup> and <sup>ˆ</sup>*I*<sup>−</sup> are straight lines on the graph with axes *sensitivity* and 1 – *specificity*, i.e., Biggerstaff's likelihood ratios graph (see Figure 3).

**Figure 3.** Biggerstaff's likelihood ratios graphs for Scenarios A, B and D (Table 2). The slopes of the lines are the likelihood ratios *<sup>L</sup>*<sup>ˆ</sup> <sup>+</sup> <sup>=</sup> 5.333 and *<sup>L</sup>*<sup>ˆ</sup> <sup>−</sup> <sup>=</sup> 0.198, calculated from Table 2. Analysis shows that the lines themselves are also iso-information contours for the expected information contents of + and – forecasts. However, the calculated values of these expected information contents depend on the prior probability as well as on *sensitivity* and *specificity*. Making use of the available data on the prior probabilities allows us to calculate relative entropies in order to distinguish analytically between scenarios, but the likelihood ratios graph does not distinguish visually between scenarios with the same *sensitivity* and *specificity*.

Now consider Scenarios A, B and D; from the data in Table 2, we calculate likelihood ratios *<sup>L</sup>*<sup>ˆ</sup> <sup>+</sup> <sup>=</sup> 5.333 and *<sup>L</sup>*<sup>ˆ</sup> <sup>−</sup> <sup>=</sup> 0.198 for all three scenarios (these are the slopes of the lines shown in Figure 3). However, the three scenarios differ in their prior probabilities: *p*ˆ*<sup>c</sup>* = 0.36, 0.05, 0.85 for A, B, and D respectively. This situation may arise in practice when a test is developed and used in one geographical location, and then subsequently evaluated with a view to application in other locations where the disease prevalence is different. The difference in test performance is reflected by the relative entropy calculations. For Scenario A, we calculate relative entropies <sup>ˆ</sup>*I*<sup>+</sup> <sup>=</sup> 0.315 and <sup>ˆ</sup>*I*<sup>−</sup> <sup>=</sup> 0.179 (both in nats, these characterize the lines shown in Figure 3 interpreted as iso-information contours for the expected information contents of + and – forecasts respectively). For Scenario B, we calculate ˆ*I*<sup>+</sup> = 0.171 and <sup>ˆ</sup>*I*<sup>−</sup> <sup>=</sup> 0.024 nats. For Scenario D, <sup>ˆ</sup>*I*<sup>+</sup> <sup>=</sup> 0.076 and <sup>ˆ</sup>*I*<sup>−</sup> <sup>=</sup> 0.289 nats. Thus we may view Biggerstaff's likelihood ratios graph from an information theoretic perspective. While likelihood ratios are independent of prior probability, relative entropies are functions of prior probability. There is further discussion of relative entropies, including calculations for Scenarios C and E, in Section 3.3.

#### *3.2. Johnson's Analysis*

Johnson [13] suggested transformation of the likelihood ratios graph (e.g., Figures 1–3), such that the axes of the graph are denominated in log likelihood ratios. At the outset, note that Johnson works in base 10 logarithms and that this choice is duplicated here, for the sake of compatibility. Thus, although Johnson's analysis is not explicitly information theoretic, where we use it as a basis for characterizing information theoretic quantities, these quantities will have units of hartleys. Note also that Johnson calculates log10 *<sup>L</sup>*<sup>ˆ</sup> <sup>+</sup> and log10 *<sup>L</sup>*<sup>ˆ</sup> <sup>−</sup> but here we take account of the signs of the log likelihood ratios. Fosgate's [14] correction of Johnson's terminology is noted, although this does not affect our analysis at all.

From Equation (3), we write:

$$\log\_{10} \delta\_{\mathfrak{c}}|\_{+} = \log\_{10} \delta\_{\mathfrak{c}^{+}} \log\_{10} \mathcal{L}\_{+} \tag{11}$$

and from Equation (4):

$$
\log\_{10} \delta\_{\mathbb{C}}|\_{-} = \log\_{10} \delta\_{\mathbb{C}} + \log\_{10} \hat{L}\_{-} \tag{12}
$$

with log10 *<sup>L</sup>*<sup>ˆ</sup> <sup>+</sup><sup>&</sup>gt; 0 (larger positive values are better) and log10 *<sup>L</sup>*<sup>ˆ</sup> <sup>−</sup><sup>&</sup>lt; 0 (larger negative values are better) for any useful test. As previously, the objective is to make pairwise comparisons of binary tests (with both tests applied at the same prior odds), premised on the availability only of the sensitivities and specificities corresponding to the two tests' operational classification rules.

With Scenario B as the reference test and Scenario C as the comparison test, we find Scenario C's test is superior in terms of log <sup>10</sup>*L*<sup>ˆ</sup> <sup>+</sup> values but inferior in terms of log <sup>10</sup>*L*<sup>ˆ</sup> <sup>−</sup> values (Figure 4). With Scenario D as the reference test and Scenario E as the comparison test, we find Scenario E's test is inferior in terms of log <sup>10</sup>*L*<sup>ˆ</sup> <sup>+</sup> values, but superior in terms of log <sup>10</sup>*L*<sup>ˆ</sup> <sup>−</sup> (Figure 4). Moreover, we find that the transformed likelihood ratios graph still does not distinguish visually between Scenarios A, B and D (Figure 4). Thus, the initial findings from the analysis of the scenarios in Table 2 are the same as previously.

Now, as with Biggerstaff's [10] original analysis, we seek to view Johnson's analysis from an information theoretic perspective. As before, we will require an estimate of the prior probability *p*ˆ*c*. After some rearrangement, we obtain from Equation (11):

$$\log\_{10}\left[\frac{\mathfrak{p}\_{\mathfrak{c}}\vert\_{+}}{\mathfrak{p}\_{\mathfrak{c}}}\right] - \log\_{10}\left[\frac{\mathfrak{p}\_{\mathfrak{nc}}\vert\_{+}}{\mathfrak{p}\_{\mathfrak{nc}}}\right] = \log\_{10}\mathcal{L}\_{+} \text{Hart} \tag{13}$$

where log10[*p*ˆ*<sup>c</sup>* |<sup>+</sup> /*p*ˆ*c*] (> 0) and log10[*p*ˆ*nc* |<sup>+</sup> /*p*ˆ*nc*] (< 0) on the LHS are information contents (as outlined in Section 3.1) with units of hartleys. From Equation (12):

$$
\log\_{10}\left[\frac{\mathfrak{p}\_{\rm c}|\_{-}}{\mathfrak{p}\_{\rm c}}\right] - \log\_{10}\left[\frac{\mathfrak{p}\_{\rm nc}|\_{-}}{\mathfrak{p}\_{\rm nc}}\right] = \log\_{10}\hat{L}\text{-Hart} \tag{14}
$$

where log10[*p*ˆ*<sup>c</sup>* |− /*p*ˆ*c*] (< 0) and log10[*p*ˆ*nc* |− /*p*ˆ*nc*] (> 0) on the LHS are information contents, again with units of hartleys. Thus, we recognize that log10 likelihood ratios also have units of hartleys. Figure 5 shows the information theoretic characteristics of Johnson's analysis when data on priors are incorporated, by drawing log10-likelihood contours on a graphical plot that has information contents on the axes.

**Figure 4.** A version of Johnson's log10 likelihood ratios diagram for data from Table 2. Here log10 *<sup>L</sup>*<sup>ˆ</sup> <sup>+</sup> <sup>=</sup> 0.727 and log10 *<sup>L</sup>*<sup>ˆ</sup> <sup>−</sup><sup>=</sup> <sup>−</sup>0.704 for Scenarios A, B and D (-). For Scenario C (-), log10 *<sup>L</sup>*<sup>ˆ</sup> <sup>+</sup> <sup>=</sup> 1.591 and log10 *<sup>L</sup>*<sup>ˆ</sup> <sup>−</sup><sup>=</sup> <sup>−</sup>0.208. For Scenario E (-), log10 *<sup>L</sup>*<sup>ˆ</sup> <sup>+</sup> <sup>=</sup> 0.438 and log10 *<sup>L</sup>*<sup>ˆ</sup> <sup>−</sup> <sup>=</sup> <sup>−</sup>1.071. Valid comparisons (i.e., for scenarios with equal prior probabilities) are Scenario B (reference) with Scenario C (comparison) and Scenario D (reference) with Scenario E (comparison).

In Figure 5, both the log10 *<sup>L</sup>*<sup>ˆ</sup> <sup>+</sup> and log10 *<sup>L</sup>*<sup>ˆ</sup> <sup>−</sup> contours always have slope <sup>=</sup> 1. As the decompositions characterized in Equations (13) and (14) show, any (constant) log10 likelihood ratio is the sum of two information contents. Looking at the "north-west" corner of Figure 5 and taking Scenarios A, B, and D from Table 2 as examples, we have log10[*p*ˆ*<sup>c</sup>* |<sup>+</sup> /*p*ˆ*c*] = 0.642, 0.319, 0.056 Hart and log10[*p*ˆ*nc* |<sup>+</sup> /*p*ˆ*nc*] = −0.085, −0.408, −0.671 Hart for *p*ˆ*<sup>c</sup>* = 0.05 (B), 0.36 (A), 0.85 (D), respectively. In each case, Equation (13) yields log10 *<sup>L</sup>*<sup>ˆ</sup> <sup>+</sup> <sup>=</sup> 0.727 Hart. Looking at the "south-east" corner of Figure 5, again taking Scenarios A, B, and D from Table 2 as examples, we have log10[*p*ˆ*nc* |− /*p*ˆ*nc*] = 0.498, 0.148, 0.018 Hart and log10[*p*ˆ*<sup>c</sup>* |− /*p*ˆ*c*] = −0.207, −0.556, −0.687 Hart for *p*ˆ*nc* = 0.15 (D), 0.64 (A), 0.95 (B), respectively. In each case, Equation (14) yields log10 *<sup>L</sup>*<sup>ˆ</sup> <sup>−</sup> <sup>=</sup> <sup>−</sup>0.704 Hart. Thus we have an information theoretic perspective on Johnson's analysis when data on priors are available, and this time one that separates Scenarios A, B and D visually (Figure 5).

**Figure 5.** The "north-west" region of the figure is characterized by Equation (13), so relates to + predictions (which are correct for *c* subjects and incorrect for *nc* subjects). Log10*L*<sup>+</sup> contours are always straight lines with slope <sup>=</sup> 1. The solid red line indicates the contour for log10 *<sup>L</sup>*<sup>ˆ</sup> <sup>+</sup> <sup>=</sup> 0.727 Hart, corresponding to Scenarios A, B, and D (Table 2). A correct + prediction has a large information content when *p*ˆ*c* is small (B), and a small information content is when *p*ˆ*c* is large (D) (the arrow indicates the direction of increasing *p*ˆ*<sup>c</sup>* along the contour). As the information content log10[*p*ˆ*<sup>c</sup>* |<sup>+</sup> /*p*ˆ*c*] (on the vertical axis) becomes decreasingly positive, the information content log10[*p*ˆ*nc* |<sup>+</sup> /*p*ˆ*nc*] (on the horizontal axis) becomes increasingly negative. The "south-east" region of the figure is characterized by Equation (14), so relates to − predictions (which are correct for *nc* subjects and incorrect for *c* subjects). Log10*L*<sup>−</sup> contours are always straight lines with slope = 1. The dashed red line indicates the contour for log10 *<sup>L</sup>*<sup>ˆ</sup> <sup>−</sup> <sup>=</sup> <sup>−</sup>0.704 Hart, corresponding to Scenarios A, B, and D (Table 2). A correct <sup>−</sup> prediction has a large information content when *p*ˆ*nc* is small (D), and a small information content is when *p*ˆ*nc* is large (B) (the arrow indicates the direction of increasing *p*ˆ*nc* along the contour, *p*ˆ*nc* = 1 − *p*ˆ*c*). As the information content log10[*p*ˆ*nc* |− /*p*ˆ*nc*] (on the horizontal axis) becomes decreasingly positive, the information content log10[*p*ˆ*<sup>c</sup>* |− /*p*ˆ*c*] (on the vertical axis) becomes increasingly negative.

#### *3.3. A New Diagrammatic Format*

Biggerstaff's [10] diagrammatic format for binary predictors allows an information theoretic interpretation once the data on prior probabilities have been incorporated. This distinguishes predictors with the same likelihood ratios analytically, but not visually. Johnson's [13] transformed version of Biggerstaff's diagrammatic format also allows an information theoretic interpretation once data on prior probabilities are incorporated. This approach distinguishes predictors with the same likelihood ratios both analytically and visually, but does not contribute to the comparison and evaluation of predictive values of disease forecasters.

We now return to the information theoretic interpretation of Biggerstaff's likelihood ratios graph (and revert to working in natural logarithms for continuity with previous analysis based on Figure 3). In Figure 3, the likelihood ratios are the slopes of the lines on the graphical plot. The lines themselves are relative entropy contours, the value of which depends on prior probability. We can now visually separate scenarios that have the same likelihood ratios but different relative entropies (e.g., A, B, D in

Table 2) by calculating the graph with relative entropies <sup>ˆ</sup>*I*<sup>+</sup> and <sup>ˆ</sup>*I*<sup>−</sup> on the axes of the plot (Figure 6). If we consider the predictor based on Scenario A as the reference, then the predictor based on Scenario B falls in the region of Figure 6 indicating comparatively less information is provided by both + and – predictions, while the predictor based on Scenario D falls in the region indicating comparatively less diagnostic information is provided by + predictions but comparatively more by − predictions.

**Figure 6.** Scenario A: from the data in Table 2, we calculate relative entropies <sup>ˆ</sup>*I*<sup>+</sup> <sup>=</sup> 0.315, <sup>ˆ</sup>*I*<sup>−</sup> <sup>=</sup> 0.179 (both in nats) (*p*ˆ*<sup>c</sup>* <sup>=</sup> 0.36) (Equations (3) and (4)). Similarly, for Scenario B we calculate <sup>ˆ</sup>*I*<sup>+</sup> <sup>=</sup> 0.171, <sup>ˆ</sup>*I*<sup>−</sup> <sup>=</sup> 0.024 nats (*p*ˆ*<sup>c</sup>* <sup>=</sup> 0.05) and for Scenario D, <sup>ˆ</sup>*I*<sup>+</sup> <sup>=</sup> 0.076, <sup>ˆ</sup>*I*<sup>−</sup> <sup>=</sup> 0.289 nats (*p*ˆ*<sup>c</sup>* <sup>=</sup> 0.85).

There is an alternative view of the diagrammatic format presented in Figure 6. Scenarios A, B and D all have the same likelihood ratios, *<sup>L</sup>*<sup>ˆ</sup> <sup>+</sup><sup>=</sup> 5.333 and *<sup>L</sup>*<sup>ˆ</sup> <sup>−</sup> <sup>=</sup> 0.198 (see Figure 3). What differs between scenarios is the prior probability *p*ˆ*c*. We can remove the gridlines indicating the relative entropies for Scenario A and plot the underlying prior probability contour (Figure 7). In Figure 7, starting at the origin and moving clockwise, prior probability increases as we move along the contour. The contour has maximum points with respect to both the horizontal axis and the vertical axis. The maximum value of the contour with respect to the horizontal axis is:

$$\hat{p}\_c = \frac{\hat{p}\_+ \mid\_{nc} \cdot \left[\hat{p}\_+ \mid\_c \cdot \left(\ln\left[\frac{\hat{p}\_+ \mid\_{nc}}{\hat{p}\_+ \mid\_{nc}}\right] - 1\right) + \hat{p}\_+ \mid\_{nc}\right]}{\left[\hat{p}\_+ \mid\_c - \hat{p}\_+ \mid\_{nc}\right]^2} \tag{15}$$

and the maximum value of the contour with respect to the vertical axis is:

$$\mathfrak{p}\_c = \frac{\mathfrak{p} - \mathfrak{p}\_{\text{nc}} \cdot \left[ \mathfrak{p} - \mathfrak{p}\_{\text{c}} \left( \ln \left[ \frac{\mathfrak{p} - \mathfrak{p}\_{\text{c}}}{\mathfrak{p} - \|\mathfrak{p}\_{\text{c}}\|\_{\text{nc}}} \right] - 1 \right) + \mathfrak{p} - \mathfrak{p}\_{\text{c}} \right]}{\left[ \mathfrak{p} + \|\mathfrak{c} - \mathfrak{p}\_{\text{c}} \|\_{\text{nc}} \right]^2}. \tag{16}$$

The corresponding values of <sup>ˆ</sup>*I*<sup>+</sup> and <sup>ˆ</sup>*I*−, respectively, can then be calculated by substitution into Equations (7) and (8). The two maxima (together with the origin) divide the prior probability contour into three monotone segments (see Figure 7). As *<sup>p</sup>*ˆ*<sup>c</sup>* increases, we observe a segment where <sup>ˆ</sup>*I*<sup>+</sup> and <sup>ˆ</sup>*I*<sup>−</sup> are both increasing (this includes Scenario B), then one where <sup>ˆ</sup>*I*<sup>+</sup> is decreasing and <sup>ˆ</sup>*I*<sup>−</sup> is increasing, this includes Scenario A), and then one where <sup>ˆ</sup>*I*<sup>+</sup> and <sup>ˆ</sup>*I*<sup>−</sup> are both decreasing (this includes Scenario D).

From Figure 7, we see that for the predictor based on Scenarios A, B and D, a + prediction provides most diagnostic information around prior probability 0.2 < *p*ˆ*<sup>c</sup>* < 0.3. A – prediction provides most diagnostic information around prior probability 0.7 < *p*ˆ*<sup>c</sup>* < 0.8. Recall that this contour describes performance (in terms of diagnostic information provided) for predictors with *sensitivity* = 0.833 and *specificity* <sup>=</sup> 0.844 (Table 2) (alternatively expressed as likelihood ratios *<sup>L</sup>*<sup>ˆ</sup> <sup>+</sup> <sup>=</sup> 5.333 and *<sup>L</sup>*<sup>ˆ</sup> <sup>−</sup> <sup>=</sup> 0.198). No additional data beyond *sensitivity* and *specificity* are required in order to produce this graphical plot; that is to say, by considering the whole range of prior probability we remove the requirement for any particular values. The point where the contour intersects the main diagonal of the plot is where <sup>ˆ</sup>*I*<sup>+</sup> <sup>=</sup> <sup>ˆ</sup>*I*−. In this case, we find that <sup>ˆ</sup>*I*<sup>+</sup> <sup>=</sup> <sup>ˆ</sup>*I*<sup>−</sup> at prior probability <sup>≈</sup> 0.5 (Figure 7). At lower prior probabilities, <sup>+</sup> predictions provide more diagnostic information than − predictions, while at higher prior probabilities, the converse is the case. This contour's balance of relative entropies at prior probability ≈ 0.5 is noteworthy because it is not necessarily the case that there is always scope for such balance.

**Figure 7.** The prior probability *p*ˆ*c* contour for Scenarios A, B, and D (solid red line). The contour is calibrated at 0.1 intervals of *p*ˆ*c*, clockwise from the origin, 0.1 to 0.9 (+ symbol on curve). Scenarios B (*p*ˆ*<sup>c</sup>* = 0.05), A (*p*ˆ*<sup>c</sup>* = 0.36), and D (*p*ˆ*<sup>c</sup>* = 0.85) as characterized in Table 2 are indicated (-). Also indicated on the prior probability contour: maximum <sup>ˆ</sup>*I*<sup>+</sup> <sup>=</sup> 0.337 nats () (*p*ˆ*<sup>c</sup>* <sup>=</sup> 0.245), maximum <sup>ˆ</sup>*I*<sup>−</sup> <sup>=</sup> 0.317 nats () (*p*ˆ*<sup>c</sup>* <sup>=</sup> 0.749), <sup>ˆ</sup>*I*<sup>+</sup> <sup>=</sup> <sup>ˆ</sup>*I*−<sup>=</sup> 0.251 nats (•) (*p*ˆ*<sup>c</sup>* <sup>=</sup> 0.513).

Recall from Section 3.1 that we start with disease prevalence as an estimate of the prior probability *p*ˆ*<sup>c</sup>* of need for a crop protection intervention. The information required (from a predictor) for certainty is then log(1/*p*ˆ*c*) denominated in the appropriate information units. This is the amount of information that would result in a posterior probability of need for an intervention equal to one. Similarly, log(1/*p*ˆ*nc*), denominated in the appropriate information units, is the amount of information that would result in a posterior probability of no need for an intervention equal to one. We can plot the contour for these information contents on the diagrammatic format of Figure 7. This contour, illustrated in Figure 8, indicates the upper limit for the performance of any binary predictor. No phytopathological data are required to calculate this contour.

The diagrammatic format of Figure 7 (for Scenarios A, B and D) can accommodate prior probability contours for other Scenarios (i.e., for predictors based on different *sensitivity* and *specificity* values). For example, Figure 9 shows, in addition, the prior probability contours for the predictors based on Scenario C (with *sensitivity* = 0.39 and *specificity* = 0.99) and on Scenario E (with *sensitivity* = 0.944 and *specificity* = 0.656). We observe that a predictor based on Scenario C's *sensitivity* and *specificity* values potentially provides a large amount of diagnostic information from a + prediction, but over a very narrow range of prior probabilities. Scenario C itself represents one such predictor. The amount of diagnostic information from − predictions is very low over the whole range of prior probabilities. A predictor based on Scenario E's *sensitivity* and *specificity* values potentially provides a large amount of diagnostic information from − predictions over a narrow range of prior probabilities. Scenario E itself represents one such predictor. The amount of diagnostic information from + predictions remains low over the whole range of prior probabilities.

**Figure 8.** The dashed curve is the prior probability *p*ˆ*c* contour showing the upper limit for performance of any binary predictor. The contour is calibrated at 0.1 intervals of *p*ˆ*c* from upper left to lower right, 0.1 to 0.9 (+ symbol on curve). The maximum relative entropy for a + test result increases indefinitely as *p*ˆ*c* approaches 0 while the maximum relative entropy for a – test result increases indefinitely as *p*ˆ*c* approaches 1. The prior probability contour for Scenarios A, B, and D from Figure 7 (solid red line) is also shown, for reference (note the rescaled axes).

**Figure 9.** The prior probability contours for Scenarios C (solid green line) and E (solid blue line). Starting at the origin, the green prior probability contour passes through points (clockwise from origin): Scenario C, <sup>ˆ</sup>*I*<sup>+</sup> <sup>=</sup> 1.399, <sup>ˆ</sup>*I*<sup>−</sup> <sup>=</sup> 0.004 (prior <sup>=</sup> 0.05) (-); maximum ˆ*I*<sup>+</sup> = 1.436 (prior = 0.073) (); maximum <sup>ˆ</sup>*I*<sup>−</sup> <sup>=</sup> 0.029 (prior <sup>=</sup> 0.580) (). This contour does not coincide with the main diagonal of the plot other than at the origin. Starting at the origin, the blue prior probability contour passes through points (clockwise from origin): <sup>ˆ</sup>*I*<sup>+</sup> <sup>=</sup> <sup>ˆ</sup>*I*<sup>−</sup> <sup>=</sup> 0.080 (•) (prior <sup>=</sup> 0.109); maximum <sup>ˆ</sup>*I*<sup>+</sup> <sup>=</sup> 0.126 (prior <sup>=</sup> 0.337) (); Scenario E, <sup>ˆ</sup>*I*<sup>+</sup> <sup>=</sup> 0.039, <sup>ˆ</sup>*I*<sup>−</sup> <sup>=</sup> 0.700 (prior <sup>=</sup> 0.850) (-); maximum <sup>ˆ</sup>*I*<sup>−</sup> <sup>=</sup> 0.701 (prior <sup>=</sup> 0.842) (point obscured from view). The prior probability contour for Scenarios A, B, and D (solid red line) is included here for reference; clockwise from origin, points marked indicate Scenarios B, A and D (see Figure 7 for details). The dashed curve shows the contour indicating the upper limit for performance of a binary predictor (see Figure 8 for details). Note the changes in the scales on the axes compared with Figures 7 and 8.

#### **4. Discussion**

Diagrammatic formats have the potential to aid interpretation in the evaluation and comparison of disease forecasts. Biggerstaff's [10] likelihood ratios graph is a particularly interesting example. This graph uses the format of the ROC curve, as widely applied in exhibiting and explaining *sensitivity* and *specificity* for binary tests. However, while *sensitivity* and *specificity* are defined conditionally on actual disease status, the likelihood ratios graph is used to compare tests on the basis of predictive values, defined conditionally on the forecast (when tests are applied at the same prior probability). As Biggerstaff notes, one is less interested in *sensitivity* and *specificity* when it comes to the application of a test, because the conditionality is in the wrong order. The predictive values, or some functions of them, are also important, and ideally one would be able use these when assessing test performance in application (Figures 1 and 2).

Altman and Royston [15] discussed this idea in some detail and proposed PSEP as a metric for use in the assessment of predictor performance (in the binary case, PSEP = *positive predictive value* + *negative predictive value* – 1). Hughes and Burnett [16] later used an information theoretic analysis (including a diagrammatic representation) to show how PSEP was related to both the *Brier score* [17] and the information theoretic *divergence score* [18] methods of assessing predictor performance. In the current article, further analysis shows that Biggerstaff's likelihood ratios graph has underlying information theoretic properties that specifically relate to predictive values. The lines on the likelihood ratios graph are relative entropy contours, quantifying the expected information consequent on revising the prior probability of disease to the posterior probability after obtaining a forecast. However, the likelihood ratios graph does not visually distinguish relative entropy contours when predictors that have the same ROC curve summary statistics (sensitivities and specificities, or equivalently, likelihood ratios for both + and − predictions) are compared at different prior probabilities (Figure 3). A modified diagrammatic format that does so would therefore be of interest.

Johnson [13] provides a modified format, with log likelihood ratios on the axes of the graph (Figure 4), and suggests various possible advantages of this format. Our further analysis again shows that this modified format has underlying information theoretic properties. These properties relate to the statistical decomposition of log likelihood ratios (Figure 5; see also [5] for further discussion) but do not appear to be straightforwardly helpful as an aid to interpretation in the evaluation and comparison of disease forecasters based on predictive values.

Benish [19] applied information graphs for relative entropy to evaluate and compare clinical diagnostic tests. Here we derive relative entropies from Biggerstaff's likelihood ratios graph and present the results in a new diagrammatic format, with relative entropies for + and − predictions on the axes of the graph. Compared with the likelihood ratios graph, this visually distinguishes between predictors that have the same ROC curve summary statistics when compared at different (known) prior probabilities (Figure 6). So, referring to the scenarios listed in Table 2 with likelihood ratios *<sup>L</sup>*<sup>ˆ</sup> <sup>+</sup> <sup>=</sup> 5.333 and *<sup>L</sup>*<sup>ˆ</sup> <sup>−</sup> <sup>=</sup> 0.198 (i.e., A, B, and D) we see that Scenario A has the highest relative entropy for a + prediction, then B, then D. Scenario D has the highest relative entropy for a − prediction, then A, then B. Recall that relative entropies are functions of the predictive values.

Suppose now that our aim is not to compare predictor performance in particular scenarios, but to evaluate performance over the range of possible scenarios. We can use our new format not just to plot relative entropies for a comparison of predictor performance for various known prior probability (disease prevalence) scenarios (Figure 6), but to also draw the contour showing how relative entropies change as prior probability of disease varies over the range from zero to one (Figure 7). This diagrammatic format requires no particular prior probabilities for calculation, only the ROC curve summary statistics. In the same way that the ROC curve relates to all predictors (by *sensitivity* and *specificity*) until a particular operational threshold is set, Figure 7 relates to all predictors (by relative entropies based on predictive values) until a particular prior probability value is specified. Maximum relative entropy points on the contour are calculable analytically in this format. Moreover, we can include the contours for predictors with different summary statistics. Figure 9 shows the contour

that includes the predictor based on Scenario C and the contour that includes the predictor based on Scenario E, in addition to the contour that includes predictors based on Scenarios A, B and D from Figure 7. In this diagrammatic format, we can easily see the difference between contours that include predictors with high performance (in terms of relative entropies) in a narrow range of applicability (in terms of prior probabilities) when compared with a contour that balances predictor performance with a wider range of applicability. Unless we wish to evaluate and/or compare particular scenarios—in which case, not unreasonably, estimates of the corresponding prior probability (disease prevalence) values are required—producing the contour plot (Figures 7 and 9) has no data requirements beyond those for producing the ROC curve.

Figures 8 and 9 include the contour showing the upper limit for performance of a binary predictor. This upper limit serves as a qualitative visual calibration of predictor performance, rather in the way that we look at an ROC curve in relation to the upper left-hand corner of the ROC plot (where *sensitivity* and *specificity* are both equal to one). The contour cuts the main diagonal of the plot at prior probability *p*ˆ*<sup>c</sup>* = 0.5, when ln(1/*p*ˆ*c*) = ln(2) = 0.693 nats (Figure 8). This is the amount of information required to be certain of a binary outcome when the prior probability is equal to 0.5. However, the amount of information required to be certain of an outcome is not of any great practical significance in crop protection decision making. Rather than seeking certainty, a realistic objective is to develop predictors that provide enough information to enable better decisions, on average, than would be made with reliance only on prior probabilities. Thus we need to be able to consider predictor performance in terms of predictive values.

Perhaps the most important instrument available to the developer of a binary predictor is the placement of the threshold on the risk score scale [2,3,6,8]. This determines a predictor's *sensitivity* and *specificity*, and consequently the likelihood ratios for + and − predictions. However, this does not guarantee predictor performance in terms of predictive values. ROC curve analysis and diagrammatic formats that characterize predictive values (or functions of them) are therefore complementary aspects of predictor evaluation and comparison. For example, the appropriate placement of the threshold on the risk score scale may be informed by knowledge of disease prevalence for the scenario in which the predictor is intended for application. This in turn affords an evaluation of likely performance—in terms of predictive values—for the predictor in operation. Sometimes, however, we may wish to compare predictors' likely performances—perhaps in a novel scenario—when we are simply a potential user of the predictors in question, having had no development input but with access to the predictors' ROC curve summary statistics. In both settings, the diagrammatic formats we have discussed have potential application. They lead to information graphs that are visually distinct but analytically linked. All give extra insight via the predictive values of disease forecasts.

**Author Contributions:** Conceptualization, G.H., J.R. and N.M.; Formal analysis, G.H., J.R. and N.M.; Methodology, G.H., J.R. and N.M.; Writing–original draft, G.H.; Writing–review & editing, J.R. and N.M. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
