2.2.2. Prediction Quality

The prediction quality can be evaluated using confidence intervals and error rates of models. Table 3 shows that for cross-validation, 97.6% (N = 4999) of sampled phoretic Varroa mites were in their CI95% with model A and 97.3% (N = 2328) with model B. These coverage rates are heterogeneous with respect to *Vpt*: they overestimate the targeted values (95%, 70%, or 50%) when *Vpt* ≤ 3, they are consistent when 3 < *Vpt* ≤ 10, and they are significantly lower than the targeted values when *Vpt* > 10, which roughly corresponds to only 5–10% of the hives. These results hold approximately for all tackled cases (crossvalidation and training validation; models A and B).


**Table 3.** Coverage rates of confidence intervals (CI95%, CI70%, and CI50%) of *Vpt* for both approaches, cross-validation and training validation, for models A and B. The coverage rate provides the proportion of times that the CI contains the true value of *Vpt*. For each method and each model, numbers of observed hives are reported for each class of *Vpt*.

> Predicted quantiles were used as an indicator of the accuracy of the prediction aimed by the model, i.e., the proportion of hives to be treated against Varroa. Predicting values by simulation may be seen as minimizing the risk of an incorrect prediction (the risk of unnecessarily increasing the number of hives to be treated) or may be necessary to more accurately target the correctly predicted value (the risk of ignoring a proportion of hives which should be treated and which will not be). For model B, outputs are based on the average Varroa load in April of 0.7 phoretic Varroa mites per 100 bees [30] (quoted *Vp*t−x) and the threshold of 3 phoretic Varroa mites per 100 bees at the beginning of summer [10] (quoted *Vpt*). The model indicates for each colony whether or not to treat (prediction that the threshold will exceed three Varroa mites). Figure 1 describes two extreme situations that correspond to two treatment strategies. The first two strategies, represented by Q97.5 and Q85, are no-risk situations because the model indicated that all colonies are to be treated, and thus no risks are taken of having a colony that exceeds the threshold of three. In these cases, the input costs are great, and 73% of colonies are unnecessarily treated. The second strategy (Q50) is an attempt to justify no treatment, and it estimates the respective risk; it provides reasons not to treat 71% of colonies at the risk of not treating the 24% of colonies that need treatment. This could be seen as the price to pay for engaging in a process of decreasing inputs. Intermediate quantiles allow beekeepers to find correct indicators based on calculated trade-offs. For example, considering indicators for Q72 (or Q71.5), 27% of colonies observed exceed the threshold of three; the model predicted to treat 11% when necessary (10% for Q71.5) and 17% when not necessary (16% for Q71.5). In these cases, there were as many colonies that were treated when not necessary (17%—in orange) as colonies untreated when necessary (16%—in red) for Q72, and the inverse occurred for Q71.5 (Figure 1).

> The first and third cases are the hives that are necessary to treat. The percentages of these four categories are provided for each level of risk.

> This figure is based partly on Table S4 of Supplementary Materials; Tables S4 and S5 show all results for models A and B of the two model evaluations (cross-validation and training validation). For both models, the smaller the quantile, the lower the global error rate. For larger quantiles (Q97.5 and Q85), models predicted better *Vpt* when the phoretic Varroa number exceeded the threshold of three Varroa mites at *t*. Model predictions of *Vpt* were relatively good when the earlier phoretic Varroa number was at three, the maximum. However, models failed to produce correct predictions when the mite number at *t-x* was higher than three for model A and higher than 0.7 for model B.

**Figure 1.** In this figure, 5 scenarios are presented with increasing risk (from left to right) taken by the beekeeper to not treat when the model predicts it was necessary or to treat when it was unnecessary. The risk is inversely proportional to the measure of quantile Q. For each level of risk, four cases are represented: (1) Hives with vp\_t\_x (i.e., *Vp* at *t* = 0) < = 0.7 and vp\_t (i.e., *Vp* three months later) > 3; (2) Hives with vp\_t\_x (i.e., *Vp* at *t* = 0) < = 0.7 and vp\_t (i.e., *Vp* three months later) <=3; (3) Hives with vp\_t\_x (i.e., *Vp* at *t* = 0) > 0.7 and vp\_t (i.e., *Vp* three months later) < = 3; (4) Hives with vp\_t\_x (i.e., *Vp* at *t* = 0) > 0.7 and vp\_t (i.e., *Vp* three months later) < = 3.
