*4.1. Data Sampling*

Data were collected from 310 colonies from 2014 to 2016 in three regions of France (PACA, AURA, and Occitanie; "dataset1") and from 720 colonies in 2018 in three regions of France (PACA, Nouvelle Aquitaine, and Centre; "dataset2"). Most of the colonies were kept on 10-frame Dadant hives and contained hybrid *Apis mellifera* L. queens. Colonies belonged to commercial beekeepers and thus displayed different sizes, dynamics, and management styles, which allowed us to take into account the variability which exists between beekeepers and apiaries. No treatment against Varroa was applied during the sampling periods.

At each sampling point, the amount of capped brood (noted *Cb*) was determined according to the ColEval method [37], and the phoretic mite load was estimated by sam-

pling around three hundred bees (or 45 g) from a frame containing an uncapped brood. Sampled bees were washed with a detergent solution and the number of Varroa mites retrieved (noted *Vp*) was counted [38]. Finally, to take into account seasonality, a "date" variable (noted *D*) was also created in which days were reported on a perpetual calendar with day 1 starting on 15 March of each year. This variable described the number of days ran from an initial time, which corresponds to the beginning of the measurable increase in the Varroa population after wintering. In our case, it corresponded approximately to the middle of March.

Sampling points were repeated at 30-day intervals, except for apiaries R16 to R18 ("dataset1"), in which measurements were sometimes performed every 12 days to mimic the generation interval of capped broods.

#### *4.2. Statistical Methods*

### 4.2.1. Distribution Adjustment on "dataset1"

All statistics were performed using the statistical software R version 3.3.0 [39]. Estimation of model parameters was carried out using the "gamlss" function of the eponymous package (Rigby and Stasinopoulos, 2005). The response variable (number of observed Varroa mites per 100 bees) was modeled with a generalized additive model for location, scale, and shape (GAMLSS). GAMLSS is an extension of the generalized linear model and the generalized additive model. It is a distribution-based approach to semiparametric regression models, in which all the parameters of the assumed distribution for the response can be modeled as additive functions of the explanatory variables, such as the *location* (e.g., *mean* μ), the *scale* (e.g., variance σ2), the *shape* (skewness and kurtosis), and some *inflation* (e.g., at zero, ν). Moreover, we chose to use GAMLSS because it offers numerous choices for the distribution of the response variable and is suitable for time series data (Rigby and Stasinopoulos, 2001). GAMLSS was fitted to data using maximum (penalized) likelihood estimation implemented with the RS algorithm, which does not require accurate starting values for μ, σ, and ν to ensure convergence in comparison with the CG algorithm [40,41]. The most parsimonious model with the lowest corrected Akaike's information criterion (AICc) [42], was selected; models with differences in AICc values lower than or equal to two were considered to be equivalent. We chose this selection criterion because, it is the most suitable criterion to model selection in predictive models for ecology and time series applications including forecasting [43]. Thus, it allows for the selection of the model that will best predict the response variable, i.e., the model with the best predictive accuracy.

Variables, which were described above, were transformed as follows to comply with the scaling conditions during model fitting:

$$\mathbb{C}b = \frac{\mathbb{C}b \mathbb{O}}{100} \tag{1}$$

$$Vp = \frac{Vp0 \ast 100 \ast 0.14}{sw} \tag{2}$$

$$Vb = \log\left(\frac{Vp}{Cb + 130} \* 100 + 1\right) \* 50\tag{3}$$

where *Cb* (Equation (1)) is a scaled value of the number of capped brood cells *Cb*0; *Vp* (Equation (2)) is the normalized rational number of Varroa mites for 100 honey bees (called "phoretic Varroa" in the present study), knowing that the weight per bee is 0.14 g, and *sw* in Equation (2) is the sampling weight of bees; *Vb* is a variable called "varbrood", built to take into account the role of the amount of brood in the regulation of Varroa reproduction, and, more specifically, to integrate the fact that the more spread out the capped brood, the harder it is to capture phoretic Varroa mites hidden in the capped brood. The varbrood variable was thus obtained by taking the Neperian logarithm of the number of phoretic Varroa and dividing it by the number of capped brood cells. In Equation (3), 130 corresponds to the *Cb* median, 100 and 50 multipliers are necessary for the scale, and +1 is used to

avoid obtaining log(0). These three quantitative variables were mathematically reduced to the same scale, in order to be able to compare their respective weights during model adjustment. The date (measured as a number of days after the first measurement) was used without transformation.

The rational number of phoretic Varroa mites present at *t* (*Vpt)* was modeled in the GAMLSS framework by a zero-inflated beta distribution with mean μ, standard deviation σ, and inflation at zero ν. Different specifications for μ, σ, and ν were used (see Results section). Our models were designed to predict *Vpt* from explanatory variables typically collected at time *t*−*x*. Two horizons of prediction *x* were considered: a short-term horizon (*x* = 1 month, noted model A hereafter) and a long-term horizon (*x* = 3 months, noted model B hereafter). For *x* = 1 (model A), all data were used to fit the models (867 observations), whereas for *x* = 3 (model B), all the data providing this interval were used to avoid the use of time-overlapping pairs of observations (93 observations). Phoretic Varroa numbers, capped brood cell numbers, and varbrood present at *t*−*x*, as well as the date at *t*, were exploited as fixed factors; they are denoted by *Vp*t−x, *Cb*t−x, *Vb*t−x, and *D*t, respectively. Moreover, an «apiary» factor (noted *Ap*) was used as a random factor and includes the variability of the apiary, beekeeping management strategy, and year and region effects.
