*2.3. Statistical Analysis*

The statistical software R [22] and the package ggplot2 [23] were used for data analysis and the construction of graphics and plots. The code is available on GitHub (https://github.com/ HannesOberreiter/coloss\_honey\_bee\_colony\_losses\_austria and version 1.0 is archived [24]) under an open source MIT license. Calculation and estimation of loss rates and corresponding confidence intervals (CI) were computed with a quasibinominal generalized linear model (GZLM) link "logit" function [15]. These calculated estimates are plotted as boxed error bars, where the box represents the 95% CI.

The majority of analyses was done as a single factor with two possible different groups, i.e., yes and no questions. To identify significant differences, the confidence intervals between the variables (factors) were compared. If the confidence intervals did not overlap, we counted that as a significant difference. If the overlap was on a small margin and it was a comparison between two groups, the null deviance minus the residual deviance in the model was tested. If it significantly differed from zero, the analyzed factor in the model had a significant influence on colony survival (ANOVA with *χ*2 deviance, *p* < 0.05).

The analysis of young queens was only performed for participants with valid answers for the number of colonies and young queens at the onset of winter. The percentage was calculated with declared colonies and divided by young queens. The total young queen population was calculated with these valid answers by summarising all colonies and by dividing them by the total number of young queens going into the winter.

For the analysis of the combination of different varroa control methods, a usage histogram for each method was generated and grouped into spring, summer, and winter. After that, a vector of all applied methods with at least 15 answers and without drone brood removal was generated to minimize calculation time and to generate statistically relevant results. To calculate all possible combinations inside the given vector, the R function combn [25] was used to generate a matrix with up to a maximum of three different methods per row. Then, each row with less than 15 participants for the particular combination was dropped before calculating the loss rate as described before.

The plotted maps were made with R [22] and shapefiles (https://github.com/ginseng666/ GeoJSON-TopoJSON-Austria by Floo Perlot, latest commit 2017-11-06) under creative common licence. Aggregation of apiaries on the map was conducted with k-means cluster search method. To improve the accuracy in areas with lower density, the resulting clusters, which included only a single apiary or a low number of apiaries and high within cluster sum of squares, were removed and the original geolocation of the affected apiaries were used for plotting.
