Very generally, large effects of the environment will lower the repeatability and inhibit high correlations. Behavior and expression of resistance traits, but also varroa reproduction success [
15,
16] and drifting of mite infested bees between colonies [
17,
18], are strongly influenced by environmental factors. Data analyzed in this study was collected at 24 apiaries in three countries over 4 years, causing large environmental differences. The differences between countries, years and apiaries were eliminated as described in
Table 2, but the remaining environmental variation is still considerable and should be taken into account when interpreting low correlations.
4.1. Repeatability
The measurements of mite infestation on bees and of the behavioral traits were repeated, such that the repeatability could be studied. As the behavioral traits can be assumed to represent a constant genetic disposition, information on the impact of the environmental conditions and the technical robustness of the measurement can be learned. Mite measurements, however, monitor a dynamic system; thus, in addition to factors influencing the measurements itself, divergent development routes decrease the correlation between repeated readings, and subsequent measurements can be expected to correlate more with each other than those lying further apart. Arguably, mite infestations might also affect behavioral traits, which influence the repeatability of their measurements.
The mite infestation measurements BINF1-3 were highly repeatable, with correlation ranging from over 0.50 to 0.85. This is in general agreement with the high repeatability of BINF (0.85) reported by Büchler et al. [
4], although slightly lower. Regarding the repeatability of mite infestation measurements for breeding colonies registered in BeeBreed with at least three mite infestation measurements, the correlation of BINF1–BINF2 was 0.58 ± 0.01, of BINF1–BINF3 0.47 ± 0.01 and of BINF2–BINF3 0.45 ± 0.01 (Hoppe, unpublished data 2020). Thus, the repeatability of BINF in BeeBreed data is similar to that in our study, although the standards of data acquisition were lower in BeeBreed, e.g., the breeders can freely choose the frequency and time points of measurements.
Correlations between mite infestation readings late in the season (BINF4 and BINF5 in Austria) were considerably lower with factors from 0.1 to 0.2. This can be interpreted in a way that factors controlling the mite reproduction in summer, the time of strongest brood activity, are different from factors affecting mite infestation later in the year in a time of lesser brood activity. An important factor is seen in an increasing risk of mite transmission between colonies in the course of season [
19]. However, it has to be noted that this interpretation is taken with caution because it is based on few colonies.
As expected, the subsequent BINF measurements are more highly correlated than nonadjacent measurements, with few exceptions. One exception is BINF4 in the Austrian data, where again the small number of observations is to be considered. The other exception is BINF3 in Croatia, where it has to be noted that the correlations are much higher than for the other countries, and the variance of the measurement itself may be more relevant.
For the repeatability of measurement of brood hygiene with the pin test, we found a large difference depending on which cells were counted. Counting only fully cleared cells (PINem), the correlation was between 0.1 and 0.3, while counting all cells that were at least opened (PINop), the correlations were considerably higher (between 0.3 and 0.4). We interpret this as a higher reliability of PINop, and those findings have prompted the German breeder association “Arbeitsgemeinschaft Toleranzzucht—AGT” to replace PINem with PINop for its test protocol [
20]. The repeatability for different methods were also compared by Hoffmann [
21]. PINem had the highest repeatability, with 0.54, while the repeatability for cells with artificial mite infestations was 0.38, and the repeatability for freeze-killed brood just 0.10. In a continuation study, a similar repeatability of 0.55 was found in a genetically diverse Carnica population of 69 colonies [
21]. Boecking et al. [
22] reported a repeatability of 0.46.
Interestingly, the repeatability of PINem decreased over time, which can be interpreted as the effect of selection for hygienic behavior, where the time needed to remove 50% of the treated cells decreased on average from about 16 h in 1994 to about 8 h in recent years. While in 1994 the repeatability was reported as 0.54 [
18] for colonies at the institute in Kirchhain (part of the AGT-program), in the continuation program it was 0.28 (Büchler, unpublished data 2009), and here it was only 0.20.
The repeatability of recapping rates, measured in Croatia, ranged from 0.06 to 0.95 and was larger than 0.3 in most cases. For RECinf the correlations were generally higher than for RECall, thus, the results supported the hypothesis that RECinf is the more reliable trait. However, this result should be considered with caution as fewer cells were investigated and fewer colonies were assessed, as also indicated by the reported standard errors. The repeatability of SMR was very low and, considering the standard errors, not essentially different from zero. The correlation between SMR1 and SMR3, with a very small number of observations, was an exception to this rule. Good repeatability of REC in comparison with SMR might be explained by the fact that REC represents a behavior of the workers directly, while SMR is a more indirect measurement as it is influenced by a variety of causes such as the social hygienic behavior of the workers, properties of the brood, recapping, as well as properties of the mites. The reason for the very low repeatability of SMR might also be an insufficient sample size. In our study, the average sample size was 24 single infested cells per colony. According to a recent study by Mondet et al. [
23], the real SMR values then could deviate more than 20% up and down from the observed scores. At least 100 single infested cells would be needed to score SMR with less than 12% deviation up and down. However, an analysis of such large numbers would require a significant amount of time and would not be realistic for field performance tests. Data from 50 to 60 MiniPlus colonies repeatedly tested each year at the institute in Kirchhain revealed repeatability values of 0.35 to 0.70 for RECall, and 0.01 to 0.09 for SMR (Büchler, unpublished data 2019). This seems well in line with the findings of this study, even though these colonies were artificially infested with mites, while in the present study we worked with natural infestation levels. We are not aware of published data on the repeatability of REC or SMR thus far, except a recent publication [
24] where the estimated repeatability for SMR was 0.43 ± 0.11 when readings were only 10 days apart, and 0.17 ± 0.09 when they were separated by a longer time and spread over the season. The latter estimate is close to ours, both in terms of timing as in the level of the repeatability. It should be noted that the genetic background of the colonies in this recent study was very diverse (Eynard, S.E., personal communication), which probably increases the level of repeatability as compared to our study and has limited value in the context of performance test and a selection program.
There are heritability estimates in the literature, however, where repeatability reflects the upper limit of heritability. Heritability for SMR varied between 0.06 ± 0.48 and 0.46 ± 0.59 [
25] while higher heritability values were found [
2,
26]. According to Harbo and Harris [
27], SMR was closely linked with VSH. Villa et al. [
28] measured the change in brood infestation during one week after introducing infested combs either into colonies selected for VSH or into unselected control colonies. They found a much higher repeatability in the group that was selected for VSH. If a similar effect holds for SMR, on top of sample size, the low repeatability in the Croatian dataset may be due to the low average SMR level of 26%. Note that in the other two populations in this experiment the average level of SMR was only slightly higher.
With repeatability values in the order of 0.15 to 0.35 for PIN and REC, the testing methods used in this study identify the pin test and recapping rates as stable properties that can be reproduced within the test season. In general, to increase the usefulness of these traits for performance testing, repeated measures are recommended [
11], as these increase the accuracy of the breeding value estimation. In contrast, SMR cannot be reproduced and each reading must be considered as a one-time assessment. This argument for SMR seems supported with high repeatability for short intervals and low repeatability for longer intervals [
24].
4.2. Correlations between Different Mite Infestation Traits and Design of Performance Test
We discuss the correlations between different mite infestation traits and the design of the performance test in the same paragraph, because the understanding of these correlations and the design of the test share similar arguments.
First of all, the correlation between traits based upon the same reading must be distinguished from traits without common reading. For instance, MPG and BINFa both are calculated from BINF and NMF, and therefore, the high correlation found is expected. The same is true for b3 and b5, which are also highly correlated. Other trait comparisons have a partial overlap, for instance MPG and BINFa with b3 and b5. The correlations found reflect this fact. Secondly, BINF1 … BINF5 and BRINF are absolute parameters describing an infestation, while MPG, BINFa, b3 and b5 are relative parameters indicating an infestation growth.
BRINF is independent from the other infestation traits; thus, the relatively high correlation to MPG (0.35) and BINFa (0.41) is particularly meaningful and indicates a close connection of mites found in brood (an absolute parameter) and infestation growth on bees. We can observe major differences between the countries here—while in Germany the correlations between BRINF and BINFa are very high (0.6), in Austria and Croatia they are low (0.13, 0.16). This must be seen in the context that the average mite infestation levels in Germany were intentionally higher than in Austria and Croatia.
As emphasized by Guichard et al. [
6], drifting is one of the main challenges in measuring mite infestation development. Pfeiffer and Crailsheim [
29] estimated 13–42% alien bees in neighboring colonies depending on their positions in the apiary and the season. Similar results were reported by Jay [
30], who found drifting rates between 11.5–24.7% within 7 days and 24.4–40.5% within 21 days after brood emergence. Therefore, to optimize testing for mite development, much attention must be paid to the design of apiaries where performance testing is carried out. Colony arrangement in squares with the entrances facing in different cardinal directions reduced drifting compared to arranging colonies in rows; moreover, colored entrance boards also had a positive effect [
30]. While this should be regarded as good practice in common test apiaries [
11], even longer distances (e.g., 70 m) between the hives showed an additional benefit [
17]. Such distances might be impossible to realize when at the same time a minimum number of colonies needs to be kept under comparable environmental conditions, as required to separate genetic and environmental effects.
With regard to mite invasion, a clear seasonal pattern with low mite invasion in spring, but high values in summer until autumn was found [
19]. A tendency of highly infested workers to enter other colonies was considered, which might result in an equalization or even inverse infestation rate of colonies with different levels of mite resistance.
A significant effect of the infestation level on invasion rate of mites might also explain why we found a negative correlation of BINFa and b3 on the untreated and more highly infested colonies in Germany, while there is a positive correlation measured in the lower infested test populations in Austria and Croatia (
Supplementary Table S3).
Perhaps in practice the seriousness of drifting might be quantified by the repeatability of BINF. In our study, spanning a period of 6 weeks, drifting might explain the somewhat lower repeatability of BINF in Germany as compared to Croatia and Austria and underlined the importance of repeated measurement.
A longer period of undisturbed infestation development is useful to identify differences between the colonies. With increasing infestation levels, however, and especially in later summer to autumn, an increasing transfer of mites within the apiary has to be taken into account. It might therefore be useful to start bee sampling for mite infestation as soon as a minimum bee infestation is noticed in most of the colonies (e.g., 1% infestation level). If more measurements can be taken, they should better be taken in an early phase, because the later measurements are more disturbed by secondary effects, indicated by the low correlation of BINF4 and BINF5 to the earlier measurements. A larger number of samplings also increase the reliability of a growth factor (comparable to b5 for Austria in our study), which is an alternative to BINFa.
A third infestation parameter, BRINF, requires more effort to measure, but might be less affected by external factors and could be more useful than expected to date. Additional data will be required to establish its usefulness as a new parameter.
4.3. Correlations between Behavioral Traits
Correlations between the pairs of related behavioral traits, PINem with PINop and RECall with RECinf, are high, while the correlations between behavioral traits of different types are lower but, in the combined data, essentially positive (
Table 7). A significant positive correlation is found between PINop and RECinf which could indicate the underlying behavioral trait is more closely related than for other trait pairs, e.g., PINem to RECall. Indeed, the critical element of the removal of a damaged larva is the recognition indicated by starting to remove the brood cap [
31]. Thus, the relatively close connection of the initiation of the brood cap removal in the pin test (PINop) and the selective recapping (RECinf) is not surprising.
Out of measured behavioral traits (PINem, PINop, RECall and RECinf), RECinf is the one with the highest correlation to SMR, meaning that higher rate of recapping mite infested cells increases the proportion of non-reproducing mites (
Table 7). Novel studies indeed identified recapping of brood as a key resistance mechanism in several populations [
32,
33]. Presence of Varroa mites in a brood cell elicits hygienic removal of infested cells by workers [
34,
35] and REC and VSH, as different expressions of hygienic behavior, are closely linked to each other. However, it is reported that opening the cell has a crucial role as both hygienic and non-hygienic bees are equally capable to recognize and remove dead or diseased brood once opened [
36]
In conclusion, the different types of behavioral traits (pin test, recapping, SMR) are all connected with each other, albeit with a relatively low correlation. Thus, the range of different mechanisms of brood hygiene that honeybee colonies express may contribute more or less independently to the overall resistance of colonies. A similar connection is found between removal of Varroa infested brood (VSH) and hygienic removal of dead brood either freeze-killed [
37] or pin killed [
22].
4.4. Relationships between Behavioral Parameters and Mite Infestation
With the exception of RECall, the behavioral traits are negatively correlated with the mite infestation traits as expected based on the hypothesis that stronger hygienic behavior and suppressed mite reproduction reduces mite population growth, and subsequently, infestation. The correlations were only between −0.1 and −0.2; however, they were significantly different from zero. The highest negative correlations for all countries combined were found for SMR and RECinf with BRINF, and PINop with BINFa. The results differed considerably between the countries, however. For instance, the high negative correlations between BRINF and both RECinf and SMR in the Austrian dataset was not found in the Croatian dataset, where it is not significantly different from zero, while in the German dataset it is significantly negative, but with a much lower correlation. Additionally, in the Croatian dataset, there is a highly negative correlation between PINop and BRINF (−0.37), which was not found in the German dataset. Our findings indicate that these correlations are based on causal connections depending on specific conditions such as the average infestation levels and the timing of sample collection.
This may also explain inhomogeneous findings on the relevance of hygiene behavior for mite infestation development in the published literature. Negative correlations of PINem and PINop with infestation parameters in this research suggest a negative impact of hygienic behavior on mites, especially on the newly introduced mite population growth parameter BINFa. While Ibrahim et al. [
38] found significant correlations between hygiene behavior and BRINF and, to a lower extent, BINF, several other studies did not observer such correlations [
39,
40,
41,
42]. Comparing the correlations of PINem and PINop to the infestation traits in our study, it can be concluded that PINop is the more promising behavioral trait to predict suppressed mite population growth. However, up to now, all other studies used the proportion of totally removed cells to estimate the hygiene behavior, either for the pin or frozen brood assay.
It seems that hygienic behavior is not a good indicator of possible resistance traits in unselected populations [
43]. Although highly hygienic colonies may slow down mite population growth significantly [
44], the proportion of such colonies in average populations seems to be small. Difficulties in associating hygienic behavior with mite infestation may also arise from the fact that bees selectively remove brood infested with mites carrying DWV, while mites with low viral loads could be neglected [
45].
Recent experiments recognized recapping of brood cells as a key mechanism of resistance in surviving populations [
32,
33]. Substantial correlations of REC with BINF and BRINF were found by Villegas and Villa [
46]. It is important to state from our study that recapping of infested cells (RECinf) is obviously the more relevant trait, as it is independent of the infestation level, while recapping of all cells (RECall) is not. Even more, the significant positive correlation between RECall and BRINF in Croatia, and BRINF and BINFa in Germany indicates that indiscriminate opening of brood cells may be contra productive such that resistance of colonies depends on a highly specific identification and recapping of Varroa-infested cells.
We calculated phenotypic correlations of breeding colonies registered in BeeBreed for which mite infestation and mite fall as well as PIN was measured. The correlation between PIN and BINFa was −0.06 ± 0.01, and adjusted for the effect of season × apiary, it was −0.08 ± 0.01. The correlation between PIN and MPG was −0.02 ± 0.01 and −0.07 ± 0.01, respectively. Thus, the correlations are similarly low, while being significantly different from zero due to the large number of observations.
A suppression of mite reproduction (SMR) is often seen as a crucial indicator of mite resistance [
2,
27], and several studies reported negative correlations between SMR and mite infestation [
26,
47,
48,
49]. However, in an analysis of SMR in 13 European countries on nine different genotypes, correlations between SMR and brood infestation were found to be not significantly different from zero [
23]. Similarly, no significant correlations between SMR and mite infestation after two generations of bi-directional selection for mite population growth were found [
16]. Harris et al. [
15] described that in non-resistant stocks, behavioral traits might explain just a small part of the mite infestation variability. This might contribute to the low correlation coefficients in our study. With 26.3% in Croatia to 32.8% in Germany, the average levels of SMR are much lower than those reported for resistant populations in Gotland (Sweden) or Avignon (France) [
50].
4.5. Breeding Objective
The performance test of a honeybee colony establishes the phenotype with respect to traits that represent the overall breeding objective that includes honey yield, gentleness, calmness, low swarming drive, disease resistance and overwintering strength. In this discussion, we focus on the Varroa resistance, for which mite infestation is included in the breeding objective.
Varroa resistance has several aspects, including long-term survival under Varroa infestation pressure, honey yield and absence of other diseases even with Varroa infestation, as well as sustainable reduction of Varroa mite population. However, the scope of these aspects is too complex to capture and simpler breeding objectives are needed to represent Varroa resistance as well as possible. Reduction of mite population is clearly the most accessible of them. Parameters such as BINFa, MPG, b3, b5, BRINF or MFOA represent this breeding objective, while behavioral parameters such as SMR, REC and PIN may contribute as indirect selection parameters.
Concerning the mite infestation traits BINFa and MPG, our results suggested that BINFa should be preferred over MPG because of higher repeatability and higher correlations to behavioral traits. MFOA was included in our study as it might become an interesting parameter once beekeepers stop winter treatments and instead apply efficient mite reduction by summer brood interruption [
51]. Finally, decisions on this issue strongly depend upon levels of heritability and genetic correlations but also upon the ease of measurement, as a balance needs to be found as to which combination of measurements best captures Varroa resistance, given limited resources such as time and money.
4.6. Genetic Parameters and Response to Indirect Selection
Repeatability of the measured parameters and consistent correlations among different parameters show a general suitability for selection. However, for a strategy of sustained selection progress, heritability, genetic correlations and reliable models for breeding values have to be determined. For a trait with high heritability, it is relatively easy to achieve a selection progress, while for a trait with lower heritability a larger population and more consistent testing is needed. The precision of a breeding value model increases when both worker and queen effects are considered, which are mostly negatively correlated with each other. The higher the negative correlation, the more important it is to select on both effects.
For pin test, several estimations of heritability have been reported, most recently by Hoppe et al. (2020, submitted) as 0.52. For SMR, the reported heritability was up to 0.46 [
25]. Thus, the heritability of behavioral traits can be very high. For mite population growth, low heritability has been reported, e.g., 0.05 (Hoppe et al., 2020, submitted). Thus, in honeybee breeding for Varroa resistance we face the choice from parameters that are easy to measure, but provide a low contribution to the breeding objective (because of low heritability or low genetic correlation with objective traits), and traits that are tedious to measure, but contribute more (because of high heritability or high genetic correlation). As an illustration of the relevance of heritability estimates and genetic correlations, we discuss the value of indirect selection for a breeding-objective trait, selecting for another trait, in that way getting an impression of its possible contribution. For this, genetic parameters are essential. The response of a trait
selection of the trait
can be written as [
52]:
where
is the correlated response to selection in
when selecting for
, and
is the response to selection in
when selecting for
itself. Furthermore,
stands for the additive genetic correlation between
and
,
for intensity of selection and
for the square root of heritability. In our case,
included BINFa and b3, which can be measured practically on very many colonies, while traits such as SMR and REC are usually measured on fewer colonies such that their intensity of selection is substantially smaller than for BINFa and b3. For PIN, however, we can assume that both intensities of selection are equal such that we only need estimates of the heritability values for BINFa, b3 and PIN, and estimates of
. Recently, Hoppe et al. (2020, submitted) estimated the genetic parameters for the combination of worker and queen effect [
53] for the main Carnica population within BeeBreed as
and
, while
. The sign is in the expected and desired direction: the higher the PINem, the lower the BINFa. Taking these values
, such that indirect selection for PINem is more effective than direct selection for BINFa. To complete an analysis like this, the issue actually is not selecting for either BINFa or PINem, but selecting for both. The repeatability for BINF of 0.55 in our German dataset and the heritability of 0.05 suggests that most part of the repeatability is due to permanent environmental effect and not due to additive genetic effect. Indirect selection for BINFa through SMR only will be beneficial when SMR has a substantial heritability and a sufficiently high genetic correlation. The very low repeatability we found for SMR does not exclude a substantial heritability, but it would imply that the genetic correlation between repeated measurements is close to zero, in line with the idea of a one-time measurement. Additionally, note that Equation (7) holds for selection on single phenotypes, while in practice, information on relatives is used, and the accuracy of selection is no longer
, but larger, and the accuracy of traits are more similar in size than their heritability. These exercises are useful, however, to decide in which activity to invest given limited resources.
These findings underline the importance of estimating genetic parameters when considering the value of traits for selection. For SMR and REC as yet there are no reliable estimates of genetic parameters. Extensive selection work of Arista Bee Research (aristabeeresearch.org) with several breeds and a project funded by the German Bundesanstalt für Ernährung und Landwirtschaft (
https://service.ble.de/ptdb/index2.php?detail_id=2103579&site_key=293&stichw=SMR&zeilenzahl_zaehler=2#newContent) may provide the necessary information to judge the value of these traits for selection in the near future.
It should be emphasized that it is important to study genetic parameters for specific populations even though this is a serious problem for small datasets. Perhaps the large Carnica main population in BeeBreed (with 10,000 records added annually) may serve as a reference population and it may be possible to judge whether the genetic parameters in a specific population differ significantly from kind of a consensus value from such a reference population. As an example, it proved not possible to detect additive genetic variance for MPG in Swiss Carnica and Mellifera in datasets of about 1000 records each [
54]. The issue is whether this means that its heritability is zero or that, considering the fairly small dataset, a consensus value would be better to use for practical purposes.