**3. Results**

#### *3.1. Possible Approaches to Achieve Standardisation or Characterisation*

Several approaches to achieve either standardisation or characterisation were considered, outlined in Table 1. The advantages and disadvantages of each approach were discussed before a consensus approach was developed (Section 3.2).

#### 3.1.1. Distribute the Same Well-Characterised Resistant and Susceptible Strains to All Test Facilities

Where bioassays need to be carried out at multiple facilities, one approach could be distributing a suitable resistant and susceptible strain to all facilities. Robust characterisation in the originating centre, and suitable quality control measures in receiving facilities, should remove strain differences as a variable in the assay. Strains could be maintained under the same selection and profiling regime, refreshed from a single facility if results of regular profiling start to differ, or refreshed every few generations from a single facility.

There are some practical limitations to this approach. The nature of a suitable strain may differ depending on the second AI under evaluation, and so this exercise may need to be repeated in parallel for each dual-AI ITN in a study. There may also be little benefit in terms of the workload of this approach over others. In the longer term, there are benefits to building the capacity of facilities to establish, maintain and characterise local strains. Regardless, this is the most straightforward approach to standardisation, provided the maintenance of the strain could be standardised between facilities.

However, there are two insurmountable issues. Firstly, it is unlikely that the same strain distributed and maintained in different facilities will remain static and comparable regarding its resistance profile. Even when under continued selective pressure, resistant phenotypes can shift over time [19,31,32], and there is likely to be a change in the resistance profile of strains associated with different genetic bottlenecks when moving between facilities, both as a result of small founding populations and as colonies adapt to their new environment. Even when rearing and selection are done under the same laboratory conditions, potential supplementary factors which are gaining more attention, such as the mosquito microbiome, may differ between insectaries and affect tolerance to insecticides [33–36]. Genetic drift will occur over time, even in well-controlled rearing facilities like the Malaria Research and Reference Reagent Resource Centre (MR4) at BEI Resources [37]. Changes in resistance can occur quite quickly [38]. So important differences may be missed before testing if the strain was not refreshed or re-characterised regularly to confirm that the resistance phenotype was still as expected.


**Table 1.** Possible approaches to achieve sufficient standardisation or characterisation of pyrethroid-resistant mosquitoes used for bioefficacy


Second, and perhaps most importantly, because of the biohazard risk inherent in transferring resistant mosquitoes between geographical regions, there are strong reservations concerning this approach. There would need to be strict containment and quality control measures in place in all receiving facilities, but even then, there is a major ethical consideration in moving a strain that is potentially more resistant than the wild populations surrounding the second test facility. The MR4, for example, will perform case-by-case hazard assessments before distributing Anopheles strains and would not distribute strains where there is a risk of laboratory/insectary escape and potential for introduction establishment of a novel resistant population in a new environment [37]. In some situations, relevant parties may accept the idea, but the containment measures needed to make this approach safe may be too expensive or not feasible in practice. Alternatively, national, local or facility decision-makers may refuse to take on this responsibility and receive the mosquito strains. For these reasons, this is not a practical approach.

3.1.2. Each Testing Facility Uses Its Own Characterised Resistant Strain with a Single Standardised Protocol

Before a trial begins, the bioassay methodology used for bioefficacy testing, for example, as part of durability monitoring, should be optimised and validated using new and twenty times washed dual-AI ITNs with a susceptible strain (e.g., Kisumu). An additional validation step could be added with a range of different well-characterised pyrethroidresistant strains to demonstrate that the method is not sensitive to differences in resistance mechanisms or population differences. In this context, a well-characterised strain would be one for which the phenotypic resistance profile was known, ideally with some understanding of the target site mutations, level of expression of detoxifying enzymes and other known mechanisms. Most test facilities that perform durability monitoring already hold pyrethroid-resistant strains that take some effort to characterise. They will often maintain them under selective pressure to preserve the resistant phenotype. There is a growing desire in the community to increase the capacity of local institutions, so this will increasingly be the case. Therefore, a pragmatic approach to standardisation could be that each facility uses a characterised local strain and relies on the testing methodology's robustness to give consistent results between facilities. This might be an attractive option to National Malaria Control Programmes (NMCPs) that would like to see data against mosquitoes that are closest in phenotype and genotype to the local mosquitoes that are responsible for malaria transmission.

If this approach were to be adopted as a way to compare results between sites, the method would need to be tested against a sufficient number of genetically heterogeneous strains, which would need to be sufficiently different for the validation to provide convincing evidence that the results would be comparable no matter what strain was used. There is some precedent. The WHO's VCAG have suggested three strains be used to screen for cross-resistance [10]. Phase II efficacy trials of ITNs require testing in an area with mosquitoes susceptible to all compounds in the ITN under evaluation, followed by testing in an area with pyrethroid-resistant populations [39]. However, it is unclear whether testing a single sample of nets against three resistant strains would provide sufficient evidence. This would be a significant burden in the efforts to validate a method. Assuming that containment facilities are not available, a single test facility is unlikely to have strains covering a broad geographical range. It is also unlikely that all resistance mechanisms will be represented by the strains the facilities maintain, thus necessitating a multi-centre validation approach.

This solution assumes that the testing methodology is sufficiently robust and specific enough and that the pyrethroid resistance is sufficiently high in the strains used that it is truly a test of the efficacy of the second AI alone. Since evidence for differences in resistance levels within the class is weak [40], characterising resistance to one example pyrethroid, or perhaps one representative Type I and one Type II pyrethroid, would be sufficient. However, even with very resistant strains, some individuals are usually killed by exposure to pyrethroids, and mortality can vary substantially within, and between, bioassays [40]. So, some measure of the additional impact of the second AI is likely to be still needed, for example, a comparison to a pyrethroid-only ITN.

The group did not have confidence in comparability between data collected at different facilities with different strains. A bioassay is not likely to be validated sufficiently to give the same results, no matter the strain used for testing, because, for the following reasons, characterisation of strains will not be perfect: not all resistance mechanisms have been identified; those contributing most to resistance are not well understood; and not all markers are routinely screened for in all test facilities. There is evidence of this challenge in efforts by the WHO to set DCs for AIs by testing compounds against multiple strains of the same species and selecting a suitable dose based on the consensus of data [41]. The consensus opinion was that, although this is a pragmatic solution, the use of different strains with different resistance mechanisms and rearing methods are unlikely to give consistent results between test facilities or across time points, and so this was not the preferred option.

3.1.3. Characterisation of the Resistant Strain in Parallel to the Durability Monitoring Testing

The resistance phenotype of mosquitoes used for bioefficacy testing of dual-AI ITNs could be characterised by the following to ensure that they are suitable to effectively provide the information needed: sufficient resistance to pyrethroids, such that a high enough proportion survive exposure to the pyrethroid that the effects of the second AI can be measured, and susceptibility to the second AI. WHO tube bioassays to assess the susceptibility of the proposed strain to the WHO DC [42] of the pyrethroid, as well as the second AI included on the dual-AI ITN under evaluation, where a DC and method for evaluation are available, would be appropriate for this purpose; a straightforward and familiar method. Resistance intensity or dose-response assays with the AIs of interest would provide some quantitative information to help in defining a strain. A clear definition of a strain suitable for use in testing would be required, and the rejection criteria would need to offer a balance between pragmatism and the need for robust results.

Further characterisation could be done to further understand the strain and aid in the interpretation of results. This would require clear guidance on interpreting the bioassay results in the context of the strain characterisation. These could include, for example, DC assays with examples of type I and type II pyrethroids. All locally used insecticide classes in use for mosquito control could more fully characterise the strain's resistance profile. Testing for the presence of molecular markers associated with insecticide resistance would be informative, if the most informative or relevant molecular markers could be determined. This may not be practical on a routine basis, but strains held for bioefficacy testing would ideally be regularly screened for key molecular markers to provide a background understanding of the resistance profile of a strain and interpretation of data, e.g., response to PBO. In order to predict the efficacy of the product under evaluation, it would be helpful to confirm that a strain possessed key resistance mechanisms, against which the product under evaluation claims efficacy, and susceptibility to the second AI.

Beyond the resistance phenotype, there are multiple sources of variability between bioefficacy tests related to the mosquitoes used. Biological factors can affect observed levels of insecticide resistance, which may lead to differences between cohorts of mosquitoes from the same strain. For example, size [27], nutritional status [27,43,44], the temperature during rearing [28,45], and age [46,47], can all have an effect on mosquito fitness, and conditions during testing affect the results of bioassays [46] and references therein. Routine quality control and use of rearing SOPs (e.g., [19,48,49]) would be a robust method of ensuring that suitable mosquitoes are used throughout the study and across facilities and would ideally include fitness testing as a measure of the consistency of rearing methods and quality of the adults produced. When maintained in the absence of selective pressure, or selective pressure only from a single insecticide, resistance phenotypes and genotypes can shift in a laboratory colony over time [19]; regular selection for insecticide resistance should form part of a programme of quality control in maintaining a resistant strain of mosquitoes. If a strain is intended for pyrethroid testing, it should be selected using a pyrethroid only insecticide, whereas a multi-resistant strain could be periodically selected with different insecticide classes, though this would be a significant undertaking.

While good rearing and testing procedures minimise most sources of variation [50], it would be informative to include some fitness testing (for example, wing length, average weight) of a sample of the cohort of mosquitoes used for bioefficacy testing, or, if possible, a sample from individuals which were killed and from those that survived characterisation or durability monitoring bioassays. A sample of each cohort of test mosquitoes could be stored for future analysis—for example, detailed characterisation of resistance mechanisms if results from one facility, or one-time point, varied from the others. This may be straightforward to compare changes in target site allele frequencies (e.g., kdr), but it may be more challenging for changes in metabolic gene expression, where defining a threshold of fold change is required, which would mean two populations were no longer comparable. Snap freezing at −80 ◦C would be ideal, so that relatively high yields of DNA and RNA can be analysed from stored samples, but even storage of dried individuals on silica would also be suitable for some further analyses.

If this approach were taken, differences in the resistance profile of strains used by different test facilities would still exist, and the strain or strains used may change in resistance phenotype during the study, but the robust characterisation and quality control should help to ensure that the key (known) parameters are similar enough, and that differences can be taken into account when interpreting data. This approach was selected for further development into the final recommended protocol. However, it was agreed that there would need to be a balance between the benefits of data robustness and the ability to reliably interpret results and compare across studies, and costs of additional workload required for extra bioassays and the ease of access to molecular characterisation.

3.1.4. Sample and Rear Wild Resistant Populations for Each Round of Testing and Save Samples for Characterisation

Where test facilities do not have access to a well-characterised resistant strain, or where issues, such as colony collapse or loss of resistance, result in non-availability of a suitable strain, a pragmatic alternative approach, sometimes employed, is to collect and rear wild resistant field populations for bioefficacy testing. Since wild-caught mosquitoes are likely to demonstrate large variability in the level of resistance and general robustness between collections, a cohort could be stored from each testing point for molecular characterisation. If sufficient material was available then phenotypic resistance and measures of fitness, including wing length, could be measured in parallel. While this approach still requires the team to have the capacity to maintain and characterise strains, less long-term commitment of resources may be needed, compared to holding strains over the whole course of the study. In some settings it may be very challenging to establish stable resistant colonies and using material maintained in the insectary for a generation or two to complete a study might be more practical. The biosafety concerns of transporting resistant mosquitoes between facilities can be avoided using local strains.

However, there would be a concern, especially when using F1s, that testing is a mix of different species; this could complicate interpretation of results, power calculations, and assay replicate requirements. The storage of samples for later analysis would also help with this element of characterising the testing cohort of mosquitoes. If a colony could be established for later testing points in the study the strain could then be screened regularly and become more well-characterised.

For some purposes, using field-collected, or recently established, colonies of mosquitoes may be desirable. For example, it may be more predictive of field performance of an ITN than using established laboratory strains, since mosquito populations at different geographical sites may differ in their susceptibility to a given product [51], owing to the different resistance mechanisms they express, and potential for variability in levels of resistance across seasons [47]. Additional information would also be gained about the predicted

ongoing efficacy of the nets locally by using locally-collected mosquitoes for durability monitoring, which possess field-relevant mixtures of resistance phenotypes. This may be important for NMCPs when making ITN procurement decisions, though this may not be the case if recently caught wild mosquitoes are being mixed in culture with previously colonised wild mosquitoes. However, bioefficacy testing for ITN durability monitoring requires capacity to detect a change over time, so reproducibility of results and consistent longitudinal use of a well characterised strain is critical. If tunnel tests are required for testing of a dual-AI net (e.g., Interceptor G2) wild collected mosquitoes are unlikely to be suitable, due to low levels of attraction to guinea pigs, which often results in low levels of blood-feeding success in untreated control tunnels. For this reason, the group saw this as a backup option rather than the primary approach for using resistant mosquitoes as part of durability monitoring or similar study.

#### 3.1.5. Conduct All Testing in a Few Chosen Test Facilities

Depending on the study design and available resources, it may be possible to standardise all bioefficacy testing by sending all sample ITNs to a single facility or to a small number of test facilities. In this way, the number of mosquito strains used across the study would be minimised, reducing variability between data sets. Other potential sources of variability are also controlled for, such as operator differences or the effect of different testing conditions. Comparing data between time points in a study would be easier than compiling data from multiple test facilities.

On the other hand, the need to test a large number of samples in a single test facility might cause a delay in processing the collected net samples, with the associated risks of changes to the resistance level of the mosquito strain between the start and the end of testing. Although net samples can be stored in refrigeration, there is also a risk of degradation during storage. This approach provides no control for the mosquito population changing between time points. Outsourcing testing to a single or small number of testing centres is unlikely to fit within country-specific National Malaria Control Program (NMCP) capacity development objectives. It may present challenges to the funders of durability monitoring studies. This was not a preferred approach.

#### 3.1.6. Send All Samples to Several Labs for Repeat Testing in a Multi-Centre Study

Testing net samples in several laboratories could avoid the need to characterise mosquito strains in detail by testing the same samples against a different strain in each facility and evaluating result consensus by compiling the data, and assessing variability in results. Since the AIs may be unevenly distributed across a single ITN [52], giving different results from different samples of the same net, pieces should be cut along the same band to distribute to multiple facilities for parallel testing. Wherever possible, this testing would be done blinded.

This approach multiplies up the testing workload by the number of facilities. To reduce the additional workload, a study could circulate a sub-sample of net pieces to additional test facilities for confirmatory testing of the results obtained by the primary test facility, with careful consideration given to how to manage a situation where results did not match between facilities. Transporting ITN samples, particularly between countries, can be challenging. There is a risk of further degradation of samples, due to delay and during transport between facilities, and a requirement for each facility to colonise and maintain a resistant colony of mosquitoes. From a quality control point of view, it is good practice for a study to repeat testing on at least a subset of samples at different test facilities. It could be done intermittently as an additional level of quality control. However, from a logistics, and particularly from a cost, point of view, this would not be a feasible approach to standardisation for all studies.

3.1.7. Measure the Added Effect of a Dual-AI ITN Relative to a Pyrethroid-Only Net

Although it would always be beneficial to understand the nature of the strain of resistant mosquito being used for testing, an additional or alternative approach to measuring the bioefficacy of the non-pyrethroid AI is to simply expose them to both a pyrethroid-only net and the dual-AI ITN under evaluation, and use the difference in mortality between the two as the endpoint. This approach would allow the comparison of the additional mortality induced by the second AI between time points to be used as a measure of continued bioefficacy. Where the endpoint caused by the second AI is different to the mortality caused by the pyrethroid, for example in the case of PPF, which causes sterilisation, no correction is needed and the level of sterilisation is the measure of the bioefficacy of the second AI. Evaluation of this effect is only possible by using a highly pyrethroid-resistant strain so that sufficient mosquitoes survive exposure to the dual-AI ITN and can be scored for fertility. This is conceptually a simple and attractive approach, removing pyrethroid content as a variable and controlling for variability between strains or changes in a strain over time, at least in terms of the pyrethroid resistance phenotype. However, this approach assumes that tolerance of a strain to the second AI does not change over time, so that even if the strain changes in its pyrethroid resistance, its response to the second AI remains constant. It also assumes that there is no cross-resistance, i.e., that the mechanisms conferring resistance to pyrethroids do not also confer resistance to the second AI. Subsequently, if susceptibility to one AI changes over time susceptibility to the second AI remains unaffected. This may be true for some new insecticides, but there is evidence of cross-resistance mediated by cytochrome P450 enzymes [53,54], including between pyrethroids and pyriproxyfen [55], so it cannot be assumed.

If there is an interaction between the pyrethroid and the second AI in the formulated dual-AI ITN, then it may not be possible to make a straightforward comparison; the two AIs may not act independently, making a direct comparison between mortality on the pyrethroid-only versus the dual-AI net samples problematic. If it was possible to obtain comparable ITNs treated with each AI alone to compare bioefficacy of each with bioefficacy of the dual-AI ITN then a direct comparison could be made, and investigation of crossresistance would be facilitated. On the other hand, the change in bioefficacy over time is relevant to a durability monitoring study. If the mortality caused by the pyrethroid-only net is sufficiently low it should still be valid to compare the additional mortality caused by the dual-AI ITN sample between time points.

Pyrethroid content is only removed as a variable if the pyrethroid-only net is equivalent to the pyrethroid content on the dual-AI ITN, in terms of the identity and concentration of the pyrethroid, as well as factors that might affect bioavailability, such as ITN formulation and impregnation method. For example, incorporated and coated nets may have different surface concentrations of AI and consequent bioavailability even where the total insecticide content is the same. This comparison becomes complicated for combination nets, such as the PermaNet 3.0, where the pyrethroid content is different on the roof and on the side panels. The selected pyrethroid-only net should be as close as possible in all characteristics to the ITN under evaluation, particularly for insecticide dose and bleed rate (where known). For some dual-AI ITNs no suitable pyrethroid-only net is available. A specifically matched pyrethroid-only net would likely rely on manufacturers producing small batches specifically for the purpose. This is not realistic, without incentive such as making it a requirement as part of the WHO Vector Control Product Prequalification (PQ) process, for example, and so the closest matching net would have to be used. The positive control should be kept consistent between time points; it may not be essential to be consistent between facilities if the relative change in additional mortality from the second AI over time is measured. A definition of 'brand new' or positive control net would be needed, along with guidance on storage conditions, especially for newer brands of nets, a method for washing and washing interval for the insecticide's regeneration.

Validation of the method against different second AIs using a range of resistant strains would be needed to have confidence in this approach, including the development of guidelines for the interpretation of results, establishing the threshold of killing when comparing the two nets, including the target minimum mortality among the resistant strain when exposed to the pyrethroid-only net, and gaining an understanding of the level of variability inherent in the assay. Additional controls could include exposing a susceptible strain alongside the resistant strain, including an unused and unwashed dual-AI ITN, or a net sample which only contains the second AI. However, this would likely have to be produced specifically for the strain characterisation by the ITN manufacturers, and again this is unrealistic without incentive.

It was agreed that this approach does not give sufficient standardisation for the durability monitoring studies under consideration. It might be enough for other purposes, such as screening field populations known to be resistant to pyrethroids to inform deployment decisions, but the variation inherent in these tests would likely lead to such wide confidence intervals in the data that it would not be sufficient for providing evidence to the WHO PQ Unit for vector control products assessment (PQT/VCP) of continued bioefficacy as part of durability evaluation in a product dossier. However, the consensus was that including a pyrethroid-only net in durability monitoring bioassays as a control would be good practice, if suitable net samples are available. The specific characteristics of the control net (brand, batch number, age, polymer, insecticide type and concentration, coated or incorporated, storage conditions) should be reported alongside the assay results. A pyrethroid-only control is not equivalent to the pyrethroid content or presentation in the dual-AI ITN. It could still be used as a proxy indicator to help calibrate and interpret test results, rather than an exact one measure to infer additional mortality induced by the second AI directly. The additional control of a brand-new dual-AI net would also be a way to control for the variability of the resistant strain over time, though with some of the same practical caveats as above. More generally, comparison between the bioefficacy of a pyrethroid-only ITN and a dual-AI ITN will help to inform procurement decisions.

#### 3.1.8. Perform Bioassays of Nets from Multiple Time Points Side by Side at the End of the Study

To control for variability in the resistant mosquitoes used for testing over time or between test facilities, all nets sampled during the study could be stored and then tested in a short period at the end of the study. The major disadvantage of this approach is that information about the expected performance of the nets would not be gathered in realtime. Since durability monitoring is currently the main means of identifying quality issues with nets, this would have significant operational impact. There would also be the challenge of performing a large number of bioassays in a short period, rather than a smaller number at each time point, and the risk of a catastrophic event leading to loss of net samples from the whole study with no durability data being collected at all. Practical issues worthy of consideration are the potential for nets to degrade further during storage and the need for substantial storage space under specific controlled conditions. This approach to standardisation was agreed not to be suitable as a standalone standardisation measure.

A compromise would be to store a subsample of nets at each time point, after they have been collected back and used for bioassays, and repeat testing on this subsample at the end of the study, where resources allow. This has the advantage of confirming the results of bioassays conducted during the survey in side-by-side testing with minimal variation in the mosquitoes used, and could also be used to try to understand any unusual results observed during the study, supposing that data collected during the study were felt to be robust enough. In that case, a robust enough decision could be made to scale back this final testing or not continue at all, but the samples would be available as a backup. Additional standardisation measures would need to be taken during the initial bioassay testing performed during the study. Still, the group thought this could be a valuable addition to other characterisation or standardisation measures for WHO PQT/VCP studies, and monitoring of durability of nets in operational deployments. There is the opportunity

to build this into existing durability monitoring protocols used by PMI-supported studies and others [56], where nets are removed from use, typically within six months and annually for three years, for durability monitoring and replaced with new nets. Currently, these replacement nets are excluded from any further monitoring, but by the end of the study would represent nets of ages corresponding to each time point of the study and could be collected at the end for a final confirmatory round of bioassays. Important caveats of this approach include: there would still be a large amount of testing to be done at the end of the study, a more significant number of replacement nets would need to be distributed to account for attrition and leave a large enough sample for the final testing, and careful record-keeping would be required as different batch numbers may be distributed at other time points adding a layer of complexity. Nevertheless, the additional quality control of data could be used to justify the additional logistics and expense of this approach.

Iinformation would need to be generated on the likely variability between original and replicate testing inherent in the bioassay, so that the results of the repeat testing could be interpreted judiciously. Consideration should be given to how to report results of this repeat testing, particularly if initial monitoring data have been distributed or published already; protocols published ahead of the trial could make it clear that this repeat testing is part of the study design and careful interpretation and reporting of results which do not completely align will be required.

3.1.9. Use a Model System Other Than a Conventional Bioassay Using Mosquitoes of the Target Species

Conventionally, the durability of an ITN is tested using defined measurements of physical integrity, insecticide content and bioefficacy. For bioefficacy, cone bioassays, where the target mosquito species are exposed to a net sample, and the mortality is scored, is the accepted measure of field-collected ITNs over time [9]. Since the purpose of durability monitoring is to detect any change in bioefficacy of the net sample over time (i.e., ITN age), in a system that otherwise gives consistent results, the testing does not have to be against the vector species of interest. For bioefficacy testing in general, it is unnecessary to use the species against which a product will be targeted, as long as their relative sensitivities in a bioassay are understood. Aedes mosquitoes, particularly *Ae. aegypti*, can be reared in large numbers [57–60], with the added benefit of eggs resistant to desiccation, and can be stockpiled until sufficient eggs have been produced for a round of testing. It may even be possible to use a model organism, such as *Drosophila melanogaster*, to replace mosquitoes altogether, which has less of a containment risk and is easier to maintain, possibly expanding the number of test facilities able to perform durability monitoring. Validation would be needed to show that the chosen bioassay was appropriate for another species and that the species was sensitive to a change in AI concentrations across the relevant range. Even then, there may be reluctance to rely on results from a non-target species to test the efficacy of a product primarily aimed at anophelines.

New technologies are emerging which might offer a valid alternative to conventional bioassays or mosquito strains established from field-collected material. Transgenic strains of *Anopheles gambiae* over-expressing specific P450 enzymes, known to be important in conferring pyrethroid resistance, can be used to detect and characterise cross-resistance between insecticide classes [61]. A strain could potentially be produced that over-expressed the enzymes known to cause resistance to the second insecticide in a dual-AI ITN, expressing a very tightly defined resistance mechanism in a known genetic background.

Measures of bioefficacy suffer from high variability due to inherent bioassay variation and biological variation between mosquito populations. Chemical analysis of the total insecticide content of a net sample, for example, by HPLC, may be more reproducible but is not sufficient as a measure of bioefficacy, since it is the availability of biologically active insecticide on the surface of a net that determines its efficacy against mosquitoes [39]. However, suppose it was possible to sample and quantify the amount of bioavailable insecticide on a net surface. In that case, this might be quicker than performing bioassays

and an equally informative measure of bioefficacy. It would need to be correlated with the results of bioassays to be validated as a replacement method.

Novel techniques and approaches warrant further investigation, especially as ITNs continue to evolve. However, a method to monitor the residual bioefficacy of dual-AI ITNs is needed urgently, precluding much analysis of available options or the development of new systems.

#### *3.2. The Final Protocol: Characterisation of the Resistant Strain in Parallel with Bioassays*

The protocol for characterising the resistant mosquitoes used in bioefficacy testing with dual-AI ITNs is outlined in Figure 1, and a detailed standard operating procedure (SOP) is provided as Supplementary Information. The group agreed on this approach following several rounds of discussion on the merits of each of the proposed strategies and refinement of this preferred approach. The primary concern of the group was durability monitoring studies with dual-AI ITNs, but the protocol could be adapted to new types of ITNs as they are developed, to other product types, such as indoor residual spray (IRS) formulations, or attractive toxic sugar baits (ATSBs), and for other kinds of studies requiring resistant mosquito strains.

Since resistance changes over time, in both wild mosquito populations and laboratory strains, even when consistent selection pressure is applied, the only way to be confident in the resistance phenotype at the time of testing is to characterise the resistant strain simultaneously with bioassaying of the ITN samples. Depending on the study design, describing each cohort of mosquitoes used for bioassays on net samples could be laborious. Instead, a resistant strain could be characterised at the start and end of a study, for example, all net samples collected in a given year or from a given district. The following elements were considered to be key to the characterisation:

	- a. A positive control brand-new pyrethroid-only net, containing the same pyrethroid content as the dual-AI ITN, should be included, for several reasons. Firstly, it provides an additional measure to ensure the strain has sufficient pyrethroid resistance, and will help interpret the results from the sample ITN bioassays.

Pyrethroid content is controlled for as a standard variable used to calibrate results between time points in the test net samples. Finally, exposing mosquitoes to a brand new pyrethroid-only net alongside the dual-AI test nets would demonstrate the added benefit of the second AI. Where multiple brands of such nets are available, the net most similar to the test net without the addition of the second AI should be selected, where possible of the same material and with the correct AI applied similarly (incorporated/impregnated) at the target concentration and release (bleed) rate. The brand should be consistent across the study.


Where both AIs induce mortality, the endpoint measures for the two AIs should be the same. If the outcome for the second AI is delayed mortality, then mortality caused by the pyrethroid should be measured for the same period. For example, in products containing chlorfenapyr, mortality is typically measured to 72 h [30,42], and so mortality should also be measured to 72 h for the pyrethroid treatments in the strain characterisation. This would control for additional delayed mortality in the resistant strain caused by the pyrethroid, which has been measured in some, but not all, strains tested [62–64], which could mean that for the purposes of the study the strain was not sufficiently resistant. Scoring knock-down and 24 h mortality for the pyrethroid exposure as well might be useful for comparison with historical data. If the second AI induces a different endpoint, for example sterility, and the study is aiming to measure the effects of each AI separately, then it would be necessary to include investigation of pyrethroid exposure on that endpoint.

A number of additional measures were recommended by the group as general good practice and to further characterise and standardise the resistant mosquitoes used for bioefficacy testing:


	- ◦ Control for longitudinal variability in strains
	- ◦ Be a second measure of how much of the original bioefficacy has been lost over time in addition to the comparison between results obtained at the different time points
	- ◦ Allow the additional mortality caused by the dual-AI ITN over that of the pyrethroidonly net to be calculated at each time point and compared longitudinally
	- ◦ Control for any effects of declining content of the first AI over time. This is particularly important as the wash resistance of the pyrethroid and the second AI may be different, and so the additional benefit of the second AI may be lost before that of the pyrethroid

It is strongly recommended that the results of strain characterisation be presented alongside study data to aid the interpretation of bioefficacy results. An example of how this might be done is shown in Table 3.

As an additional standardisation measure, the group proposes for durability monitoring bioassays that a sub-set of dual-AI ITN samples is retained from each study time point to repeat bioefficacy testing, and characterisation of the strain, with ITN samples from all time points in parallel at the end of the study. Suppose the nets are stored appropriately to minimise the degradation over time, in that case, this additional test allows for a direct comparison between samples to minimise the difference in the mosquito population and reconfirm the trend in mortality measured over time during the study. Since the bioassays were also performed during the study, a data set would still have been generated if storage conditions turned out unsuitable and samples were degraded, lost or damaged over time.

**Table 2.** Some key considerations for maintaining consistent quality of mosquitoes being reared for bioassays and conditions during bioassays, and steps that can be taken to monitor quality of mosquitoes.


1 In decreasing order of preference): composite fitness indices, wing morphometrics [69], wing length, dry weight, wet weight; 2 Mosquitoes may be reared on an adjusted light cycle to accommodate testing at a specific point in their circadian rhythm within working hours.

**Figure 1.** Overview of protocol for characterisation of a pyrethroid resistant strain for use in testing the bioefficacy of a dual-AI ITN, developed by consensus of a group of key stakeholders. Where delayed mortality (scored after more than 24 h) is the endpoint of interest for the second AI, mortality should be scored at this later time point for all elements of the characterisation; mortality may also be scored at 24 h.

An alternative to the retention and repeat testing of nets at the end of the study may be possible and has some advantages as an additional standardisation step. At each time point during a durability monitoring study, a sample of nets is collected from the field for destructive sampling (i.e., bioassay) and replaced with new nets of the same brand to prevent the household from being left unprotected. At the end of the study, a sample of these replacement nets could be collected alongside the nets being sampled for the final time point, and bioassays performed on all nets in parallel. In this way, nets of all ages could be tested side by side for a more direct comparison, with the same characterised strain of mosquitoes [56]. This approach avoids the risk of degradation of nets collected at each timepoint and held until the end of the study for parallel repeat

testing. A more complicated study design is required, and additional nets would have to be distributed to ensure sufficient nets remained at 36 months, since nets ge<sup>t</sup> discarded as they wear out. In carefully conducted research studies that employ unique labelling of individual nets, it should be possible if additional cost could be supported but not feasible for programmatic evaluations.

**Table 3.** Characteristics of a pyrethroid resistant and susceptible mosquito strain used for bioefficacy monitoring of dual-AI nets (an example is a strain used to monitor Interceptor G2, chlorfenapyr + alpha-cypermethrin ITN). Recommended format for presenting the results of strain characterisation should be provided alongside bioefficacy testing with dual-AI ITNs.


#### *3.3. Considerations and Points of Discussion in Deciding on the Final Protocol*

## 3.3.1. Sample Size

When producing data to characterise a mosquito strain, the more mosquitoes tested, the more robust the result, achieved by increasing the number of replicate assays (cones, tubes or bottles). Given the inherent level of variability in the bioassays proposed, it would be desirable to recommend a minimum number of replicates on which a result should be based. The protocol proposed here uses the WHO test procedures for resistance monitoring [42] as a baseline measure of how many replicates are required for each assay, but as more data are produced using this protocol, more robust power calculations, or the application of modelling, can be used to refine the recommendation. However, if F1 mosquitoes are used for testing upward-adjustment of sample sizes might be considered because of greater inherent variability compared to (inbred) laboratory strains [40], and is essential if species mixtures are expected.

#### 3.3.2. Controlled Conditions during Characterisation of Strains

As with bioefficacy testing generally, it is necessary to control the climatic conditions during the strain characterisation bioassays. At minimum, the temperature, relative humidity, and time of day should be recorded and closely monitored in case of electricity cuts or other fluctuations. A reporting checklist would be helpful to encourage accurate reporting, whether the SOP is followed thoroughly or whether deviations have occurred for whatever reason. This will aid downstream interpretation of the results, and if temperature or humidity variation is implicated in production of apparently aberrant results, repetition of tests that were conducted out of specified ranges is advisable. Additionally, depending on the nature of the second AI, the time of day the bioassays, both strain characterisation and durability, testing are conducted might be critical [26,70–72]; for example, in evaluating a dual-AI ITN containing chlorfenapyr [25]. Some key parameters to consider standardising when performing bioassays with mosquitoes are suggested in Table 2.

#### 3.3.3. The Approach Selected Must Be Applicable in Most or All Test Facilities

Proposed protocols must be practical, affordable, safe, and accessible in strain availability and facilities to maintain and characterise mosquitoes. The more criteria for suitable strains in place (multiple resistance mechanisms, resistance levels, characterisation methods), the more difficult it might be for test facilities to meet these criteria.

#### *3.4. Deciding on Criteria for a Suitable Resistant Strain*

The group's discussion over what criteria to set for a resistant strain was an attempt to strike a balance between a desire to characterise the strain in the greatest possible detail, to allow the best interpretation of data and comparison between data sets, and the pragmatic considerations of how much additional resource burden could be borne by programmes evaluating the dual-AI ITNs. The final criteria agreed by the group is highlighted in the protocol overview in Figure 1.

There was an agreemen<sup>t</sup> to recommend using a single well characterised strain for all testing within a study to remove this as a possible source of variation. ITN efficacy testing may be important to test against resistant strains of all significant *Anopheles* vector species, but this is not critical for durability monitoring. Indeed, the species used for the bioassays need not be a target of the ITN at all. As long as it is validated for the assay, it was sensitive to changes in bioavailable AI and relevant concentrations of the second AI.

There was a preference to use a strain with resistance conferred by multiple mechanisms, to produce the most widely applicable results; however, to confirm the presence of multiple mechanisms is complex and beyond the capacity of many test facilities. Although resistance to the first AI, currently always a pyrethroid, in the dual-AI ITN is the relevant requirement of the mosquito strain, and sufficient susceptibility to the second AI is required, a broader resistance profile might be desirable. It is likely sufficient to demonstrate resistance to the specific pyrethroid in the product under evaluation. Still, there may be a benefit to knowing that multiple resistance mechanisms are acting and using a strain shown to be resistant to pyrethroids in general and other insecticide classes. The consensus was that more information may always be desirable, and would help explain variable results across time or between test facilities. Still, an understanding of the resistance mechanisms present is probably not necessary for the question at hand.

Overexpression of cytochrome P450s appears to be the mechanism most commonly implicated in metabolic resistance and cross-resistance [53,56]. So, upregulation of P450s would be a desirable minimum criterion in a resistant strain. This could be demonstrated by characterising expression levels of a panel of key enzymes in the resistant strain (as detailed in [19]), and the potential for cross-resistance with the second AI of interest could be predicted with the use of P450 screens [54]. To adequately describe a strain's metabolic resistance risk, the most relevant molecular markers would need to be identified, along with the P450s most important in conferring resistance to the first AI, and then acceptability criteria based on fold-increase in expression relative to a susceptible strain would have to be established. This is challenging, however, and a strain showing a broad overexpression profile, including at least some known key enzymes (with proven insecticide metabolic capacity), may be more realistic. The interaction of P450s with the second AI would ideally be characterised as well. These analytical methods are specialised and relatively expensive, but regional reference laboratories may support programmes in analysing mosquito strains for this purpose. Given restricted resources, a programme could set out to molecularlycharacterise the key strain, or strains, used for bioefficacy testing in durability monitoring at least at the start of the study. However, P450 expression levels are likely to change over time, particularly under selective pressure usually applied to laboratory-maintained resistant strains, so repeated analysis, perhaps of a reduced set of key markers or enzymes identified during initial characterisation, is desirable.

Given the costs associated with a more sophisticated analysis of resistance mechanisms, a pragmatic alternative is to demonstrate the involvement of metabolic resistance (primarily attributable to P450 enzyme activity) in the selected strain using a PBO synergism assay. Demonstrating that mortality is increased by PBO pre-exposure followed by a pyrethroid exposure relative to a pyrethroid alone may be sufficient to demonstrate the presence of P450-mediated resistance. This could be done with a WHO tube assay [42] or exposure to a locally relevant pyrethroid-PBO ITN, which would give useful efficacy data relevant to the local setting. Moving away from the standard protocol would make comparing facilities more challenging, though the standard synergism assay may not always be very informative [40]. Testing with other synergists might be informative, as might testing the effect of PBO pre-exposure followed by exposure to the second AI. Still, standard methods have not ye<sup>t</sup> been established [41].

The most pragmatic way to determine that a strain is suitable for monitoring the durability of the second AI in a dual-AI ITN is to confirm its resistance to the first AI, and ensure that it meets the acceptable criteria of mortality in a standard bioassay. This can be done through WHO tube bioassays using 1×, 5× and 10× DCs and selecting strains that are, for example, at least moderately resistant (<90% mortality at 5× DC) according to WHO definitions [42]. The resistance level could be determined more precisely using doseresponse experiments to calculate LC50 values and resistance ratios relative to a susceptible comparator strain. If a standard SOP was used, these results could be compared between test facilities. Criteria that a minimum fold-increase in resistance be met before a strain was used for durability monitoring could then be set, though this is a labour-intensive approach, particularly since the LC50 for a susceptible population would ideally be set using multiple susceptible strains in a multi-centre study, to overcome the noise that is inherent in this approach. There may not be a need for strict resistance criteria since durability monitoring simply needs to detect a change in bioefficacy over time. However, a sufficient proportion of the exposed mosquitoes must survive exposure to the first AI to allow detection of an effect of the second AI.

It is important to measure the susceptibility of the resistant strain to the second AI in the product under evaluation in the absence of the first AI as part of strain characterisation. Even where an insecticide has previously not been used for mosquito control, resistance may have emerged as a result of agricultural use [2]. There is also the potential of crossresistance to an insecticide with a different mode of action in mosquitoes resistant to pyrethroids, possibly through more general mechanisms that increase metabolism or reduce penetration. For example, the same metabolic enzymes appear to target pyrethroids and pyriproxyfen [56,57,73]. Programmes measuring the efficacy of a new vector control product should monitor the target population for emerging resistance. Still, it is also desirable to show that the resistant strain used to test the durability of the second AI does not already have a level of cross-resistance to it and that such resistance does not develop during the study. For PBO products this can be established during characterisation of pyrethroid resistance, as described above, and where the WHO recommends a DC and suitable methodology, this can be built into the strain characterisation [42]. Where such a method is not available for the second AI, cross-resistance may be predicted through molecular analysis [54], but this would normally need to be the subject of substantial additional investigation.

It is possible that the methodology selected for the bioefficacy component of durability monitoring could affect the criteria for a suitable resistant strain. For example, the cone test and tunnel test are very different modes of exposure and environments in which mosquitoes encounter a net sample for different exposure times and a strain that is not killed by the pyrethroid in an ITN in a cone test may be killed in a tunnel test. Where a non-standard methodology is used to measure bioefficacy, it is recommended that data from the baseline bioassays with a selected resistant strain be reviewed along with the data from the strain characterisation exercise to confirm that the strain and standard bioassays are suitable for that specific study.

#### *3.5. Cost Implications of Adding Strain Characterisation to a Study*

The development of this characterisation protocol seeks to outline an optimum method to characterise and standardise resistant strains for use in bioefficacy testing of dual-AI ITNs. These efforts are required to produce robust and reliable data, but additional funds will be needed to support the additional testing. Following the SOP detailed in Supplementary Information would add a workload consisting of six WHO tube assays with a pyrethroid, six WHO tube assays for the synergist experiment, six DC assays for the second AI and five cone tests with a brand new pyrethroid-only ITN with the resistant strain, plus two WHO

tube assays with a pyrethroid and five cone tests with a brand new pyrethroid-only ITN with a susceptible reference strain, a total of 475 resistant and 75 susceptible mosquitoes. If multiple types of dual-AI ITNs were included in a study, six additional DC bioassays would need to be added for each non-pyrethroid AI in the study, plus additional new dual-AI ITN positive controls, if included. The same mosquitoes can be used for QC and samples stored for later analysis, but these steps will require time commitment and consumables. Wing length analysis requires access to a microscope and image software or graticule, and further molecular analysis of samples may be required.

For a single experiment strain characterisation would be a one-time cost. Still, for a study lasting up to a month, strain characterisation should be completed before the study and repeated at the end of the study. Characterisation should also be repeated for longer studies, to ensure that the resistant strain has not changed in resistance profile and is still suitable, within one month of finishing, and where possible repeated during the study, on every mosquito generation if possible, or as often as practical. If resources were available to include some elements of the characterisation alongside each bioassay session, it would characterise the strain and provide an internal control for any differences between time points arising from changes in rearing, testing conditions, operator differences etc.

This additional cost may be small and easily borne for small scale research or development activities by academic institutes or developers or manufacturers of insecticide-based vector control tools. ITN evaluation and procurement is, however, a very price-sensitive market. Adding additional testing to the durability monitoring protocol will add cost to already expensive trials of new ITNs [74]. Durability monitoring is largely a donorfunded activity that is already growing in scale due to more complicated bioefficacy testing methodologies for dual-AI ITNs than was required for pyrethroid-only ITNs. The benefits of these additional characterisation steps will need to be accepted by funders, including the potential costs incurred should poor quality durability monitoring results lead to poor decisions on ITN choice. Decisions to procure more expensive ITNs can be made with greater confidence if the durability monitoring data are more robust in demonstrating their residual bioefficacy. Additionally, the scale of additional testing may be relatively insignificant compared to the bioefficacy testing already included in a study. For example, one reported durability study of ITNs in Madagascar required 50,000 mosquitoes to test 400 net samples [6]. The proposed protocol has been divided into minimum essential and additional desirable steps based on available resources. An exercise to calculating the cost of the characterisation and in scoping the willingness of funders to support may help promote the adoption of this proposed protocol. It is also likely that test facilities will support the minimum essential strain characterisation from multiple funding sources as it is incorporated into their regular facility running costs.
