*2.6. Data Synthesis*

Due to the nature of the data extracted, it was decided by the review team that a meta-analysis of any format (including a pairwise meta-analysis or a network meta-analysis) was not appropriate for any data included in this review. In most cases, the studies contributing data towards a particular outcome were extremely heterogeneous. Clinical heterogeneity existed between studies in terms of the intervention timing, frequency of the blood taking procedures, the gauge of the needle used, and the amount of blood taken per procedure. There was also heterogeneity in terms of the characteristics of the mice used (strain, sex, age, etc.). In the few circumstances in which studies were homogeneous enough to facilitate an appropriate meta-analysis, the primary authors rarely provided complete reporting of data, often only reporting *p*-values, a statement of (non-)significance, or simply showcasing their results in the form of a figure or graph that the review team had to "digitize". The review team are cognizant that 'digitizing' data from figures is a subjective, highly variable and imprecise method in which to collect data, and are hesitant to include data collected via this method in any formal meta-analysis.

Because of these limitations and deviations from the methods as specified in the protocol, data were synthesised according to the reporting guidelines of Synthesis Without Meta-analysis (SWiM) [18] for each outcome presented. This has occurred for each outcome and is covered in detail in the results section.

### *2.7. Assessing Certainty in the Findings*

The Grading of Recommendations, Assessment, Development and Evaluation (GRADE) approach for grading the certainty of evidence was followed [19,20] and a Summary of Findings (SoF) has been created using the GRADEPro GDT software (McMaster University, ON, Canada) [21]. The SoF reports plasma glucose concentration (mmol/L), plasma corticosterone concentrations (ng/mL), faecal corticosterone concentrations (ng/0.05 gram of faeces) and bodyweight (% change). The SoF has been presented in Table 1.



have downgraded one level between the two k Downgraded one level for inconsistency. Only one pairwise comparison was observed over multiple studies. However, the direction of that comparison occurred in the same direction across studies. Wide variation in results from other pairwise comparisons across results from included studies l Downgraded one level for

imprecision—only 3 studies of limited size contributed data towards this outcome. Primary outcome data considered using GRADE are bolded.

