*2.6. Sample Size*

In order to enable item response theory (IRT) methods to be used in the analysis, which was important for building the scoring algorithm [22], we established that a sample of approximately 300 children was required. Considering the time needed to carry out the assessments and collect all the data, as well as available funding and feasibility of assessments in the countries, the sample size was set at 96 children per country, 288 in total. This also allowed us to devise a quota sampling scheme with equal allocation of children to each cell (see File S1). For the IRT model used (2PL), 250 children are considered a sufficient sample size [23] and 288 children gives us an 80% power at a two-sided 0.05 alpha to detect as significant a Pearson's correlation of 0.16 or higher of the tool score against other contextual variables. Thus, we felt the sample size sufficient for the purpose of tool validation.
