*3.3. Reliability*

Inter-rater and test-retest reliability statistics were calculated for all 121 items tested. The results are shown for the 90 well-performing items that were eventually retained in the finalized tool (Table 3) and the 33 removed items. We report statistics as a whole across all countries, as numbers for individual countries were too small due to sample sizes. The raw proportion of agreement (RAP) is reported as well as frequencies of items with statistics less than 0.6 (poor/fair/moderate reliability), between 0.61 and 0.80 (good) and above 0.81 (very good) for Cohen's kappa and Gwet's AC1 [31]. As it can be seen from

the table, the majority of items showed very good reliability, with poorest outcomes in the socio-emotional domain and highest reliability values in the motor domain. Inter-rater reliabilities by domain for the retained items ranged from 0.78 to 0.95. Intra-rater (or test-retest) demonstrated excellent reliabilities, with a range of mean reliability by domain from 0.84 to 0.96.


**Table 3.** Reliability frequencies by domain for WHO IYCD items for the 90 retained items and the 33 removed items.

\*\* NOTE: Of the items removed, 9 had inter-rater reliability kappa statistics < 0.40, and 6 had intra-rater reliability kappa statistics < 0.40. These include the 10 behavior items showing no developmental trajectories that were later added to the final tool as important non-scoring items. RAP \*\*\*—Raw Agreement Proportion, Kappa—Kappa statistic of agreement, AC1—Gwet's AC1 agreement statistic.

### *3.4. Cognitive Interviews*

Twenty-seven caregivers (nine in each country) were interviewed and in all cases, generally demonstrated a good understanding of the questions that were asked of them. The feedback they provided was used to revise wording of a few items. The four items that showed the highest degree of misunderstanding were: "Does your child walk backwards, two or more steps without any support?" (GRO20), "When you say 'no', does your child stop what they are doing?" (REC7), "Can your child complete a five piece puzzle?" (EXP23) and "Can your child understand on first try what is being said to him/her?" (SE12). All these items were removed as shown in Table 4 with item numbers as in Prototype 1 [14].
