*3.2. Item Performance*

Figure 3 illustrates examples of different item response trajectories.

Plots for all items tested are available in the second supplementary file (File S2). Approximately 90% of items fell into the patterns demonstrated in Figure 3a,b, with wellmarked developmental progression in item attainment among children within and between each country. Two examples of poor items are illustrated by lines that are very flat (Figure 3e) or that show too much variation in progression between countries (Figure 3f).

A number of items (11/121) did not show a clear developmental trajectory. Nine of eleven poorly performing items belonged to the socio-emotional domain (representing 9/35 or 26% of the items in that domain), and one each belonged to the gross motor and expressive language domains (representing 1/24 or 4% of items in each domain). Items that showed considerable differences in terms of attainment across countries and the poorly performing items were subjected to expert review by the core and country teams. During the review (reported below), it was ascertained whether the item was likely to be exhibiting bias (due for example to misunderstanding or poor translation), or true differences between countries, and therefore whether it should be retained or deleted from the finalized tool.

**Figure 3.** Examples of logistic regression of items. The first two plots, (**a**,**b**), show clear developmental trajectories by age for each country, with agreement between countries. The next two plots, shown in (**c**,**d**), display items that have good developmental trajectories but also some differences between countries. Examples of two poorly performing items are shown in (**e**,**f**). Green—Pakistan, Blue—Malawi, Pink—Brazil.
