*2.4. Analysis Plan*

To support Aim 1, the feasibility was assessed for measures of social cognition and social behavior administered to, or regarding, children and adolescents with DS. Feasibility was specified as the percentage of participants who provided responses at Time 1 and Time 2. Feasibility criteria were set a priori and ≥80% was the selected parameter for acceptable feasibility for use of these measures in DS research. This selection was informed by previous work on the psychometrics of cognitive measurements in intellectual disability and DS [33,39]. Examiners recorded reasons for noncompletion, which consisted of not understanding the task, behavioral noncompliance, and verbal refusal. Noncompletion of the parent-report measure was from missing questions (i.e., did not complete both sides of paper form) or failure to return the questionnaire. Range of scores, skewness, and kurtosis were also examined to determine the normality of the score distributions and to evaluate if there were floor effects for raw or standard scores. Acceptable values for skewness were between −1 and 1 and were between −2 and 2 for kurtosis. Participants who completed the measure with the lowest possible score, and those who were unable to complete the measure at Time 1, were both included in the estimate of floor effects. Floor effects < 20% were considered appropriate for research.

To support Aim 2, further psychometric evaluation (test-retest reliability, practice effects, and validity) was completed over the two-week testing interval. Test-retest reliability was assessed using intraclass correlation coefficients (ICC). Descriptive categories for ICCs are poor (<0.50), moderate (0.50−0.74), good (0.75−0.90), or excellent (>0.90) [33,40], and a priori good or excellent classifications were deemed suitable. Paired samples *t*-tests were used to assess practice effects. Practice effects were presumed if scores at the two testing visits had a significance value less than 0.05 and Cohen's *d* effect size greater than 0.20. Convergent validity across a selection of measures (NEPSY-II subtests and SRS-2 Social Awareness and Social Cognition) and associations among all social measures was determined using bivariate Pearson correlations. Correlation coefficients ≥0.50 were deemed as acceptable for convergent validity. Associations with broader developmental domains (age, cognition, and language) were also evaluated, and significant correlations were expected.

The third aim of the study investigated measures with low feasibility using post hoc sensitivity and specificity analyses. Sensitivity probabilities estimate the likelihood that a participant with specific characteristics will be able to complete the measure. Specificity probabilities estimate the likelihood that a participant not included in the specified characteristics will be unable to complete the measure. These analyses were completed for any measure that did not meet study feasibility criteria, and suggestions for age and cognitive ability of participants for future administration were established (as per [33]). Benchmarks for sensitivity and specificity probabilities were selected based on age (8 and 10 years) and cognitive ability (ABIQ deviation scores ≥20, ≥30, ≥40, and ≥50). Lower bounds of chronological age in previous clinical trials in DS informed benchmark selection [41].
