Aim 1: Principal Components Analysis

For the first aim, principal components analysis (PCA) was used to identify the number of components assessed by all individual items from the performance and informant procedures. The R package 'tidymodels' was used for all steps of the PCA analysis. The appropriateness of using PCA was evaluated using variable correlations, Bartlett's test, Kaiser-Meyer-Olkin, and determinants. In terms of correlations, variables should be only mildly intercorrelated and were examined using thresholds suggested by Field et al. [37] to have absolute correlations ranging from 0.3 to 0.9. Items with more than one occurrence for a correlation outside of the range were excluded from the PCA analysis. Only four variables needed to be excluded: Vineland-II ABC, Vineland-II social domain score, one item from the Vineland-II maladaptive behavior domain scale, and the SIB total score. The Kaiser-Meyer-Olkin factor adequacy was 0.95, above the 0.7 threshold. Bartlett's test of sphericity was significant (X<sup>2</sup> (325) = 5207.62; *p* < 0.001). Finally, the determinant was below 0.00001. Together, these indicated that PCA was appropriate. Based on the scree plot of unrotated results, two components with eigenvalues > 1.0 were identified, accounting for 76% of the total variance in scores. Varimax rotation of loadings was then employed to enhance interpretability of identified components.

The dataset containing the two components scores and AD diagnostic status were then split into a training and test dataset. The training dataset was used to generate the logistic regression model. The model was assessed for multicollinearity and the assumption that independent variables are linearly related to the log odds. The performance of the generated model was assessed on the test dataset by evaluating the area under the curve (AUC), sensitivity, and specificity.
