2.3.2. Classification Models

Recently, researchers in different scientific fields, including the clinical and social sciences, have emphasized the utility of focusing on prediction, rather than explanation, during data analysis [69–72]. This increased attention to predictive models may be largely attributed to the significant spread of machine learning (ML)—a branch of artificial intelligence that trains algorithms on data samples (i.e., training sets) in order to make predictions on completely new data (i.e., test sets) without being explicitly programmed to do so [73]. As regards psychology, ML techniques have been shown to be particularly useful for predicting human behavior, including high-risk behavior; thus, they may be applied to improve the effectiveness and targeting of preventive programs and interventions [74]. In brief, ML models are capable of predicting the behavior of individual subjects, allowing greater attention to be paid to those considered most critical [69].

In the present study, ML algorithms were trained on psycho-social data to identify subjects who were more likely to present high levels of perceived stress during the COVID-19 emergency, and who were consequently at the greatest risk of developing psychological symptoms, including those of PTSD. For this purpose, participants were split into two classes: high perceived stress and low perceived stress. The high perceived stress class included participants with a PSS-10 score of more than 1.5 *SD* above the Italian population mean (*n* = 393) for men and women, respectively. Conversely, the low perceived stress class included participants whose PSS-10 did not exceed 1.5 *SD* above the Italian normative value (*n* = 1642). It should be noted that participants who reported their gender as "other" (*n* = 18) were excluded from this analysis, as the Italian normative values were available for males and females only [53].

As ML models are built to fit particular data, it is important to test how each model fits new (i.e., unseen) data. For this reason, part of the data (the training set) is generally used to train and validate the model, while another part (the test set) is used to test the model's accuracy on new examples [73,75]. This procedure guarantees the model generalization and increases the replicability of the results [76,77]. In the present study, 20% [73,75] of the participants were randomly chosen and retained as the test set. Accordingly, the training set consisted of 1628 participants (314 with high perceived stress and 1314 with low perceived stress), and the test set consisted of 407 participants (79 with high perceived stress and 328 with low perceived stress).

In the first step, feature selection was performed to remove redundant and irrelevant features and to increase model generalization by reducing overfitting and noise in the data [78]. A good strategy for feature selection is to identify the subset of features that are highly correlated with the class to predict, but not correlated with each other [78]. This procedure was performed in the present study using the correlation-based feature selector (CFS) in the WEKA 3.9 software [79].

The problem of class imbalance was addressed while running the classification algorithms. The ratio between participants with high perceived stress and those with low perceived stress was approximately 1:5. As ML methods work best with balanced datasets, it is necessary to account for any class imbalance, especially when training examples are limited—a condition that is frequently met by datasets in health and clinical psychology [80]. At the same time, it is equally important for ML models to be built on samples that are representative of the population, reflecting real distribution [80].

One strategy to overcome these two limitations consists of altering the relative costs associated with misclassifying the minority and majority classes, in order to compensate for the class imbalance [81]. In the present study, ML algorithms were set in such a way that any algorithmic error made in classifying the minority class (high perceived stress) was weighted four times more than any error in classifying the majority class (low perceived stress). This cost-modifying strategy has been shown to provide better results than other methods in addressing the class imbalance problem [81]. Moreover, it should be noted that, for the goal of the present task, it was more beneficial to minimize false negatives than to minimize false positives (i.e., to have a model with high sensitivity rather than high specificity). In other words, it was more important to identify people who were truly at risk than to avoid misclassifying people who were not truly at risk.

ML models were trained and validated on the training sample (*n*= 1628) through a 10-fold cross-validation procedure using the WEKA 3.9 software [79]. The different algorithms (i.e., logistic regression [82], support vector machine (SVM) [83], Naïve Bayes [84], random forest [85]) were chosen as representatives of different classification strategies, to ensure that the results would be stable across classifiers and not dependent on specific model assumptions (details on the parameters of the ML classifiers are reported in the Supplementary Materials). K-fold cross-validation is a resampling procedure that seeks to reduce the variance in model performance relative to the performance that may be obtained from a single training set and a single test set. The procedure consists of portioning the sample into k subsets (i.e., folds; in the present study, k = 10), and using k-1 (i.e., 9) subsets to train the model and the remaining subset to validate the model's accuracy. This is repeated k (i.e., 10) times [86]. The final model metrics are obtained by averaging the metrics obtained in all validation subsets. In the present study, the models developed from the 10-fold cross-validation procedure were tested on the test sample (*n* = 407).
