*2.3. Data Analysis*

Statistical analysis was done with Stata16 for windows (College Station, TX, USA). The association between the baseline measures with the CBT efficiency (bad versus good outcome) was based on the chi-square test (χ2) for categorical measures and analysis of variance (ANOVA) for quantitative measures. An increase in the Type-I error due to the multiple significance tests was based on the Finner method [54], which is a family-wise procedure that has proved more powerful than the standard Bonferroni correction.

The comparison of the learning curves in the IGT was based on 3 × 5 mixed ANOVA (adjusted by the participants' age and education level), which is defined as the betweensubjects factor of the group (bad CBT outcome, good CBT outcome, and control condition) and as the within-subjects factor for the score in each block. Polynomial contrasts for the within-subject factor assessed linear, quadratic, cubic, and quartic trends in the learning curves. Comparing the IGT-Learning score between the three groups was also based on analysis of variance, which was adjusted by age and education (ANCOVA).

The discriminative capacity of the IGT-Learning score to discriminate between good versus bad outcomes in the CBT was based on Receiver Operating Characteristics (ROC) analysis. This methodology is used in clinical areas to obtain the optimal cut-off in measurement tools using an external reference criterion. In this work, ROC analysis was applied within the ED subsample to obtain the best cut-off in the IGT index to discriminate between patients with bad versus good CBT outcomes. Since selecting the optimal cut-off depends on the prevalence of the criteria and the costs/risks of false classifications [55], the analysis was performed considering a distribution for the CBT outcome equal to the sample and a cost for a false negative double compared to the cost for a false positive.

Logistic regression valued the capacity of the optimal cut-off point in the IGT-Learning global measure to differentiate between bad and good outcomes. Goodness-of-fit was assessed with the Hosmer and Lemeshow test.

In this study, the effect size was based on the eta-squared coefficient (η2) for quantitative measures (values of 0.06, 0.10, and 0.25 were interpreted as low–poor, moderate– medium, and large–high effect size) [56], and in Cramer's-V coefficient for categorical (values of 0.10, 0.30, and 0.50 were interpreted as low–poor, moderate–medium, and large–high effect size) [57].
