*2.1. Participants*

The present study included all the patients enrolled in the systematic prospective follow-up of the Bipolar Disorders Unit of the Hospital Clinic, University of Barcelona, Catalunia, Spain, from October 1998 to November 2021. Barcelona's Bipolar and Depressive Disorders Unit provides both tertiary- and secondary-level care. The unit enrolls difficultto-treat BD patients BD derived from all over Spain, and patients from a catchment area of approximately 170,000 inhabitants in Barcelona in particular [26]. Trained psychiatrists regularly treat more than 700 patients according to a local protocol based on international clinical guidelines [27,28]. The inclusion criteria were as follows: (i) older than 18 years of age, and (ii) diagnosis of BD type I (BDI) or type II (BDII) according to the Diagnostic and Statistical Manual of Mental Disorders (DSM) IV [29], DSM-IV-TR [30], or DSM-5 criteria [31]. In addition, the exclusion criteria were the presence of severe organic diseases requiring urgen<sup>t</sup> treatment at baseline assessment or severe cognitive, motor, or visual impairment. All participants provided written informed consent for this ethical committeeapproved study (approval code: HCB/2017/0432).

#### *2.2. Clinical Variables Assessment*

Patients were assessed using the Structured Clinical Interview for DSM Disorders [32]. The main sociodemographic and clinical characteristics were collected through an ad hoc schedule. If specific information was not collected during the baseline assessment, the electronic clinical records of each patient were inquired. Collected variables included age, education, working and living status, duration of illness, current pharmacological treatment (if maintained for at least six months), the number of hospitalizations and affective episodes, and lifetime aggressive behavior, among other variables of interest (see Table 1). Predominant polarity was defined according to the standard definition created in our unit and repeatedly validated [33]. A family history of psychiatric disorder was defined as having a first-degree relative diagnosed with and/or treated for any mood disorder, including major depression, cyclothymia, and dysthymia. The term "suicide attempt" refers to intentional self-inflicted poisoning, injury, or self-harm with a deadly intent without death. The presence of lifetime SUD was assessed according to DSM-IV [29], DSM-IV-TR [30], or DSM-5 criteria [31], including drug-specific diagnoses for ten substances: alcohol, cannabis, cocaine, heroin, hallucinogens, inhalants, prescription opioids, sedatives/tranquilizers, stimulants, and other drugs (e.g., ecstasy or ketamine). Each DSM-5 SUD diagnosis required positive responses to 2 or more of the 11 criteria for each drug-specific SUD.

#### *2.3. Statistical Analyses*

All of the analyses were conducted with RStudio, R version 4.1.2 [34]. The Kolmogorov– Smirnov test was used to assess whether continuous variables displayed a normal distribution. The parametric comparative analyses for the demographic and clinical characteristics of the groups (SUD vs. non-SUD) were done with unpaired t-test with Bonferroni post hoc correction for continuous variables; for non-parametric distributions, a Mann–Whitney *U* Test or Kruskal–Wallis Test was used, where appropriate. Categorical data were analyzed by Chi-square analysis.

#### 2.3.1. Missing Data

We inspected missing data using the R package "skimr" [35], and these were assumed to be missing at random. Therefore, we included only the variables presenting at least 75% of the available data in the model. For those contributing less than 25% of missing data, missing data were imputed using the R package "missRanger" [36], which is based on the algorithm of "missForest" (Stekhoven and Bühlmann 2012) and uses a random forest (RF) approach. The parameter "num.trees" was set at 5000, and the out-of-bag (OOB) errors were calculated for each variable in order to measure the accuracy according to the method outlined in previous evidence [37]. OOB errors ranged from 0 (better performance) to 1 (worse performance).

#### 2.3.2. Random Forest

We performed RFs using the R packages "RandomForest" [38] and "caret" [39] to tune the algorithm. A series of RFs were primarily used to select the most important features, which predicted different outcomes. To select the most critical variables, the "mean decrease of Gini coefficient" was adopted to observe which variables substantially contributed to the homogeneity of the nodes and leaves in the resulting RF.

We conducted a first RF considering the presence of lifetime SUD as the predicted condition in the entire sample. Next, among people with SUD, as the first step, we selected only people with a lifetime diagnosis of AUD who consumed alcohol alone or combined with other substances (i.e., cocaine, cannabis, hallucinogens, or MDMA). Then, we conducted a second and third RF considering the presence of AUD mono-use or comorbid with another SUD as the predicted conditions, compared with non-abusing controls; finally, we performed a fourth RF to differentiate people with AUD mono-use from people with AUD+SUD. RF is a classification algorithm that combines multiple decision trees made

by randomly selected bootstrap samples, mainly affected by unbalanced data. We used the R package "ROSE" [40] to under sample the predicted class that presented the most observations in order to obtain a balanced dataset. As a secondary step, to exploratory assess the accuracy of the prediction model, a "train" and a "test" dataset were prepared, containing 80% and 20% of the original observations, respectively. We initially used a repeated cross-validated RF (10-folds, ten repeats) on the "train" dataset and then tuned some hyperparameters (i.e., number of trees, number of features randomly selected at each node, and size of the node) during the learning phase, to obtain the best accuracy value. Then, we applied the trained model to the "test" dataset, and calculated measures such as accuracy, sensitivity, specificity, the OOB estimate, and the F-score (F1), which alternatively are tests of the model's accuracy. To graphically present our results, we produced a confusion matrix.

**Table 1.** Socio-demographic and clinical characteristics of the sample classified according to the presence of substance use disorder(s).



**Table 1.** *Cont.*

BD = bipolar disorder; SUD = substance use disorder; *n* = number of cases; *p* = statistical significance; SD = standard deviation; χ2 = Chi-square test; t = Independent Samples *t*-test; Z = Mann–Whitney U test.

#### 2.3.3. Multiple Logistic Regression

The variables selected by the RFs that mostly decreased the homogeneity of the nodes and leaves of the model were considered as independent variables in multiple logistic regressions, adjusting for sex, duration of illness, and age at BD onset, as these are known factors associated with SUD in previous studies [6,20,41]. The dependent variables were the same ones considered in the corresponding RFs. Odds ratios (ORs) and their 95% confidence intervals (CIs) were estimated to assess the significance of each result. The variance explained was calculated as Nagelkerke's pseudo R<sup>2</sup> with the R package "fmsb" [42]. These regression models were useful to provide further information on the association between the relevant variables identified by the RFs and the various SUD comorbidities of interest.
