Next Article in Journal
Correction: Kim et al. Factors Affecting Abdominal Obesity: Analyzing National Data. Healthcare 2024, 12, 827
Previous Article in Journal
How Useful Is TikTok for Patients Searching for Carpal Tunnel Syndrome-Related Treatment Exercises?
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Sociodemographic Variables in Offender and Non-Offender Patients Diagnosed with Schizophrenia Spectrum Disorders—An Explorative Analysis Using Machine Learning

1
Adult Psychiatry and Psychotherapy, University Hospital of Psychiatry Zurich, Faculty of Medicine, University of Zurich, 8006 Zurich, Switzerland
2
Department of Consultation-Liaison-Psychiatry and Psychosomatic Medicine, University Hospital Zurich, University of Zurich, 8091 Zurich, Switzerland
3
German Center for Neurodegenerative Diseases (DZNE) within the Helmholtz Association, 39120 Magdeburg, Germany
4
Forensic Psychiatry and Psychotherapy, University Hospital of Psychiatry Zurich, Faculty of Medicine, University of Zurich, 8006 Zurich, Switzerland
*
Author to whom correspondence should be addressed.
Healthcare 2024, 12(17), 1699; https://doi.org/10.3390/healthcare12171699
Submission received: 18 July 2024 / Revised: 22 August 2024 / Accepted: 23 August 2024 / Published: 26 August 2024

Abstract

:
With the growing availability of medical data and the enhanced performance of computers, new opportunities for data analysis in research are emerging. One of these modern approaches is machine learning (ML), an advanced form of statistics broadly defined as the application of complex algorithms. ML provides innovative methods for detecting patterns in complex datasets. This enables the identification of correlations or the prediction of specific events. These capabilities are especially valuable for multifactorial phenomena, such as those found in mental health and forensic psychiatry. ML also allows for the quantification of the quality of the emerging statistical model. The present study aims to examine various sociodemographic variables in order to detect differences in a sample of 370 offender patients and 370 non-offender patients, all with schizophrenia spectrum disorders, through discriminative model building using ML. In total, 48 variables were tested. Out of seven algorithms, gradient boosting emerged as the most suitable for the dataset. The discriminative model finally included three variables (regarding country of birth, residence status, and educational status) and yielded an area under the curve (AUC) of 0.65, meaning that the statistical discrimination of offender and non-offender patients based purely on the sociodemographic variables is rather poor.

1. Introduction

Due to the increasing amount of medical data and the advances in digitalization, there is a growing need for statistical approaches with the ability to comprehensively and efficiently analyze said data. One of these approaches is machine learning (ML). Machine learning is an evolving branch of complex algorithms designed to mimic human intelligence by learning from the environment. As an example of artificial intelligence, ML algorithms learn from and adapt their performance to the raw data fed to them [1,2]. The “learning” can be understood as the process of finding patterns in a dataset, helping to uncover new information without going through hypothesis testing [3]. Following technical progress in processing power, the algorithms can easily be applied to large data sets while keeping computing times at acceptable levels [4]. This allows the analysis of not only a large quantity of variables but also their interplay, which makes ML highly suitable for the evaluation of multifactorial outcomes [5,6].
In psychiatric research, statistics mostly rely on null hypothesis significance testing (NHST) or regression models, presenting a simple representation of the relationship between an independent and a dependent variable [7]. However, these methods have certain shortcomings. (1) To avoid an accumulation of alpha error, only a limited number of variables can be analyzed, restricting the possibility of investigating multifactorial outcomes [8]. (2) NHST, for example, is not able to analyze the interplay of variables, impairing the possibility of investigating complex relationships and phenomena [4]. (3) As the statistical approach is selected in advance, it may not fit the data structure. (4) In accordance with the principle of falsification in NHST, null hypotheses can only either be falsified—meaning that the alternative hypothesis is assumed—or not; however, this is not, in turn, the verification of the null hypothesis [9]. This requires the research question to be precisely and unambiguously defined in advance, which limits explorative investigations for undetected patterns in data. Repeatedly, despite its legitimation, NHST, as the “default statistical practice” [10], has come under increasing criticism as it does not accommodate all types of research undertakings [11,12,13,14].
Mental disorders are generally influenced by a variety of factors and their interplay, instead of being monocausal and linear. For instance, schizophrenia may develop in individuals with (a) a certain vulnerability and (b) a certain set of conditions [15,16]. The same is true for outcomes and events during the course of a mental disorder, e.g., aggression [17]. Oversimplification in the understanding of mental disorders has been identified as a conceptual crisis in psychiatric research [18]. Keeping the limitations of widely used statistical approaches outlined above in mind, it becomes clear that they may not be suitable when investigating multidimensional constructs as they are presented in psychiatry. Here, the qualities of ML offer new opportunities. Apart from their ability to detect patterns in large datasets and analyze complex, non-linear interrelations, ML algorithms can also help to evaluate the quality of a statistical model, e.g., with receiver operating characteristics (ROCs), area under the curve (AUC), sensitivity, or specificity [4]. This quantification of a predictive model allows a transparent evaluation of the soundness of the model [19].
So far, ML is still rarely used for research purposes in forensic psychiatry in general, and if so, mostly for the prediction of violence [20]. Existing work mainly focuses on risk assessment, while rather little fundamental research is conducted [21,22]. This poses a major issue since it is vital to understand the differences between patients diagnosed with schizophrenia spectrum disorders (SSD)who commit crimes and those who do not. Recent research performed by our group investigated the similarities and differences of these patients in various aspects, e.g., aggression and suicidal behavior [23,24]. So far, there is no comprehensive ML analysis of solely sociodemographic factors. Filling this gap in knowledge might help to prevent patients from becoming perpetrators at all.
The role of sociodemographic factors in the development of criminal behavior has been repeatedly discussed, e.g., low economic status, low achievement in school, or social isolation [25,26,27]. Therefore, a potential association between sociodemographic variables and criminal behavior shall be investigated. While individuals with SSDs have an elevated risk of expressing criminal behavior, the majority of affected patients do not come into conflict with the law but rather are at higher risk of victimization compared to the general public [28,29,30,31]. With the expression of criminal behavior in mental disorders being under-researched, it seems sensible to further evaluate possible contributors and protective factors to close this research gap. Regarding statistical procedures, ML outperforms common techniques due to various reasons, as described above. We therefore opted to use the means of ML due to greater flexibility and the lack of need for a priori assumptions [32].
The following study aims to outline the procedures, benefits, and limitations of ML in psychiatric research in a paradigmatic research question with the objective of discriminating between offenders and non-offenders with SSDs based solely on sociodemographic variables. To our knowledge, this study is the first one to follow this goal.

2. Materials and Methods

2.1. Study Population

The total sample comprised a study group of offender patients (OP, n = 370) and a comparison group of non-offender patients (NOP, n = 370), all of whom were diagnosed with SSDs according to ICD-9 or ICD-10 (chapters F20.0 to F25.9) [33,34]. Both groups were matched by gender. To evaluate whether comparability between the two was feasible, basic sample characteristics such as country of birth, marital status, and diagnosis, as well as comorbidities and aggressive behavior, were assessed.

2.1.1. Forensic Psychiatric Subpopulation (OP)

The OP sample (n = 370) stemmed from patients admitted to the Centre for Inpatient Forensic Therapy at the University Hospital of Psychiatry Zurich between 1982 and 2016, with the majority being admitted after the year 2000. This institution, being the largest forensic psychiatric inpatient treatment facility in German-speaking Switzerland, serves two purposes: patients are either admitted by court order for reduction of their risk of reoffending through treatment of their underlying psychiatric illness or they are referred from penitentiary settings for the treatment of acute psychiatric syndromes. OP patients were convicted because of violent (e.g., homicide, physical or sexual assault, and arson) and non-violent reasons (e.g., threatening behavior, crimes against property, and violation of traffic, drug, and firearm regulations).

2.1.2. General Psychiatric Subpopulation (NOP)

The NOP (n = 370) sample stemmed from patients admitted for general psychiatric inpatient treatment at the Centre for Integrative Psychiatry of the University Hospital of Psychiatry Zurich. Amongst other specialized wards, the facility focuses on the subacute treatment of psychotic disorders, usually for 6 to 8 weeks or longer if needed.

2.2. Data Source and Extraction

Data were retrospectively assessed based on the patients’ medical files. The case files were rather comprehensive and included extensive information on the referenced hospitalization, including reports by various medical healthcare professionals and reports on previous in- and outpatient treatment. For the OP group, the files also included testimonies, police reports, court proceedings, and information regarding the course of previous imprisonments and detentions. Data assessment and extraction were performed through directed qualitative content analysis by two experienced psychiatrists according to a rating protocol based on a set of criteria originally described by Seifert and Nedopil and adapted under the supervision of experienced forensic and general psychiatric researchers and clinicians [35]. To evaluate for inter-rater reliability, a random subsample of 10% of all cases was independently encoded by another researcher. With a Cohen’s Kappa of 0.78, inter-rater reliability was considered substantial [36].

2.3. Selection of Predictor Variables

Since the purpose of our research presented here was to determine whether—and if so, which—sociodemographic factors divide the NOP group from the OP group, 48 items from the following domains were selected as predictor variables: age, gender, country of birth, status of residency, profession of faith, marital status and close family, living situation at the time of admission to the referenced hospitalization, highest school-leaving certificate, learned profession and employment status at the time of admission to the referenced hospitalization, type of legal guardian during childhood and adolescence, and membership in social associations as a measure of social integration. We consider this selection appropriate since these variables often play an important role in public discussions about criminal development and—despite the retrospective study design—are still collectible without influencing data quality. For a detailed list of all predictor variables and their precise definitions, please refer to Appendix A.

2.4. Data Analysis Using Machine Learning

We used supervised ML to uncover the most significant variables that distinguish the OP group from the NOP group among a large set of parameters, selecting the model with the highest predictive power. Unlike unsupervised ML, which is employed to uncover hidden patterns in datasets without a defined outcome variable, supervised ML trains algorithms on labeled datasets and uses these algorithms to predict specific outcomes—in this case, “OP: true” vs. “OP: false” [37].
Figure 1 provides an overview of the statistical steps, which are further detailed below. All steps were carried out using R version 3.6.3 (R Project, Vienna, Austria) and the MLR package v2.171 (Bischl, Munich, Germany [38]). R and the MLR package are common and easy-to-use software with reliable and replicable codes. In particular, the MLR package allows for many ML operations, including variable reduction and imputation. Calculations for the confidence intervals of the balanced accuracy were performed with MATLAB R2019a (MATLAB and Statistics Toolbox Release 2012, The MathWorks, Inc., Natick, MA, USA; License obtained via the University of Zurich) using the add-on “computing the posterior balanced accuracy” v1.0.

2.4.1. Preprocessing

Categorical variables were transformed into binary code, whereas continuous and ordinal variables remained unadjusted. The outcome variable was categorized as either “OP: true” or “OP: false”, with the latter being defined as the positive class in further analyses (Figure 1, Step 1). In ML, the algorithm or model needs to be trained before it can be applied to new data [39]. To provide the algorithm with data from which it could learn to detect patterns, our data set was split into one training set, comprising 70% of all cases, and a validation set with the remaining 30% of all cases, which was stored aside and remained untouched for the following process (Figure 1, Step 2).
To allow the inclusion of the total population and to avoid omissions resulting in an increased risk of bias, the imputation of missing values was carried out [40]. We employed mean imputation for numerical variables and mode imputation for categorical variables due to their simplicity and efficiency. Mean imputation helps preserve the central tendency of the data, reducing biases in the mean structure. Mode imputation maintains the distribution of the most frequent categories, ensuring the integrity of categorical data distributions. To apply the same coefficients in the imputation of missing values in the validation set, we created an “ImputationDesc” object with the coefficients used in the imputation on the training set (see Figure 1, Step 3a). The ImputationDesc object stores all relevant information about the imputation and can be used to impute the test data set the same way as the training data.
A primary goal of this study was to identify the key variables among the 48 possible ones. However, as data are mostly nonlinear and nonparametric, there was a high chance of overfitting, a common obstacle in ML. In practice, overfitting refers to the algorithm learning too well from the training data, with random fluctuations in the data being picked up by the model [41]. We conducted variable reduction using a random forest algorithm (Figure 1, Step 3b). Initially, all available variables were included. Iteratively, the least important variables, as determined by random forest importance scores, were removed. The reduction process continued until the addition of another variable did not improve the AUC by more than 5%. This approach ensured that only the most predictive variables were retained, optimizing the model’s performance and interpretability. This resulted in 3 predictor variables (see Section 3) and also helped to prevent extensive computing times. Thus, the preprocessing was concluded.

2.4.2. Training of the Algorithm

Seven algorithms—logistic regression, decision trees, random forest, gradient boosting, k-nearest neighbor (KNN), support vector machines (SVMs), and naïve Bayes—were applied to the training set for building discriminative models from multiple perspectives and to ensure the robustness of our results. Logistic regression and naïve Bayes provide straightforward, interpretable models, while decision trees and random forest offer flexibility and resistance to overfitting. Gradient boosting and SVMs are powerful for handling complex, non-linear relationships, and KNN is suitable for capturing local data structures. In employing multiple algorithms, the authors aimed to facilitate the comprehensive validation of findings and increase confidence in the results by using cross-verifying patterns identified by different methods [42,43]. The algorithms were assessed according to the parameters listed in Table 1. The model with the best performance was then selected for model validation on the validation set (Figure 1, Step 4).
As discussed above, avoiding overfitting was crucial in the process. To reduce the risk beyond dimensionality reduction, we conducted cross-validation on the training set in the form of nested resampling [45]. The entire data processing and model training process was conducted with cross-validation, and the models’ performance was tested in an outer loop also embedded in cross-validation. This approach allowed us to artificially create different subsamples of the same dataset while keeping the validation subset untouched. (Figure 1, Step 5).

2.4.3. Validation of the Algorithm

The following steps were all performed on the validation set (30% of the total population), which had remained strictly untouched by the procedures described in steps 3–5. Imputation of missing values was carried out in the same manner as on the training set, with the imputation weights previously saved (Figure 2, Step 1).
The most suitable model, which had been identified in Figure 1, Step 4, was applied and evaluated in terms of its performance parameters (Figure 2, Step 2).
In the last step, the identified predictor variables were ranked in accordance with their relative influence within the selected model (Figure 2, Step 3).

3. Results

The basic characteristics of our sample, which were evaluated to check comparability between OP and NOP samples, showed a similar distribution of age and gender, as well as psychiatric main diagnosis, with the majority of patients suffering from paranoid schizophrenia. Regarding psychiatric comorbidities, OP patients showed a higher prevalence of personality disorders and substance use disorders (Table 2).
Out of the seven algorithms applied in the model building process, gradient boosting showed the best performance parameters on the training set, yet the AUC only yielded 0.69 (Table 3). With a sensitivity of 77%, the algorithm identified nearly ¾ of all NOP samples correctly. At the same time, the best-performing algorithm was able to identify only half of all OP samples correctly.
Out of all possible 48 predictors/discriminative variables (see Appendix A for the full list), the following remained after the reduction of dimensionality through random forest: Switzerland as country of birth, illegal residency in Switzerland, and failure to complete compulsory schooling (Table 4). Adding another item to the model did not significantly improve the AUC, meaning that the three variables were more indicative of discrimination between the groups than all other variables.
After having applied the gradient boosting model to the validation set, both sensitivity and specificity yielded 63%. The AUC was 0.65 (95% confidence interval 0.58–0.72), which was slightly lower than on the training set (Table 5). Figure 3 displays the ROC curve (a) and the confusion matrix (b), displaying the numbers of true positives/negatives and false positives/negatives of the final model.
The variables contributed differently to the model: country of birth (Switzerland) emerged as the most influential, closely followed by illegal residence in Switzerland and failure to complete compulsory schooling (see Figure 4).

4. Discussion

By using supervised ML, we were able to analyze whether sociodemographic characteristics are powerful discriminative variables between offenders and non-offenders with schizophrenia spectrum disorders. As outlined above, the application of ML is useful when analyzing complex phenomena with intertwining variables, such as criminal behavior, which is considered to be driven by multiple factors. A similar analysis with NHST, which is widely applied to psychiatric research regardless of whether it is actually suitable for the research question, would not have been feasible, as it does not allow the analysis of the interplay of multiple variables, and the number of variables in this analysis would have led to an accumulation of alpha error. Furthermore, supervised ML offers the possibility of exploring data without the need to define the hypothesis in advance, which allows the discovery of unknown patterns in data. This makes ML especially suitable for exploratory analyses in areas characterized by scarcity in the pre-existing literature, which can be used to form a sound and distinct hypothesis. The presented study aims to serve as an example of ML as a powerful data-driven tool to analyze a wealth of complex data, detecting new patterns and thus offering new possibilities for research off the beaten path of widely applied traditional statistical methods.
While the current study focused on the presentation of an exemplary application of a machine learning-based methodology in psychiatric research, we would like to discuss the results on a clinical level. Regarding the baseline parameters of both study groups, the OP group had a higher proportion of comorbid substance use disorders. This finding corresponds to previous research stating that patients suffering from SSDs are more likely to commit violent crimes in cases of comorbid substance abuse [28,29].
In our model, “country of birth: Switzerland” and “illegal residence in Switzerland” emerged to be the most and second-most influential, respectively. OP patients were less frequently born in Switzerland and simultaneously resided more often without a legal basis. In other words, a history of migration turned out to be highly important when distinguishing between offending and non-offending SSD patients. While the personal experience of migration has been shown to be a risk factor for developing psychotic disorders [46,47,48], the differences regarding criminal behavior are not easy to explain. In a sample of forensic patients in Denmark, the proportion of migrants committing violent crimes exceeded that of individuals of Danish ethnicity. This finding was attributed to the elevated risk of psychosis in migration and the underlying linkage of schizophrenia and violent behavior [49]. A population study from Sweden covering a span of nearly 40 years found that both male and female schizophrenia patients not born in Sweden were at higher risk of committing violent acts, although the factor “Born abroad” turned out to be only one among several others, such as substance abuse or past violence [50]. Several research works covering immigration and crime found no or only weak associations (either positive or negative), but these studies did not investigate violence in the context of psychiatric disorders in general and SSD in particular. In addition, no subgroup analyses regarding the social status of migrants, as well as the country and culture of origin, were conducted [51,52,53]. Transferability to the results of this study is therefore impaired.
Moreover, illegal residency might result in barriers regarding access to healthcare [54,55]. Although all individuals residing in Switzerland, undocumented immigrants included, can receive mental health care upon request [56], disadvantages in daily life cannot be ruled out as illegal residents might fear denunciation to authorities and thus decline medical consultation or may not be aware of available health care structures. In addition, treatment is often limited to essential procedures in cases of emergency; thus, access to all potentially available services is not granted [57]. Consequently, existing psychiatric or emerging disorders are possibly at higher risk of exacerbation.
In general, schizophrenia is associated with reduced academic achievements, and affected individuals are less likely to attain higher educational levels [58]. In our study, OP patients showed a higher prevalence of failure to conclude compulsory schooling (26%; NOP: 5.6%). The Swedish-based population study previously mentioned found higher rates of violent crimes in male schizophrenic patients with low intelligence quotient [50], which is a risk factor for dropping out of school [59]. In our study, IQ was not tested, and furthermore, the higher rate of failure in forensic patients is probably caused by multiple factors. In addition to the factors mentioned above, migration while school-aged impairs school attendance and performance [60], and, depending on the country of origin, individuals potentially face scarcely developed educational systems. Lastly, the high rate of failure may also be attributed to a higher burden of disease.
The performance metrics of our Gradient Boosting Model indicate only poor to moderate efficacy. The Balanced Accuracy of 63.1% suggests a fair balance in identifying both positive and negative cases, as sensitivity and specificity perform similarly. An AUC of 0.65 reflects poor discriminatory power [61]. The PPV of 66.4% and NPV of 59.6% indicate the model’s moderate predictive reliability [62]. Importantly, the NPV with a confidence interval of 49.8% to 68.8% is the least significant, since the lower threshold of the CI is below 50%. This implies that there is substantial uncertainty around the NPV estimate, and in some cases, the NPV might be less than 50%, meaning that the model’s ability to correctly identify true negatives could be as bad as random guessing.
The finding that the model’s performance was rather poor sparks hope for clinicians, as it shows that variables regarding the patient’s sociodemographic background, which mostly cannot be therapeutically targeted due to their biographical nature, are not, on their own, determinants of criminal development as a negative outcome during the course of SSDs.
That said, both groups are more similar than one could have expected. The present AUC and the balanced accuracy should therefore not be recognized as weak statistical performance parameters, but rather indicate that the OP and NOP groups have many sociodemographic features in common.
A crucial step is the separation of training and validation data, which allows the model to face an unbiased sample, or, as Matthew Carbone called it, a “litmus test” for the performance of the model on new data [63]. However, this requires both sets of data to be of the same type, meaning that it needs to be of the same distribution. In general, keeping in mind that ML is a data-driven approach, the algorithms’ performance is as good as the data provided [64]. In our case, there are some limitations to our data that need to be addressed, the first being the retrospective data extraction bias, in which data quality is not comparable to a prospectively standardized study, resulting in decreased robustness [65]. Additionally, various unspecific variables that could influence the course of treatment cannot be measured in a scientifically sound way in retrospective studies. This, for example, applies to important states such as perceived loneliness. However, with lengths of stay of up to several years in court-mandated inpatient treatment, forensic psychiatric research faces the problem of collecting large enough samples for robust results, which is why we opted for the retrospective approach regardless of its limitations. This brings us to the second caveat in data quality—while the sample of 740 can be considered large from a forensic psychiatric point of view, it is rather small for ML purposes. The smaller the sample, the less training data are fed to the algorithm, and the lower its statistical power [66]. This again stresses the fact that ML is ideal for analyzing big data but is not suitable for very small sample sizes. These also increase the risk of unequally distributed training and validation datasets. Therefore, the results obtained in this study should be reproduced in a multicenter project in order to collect a sufficient amount of data.
Regarding further methodological limitations, overfitting, a common issue in ML, has to be discussed. The term overfitting refers to a model incorporating noise in the training data to the extent that a high error rate occurs on new data, reducing its applicability and generalizability [67]. However, there are statistical steps to counteract the effect of overfitting, such as the cross-validation applied in this case [41]. By creating five random subsamples of our data and always using one as a validation set and the other four as training sets, error estimation is averaged over all five trials (folds), reducing variance and bias. While it is possible to perform cross-validation with even more folds, this also increases computing time and can be rather time-consuming depending on the technical prerequisites available. Data quality might also be impaired by missing values. In our study, all variables with more than 33% missing observations were omitted. The missing values of the remaining variables were then imputed, a potentially problematic but still recommended approach in order to maintain data quality [68,69]. Lastly, it is vital to consider various ethical challenges associated with the use of artificial intelligence in the field of forensic psychiatry, e.g., racial or gender bias [70].
Recently, a growing discussion about Explainable Artificial Intelligence (XAI) has emerged. XAI fosters transparency and understanding of machine learning models and advocates a human understanding of AI-generated models. Only then is it possible to determine when to trust the AI and when the AI should be distrusted, which is especially important in sensitive fields such as forensic psychiatry [71,72]. In our study, XAI might help clarify how and why the particular sociodemographic variables influence the model’s predictions. While our research focuses on identifying these variables to improve the care of non-European migrants in general and forensic settings, XAI helps to ensure that the model’s decisions are transparent and free from bias, thereby preventing further stigmatization of this vulnerable group. Additionally, XAI could provide insights into the moderate predictive power of our model. Future projects should therefore focus on incorporating XAI more thoroughly.

5. Conclusions

Merely sociodemographic attributes are not sufficient to explain the differences between general and forensic psychiatric patients suffering from schizophrenia spectrum disorders, as indicated by our model’s mediocre performance parameters. What appears to be negative is actually a glimmer of hope. In contrast to the variables described previously, a variety of other features can be influenced by clinicians, e.g., psychopathology, integration into social communities, and comorbid substance abuse.
Supervised Machine Learning facilitates multiple chances to explore undetected patterns within complex datasets. In particular, the under-researched field of forensic psychiatry benefits from these modern and increasingly available methods.

Author Contributions

A.B.H.: Conceptualization, Data curation, Investigation, Methodology, Validation, Visualization, Writing—original draft, Writing—review & editing. M.D.: Methodology, Writing—review & editing. L.M.: Conceptualization, Formal analysis, Investigation, Resources, Supervision, Writing—review & editing. J.K.: Conceptualization, Data curation, Formal analysis, Investigation, Software, Project administration, Methodology, Supervision, Writing—review & editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

This research project was reviewed and approved by the Ethics Committee Zurich under the reference number KEK-ZH-NR 2014–0480, approval date: 19 May 2015.

Informed Consent Statement

Patient consent was waived due to the retrospective design, for which formal consent is not necessarily required in the Canton of Zurich, Switzerland.

Data Availability Statement

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Variable CodeVariable Label
y (Outcome variable)Is the patient forensic?
SD1Age at admission?
SD2Sex according to patient files
SD3aCountry of birth: Switzerland?
SD3bCountry of birth: Balkan region country?
SD3cCountry of birth: other European country?
SD3dCountry of birth: Middle east?
SD3eCountry of birth: Africa?
SD3fCountry of birth: other country?
SD3gIf 3b–3f do not apply, legal residence in Switzerland?
SD4aChristian faith?
SD4bIslamic faith?
SD5aMarital Status (at time of the investigated offence)—married?
SD5bMarital Status (at time of the investigated offence)—single?
SD6aLiving situation (at time of the investigated offence)—mental health care institution
SD6bLiving situation (at time of the investigated offence)—assisted living
SD6cLiving situation (at time of the investigated offence)—home alone
SD6dLiving situation (at time of the investigated offence)—home with others
SD6eLiving situation (at time of the investigated offence)—at parents’ home
SD6fLiving situation (at time of the investigated offence)—with relatives
SD6gLiving situation (at time of the investigated offence)—homeless
SD6hLiving situation (at time of the investigated offence)—prison
SD6iLiving situation (at time of the investigated offence)—other
SD7aHighest graduation (at time of the investigated offence)—no compulsory schooling
SD7bHighest graduation: (at time of the investigated offence)—compulsory schooling
SD7cHighest graduation (at time of the investigated offence)—graduation
SD7dHighest graduation (at time of the investigated offence)—college/university
SD8aLearned profession: no apprenticeship
SD8bLearned profession: college/university degree
SD8cLearned profession: official/civil servant
SD8dLearned profession: mercantile job
SD8eLearned profession: non-mercantile job
SD8fLearned profession: crafting job
SD8gLearned profession: other job
SD9Is the patient a nonworker (at time of the investigated offence)?
SD11Is the patient a nonworker (majority of occupational time)?
SD12aRank at job: basal (majority of occupational time)?
SD12bRank at job: complex (majority of occupational time)?
SD14Own children?
SD15Any siblings?
SD17Was the legal guardian married?
SD18aWho was/is the legal guardian—birth parents?
SD18bWho was/is the legal guardian—single parents?
SD18cWho was/is the legal guardian—step-parents?
SD18dWho was/is the legal guardian—one step-parent?
SD18fWho was/is the legal guardian—grandparents?
SD18gWho was/is the legal guardian—foster parents?
SD18hWho was/is the legal guardian—child home?
SD19Member in a (leisure) club?

References

  1. El Naqa, I.; Murphy, M.J. What Is Machine Learning? In Machine Learning in Radiation Oncology: Theory and Applications; El Naqa, I., Li, R., Murphy, M.J., Eds.; Springer International Publishing: Cham, Switzerland, 2015; pp. 3–11. [Google Scholar]
  2. Rebala, G.; Ravi, A.; Churiwala, S. Machine Learning Definition and Basics. In An Introduction to Machine Learning; Rebala, G., Ravi, A., Churiwala, S., Eds.; Springer International Publishing: Cham, Switzerland, 2019; pp. 1–17. [Google Scholar]
  3. Nasteski, V. An overview of the supervised machine learning methods. Horizons B 2017, 4, 51–62. [Google Scholar] [CrossRef]
  4. Kirchebner, J. A collection of medical findings using machine learning and their relevance to psychiatry. Swiss Arch. Neurol. Psychiatry Psychother. 2022, 173, w03251. [Google Scholar] [CrossRef]
  5. Gutierrez, D.D. Machine Learning and Data Science: An Introduction to Statistical Learning Methods with R; Technics Publications: Sedona, AZ, USA, 2015. [Google Scholar]
  6. Oquendo, M.A.; Baca-Garcia, E.; Artés-Rodríguez, A.; Perez-Cruz, F.; Galfalvy, H.C.; Blasco-Fontecilla, H.; Madigan, D.; Duan, N. Machine learning and data mining: Strategies for hypothesis generation. Mol. Psychiatry 2012, 17, 956–959. [Google Scholar] [CrossRef] [PubMed]
  7. Silva-Ayçaguer, L.C.; Suárez-Gil, P.; Fernández-Somoano, A. The null hypothesis significance test in health sciences research (1995–2006): Statistical analysis and interpretation. BMC Med. Res. Methodol. 2010, 10, 44. [Google Scholar] [CrossRef]
  8. Ryffel, F.A. Alpha and beta error (accumulation). In The International Encyclopedia of Communication Research Methods; John Wiley & Sons: Hoboken, NJ, USA, 2017; pp. 1–5. [Google Scholar]
  9. Popper, K.R. Logik der Forschung: Zur Erkenntnistheorie der Moderner Naturwissenschaft; Verlag von Julius Springer: Berlin/Heidelberg, Germany, 1935. [Google Scholar]
  10. Szucs, D.; Ioannidis, J.P.A. When Null Hypothesis Significance Testing Is Unsuitable for Research: A Reassessment. Front. Hum. Neurosci. 2017, 11, 390. [Google Scholar] [CrossRef]
  11. Krueger, J. Null hypothesis significance testing: On the survival of a flawed method. Am. Psychol. 2001, 56, 16. [Google Scholar] [CrossRef]
  12. Balluerka, N.; Gómez, J.; Hidalgo, D. The Controversy over Null Hypothesis Significance Testing Revisited. Methodology 2005, 1, 55–70. [Google Scholar] [CrossRef]
  13. Krantz, D.H. The Null Hypothesis Testing Controversy in Psychology. J. Am. Stat. Assoc. 1999, 94, 1372–1381. [Google Scholar] [CrossRef]
  14. Sedgwick, P.M.; Hammer, A.; Kesmodel, U.S.; Pedersen, L.H. Current controversies: Null hypothesis significance testing. Acta Obs. Gynecol Scand 2022, 101, 624–627. [Google Scholar] [CrossRef]
  15. Ingram, R.E.; Luxton, D.D. Vulnerability-stress models. Dev. Psychopathol. A Vulnerability-Stress Perspect. 2005, 46, 32–46. [Google Scholar]
  16. Cheng, S.C.; Walsh, E.; Schepp, K.G. Vulnerability, Stress, and Support in the Disease Trajectory from Prodrome to Diagnosed Schizophrenia: Diathesis–Stress–Support Model. Arch. Psychiatr. Nurs. 2016, 30, 810–817. [Google Scholar] [CrossRef] [PubMed]
  17. Darrell-Berry, H.; Berry, K.; Bucci, S. The relationship between paranoia and aggression in psychosis: A systematic review. Schizophr. Res. 2016, 172, 169–176. [Google Scholar] [CrossRef]
  18. Fava, G.A. The Intellectual Crisis of Psychiatric Research. Psychother. Psychosom. 2006, 75, 202–208. [Google Scholar] [CrossRef] [PubMed]
  19. Zhou, J.; Gandomi, A.H.; Chen, F.; Holzinger, A. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics 2021, 10, 593. [Google Scholar] [CrossRef]
  20. Parmigiani, G.; Barchielli, B.; Casale, S.; Mancini, T.; Ferracuti, S. The impact of machine learning in predicting risk of violence: A systematic review. Front. Psychiatry 2022, 13, 1015914. [Google Scholar] [CrossRef] [PubMed]
  21. Watts, D.; Moulden, H.; Mamak, M.; Upfold, C.; Chaimowitz, G.; Kapczinski, F. Predicting offenses among individuals with psychiatric disorders—A machine learning approach. J. Psychiatr. Res. 2021, 138, 146–154. [Google Scholar] [CrossRef]
  22. Trinhammer, M.L.; Merrild, A.C.H.; Lotz, J.F.; Makransky, G. Predicting crime during or after psychiatric care: Evaluating machine learning for risk assessment using the Danish patient registries. J. Psychiatr. Res. 2022, 152, 194–200. [Google Scholar] [CrossRef]
  23. Kirchebner, J.; Lau, S.; Machetanz, L. Offenders and non-offenders with schizophrenia spectrum disorders: Do they really differ in known risk factors for aggression? Front. Psychiatry 2023, 14, 1145644. [Google Scholar] [CrossRef]
  24. Machetanz, L.; Lau, S.; Habermeyer, E.; Kirchebner, J. Suicidal Offenders and Non-Offenders with Schizophrenia Spectrum Disorders: A Retrospective Evaluation of Distinguishing Factors Using Machine Learning. Brain Sci 2023, 13, 97. [Google Scholar] [CrossRef]
  25. Farrington, D.P.; Loeber, R.; Ttofi, M.M. Risk and protective factors for offending. In The Oxford Handbook of Crime Prevention; Oxford University Press: Oxford, UK, 2012; pp. 46–69. [Google Scholar]
  26. Jolliffe, D.; Farrington, D.P.; Piquero, A.R.; Loeber, R.; Hill, K.G. Systematic review of early risk factors for life-course-persistent, adolescence-limited, and late-onset offenders in prospective longitudinal studies. Aggress. Violent Behav. 2017, 33, 15–23. [Google Scholar] [CrossRef]
  27. Takahashi, A. The role of social isolation stress in escalated aggression in rodent models. Neurosci. Res. 2022. [Google Scholar] [CrossRef] [PubMed]
  28. Fazel, S.; Gulati, G.; Linsell, L.; Geddes, J.R.; Grann, M. Schizophrenia and violence: Systematic review and meta-analysis. PLoS Med 2009, 6, e1000120. [Google Scholar] [CrossRef] [PubMed]
  29. Fazel, S.; Långström, N.; Hjern, A.; Grann, M.; Lichtenstein, P. Schizophrenia, Substance Abuse, and Violent Crime. JAMA 2009, 301, 2016–2023. [Google Scholar] [CrossRef]
  30. Fazel, S.; Sariaslan, A. Victimization in people with severe mental health problems: The need to improve research quality, risk stratification and preventive measures. World Psychiatry 2021, 20, 437. [Google Scholar] [CrossRef] [PubMed]
  31. Latalova, K.; Kamaradova, D.; Prasko, J. Violent victimization of adult patients with severe mental illness: A systematic review. Neuropsychiatr. Dis. Treatment. 2014, 10, 1925–1939. [Google Scholar] [CrossRef]
  32. Rajula, H.S.R.; Verlato, G.; Manchia, M.; Antonucci, N.; Fanos, V. Comparison of Conventional Statistical Methods with Machine Learning in Medicine: Diagnosis, Drug Development, and Treatment. Medicina 2020, 56, 455. [Google Scholar] [CrossRef]
  33. World Health Organization. ICD-10: International Statistical Classification of Diseases and Related Health Problems: Tenth Revision; World Health Organization: Geneva, Switzerland, 2004. [Google Scholar]
  34. Slee, V.N. The International classification of diseases: Ninth revision (ICD-9). Ann. Intern. Med. 1978, 88, 424–426. [Google Scholar] [CrossRef]
  35. Seifert, D. Die Entwicklung des psychiatrischen Massregelvollzugs (§ 63StGB) in Nordrhein-Wesfalen. Psychiat Prax 1997, 24, 237–244. [Google Scholar]
  36. Brennan, P.F.; Hays, B.J. Focus on psychometrics the kappa statistic for establishing interrater reliability in the secondary analysis of qualitative clinical data. Res. Nurs. Health 1992, 15, 153–158. [Google Scholar] [CrossRef]
  37. Osisanwo, F.; Akinsola, J.; Awodele, O.; Hinmikaiye, J.; Olakanmi, O.; Akinjobi, J. Supervised machine learning algorithms: Classification and comparison. Int. J. Comput. Trends Technol. (IJCTT) 2017, 48, 128–138. [Google Scholar]
  38. Bischl, B.; Lang, M.; Kotthoff, L.; Schiffner, J.; Richter, J.; Studerus, E.; Casalicchio, G.; Jones, Z. Machine learning in RJ Mach. Learn. Res 2016, 17, 5938–5942. [Google Scholar]
  39. Joseph, V.R. Optimal ratio for data splitting. Stat. Anal. Data Min. ASA Data Sci. J. 2022, 15, 531–538. [Google Scholar] [CrossRef]
  40. Thomas, T.; Rajabi, E. A systematic review of machine learning-based missing value imputation techniques. Data Technol. Appl. 2021, 55, 558–585. [Google Scholar] [CrossRef]
  41. Ying, X. An overview of overfitting and its solutions. J. Phys. Conf. Ser. 2019, 1168, 022022. [Google Scholar] [CrossRef]
  42. James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning: With Applications in R; Springer New York: New York, NY, USA, 2013. [Google Scholar]
  43. Hastie, T.; Tibshirani, R.; Friedman, J. Model Assessment and Selection. In The Elements of Statistical Learning: Data Mining, Inference, and Prediction; Springer New York: New York, NY, 2009; pp. 219–259. [Google Scholar]
  44. Varoquaux, G.; Colliot, O. Evaluating Machine Learning Models and Their Diagnostic Value. In Machine Learning for Brain Disorders; Colliot, O., Ed.; Springer US: New York, NY, USA, 2023; pp. 601–630. [Google Scholar]
  45. King, R.D.; Orhobor, O.I.; Taylor, C.C. Cross-validation is safe to use. Nat. Mach. Intell. 2021, 3, 276. [Google Scholar] [CrossRef]
  46. Dykxhoorn, J.; Hollander, A.C.; Lewis, G.; Magnusson, C.; Dalman, C.; Kirkbride, J.B. Risk of schizophrenia, schizoaffective, and bipolar disorders by migrant status, region of origin, and age-at-migration: A national cohort study of 1.8 million people. Psychol. Med. 2019, 49, 2354–2363. [Google Scholar] [CrossRef]
  47. Henssler, J.; Brandt, L.; Muller, M.; Liu, S.; Montag, C.; Sterzer, P.; Heinz, A. Migration and schizophrenia: Meta-analysis and explanatory framework. Eur. Arch. Psychiatry Clin. Neurosci. 2020, 270, 325–335. [Google Scholar] [CrossRef]
  48. Selten, J.P.; van der Ven, E.; Termorshuizen, F. Migration and psychosis: A meta-analysis of incidence studies. Psychol. Med. 2020, 50, 303–313. [Google Scholar] [CrossRef]
  49. Gabrielsen, G.; Kramp, P. Forensic psychiatric patients among immigrants in Denmark–diagnoses and criminality. Nord. J. Psychiatry 2009, 63, 140–147. [Google Scholar] [CrossRef]
  50. Fazel, S.; Wolf, A.; Palm, C.; Lichtenstein, P. Violent crime, suicide, and premature mortality in patients with schizophrenia and related disorders: A 38-year total population study in Sweden. Lancet Psychiatry 2014, 1, 44–54. [Google Scholar] [CrossRef]
  51. Bianchi, M.; Buonanno, P.; Pinotti, P. Do Immigrants Cause Crime? J. Eur. Econ. Assoc. 2012, 10, 1318–1347. [Google Scholar] [CrossRef]
  52. Adelman, R.; Reid, L.W.; Markle, G.; Weiss, S.; Jaret, C. Urban crime rates and the changing face of immigration: Evidence across four decades. J. Ethn. Crim. Justice 2017, 15, 52–77. [Google Scholar] [CrossRef]
  53. Ousey, G.C.; Kubrin, C.E. Immigration and Crime: Assessing a Contentious Issue. Annu. Rev. Criminol. 2018, 1, 63–84. [Google Scholar] [CrossRef]
  54. Woodward, A.; Howard, N.; Wolffers, I. Health and access to care for undocumented migrants living in the European Union: A scoping review. Health Policy Plan. 2014, 29, 818–830. [Google Scholar] [CrossRef]
  55. Hacker, K.; Anies, M.; Folb, B.L.; Zallman, L. Barriers to health care for undocumented immigrants: A literature review. Risk Manag Health Policy 2015, 8, 175–183. [Google Scholar] [CrossRef]
  56. Bilger, V.; Hollomey, C.; Wyssmüller, C.; Efionayi-Mäder, D. Health Care for Undocumented Migrants in Switzerland; Federal Office of Public Health: Bern, Switzerland, 2011. [Google Scholar]
  57. Rüefli, C.; Huegli, E. Krankenversicherung und Gesundheitsversorgung von Sans Papiers, Bericht zur Beantwortung des Postulats Heim (09.3484). 2011. Available online: https://www.beobachtungsstelle.ch/fileadmin/user_upload/pdf_divers/Berichte/2012/Bericht_Postulat_Heim_komplett_d.pdf (accessed on 22 August 2024).
  58. Dickson, H.; Hedges, E.P.; Ma, S.Y.; Cullen, A.E.; MacCabe, J.H.; Kempton, M.J.; Downs, J.; Laurens, K.R. Academic achievement and schizophrenia: A systematic meta-analysis. Psychol. Med. 2020, 50, 1949–1965. [Google Scholar] [CrossRef]
  59. Gubbels, J.; van der Put, C.E.; Assink, M. Risk Factors for School Absenteeism and Dropout: A Meta-Analytic Review. J. Youth Adolesc. 2019, 48, 1637–1667. [Google Scholar] [CrossRef] [PubMed]
  60. Rosenthal, L.; Moro, M.R.; Benoit, L. Migrant Parents of Adolescents With School Refusal: A Qualitative Study of Parental Distress and Cultural Barriers in Access to Care. Front. Psychiatry 2019, 10, 942. [Google Scholar] [CrossRef] [PubMed]
  61. Nahm, F.S. Receiver operating characteristic curve: Overview and practical use for clinicians. Korean J. Anesth. 2022, 75, 25–36. [Google Scholar] [CrossRef]
  62. Trevethan, R. Sensitivity, Specificity, and Predictive Values: Foundations, Pliabilities, and Pitfalls in Research and Practice. Front. Public Health 2017, 5, 307. [Google Scholar] [CrossRef]
  63. Carbone, M.R. When not to use machine learning: A perspective on potential and limitations. MRS Bull. 2022, 47, 968–974. [Google Scholar] [CrossRef]
  64. Sambasivan, N.; Kapania, S.; Highfill, H.; Akrong, D.; Paritosh, P.; Aroyo, L.M. “Everyone wants to do the model work, not the data work”: Data Cascades in High-Stakes AI. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, Yokohama, Japan, 8–13 May 2021; pp. 1–15. [Google Scholar]
  65. Talari, K.; Goyal, M. Retrospective Studies—Utility and Caveats. J. R. Coll. Physicians Edinb. 2020, 50, 398–402. [Google Scholar] [CrossRef] [PubMed]
  66. Vabalas, A.; Gowen, E.; Poliakoff, E.; Casson, A.J. Machine learning algorithm validation with a limited sample size. PLoS ONE 2019, 14, e0224365. [Google Scholar] [CrossRef]
  67. Dietterich, T. Overfitting and undercomputing in machine learning. ACM Comput. Surv. (CSUR) 1995, 27, 326–327. [Google Scholar] [CrossRef]
  68. Heymans, M.W.; Twisk, J.W.R. Handling missing data in clinical research. J. Clin. Epidemiol. 2022, 151, 185–188. [Google Scholar] [CrossRef]
  69. Schafer, J.L.; Graham, J.W. Missing data: Our view of the state of the art. Psychol. Methods 2002, 7, 147–177. [Google Scholar] [CrossRef]
  70. Tortora, L. Beyond Discrimination: Generative AI Applications and Ethical Challenges in Forensic Psychiatry. Front. Psychiatry 2024, 15, 1346059. [Google Scholar] [CrossRef]
  71. Joyce, D.W.; Kormilitzin, A.; Smith, K.A.; Cipriani, A. Explainable artificial intelligence for mental health through transparency and interpretability for understandability. NPJ Digit. Med. 2023, 6, 6. [Google Scholar] [CrossRef]
  72. Xu, F.; Uszkoreit, H.; Du, Y.; Fan, W.; Zhao, D.; Zhu, J. Explainable AI: A Brief Survey on History, Research Areas, Approaches and Challenges. In Natural Language Processing and Chinese Computing; Springer: Cham, Switzerland, 2019; pp. 563–574. [Google Scholar]
Figure 1. Data processing and training of the algorithm. Legend: OP = Offender patients; SVM = support vector machines; KNN = k-nearest neighbors; SVM = support.
Figure 1. Data processing and training of the algorithm. Legend: OP = Offender patients; SVM = support vector machines; KNN = k-nearest neighbors; SVM = support.
Healthcare 12 01699 g001
Figure 2. Model building and testing the performance on the validation dataset. Legend: OP = Offender patients; AUC = area under the curve (level of discrimination); ROC = receiver operating characteristic curve.
Figure 2. Model building and testing the performance on the validation dataset. Legend: OP = Offender patients; AUC = area under the curve (level of discrimination); ROC = receiver operating characteristic curve.
Healthcare 12 01699 g002
Figure 3. The final model’s AUC curve (a) and confusion matrix (b).
Figure 3. The final model’s AUC curve (a) and confusion matrix (b).
Healthcare 12 01699 g003
Figure 4. Ranking of predictor variables in accordance with their importance in the model (through gradient boosting).
Figure 4. Ranking of predictor variables in accordance with their importance in the model (through gradient boosting).
Healthcare 12 01699 g004
Table 1. Brief explanation of included statistical parameters used in machine learning [44].
Table 1. Brief explanation of included statistical parameters used in machine learning [44].
Performance MeasuresExplanation
Receiver operating characteristics, area under the curve (AUC)Overall ability of a model to discriminate between two groups, as indicated by graph plotting sensitivity and 1-specificity. The higher the AUC, the better the model distinguishes between positive and negative classes.
Balanced AccuracyThe average of sensitivity and specificity, providing a single measure that allows for interpreting both false positives and false negatives.
Sensitivity The ability of a model to correctly identify true positives. Sensitivity is also called recall or the true positive rate.
Specificity The ability of a model to correctly identify true negatives, also called the true negative rate.
Positive predictive value (PPV) The proportion of positive test results that are true positives. Used to interpret an individual’s actual probability of being a true positive in case of a positive test result.
Negative predictive value (NPV)The proportion of negative test results that are true negatives. Used to interpret an individual’s actual probability of being a true negative in case of a negative test result.
Table 2. Basic sample characteristics.
Table 2. Basic sample characteristics.
Variable DescriptionOP
n/N (%)
Mean (SD)NOP
n/N (%)
Mean (SD)
Age at admission 34.2 (10.2) 36.2 (12.2)
Sex *: male339/370 (91.6) 339/370 (91.6)
Country of birth: Switzerland167/370 (45.1) 245/367 (66.8)
Marital status: Single297/364 (81.6) 282/364 (77.5)
Diagnosis: Schizophrenia294/370 (79.5) 287/370 (77.6)
Co-Diagnosis: Addiction Disorder269/200 (72.9) 183/327 (56)
Co-Diagnosis: Personality Disorder47/370 (12.7) 26/370 (7)
Legend: SD = Standard deviation; OP = Offender patients; NOP = Non-offender patients; n = subgroup with characteristics; N = total study population; * according to patients’ case files.
Table 3. Applied Machine Learning models and their performance in nested cross-validation.
Table 3. Applied Machine Learning models and their performance in nested cross-validation.
Statistical ProcedureBalanced
Accuracy (%)
AUCSensitivity (%)Specificity (%)PPV (%)NPV (%)
Logistic Regression62.200.6874.9049.5058.2069.60
Tree63.000.6482.3043.8057.7074.00
Random Forest62.40.6878.346.557.973.4
Gradient
Boosting
63.30.6976.749.959.372.1
KNN56.90.5978.93544.685.9
SVM61.90.6873.849.958.168.9
Naive Bayes62.60.6862.362.961.265.3
Legend: AUC = area under the curve (level of discrimination); PPV = positive predictive value; NPV = negative predictive value; KNN = k-nearest neighbors; SVM = support vector machines. Bold font highlights the algorithm with the best performance parameters.
Table 4. Absolute and relative distribution of relevant predictor variables.
Table 4. Absolute and relative distribution of relevant predictor variables.
Variable DescriptionOP
n/N (%)
NOP
n/N (%)
Country of birth: Switzerland167/370 (45.1)245/367 (66.8)
Illegal residence in Switzerland 95/370 (25.7)34/367 (9.3)
Graduation: did not complete compulsory schooling89/342 (26)18/321 (5.6)
Legend: OP = Offender patients; NOP = Non-offender patients; n = subgroup with characteristics; N = total study population.
Table 5. Final performance measures of the gradient boosting model on validation data.
Table 5. Final performance measures of the gradient boosting model on validation data.
Performance Measures% (95% CI)
AUC0.65 (0.58–0.72)
Balanced Accuracy63.1 (56.5–69.1)
Sensitivity63 (53.6–71.6)
Specificity63.1 (53–72.2)
PPV66.4 (56.8–74.8)
NPV59.6 (49.8–68.8)
Legend: AUC = area under the curve (level of discrimination); PPV = positive predictive value; NPV = negative predictive value; CI = Confidence interval.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hofmann, A.B.; Dörner, M.; Machetanz, L.; Kirchebner, J. Sociodemographic Variables in Offender and Non-Offender Patients Diagnosed with Schizophrenia Spectrum Disorders—An Explorative Analysis Using Machine Learning. Healthcare 2024, 12, 1699. https://doi.org/10.3390/healthcare12171699

AMA Style

Hofmann AB, Dörner M, Machetanz L, Kirchebner J. Sociodemographic Variables in Offender and Non-Offender Patients Diagnosed with Schizophrenia Spectrum Disorders—An Explorative Analysis Using Machine Learning. Healthcare. 2024; 12(17):1699. https://doi.org/10.3390/healthcare12171699

Chicago/Turabian Style

Hofmann, Andreas B., Marc Dörner, Lena Machetanz, and Johannes Kirchebner. 2024. "Sociodemographic Variables in Offender and Non-Offender Patients Diagnosed with Schizophrenia Spectrum Disorders—An Explorative Analysis Using Machine Learning" Healthcare 12, no. 17: 1699. https://doi.org/10.3390/healthcare12171699

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop