Previous Article in Journal
Risk Factors and Prevalence of Suicide in Chilean University Students
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Tutorial

CMHSU: An R Statistical Software Package to Detect Mental Health Status, Substance Use Status, and Their Concurrent Status in the North American Healthcare Administrative Databases

by
Mohsen Soltanifar
1,2,* and
Chel Hee Lee
3,4
1
DARE Department, BC PHSA Corporate, 1333 West Broadway, Vancouver, BC V6H 1G9, Canada
2
College of Professional Studies, Northeastern University, 410 W Georgia St, Vancouver, BC V6B1Z3, Canada
3
CCM Department, AHS Corporate, 3260 Hospital Drive NW, Calgary, AB T2N 4Z6, Canada
4
Mathematics & Statistics Department, University of Calgary, 2500 University Drive NW, Calgary, AB T2N 1N4, Canada
*
Author to whom correspondence should be addressed.
Psychiatry Int. 2025, 6(2), 50; https://doi.org/10.3390/psychiatryint6020050
Submission received: 25 February 2025 / Revised: 26 March 2025 / Accepted: 10 April 2025 / Published: 22 April 2025

Abstract

:
The concept of concurrent mental health and substance use (MHSU) status and its detection in patients has garnered growing interest among psychiatrists and healthcare policymakers over the past four decades. Researchers have proposed various diagnostic methods, including the Data-Driven Diagnostic Method (DDDM), for the identification of MHSU. However, the absence of a standalone statistical software package to facilitate DDDM for large healthcare administrative databases has remained a significant gap. This paper introduces the R statistical software package CMHSU (version 0.0.6.9), available on the Comprehensive R Archive Network (CRAN), for the diagnosis of mental health (MH) status, substance use (SU) status, and their concurrent (MHSU) status. The package implements DDDM using hospital and medical service physician visit counts along with maximum time span parameters for MH, SU, and MHSU diagnoses. A simulated real-world dataset incorporating fentanyl is presented to examine various analytical aspects, including three key dimensions of MHSU detection based on the DDDM framework, as well as temporal analysis to demonstrate the package’s application for healthcare policymakers. Additionally, the limitations of the CMHSU package and potential directions for its future extension are discussed.

1. Introduction

The introduction of this paper is organized into four main sections: Section 1.1 outlines definitions and diagnostic approaches for concurrent mental health and substance use status (MHSU), with a particular emphasis on the Data-Driven Diagnostic Method (DDDM). Section 1.2 provides a summary of the primary North American healthcare administrative databases. Section 1.3 explores the motivations for developing the “CMHSU” statistical software package. Lastly, Section 1.4 details the outline of the study.

1.1. Concurrent Mental Health Status and Substance Use Status: Definition and Diagnosis

The concept of concurrent mental health (MH) and substance use (SU) status, collectively referred to as MHSU status, has been explored for over four decades [1,2,3,4,5,6,7,8,9,10,11,12,13], with significant attention given to its methodological challenges [14]. Various approaches have been developed to diagnose MHSU status, each with distinct applications and strengths. The primary diagnostic methods are outlined below.

1.1.1. Comprehensive Clinical Assessment (CCA)

This method involves in-depth evaluations and interviews conducted by healthcare professionals to gain a thorough understanding of an individual’s mental health, substance use patterns, and psychosocial factors. CCA typically includes assessments of psychiatric and medical history, as well as risk evaluation [15,16]. For example, a psychiatrist may utilize the Psychiatric Research Interview for Substance and Mental Disorders (PRISM) to diagnose major depressive disorder and evaluate alcohol dependency patterns.

1.1.2. Standardized Screening Instruments (SSI)

SSIs are structured tools designed to assess the presence and severity of mental health and substance use disorders. Common instruments include the Generalized Anxiety Disorder-7 (GAD-7), the Patient Health Questionnaire-9 (PHQ-9), the Drug Abuse Screening Test (DAST), and the Alcohol Use Disorders Identification Test (AUDIT) [17,18]. For instance, a general practitioner might employ the AUDIT to detect harmful drinking behaviors in a patient suspected of having an anxiety disorder.

1.1.3. Integrated Treatment Approach (ITA)

ITA emphasizes the simultaneous treatment of mental health and substance use disorders within a unified, evidence-based framework. This approach prioritizes coordinated strategies to address the complex interactions between these conditions [19,20]. For example, a patient with bipolar disorder and opioid use disorder may receive medication-assisted treatment (MAT) alongside cognitive-behavioral therapy (CBT) as part of an integrated care plan.

1.1.4. Multidisciplinary Collaboration (MC)

This method involves a coordinated effort among a team of professionals, such as psychiatrists, addiction specialists, and social workers, to design and implement a comprehensive care plan tailored to individuals with co-occurring disorders [21,22]. For instance, a multidisciplinary team comprising a psychiatrist, a social worker, and an addiction counselor may convene weekly to plan and manage the care of a patient diagnosed with schizophrenia and alcohol dependence.

1.1.5. Data-Driven Diagnostic Method (DDDM)

The DDDM leverages clinical and administrative data analysis to identify co-occurring mental health and substance use status through the utilization of healthcare databases and the International Classification of Disease (ICD) codes. This approach is based on patterns of healthcare usage, such as hospitalizations and physician visits [23,24,25,26]. For example, a healthcare researcher may analyze hospital records to identify individuals who, within the past year, had at least one hospitalization or two medical physician visits for mental health and substance use status, as indicated by ICD-10 codes [24,25,26].
The DDDM approaches have been based on frequency and time-based detection of plausible ICD codes from healthcare administrative databases [24,25,26]. In general format, we set the following general statistical parametric definition [27,28,29] (Figure 1):
Definition 1. 
Data-Driven Diagnostic Method (DDDM)
Given time spans t M H , t S U , and t M H S U and frequencies n M H H , n M H P , n S U H , n S U P . Then, a patient has concurrent mental health substance use (MHSU) diagnosis if and only if in the past t M H S U time span the patient had the following two conditions:
[a] 
at least one mental health (MH) diagnosis (defined by at least n M H H times hospitalizations or n M H P primary care physician visits within t M H time span),
[b] 
at least one substance use (SU) diagnosis (defined by at least n S U H times hospitalizations or n S U P primary care physician visits within t S U time span).
Remark 1. 
The diagnosis of MH status or SU status is based on detection of associated ICD-09, ICD-10, or ICD-11 codes from administrative databases. Examples are provided in Section 3 with a simulated database.
Remark 2. 
The precise values for the set of parameters t M H , t S U , t M H S U , n M H H , n M H P , n S U H , n S U P are decided by the study’s principal clinician, given various internal, external, and contextual factors.

1.2. North American Healthcare Administrative Databases

In both Canada and the United States, numerous healthcare administrative databases are available, varying by regional level (federal/state or provincial) and by the type of institution managing the data (hospitals or medical service physician providers). Table 1 outlines the most significant databases across North America. These databases commonly record several crucial variables, including Client ID, Visit Date, Discharge Date, and Disease Diagnostic Code.
The term Disease Diagnostic Code refers to codes from the World Health Organization’s International Classification of Diseases (ICD) [30]. The ICD codes utilized in American and Canadian healthcare systems are detailed in [31,32]. Within each country, state or provincial systems may adopt their versions of these federally recognized ICD codes [24,25,26,33]. In this study, we consider ICD codes broadly, irrespective of their version (ICD-09, ICD-10, or ICD-11), focusing on codes ranging from three to five characters in length, presented without a decimal point.
Remark 3. 
The ICD codes provided in the following examples for mental health (MH) status and substance use (SU) status are intended as illustrative examples rather than exhaustive lists. The complete set of ICD codes for these conditions may vary based on local and national guidelines, as well as the criteria established by study principal investigators.

1.3. Motivation

The DDDM has several advantages over other methods for the diagnosis of MHSU as follows:
(i)
Scalability and Large-Scale Analysis: The Data-Driven Diagnostic Method (DDDM) offers the advantage of scalability by utilizing healthcare databases and administrative records to analyze large populations. Unlike approaches that require direct patient interaction, DDDM enables the comprehensive assessment of co-occurring mental health and substance use status across diverse demographic groups and healthcare systems [23].
(ii)
Objective and Quantitative Approach: DDDM relies on the objective extraction and analysis of the data features such as ICD codes and patterns of healthcare utilization. This DDDM characteristic minimizes biases often associated with subjective methods, such as clinical interviews, thereby improving the reliability and validity of diagnostic outcomes [24].
(iii)
Cost-Effectiveness: By leveraging preexisting healthcare data, DDDM eliminates the need for resource-intensive procedures, such as in-person assessments or multidisciplinary team evaluations. This makes it a cost-efficient alternative for healthcare systems facing resource limitations [25].
(iv)
Timeliness and Accessibility: DDDM enables rapid identification of MHSU patterns through the querying of existing data sources, providing real-time or nearly real-time diagnostic capabilities. This characteristic is particularly valuable for tracking trends and addressing emerging public health issues [26].
(v)
Population-Level Insights for Policy and Planning: Through its capacity to analyze large-scale healthcare data, DDDM facilitates the identification of utilization trends, disparities, and service gaps. This characteristic allows policymakers to design targeted interventions and optimize resource allocation to address the needs of individuals with co-occurring status [24].
However, despite all the above advantages of DDDM over other methods for diagnosis of MHSU status, there has been no available statistical software package with the given parameters in the Definition 1 to help researchers implement MHSU diagnosis detection. The R Statistical software CMHSU enables researchers to detect MHSU status with a variety of scenarios in terms of time span, visit frequency, subtypes of mental health, and subtypes of substance use.

1.4. Study Outline

This paper is structured as follows: Initially, we provide essential theories and instructions for the installation of the CMHSU package. We introduce the four primary functions of the package, which are designed to detect mental health (MH) status, substance use (SU) status, and their concurrent (MHSU) status in both basic and comprehensive forms. Subsequently, we describe a simulation study using a simulated real-world healthcare administrative dataset to identify MH, SU, and MHSU statuses. This study comprises four sub-studies. In the first three, specific parameters from Definition 1 are held constant while others vary, enabling an analysis of trends in the detection of MH, SU, and MHSU statuses. The final sub-study focuses on temporal analysis, demonstrating an example of package application for the North American healthcare policymakers. We conclude with a discussion on the contributions of the CMHSU package to existing literature, its limitations, and future development plans.

2. The CMHSU Package

2.1. The Background and Installation

The CMHSU package utilizes the Data-Driven Diagnostic Method (DDDM) parametric Definition 1 detailed in Section 1.1 and illustrated in Figure 1. It requires four R packages: dplyr [34], tidyr [35], purrr [36], and magrittr [37], and was developed using R version 4.2.2 [38]. The package (version 0.0.6.9) is accessible on the Comprehensive R Archive Network (CRAN) at https://CRAN.R-project.org/package=CMHSU (accessed on 10 January 2025). Installation and loading of the package can be performed using R commands:
  • Line #1: > install.packages(‘‘CMHSU’’, dependencies=TRUE)
  • Line #2: > library(CMHSU)
The CMHSU package includes four primary multivariate functions:
  • (1) MH_status(),
  • (2) SU_status(),
  • (3) MHSU_status_basic(),
  • (4) MHSU_status_broad.
The first two functions focus specifically on the individual components of mental health (MH) status or substance use (SU) status as depicted in Figure 1. The latter two functions encompass the comprehensive framework shown in Figure 1. Each function is described in detail in the subsequent subsections.

2.2. Detection of Mental Health Status

The function MH_status() is designed to determine the mental health status of patients based on a predefined list of plausible mental health diagnostic codes provided by clinicians. This function requires five input parameters. The first parameter, inputdata, represents the patient data and is formatted as a dataframe with four essential columns:
  • ClientID, which uniquely identifies the patient,
  • VisitDate, indicating the date of the visit,
  • Diagnostic_H, representing diagnoses made during hospital visits,
  • Diagnostic_P, reflecting diagnoses made by medical service physicians.
The second, third, and fourth parameters, n_MHH, n_MHP, and t_MH, specify the minimum number of hospital visits required, the minimum number of medical physician visits needed, and the maximum allowable time lag (in days) between hospital or physician visits, respectively. Finally, the fifth parameter, ICD_MH, contains the list of plausible mental health diagnostic codes determined by the study clinicians. This function returns a dataframe matrix including ClientID, earliest date of mental health, latest date of mental health, and the mental health status (Yes/No).
As an initial working example, we utilize a simulated real-world dataset, as described in Section 3.1, to identify all patients with a mental health status corresponding to one of the following statuses: psychotic, mood, anxiety, or neurocognitive disorders. The detection criteria include at least one hospital visit or at least one medical service physician visit, within a maximum time span of two months (60 days). The corresponding R code for implementing this detection is shown below; it generates the results depicted in Figure 2a:
Line #1: > myexample<-SampleRWD[,c(1:4)]
Line #2: > SampleMH_1 = MH_status(myexample, n_MHH=1, n_MHP=1, t_MH=60,
            ICD_MH=c(‘‘F060’’,‘‘F063’’,‘‘F064’’,‘‘F067’’))
Line #3: > head(SampleMH_1)

2.3. Detection of Substance Use Status

The function SU_status() is similar to MH_status(). It is designed to determine the substance use status of patients based on a predefined list of plausible substance use diagnostic codes provided by clinicians. This function also requires five input parameters. The first parameter, inputdata, is a dataframe with columns:
  • ClientID,
  • VisitDate,
  • Diagnostic_H,
  • Diagnostic_P.
The second, third, and fourth parameters, n_SUH, n_SUP, and t_SU, specify the minimum number of hospital visits, the minimum number of physician visits, and the maximum time lag (in days) between visits, respectively. The fifth parameter, ICD_SU, is the list of plausible substance use diagnostic codes. This function returns a dataframe including ClientID, earliest date of substance use, latest date of substance use, and the substance use status (Yes/No).
As the second working example, we identify all patients with a substance use status corresponding to one of the following statuses: alcohol use, fentanyl use, cannabis use, or cocaine use. The detection criteria require at least one hospital visit or one medical service physician visit, within a maximum allowable time span of two months (60 days). Below is the corresponding R code; it generates the results illustrated in Figure 2b:
Line #1: > myexample<-SampleRWD[,c(1:4)]
Line #2: > SampleSU_1 = SU_status(myexample, n_SUH=1, n_SUP=1, t_SU=60,
              ICD_SU=c(‘‘F100’’,‘‘T4041’’,‘‘F120’’,‘‘F140’’))
Line #3: > head(SampleSU_1)

2.4. Detection of Concurrent Mental Health and Substance Use (MHSU) Status—Part (I)

The function MHSU_status_basic() is designed to detect concurrent MH and SU status. It requires ten input parameters:
  • inputdata (dataframe with ClientID, VisitDate, Diagnostic_H, Diagnostic_P),
  • n_MHH,
  • n_MHP,
  • n_SUH,
  • n_SUP,
  • t_MH,
  • t_SU,
  • t_MHSU,
  • ICD_MH,
  • ICD_SU.
Here, n_MHH and n_MHP are the minimum numbers of hospital and physician visits for MH; n_SUH and n_SUP are the minimum numbers for SU; and t_MH and t_SU are maximum lags within MH and SU, respectively. The parameter t_MHSU is the maximum time span between MH and SU statuses. Both ICD_MH and ICD_SU are lists of relevant diagnostic codes. The function returns a dataframe with earliest/latest MH dates, MH status, earliest/latest SU dates, SU status, and the concurrent status (Yes/No). It assumes the input data time span is less than or equal to t_MHSU.
As the third working example, we build on the earlier two to detect the concurrent mental health and substance use (MHSU) status for clients diagnosed with psychotic, mood, anxiety, or neurocognitive disorders in combination with alcohol, fentanyl, cannabis, or cocaine consumption-related status. We require a minimum of one hospital visit and one physician visit within two months (60 days) for each of MH and SU; the maximum time span between MH and SU statuses is set to one year (365 days). It is important to note that the total dataset time span is 363 days, which is slightly shorter than the specified maximum span of 365 days. The R code is below; results appear in Figure 2c:
Line #1: > myexample<-SampleRWD[,c(1:4)]
Line #2: > SampleMHSU_1 = MHSU_status_basic(myexample, n_MHH=1, n_MHP=1, n_SUH=1,
            n_SUP=1, t_MH=60, t_SU=60, t_MHSU=365,
            ICD_MH=c(‘‘F060’’,‘‘F063’’,‘‘F064’’,‘‘F067’’),
            ICD_SU=c(‘‘F100’’,‘‘T4041’’,‘‘F120’’,‘‘F140’’))
Line #3: > head(SampleMHSU_1)

2.5. Detection of Concurrent Mental Health and Substance Use (MHSU) Status—Part (II)

A key assumption of MHSU_status_basic() is that the data time span is ≤ t _ M H S U . When this does not hold, MHSU_status_broad() can be used. It takes the same ten parameters but generates k non-overlapping windows of concurrent MH and SU output datasets (  k = | T i m e S p a n ( inputdata ) | t M H S U + 1  ). Each output includes a “Window” variable to denote the respective time window (see Figure 3). Thus, for a dataset of size n, the output can have size k × n .
As the final working example, we modify the third example to detect MHSU status with a maximum allowable time span between MH and SU (t_MHSU) of 360 days, while the total dataset is 363 days. This implies k = 363 360 + 1 = 4 non-overlapping windows. The code below illustrates usage; it produces results in Figure 2d:
Line #1: > myexample<-SampleRWD[,c(1:4)]
Line #2: > SampleMHSU_2 = MHSU_status_broad(myexample, n_MHH=1, n_MHP=1, n_SUH=1,
              n_SUP=1, t_MH=60, t_SU=60, t_MHSU=360,
              ICD_MH=c(‘‘F060’’,‘‘F063’’,‘‘F064’’,‘‘F067’’),
              ICD_SU=c(‘‘F100’’,‘‘T4041’’,‘‘F120’’,‘‘F140’’))
Line #3: > head(SampleMHSU_2[c(1,201,401,601),])

3. A Simulation Study

This section introduces a simulation study, encompassing its data and analysis, and is organized into four key parts. Section 3.1 outlines the characteristics of the simulated real-world dataset. Section 3.2 examines how varying the maximum time span within MH status and within SU status affects the frequency of MH, SU, and MHSU diagnoses, given the assumption of a constant number of hospital and medical service physician visits. Section 3.3 investigates the influence of the number of required hospital and physician visits on MH, SU, and MHSU detection, assuming a fixed maximum time span for MH and SU. Finally, Section 3.4 explores the effect of varying the maximum time span within MH, within SU, and between MH and SU on the number of MHSU diagnoses, assuming fixed number of hospital and medical service physician visits.

3.1. Simulated Real-World Data

We simulate a healthcare administrative dataset as follows:
  • The dataset consists of 200 patients categorized into seven diagnostic groups who visited hospitals or medical service physicians from 1 January 2024, to  31 December 2024.
  • The patient groups include 125 individuals with recorded mental health (MH) diagnoses, 125 with substance use (SU) diagnoses, 100 with concurrent MHSU diagnoses, and 50 with no MH, SU, or MHSU diagnoses.
Table 2 summarizes key features of the simulated dataset, and Table 3 provides example entries for two sample patients. This dataset is included as part of the CMHSU R package (see Supplementary Materials).

3.2. The Impact of Maximum Time Span Within Mental Health and Within Substance Use

We examine how the maximum allowable time interval within MH and SU diagnoses affects detection counts. Specifically, we require at least two hospital or two physician visits within x days ( x = 0 56 ) for both MH and SU, while setting the maximum allowable interval between MH and SU to one year (365 days):
n _ MHH = n _ MHP = n _ SUH = n _ SUP = 2 , t _ MH = t _ SU = x ( 0 x 56 ) , t _ MHSU = 365 .
Figure 4 presents the results. As the maximum required time interval for diagnosis increases, the number of patients detected rises for MH, SU, and MHSU, eventually reaching an asymptote at 125 patients for MH, 115 patients for SU, and 90 patients for MHSU.

3.3. The Impact of Number of Hospital Visits and Medical Service Physician Visits

We now investigate how the required number of hospital visits impacts detection counts for MH, SU, and MHSU. We assume a hospital-to-physician visit ratio of 1:2, that is,
n _ MHH = 1 2 n _ MHP , n _ SUH = 1 2 n _ SUP , n _ MHH = n _ SUH = x , x = 1 , , 8 ,
and fix t _ MH = t _ SU = 183 (6 months), t _ MHSU = 365 (1 year). Figure 5 displays the results. As the required number of hospital visits increases, detection counts decrease until eventually reaching zero for all three diagnostic count curves (MH, SU, and MHSU). Additionally, for large required number of hospital visits, the MH and MHSU count curves overlap.

3.4. The Impact of Maximum Time Span for Concurrent Diagnosis

Finally, we study how varying t_MHSU affects the frequency of concurrent MHSU detection, assuming fixed maximum spans t_MH and t_SU. We require at least two hospital or two physician visits within x days ( x { 14 , 21 , 28 } ) for MH and SU:
n _ MHH = n _ MHP = n _ SUH = n _ SUP = 2 , t _ MH = t _ SU = x , x = 14 , 21 , 28 ,
and then let t _ MHSU = y vary in discrete steps of 31 k ( k = 1 , , 12 ). Figure 6 shows the results. For each fixed maximum time span for MH and SU, the number of detected patients increases with increasing maximum time span for MHSU diagnosis, approaching an asymptote. Also, for a fixed maximum time span for MHSU diagnosis, increasing maximum time span for MH and SU leads to capturing more patients.

3.5. Temporal Analysis

In this section, we assume that, unlike the previous three Section 3.2, Section 3.3 and Section 3.4, the study’s principal investigators have established agreed-upon default values for key parameters. These include a minimum of one hospital visit, a minimum of two medical service physician visits, a maximum time span of one month for mental health diagnoses, a maximum time span of one month for substance use diagnoses, and a maximum time span of one month for concurrent diagnoses. Here, policymakers are particularly interested in monitoring the monthly trends in the frequency and rate of MH, SU, and MHSU diagnoses throughout the 2024 calendar year. Figure 7 presents the results using similar programming presented in Appendix A.1 and Appendix A.2. As shown in the figure, both MH and SU exhibit an overall increasing trend over time, reaching their peak in November. In contrast, MHSU follows a fluctuating pattern initially, followed by a period of stabilization until November. However, all three categories experience a sharp decline in December.
Remark 4. 
The temporal analysis presented here is based on frequency statistics. A similar approach applies to rate statistics, depending on the objectives of the principal investigators. The key difference lies in using proportions instead of counts in the estimation process, as outlined in Appendix A.2.
Remark 5. 
Given the nature of the healthcare administrative database and the study objectives, principal investigators have the flexibility to consider various temporal components for [Unit, Span] in the temporal analysis. In this study, our temporal analysis was conducted to examine monthly variations over a year, represented as [Month, Year]. Other potential examples include [Day, Month], [Week, Year], [Quarter, Year], and [Year, Decade].

4. Discussion

4.1. Summary and Contributions

We introduce the R package CMHSU, the first statistical software package to implement the Data-Driven Diagnostic Method (DDDM) for identifying mental health (MH) status, substance use (SU) status, and concurrent (MHSU) status within the North American healthcare administrative databases. The package provides flexibility to accommodate various scenarios, including (i) the minimum required visits to hospitals or medical service physicians, (ii) the maximum time spans for MH and SU diagnoses, and (iii) the maximum time span between them.
The first key contribution of the CMHSU package is enabling clinicians to define concurrent MHSU status using the DDDM approach with standard agreed-upon threshold parameters. The examples presented in this paper highlight three critical dimensions and their associated challenges in achieving this definition. Once these dimensions are addressed simultaneously, defining concurrent MHSU status based on the DDDM becomes feasible. The details of these dimensions are as follows:
  • Dimension of Time Spans within MH Diagnosis and SU Diagnosis (Section 3.2): The time spans within MH and SU diagnoses play a critical role. For instance, if a maximum time span of 7 days is considered, only 48 out of 100 patients (48.8%) are captured, whereas extending the time span to 56 days captures 90 out of 100 patients (90.0%). This delicate situation raises the question: “What are the appropriate maximum time spans for MH and SU diagnoses?”
  • Dimension of Required Number of Visits (Section 3.3): The number of required hospital and medical service physician visits significantly influences the detection rates. For example, requiring two hospital visits and four physician visits captures 90 out of 100 patients (90.0%), while increasing the required threshold to three hospital visits and six physician visits reduces the capture rate to 70 out of 100 patients (70.0%). This delicate balance prompts the question: “What is the optimal minimum number (or ratio) of required visits?”
  • Dimension of Time Span for Concurrent (MHSU) Status (Section 3.4): The maximum time span between MH and SU diagnoses also plays a vital role in the detection. For instance, setting this span to one month captures 61 out of 100 patients (61.0%), while extending it to three months captures 84 out of 100 patients (84.0%). This scenario raises the question: “What is the ideal maximum time span between MH and SU diagnoses?”
The second key contribution of the CMHSU package is its ability to diagnose comorbidities of various diseases within healthcare administrative databases. While originally designed for the concurrent mental health and substance use (MHSU) diagnoses, the package does not impose limitations on its input ICD variables, making it applicable for any other comorbidity conditions.
The final key contribution of the CMHSU package is its utility for policy-makers at the federal, state, and territorial levels. By facilitating the rapid compilation of databases tracking ongoing mental health status, substance use status, and concurrent status, the package enables timely monitoring of trends. This real-time capability supports the development and implementation of appropriate healthcare policies in a timely manner.

4.2. Strengths, Limitations, and Future Work

The R package CMHSU offers several unique features that make it a valuable and advantageous statistical software tool for detecting mental health (MH) status, substance use (SU) status, and concurrent mental health and substance use (MHSU) status within large administrative healthcare databases. These features include the following:
  • Flexibility: CMHSU incorporates four core functions and ten customizable parameters, allowing researchers to account for a wide range of predefined scenarios by investigators when identifying MH status, SU status, and MHSU status in the healthcare administrative databases.
  • Comprehensiveness: CMHSU enables the detection of nearly all mental health and substance use conditions, providing an advantage over other statistical tools, particularly given the recent comprehensive codifications in ICD-10 and ICD-11.
  • Efficiency: The CMHSU package is designed for ease of use, requiring only a basic statistical background. Unlike advanced statistical and machine learning-based detection methods for MH status [39], SU status [40], and MHSU status [41]—which typically necessitate expertise in topics such as K-Nearest Neighbors, Random Forests, Gradient Boosted Trees, Deep Neural Networks—CMHSU offers a user-friendly and time-efficient alternative statistical software tool. This advantage makes it a particularly accessible and practical tool for researchers and practitioners without extensive statistical or machine learning expertise.
  • Interpretability: CMHSU employs a trace-back methodology to identify MH status, SU status, and MHSU status based on their corresponding ICD codes. This approach enhances the clarity and transparency of the results, facilitating faster interpretation, improved visualization of detected cases, and analysis of trends over time compared to other statistical tools.
  • Seamless Integration: The package is free and easy to install, ensuring compatibility with existing analytical tools used for processing large healthcare administrative databases. This seamless integration enhances its accessibility and usability in the real-world research applications.
This work has several limitations, each of which presents opportunities for future extensions. Some of them are as follows:
  • Scalability: For large-scale databases, it may be more efficient to partition the input dataset into multiple disjoint subsets and apply the MHSU_status_broad() function to each subset separately to manage the extensive outputs. Introducing an additional parameter within the function to automate the partitioning of the input data (inputdata) would further optimize the computational process (See Appendix A.1).
  • Output Format: The package currently generates outputs as dataframes only, without providing summary frequency statistics for mental health status, substance use status, and their concurrent status. Calculating these statistics requires additional R programming (See Appendix A.2).
  • Customization: The window lag in the function MHSU_status_broad() is fixed at one day (as illustrated in Figure 3). Adding a fixed or adaptive parameter to specify the length of this window lag would enhance the flexibility of the detection process for researchers. The choice of fixed or adaptive status depends to the specific patient data in the study.
  • Evaluation: The DDDM approach assumes that ICD coding in administrative databases is reliable. However, in practice, these records are often incomplete or subject to misclassifications, leading to potential biases in the evaluation process and, consequently, affecting the accuracy of the results. Furthermore, it is still unclear how to measure the package’s precision performance in detecting patients with MH status, SU status, and MHSU status, as well as how to compare its performance using DDDM with more advanced machine learning-based methods [39,40,41]. This comparison needs a mapping mechanism between the above parameters in the DDDM and above machine learning-based methods requiring subsequent methodological research.
  • Standardization: The use of ICD codes for diagnosis depends on the local and national jurisdictions (Table 1). Potential inconsistencies in coding across different jurisdictions and healthcare databases may affect the reliability of the results for comparisons across these jurisdictions.
  • Empirical Validation: The simulation study presented in this paper utilizes self-generated simulated data. Ideally, using a real-life empirical healthcare administrative database would significantly enhance the validity and applicability of the findings for healthcare policy implementations. However, key obstacles—including (1) data privacy and security regulations, (2) access restrictions and bureaucratic hurdles, and (3) selection bias—prevented the use of such a database in the current analysis.
  • Geographical Adaptability: CMHSU is designed for detecting mental health (MH) status, substance use (SU) status, and their concurrent (MHSU) status within the North American healthcare databases, based on the fact that the original DDDM was developed by researchers in this region. However, it remains unclear whether these methods can be effectively adopted for use in the non-North American healthcare systems, given potential differences in healthcare infrastructure, coding practices, and administrative data structures. Despite all these issues, given availability of the essential data fields, a preliminary analysis is still possible (See Section 4.3).
  • Temporal Analysis: The simulation study in this paper serves as a hypothetical demonstration of the package’s application. Its results can only be truly meaningful and applicable to real-world policymaking if the DDDM parameters are based on a universally agreed-upon set of predefined values related to time span and visit definitions. However, no such universally accepted standard parameters currently exist among psychiatrists. Addressing this issue requires further discussions, research, and a final systematic review and meta-analysis among various studies.

4.3. Summary of CMHSU Data Analysis Workflow

We conclude this section with a summary of the data analysis workflow, designed to facilitate a smooth user experience when applying the CMHSU package in their projects. The workflow comprises the following steps:
(i)
Compile CMHSU Data:
Gather essential data fields, including ClientID, VisitDate, Diagnostic_H, and Diagnostic_P.
(ii)
Assess Scalability:
(a)
Size: Execute data size scalability by applying the function splitfunction_id().
(b)
Time: Execute temporal scalability by applying the function splitfunction_time().
(iii)
Define Analysis Parameters:
Specify required parameters, such as n_MHH, n_MHP, and others as necessary.
(iv)
Examine Data Time Span:
(a)
For data spans less than or equal to t_MHSU, apply the function MHSU_status_basic().
(b)
For data spans exceeding t_MHSU, apply the function MHSU_status_broad().
(v)
Report Results:
(a)
Frequency: Extract count data using the script SummarySampleMHSU_1.
(b)
Proportion: Extract proportion data using the script SummarySampleMHSU_1.
(vi)
Temporal Interpretation:
Interpret temporal patterns by applying the [Unit, Span] methodology illustrated in Section 3.5.

5. Conclusions

The free R software package CMHSU enables researchers to detect mental health (MH) status, substance use (SU) status, and their concurrent (MHSU) status within healthcare administrative databases. The package offers a wide range of flexibility regarding visit count and maximum time span parameters, allowing for comprehensive and adaptable analyses. This functionality facilitates the compilation of these statuses at a large population level in a timely manner, supporting health policymakers in effectively monitoring trends and making informed healthcare decisions to optimize patient treatment.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/psychiatryint6020050/s1, S1. R Package Materials. The following supporting information can be downloaded at “CMHSU” R package CRAN documentation URL (accessed on 11 April 2025) https://cran.r-project.org/package=CMHSU. S2. Shiny Application. A Shiny application has been included in the Supplementary Materials to facilitate interactive exploration of the estimations. The application can be accessed at https://mohsensoltanifar.shinyapps.io/rpackageshinyappcmhsu/.

Author Contributions

Conceptualization, M.S.; methodology, M.S.; software, M.S. and C.H.L.; validation, M.S. and C.H.L.; formal analysis, M.S.; investigation, M.S.; resources, M.S. and C.H.L.; data curation, M.S.; writing—original draft, M.S. and C.H.L.; writing—review and editing, M.S. and C.H.L.; visualization, M.S.; supervision, M.S. and C.H.L.; project administration, M.S.; funding acquisition, M.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding. The APC of this work has been funded by the first author.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The study simulated data is available inside package documentation.

Acknowledgments

This paper is dedicated to the families and loved ones of those affected by the fentanyl and other illicit drugs crisis in Canada and the United States. May this work serve as another steppingstone towards meaningful solutions that prevent further loss and offer healing to communities in both nations. The authors thank the journal reviewers for their helpful feedback on the first draft.

Conflicts of Interest

Author Mohsen Soltanifar is employed by the company BC PHSA Corporate. Author Chel Hee Lee is employed by the company AHS Corporate. The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
APCD: All-Payer Claims Databases; AUDIT: Alcohol Use Disorders Identification Test; CBT: Cognitive-Behavioral Therapy; CCA: Comprehensive Clinical Assessment; CMDB: Canadian MIS Database; CRAN: Comprehensive R Archive Network; DAD: Discharge Abstract Database; DAST: Drug Abuse Screening Test; DDDM: Data-Driven Diagnostic Method; GAD-7: Generalized Anxiety Disorder-7; ICD: International Classification of Diseases; ITA: Integrated Treatment Approach; MAT: Medication-Assisted Treatment; MC: Multidisciplinary Collaboration; MEDPAR: Medicare Provider Analysis and Review; MH: Mental Health; MHSU: Concurrent Mental Health and Substance Use; MSP: Medical Services Plan; NACRS: National Ambulatory Care Reporting System; NAMCS: National Ambulatory Medical Care Survey; NIS: National Inpatient Sample; OHIP: Ontario Health Insurance Plan; OSHPD: California Office of Statewide Health Planning and Development; PHQ-9: Patient Health Question-naire-9; RAMQ: Régie de l’assurance maladie du Québec; SID: State Inpatient Databases; SPARCS: New York Statewide Planning and Research Cooperative System; SSI: Standardized Screening Instruments; SU: Substance Use; VA: Veterans Affairs.

Appendix A

Appendix A.1. Scalability with Large Databases

The scalability of the large healthcare administrative dataset, inputdata, is achieved by considering two dimensions: unique IDs and time. First, for the unique ID dimension, the function splitfunction_id(), introduced below, takes the dataset inputdata (containing m unique patients) and an integer n 1 as input. It then partitions the dataset into k subsets, where k = [ m / n ] + 1 ( i f n m ) , m / n ( i f n m ) . The first k 1 subsets each contain n unique patients, while the kth subset contains at most n unique patients. For example, we consider the case where m = 200 , n = 18 and k = 12 :
line #1:> splitfunction_id <- function(inputdata, n) {
                 inputdata <- inputdata %>% arrange(ClientID)
                 unique_ids <- unique(inputdata$ClientID)
                 groups <- split(unique_ids, ceiling(seq_along(unique_ids)/n))
                 split_datasets <- list()
                 for (i in seq_along(groups)) {
                 split_datasets[[paste0(‘‘inputdata_’’, i)]] <- inputdata %>%
                 filter(ClientID %in% groups[[i]])
                 }
                 return(split_datasets)
                 }
line #2:> inputdata_split_id <- splitfunction_id(SampleRWD, 18)
line #3:> inputdata_split_id$inputdata_1
Second, for the time dimension, the function splitfunction_time(), introduced below, takes the dataset inputdata (with time span T days) and an integer t 1 as input. It then partitions the dataset into l subsets, where l = [ T / t ] + 1 ( i f t T ) , T / t ( i f t T ) . The first l 1 subsets each are in the time span of t days, while the lth subset time span is at most t. For example, we consider the case where T = 363 , t = 30.5 and l = 12 :
line #1:> splitfunction_time <- function(inputdata, t) {
                  inputdata <- inputdata %>% mutate(VisitDate = as.Date(VisitDate))
                  inputdata <- inputdata %>% arrange(VisitDate)
                  VisitDate_0 <- min(inputdata$VisitDate, na.rm = TRUE)
                  split_datasets <- list()
                  i <- 1
                  while(TRUE) {
                  start_date <- VisitDate_0 + (i - 1) * (t + 1)
                  end_date <- start_date + t
                  subset_data <- inputdata %>%
                  filter(VisitDate >= start_date & VisitDate <= end_date)
                  if (nrow(subset_data) == 0) break
                  split_datasets[[paste0(‘‘inputdata_’’, i)]] <- subset_data
                  i <- i + 1
                  }
                  return(split_datasets)
                  }
line #2:> inputdata_split_time <- splitfunction_time(SampleRWD, 30.5)
line #3:> inputdata_split_time$inputdata_1

Appendix A.2. Summary Statistics Output

The following R program computes summary statistics for MH, SU, and MHSU frequencies from the CMHSU output shown in the Figure 2c:
line #1:> myexample<-SampleRWD[,c(1:4)]
line #2:> SampleMHSU_1 = MHSU_status_basic(myexample, n_MHH=1, n_MHP=1, n_SUH=1,
          n_SUP=1, t_MH=60, t_SU=60, t_MHSU=365,
          ICD_MH=c(‘‘F060’’,‘‘F063’’,‘‘F064’’,‘‘F067’’),
          ICD_SU=c(‘‘F100’’,‘‘T4041’’,‘‘F120’’,‘‘F140’’))
line #3:> SummarySampleMHSU_1 <-  SampleMHSU_1 %>%
      summarise(
       MH_Count = sum(MH_status == ‘‘YES’’),
       MH_Proportion = formatC(mean(MH_status == ‘‘YES’’), format = ‘‘f’’, digits = 3),
       SU_Count = sum(SU_status == ‘‘YES’’),
       SU_Proportion = formatC(mean(SU_status == ‘‘YES’’), format = ‘‘f’’, digits = 3),
       MHSU_Count = sum(MHSU_status == ‘‘YES’’),
       MHSU_Proportion = formatC(mean(MHSU_status == ‘‘YES’’), format = ‘‘f’’, digits = 3) )
line #4:> print(SummarySampleMHSU_1)
line #5:> # A tibble: 1 × 6
      MH_Count MH_Proportion SU_Count SU_Proportion MHSU_Count MHSU_Proportion
      <int>     <chr>        <int>     <chr>       <int>        <chr>
      125       0.625         125      0.625        100         0.500

References

  1. Weiss, K.J.; Rosenberg, D.J. Prevalence of anxiety disorder among alcoholics. J. Clin. Psychiatry 1985, 46, 3–5. [Google Scholar] [PubMed]
  2. Jaffe, J.H.; Ciraulo, D.A. Alcoholism and depression. In Psychopathology and Addictive Disorders; Meyer, R.E., Ed.; Guilford Press: New York, NY, USA, 1986. [Google Scholar]
  3. Helzer, J.E.; Pryzbeck, T.R. The co-occurrence of alcoholism with other psychiatric disorders in the general population and its impact on treatment. J. Stud. Alcohol 1988, 49, 219–224. [Google Scholar] [CrossRef] [PubMed]
  4. Drake, R.E.; Wallach, M.A. Substance abuse among the chronic mentally ill. Hosp. Community Psychiatry 1989, 40, 1041–1046. [Google Scholar] [CrossRef] [PubMed]
  5. Regier, D.A.; Farmer, M.E.; Rae, D.S.; Locke, B.Z.; Keith, S.J.; Judd, L.L.; Goodwin, F.K. Comorbidity of mental disorders with alcohol and other drug abuse: Results from the Epidemiologic Catchment Area (ECA) Study. JAMA 1990, 264, 2511–2518. [Google Scholar] [CrossRef]
  6. Drake, R.E.; Wallach, M.A. Moderate drinking among people with severe mental illness. Hosp. Community Psychiatry 1993, 44, 780–782. [Google Scholar] [CrossRef]
  7. Kessler, R.C.; McGonagle, K.A.; Zhao, S.; Nelson, C.B.; Hughes, M.; Eshleman, S.; Wittchen, H.U.; Kendler, K.S. Lifetime and 12-month prevalence of DSM-III-R psychiatric disorders in the United States: Results from the National Comorbidity Survey. Arch. Gen. Psychiatry 1994, 51, 8–19. [Google Scholar] [CrossRef]
  8. Wright, S.; Gournay, K.; Glorney, E.; Thornicroft, G. Dual diagnosis in the suburbs: Prevalence, need, and in-patient service use. Soc. Psychiatry Psychiatr. Epidemiol. 2000, 35, 297–304. [Google Scholar] [CrossRef]
  9. Evans, K.; Sullivan, M.J. Dual Diagnosis: Counseling the Mentally Ill Substance Abuser; Guilford Press: New York, NY, USA, 2001. [Google Scholar]
  10. Cantwell, R. Substance use and schizophrenia: Effects on symptoms, social functioning, and service use. Br. J. Psychiatry 2003, 182, 324–329. [Google Scholar] [CrossRef]
  11. Blanco, C.; Alegría, A.A.; Liu, S.M.; Secades-Villa, R.; Sugaya, L.; Davies, C. Differences among major depressive disorder with and without co-occurring substance use disorders and substance-induced depressive disorder. J. Clin. Psychiatry 2012, 73, 865–873. [Google Scholar] [CrossRef]
  12. Szerman, N.; Martinez-Raga, J.; Peris, L. Rethinking dual disorders. Addict. Disord. Their Treat. 2012, 11, 191–200. [Google Scholar] [CrossRef]
  13. Atkins, C. Co-Occurring Disorders: Integrated Assessment and Treatment of Substance Use and Mental Disorders; PESI Publishing & Media: Eau Claire, WI, USA, 2014. [Google Scholar]
  14. Todd, J.; Green, G.; Harrison, M.; Ikuesan, B.A.; Self, C.; Baldacchino, A.; Sherwood, S. The challenges of dual diagnosis: Substance misuse and psychiatric disorders. J. Psychiatr. Ment. Health Nurs. 2004, 11, 48–54. [Google Scholar] [CrossRef] [PubMed]
  15. Mueser, K.T.; Noordsy, D.L.; Drake, R.E.; Fox, L. Integrated Treatment for Dual Disorders: A Guide to Effective Practice; Guilford Press: New York, NY, USA, 2003. [Google Scholar]
  16. Sciacca, K. On co-occurring addictive and mental disorders: A brief history of the origins of dual diagnosis treatment and program development. Am. J. Orthopsychiatry 1996, 66, 407–413. [Google Scholar] [CrossRef] [PubMed]
  17. Skinner, H.A. The Drug Abuse Screening Test. Addict. Behav. 1982, 7, 363–371. [Google Scholar] [CrossRef] [PubMed]
  18. Babor, T.F.; Higgins-Biddle, J.C.; Saunders, J.B.; Monteiro, M.G. AUDIT: The Alcohol Use Disorders Identification Test: Guidelines for Use in Primary Care, 2nd ed.; World Health Organization: Geneva, Switzerland, 2001. [Google Scholar]
  19. Drake, R.E.; O’Neal, E.L.; Wallach, M.A. A systematic review of psychosocial research on psychosocial interventions for people with co-occurring severe mental and substance use disorders. J. Subst. Abus. Treat. 2008, 34, 123–138. [Google Scholar] [CrossRef]
  20. Minkoff, K. Best practices: Developing standards of care for individuals with co-occurring psychiatric and substance use disorders. Psychiatr. Serv. 2001, 52, 597–599. [Google Scholar] [CrossRef]
  21. Sterling, S.; Chi, F.; Hinman, A. Integrating care for people with co-occurring alcohol and other drug, medical, and mental health conditions. Alcohol Res. Health 2011, 33, 338–349. [Google Scholar]
  22. Mueser, K.T.; Fox, L. A family intervention program for dual disorders. Community Ment. Health J. 2002, 38, 253–270. [Google Scholar] [CrossRef]
  23. Heslin, K.C.; Weiss, A.J. Hospitalizations Involving Mental and Substance Use Disorders Among Adults; Agency for Healthcare Research and Quality (AHRQ): Rockville, MD, USA, 2015. Available online: https://www.hcup-us.ahrq.gov (accessed on 3 January 2025).
  24. Keen, C.; Kinner, S.A.; Young, J.T.; Jang, K.; Gan, W.; Samji, H.; Zhao, B.; Krausz, M.; Slaunwhite, A. Prevalence of co-occurring mental illness and substance use disorder and association with overdose: A linked data cohort study among residents of British Columbia, Canada. Addiction 2022, 117, 129–140. [Google Scholar] [CrossRef]
  25. Lavergne, M.R.; Shirmaleki, M.; Loyal, J.P.; Jones, W.; Nicholls, T.L.; Schütz, C.G.; Vaughan, A.; Samji, H.; Puyat, J.H.; Kaoser, R.; et al. Emergency department use for mental and substance use disorders: Descriptive analysis of population-based, linked administrative data in British Columbia, Canada. BMJ Open 2022, 12, e057072. [Google Scholar] [CrossRef]
  26. Lavergne, M.R.; Loyal, J.P.; Shirmaleki, M.; Kaoser, R.; Nicholls, T.; Schütz, C.G.; Vaughan, A.; Samji, H.; Puyat, J.H.; Kaulius, M.; et al. The relationship between outpatient service use and emergency department visits among people treated for mental and substance use disorders: Analysis of population-based administrative data in British Columbia, Canada. BMC Health Serv. Res. 2022, 22, 477. [Google Scholar] [CrossRef]
  27. Khan, S. Concurrent mental and substance use disorders in Canada. Health Rep. 2017, 28, 3–8. Available online: https://www.statcan.gc.ca (accessed on 3 January 2025). [PubMed]
  28. Health Canada. Best Practices: Concurrent Mental Health and Substance Use Disorders; Centre for Addiction and Mental Health: Ottawa, ON, Canada, 2002; Available online: https://publications.gc.ca/site/eng/247253/publication.html (accessed on 3 January 2025).
  29. Alberta Health Services. Enhancing Concurrent Capability Across Addiction and Mental Health Services: Foundational Concepts; Alberta Health Services: Edmonton, AB, Canada, 2011; Available online: https://www.albertahealthservices.ca (accessed on 3 January 2025).
  30. World Health Organization. International Classification of Diseases for Mortality and Morbidity Statistics (11th Revision); World Health Organization: Geneva, Switzerland, 2019; Available online: https://icd.who.int (accessed on 3 January 2025).
  31. Canadian Institute for Health Information. Canadian Coding Standards for Version 2022 ICD-10-CA and CCI; CIHI: Ottawa, ON, Canada, 2022; Available online: https://www.cihi.ca/en/ (accessed on 3 January 2025).
  32. Centers for Disease Control and Prevention. International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM) 2024; National Center for Health Statistics: Atlanta, GA, USA, 2024. Available online: https://www.cdc.gov/nchs/icd/icd-10-cm/index.html (accessed on 3 January 2025).
  33. Casillas, S.M.; Scholl, L.; Mustaquim, D.; Vivolo-Kantor, A. Analysis of trends and usage of ICD-10-CM discharge diagnosis codes for poisonings by fentanyl, tramadol, and other synthetic narcotics in emergency department data. Addict. Behav. Rep. 2022, 16, 100464. [Google Scholar] [CrossRef] [PubMed]
  34. Wickham, H.; François, R.; Henry, L.; Müller, K.; Vaughan, D.; dplyr: A Grammar of Data Manipulation (Version 1.1.4) [R Package]. CRAN. 2023. Available online: https://CRAN.R-project.org/package=dplyr (accessed on 3 January 2025).
  35. Wickham, H.; Vaughan, D.; Girlich, M.; tidyr: Tidy Messy Data (Version 1.3.0) [R Package]. CRAN. 2023. Available online: https://CRAN.R-project.org/package=tidyr (accessed on 3 January 2025).
  36. Wickham, H.; Henry, L.; purrr: Functional Programming Tools (Version 1.0.2) [R Package]. CRAN. 2023. Available online: https://CRAN.R-project.org/package=purrr (accessed on 3 January 2025).
  37. Bache, S.; Wickham, H.; magrittr: A Forward-Pipe Operator for R (Version 2.0.3) [R Package]. CRAN. 2022. Available online: https://CRAN.R-project.org/package=magrittr (accessed on 3 January 2025).
  38. R Core Team. R: A Language and Environment for Statistical Computing; [Computer Software]; R Foundation for Statistical Computing: Vienna, Austria, 2022; Available online: https://www.R-project.org (accessed on 3 January 2025).
  39. Liu, D.; Choi, K.W.; Lizano, P.; Yuan, W.; Yu, K.-H.; Smoller, J.W.; Kohane, I. Construction of extra-large scale screening tools for risks of severe mental illnesses using real world healthcare data. arXiv 2023. [Google Scholar] [CrossRef]
  40. De Mattos, B.P.; Mattjie, C.; Ravazio, R.; Barros, R.C.; Grassi-Oliveira, R. Craving for a Robust Methodology: A Systematic Review of Machine Learning Algorithms on Substance-Use Disorders Treatment Outcomes. Int. J. Ment. Health Addict. 2024, 22. [Google Scholar] [CrossRef]
  41. Acharya, N.; Kar, P.; Ally, M.; Soar, J. Predicting Co-Occurring Mental Health and Substance Use Disorders in Women: An Automated Machine Learning Approach. Appl. Sci. 2024, 14, 1630. [Google Scholar] [CrossRef]
Figure 1. Diagram of detection of mental health (MH) status, substance use (SU) status, and their concurrent (MHSU) status in the North American healthcare administrative databases in terms of a given number of hospital visits, medical service physician visits, and time intervals within and between them.
Figure 1. Diagram of detection of mental health (MH) status, substance use (SU) status, and their concurrent (MHSU) status in the North American healthcare administrative databases in terms of a given number of hospital visits, medical service physician visits, and time intervals within and between them.
Psychiatryint 06 00050 g001
Figure 2. Sample simulated real world dataset: (a) Mental health status; (b) Substance use status; (c) Concurrent status (basic); (d) Concurrent status (broad).
Figure 2. Sample simulated real world dataset: (a) Mental health status; (b) Substance use status; (c) Concurrent status (basic); (d) Concurrent status (broad).
Psychiatryint 06 00050 g002
Figure 3. The concept of window calculation in function MHSU_status_broad().
Figure 3. The concept of window calculation in function MHSU_status_broad().
Psychiatryint 06 00050 g003
Figure 4. Frequency of patients’ diagnosis status vs. maximum time span for MH and SU (time = t_MH = t_SU, t_MHSU = 365).
Figure 4. Frequency of patients’ diagnosis status vs. maximum time span for MH and SU (time = t_MH = t_SU, t_MHSU = 365).
Psychiatryint 06 00050 g004
Figure 5. Frequency of patients’ diagnosis status vs. number of hospital visits, with a hospital:physician count ratio = 1:2 and t_MHSU = 365.
Figure 5. Frequency of patients’ diagnosis status vs. number of hospital visits, with a hospital:physician count ratio = 1:2 and t_MHSU = 365.
Psychiatryint 06 00050 g005
Figure 6. Frequency of concurrent MHSU diagnosis vs. maximum time span between MH and SU (Time=t_MHSU), for t_MH = t_SU = {14, 21, 28} days.
Figure 6. Frequency of concurrent MHSU diagnosis vs. maximum time span between MH and SU (Time=t_MHSU), for t_MH = t_SU = {14, 21, 28} days.
Psychiatryint 06 00050 g006
Figure 7. Monthly frequency of MH, SU, and MHSU status in the calendar year 2024.
Figure 7. Monthly frequency of MH, SU, and MHSU status in the calendar year 2024.
Psychiatryint 06 00050 g007
Table 1. The North American healthcare administrative databases in terms of country, level, hospital, and medical service physician categories.
Table 1. The North American healthcare administrative databases in terms of country, level, hospital, and medical service physician categories.
CountryLevelHospital DataPhysician Data
Canada
(Psychiatryint 06 00050 i001)
Federal
  • Discharge Abstract Database (DAD)
  • National Ambulatory Care Reporting System (NACRS)
  • Canadian Institute of Health Information(CIHI) Physician Claims Data
  • Canadian Management Information Database (CMDB)
Province
  • MED-ÉCHO (Quebec) data
  • Ontario Health data
  • Alberta Health data
  • Medical Services Plan (MSP) Data (BC)
  • Alberta Health Physician Claims
  • Régie de l’assurance maladie du Québec (RAMQ)
  • Ontario Health Insurance Plan (OHIP)
Territory
  • Yukon Hospital Corporation Dis-charge Data
  • Northwest Territories Health Authority Records
  • Yukon Health Care Insurance Plan Claims
  • Northwest Territories Medical Billing Data
United States
(Psychiatryint 06 00050 i002)
Federal
  • National Inpatient Sample (NIS)
  • Medicare Provider Analysis and Review (MEDPAR)
  • Veterans Affairs (VA) Inpatient Data
  • Medicare Physician/Supplier Part B Claims Data
  • National Ambulatory Medical Care Survey (NAMCS)
  • Veterans Affairs (VA) Outpatient Data
State
  • State Inpatient Databases (SID)
  • California Office of Statewide Health Planning and Development (OSHPD)
  • New York Statewide Planning and Research Cooperative System (SPARCS)
  • State Medicaid Claims Data
  • All-Payer Claims Databases (APCDs: e.g., Massachusetts, Colorado)
Territory
  • Guam Memorial Hospital Authority Records
  • American Samoa Dept. of Health Inpatient Data
  • Guam Medicaid and CHIP Claims Data
  • American Samoa Medicaid Claims 1
1 As of 3 January 2025.
Table 2. Key features of the simulated real-world healthcare administrative database.
Table 2. Key features of the simulated real-world healthcare administrative database.
GroupSizeVisit Date SpanVisit Length#Hospital#PhysicianSU (Freq)MH (Freq)Other (Freq)
1101 January 2024 –31 January 20241 month12F100 (1)F060 (2)NA
2201 February 2024–31 March 20242 months24T4041 (2)F063 (4)J10 (4)
3301 April 2024–31 June 20243 months36F120 (3)F064 (6)I10 (3)
4401 July 2024–31 December 20246 months612F140 (6)F067 (12)I10 (6), J10 (12)
5251 November 2024–31 December 20242 months36F100 (3)NAJ10 (6)
6251 November 2024–31 December 20242 months24NAF060 (4)I10 (2)
7501 November 2024–31 December 20242 months12NANAI10 (1), J10 (2)
Notes: F100 (Alcohol); F060 (Psychotic); T4041 (Fentanyl); F063 (Mood); F120 (Cannabis); F064 (Anxiety); F140 (Cocaine); F067 (Neurocognitive); I10 (Hypertension); J10 (Influenza); NA (Not Applicable).
Table 3. Two sample clients from the 200 patients in the simulated real-world healthcare administrative database.
Table 3. Two sample clients from the 200 patients in the simulated real-world healthcare administrative database.
ClientIDVisitDateDiagnostic_HDiagnostic_PMHSU_HMeaning_HMHSU_PMeaning_P
00131 January 2024F100NASUAlcoholNANA
00115 January 2024NAF060NANAMHPsychotic
00119 January 2024NAF060NANAMHPsychotic
01119 February 2024T4041NASUFentanylNANA
0117 March 2024T4041NASUFentanylNANA
01114 February 2024NAF063, J10NANAMHMood, Influenza
01117 February 2024NAF063, J10NANAMHMood, Influenza
01114 March 2024NAF063, J10NANAMHMood, Influenza
01110 March 2024NAF063, J10NANAMHMood, Influenza
Notes: Diagnostic_H: ICD code for diagnosis at the hospital; Diagnostic_P: ICD code for diagnosis by service physician; MHSU_H: MH/SU status assigned at hospital; MHSU_P: MH/SU status assigned by physician; Meaning_H: hospital-assigned ICD meaning; Meaning_P: physician-assigned ICD meaning; ICD: International Classification of Diseases.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Soltanifar, M.; Lee, C.H. CMHSU: An R Statistical Software Package to Detect Mental Health Status, Substance Use Status, and Their Concurrent Status in the North American Healthcare Administrative Databases. Psychiatry Int. 2025, 6, 50. https://doi.org/10.3390/psychiatryint6020050

AMA Style

Soltanifar M, Lee CH. CMHSU: An R Statistical Software Package to Detect Mental Health Status, Substance Use Status, and Their Concurrent Status in the North American Healthcare Administrative Databases. Psychiatry International. 2025; 6(2):50. https://doi.org/10.3390/psychiatryint6020050

Chicago/Turabian Style

Soltanifar, Mohsen, and Chel Hee Lee. 2025. "CMHSU: An R Statistical Software Package to Detect Mental Health Status, Substance Use Status, and Their Concurrent Status in the North American Healthcare Administrative Databases" Psychiatry International 6, no. 2: 50. https://doi.org/10.3390/psychiatryint6020050

APA Style

Soltanifar, M., & Lee, C. H. (2025). CMHSU: An R Statistical Software Package to Detect Mental Health Status, Substance Use Status, and Their Concurrent Status in the North American Healthcare Administrative Databases. Psychiatry International, 6(2), 50. https://doi.org/10.3390/psychiatryint6020050

Article Metrics

Back to TopTop