1. Introduction
Process mining is a discipline that allows for greater understanding into real-life processes of recorded systems behaviour. Through process mining techniques, numerous case studies and successful companies have demonstrated further quality improvement, compliance, and optimization of processes.
In healthcare, recent review papers have provided an overview of process mining across clinical case studies. Rojas et al in 2016 identified eleven common aspects across 74 clinical case studies [
1]. These aspects include methodologies, techniques or algorithms, medical fields and healthcare specialty. In 2018, Erdogan and Tarhan conducted a systematic mapping of 172 case studies with mostly the same metrics and aspects [
2]. These papers are very specific as to
how these case studies were conducted, which enhances comparison between different process mining techniques in different settings. However, from a medical perspective, the terms and categories listed under
medical fields and
healthcare specialty are not structured in a uniform way, and do not follow a standardized clinical coding scheme. Further, basic characteristics of the event log data (timeframe, number of cases or patients, healthcare facility/organization) are not always clearly reported.
The number of case studies on process mining in healthcare continues to increase steadily. As such, a standard approach of reporting event log data, clinical specialties and medical diagnoses would provide greater clarity and enhance comparability between treatments of specific diseases across different heathcare settings.
In this article, further to the studies examined by Rojas et al., we conducted a forward search of processing mining case studies in healthcare for the three-year period from January 2016 to December 2018. We identified case studies that described basic characteristics of the event log data, and where information on the patient encounter environment, clinical specialty and medical diagnoses could be assigned under a standard clinical coding scheme.
Section 2 describes how the forward search was conducted and which criteria we applied to filter the results. In addition, the methods describe standard clinical coding systems and terminologies that were used. In
Section 3, the results of our analysis are presented.
Section 4 discusses the benefits and gives an outlook on the potential clinical insights gained by reporting and classifying clinical terms, clinical specialties and medical diagnoses using a standard clinical coding scheme.
This article is an extension to the paper presented at the workshop on Process-Oriented Data Science for Healthcare 2019 (PODS4H19), held in conjunction with the BPM2019 conference in Vienna, Austria [
3]. It presents further details to the results of our analysis and it provides an outline for a reporting template in
Appendix A. This template can be used as a checklist for the reporting of case studies on process mining in healthcare.
4. Discussion
Whether for process discovery, conformance checking, or enhancement, process mining case studies are influenced by the quality of the labeled data. The benefits of high-quality, labeled data include improved accuracy, efficiency and predictability of processes, not only for the study itself but also for comparability across studies. Further, high-quality, labeled data can make other kinds of future analyses and even machine learning techniques (e.g., supervised learning, trend estimation, clustering) easier and more efficient to achieve. In process mining case studies in healthcare, labeled data often encompasses clinical aspects and terms. As such, our aim was to examine clinically-relevant case studies since Rojas et al. [
4] and determine how to improve upon the clarity and comparability of clinical aspects and terms described.
4.1. Reporting Basic Characteristics of the Event Log Data
For our analysis, we selected papers that described basic characteristics of the event log data. These characteristics included the origin or source of the data, the healthcare facility, the number of cases or patients, and the timeframe of the study. For example, in Rinner et al. [
18], event logs were extracted for a total of 1023 patients starting melanoma surveillance between January 2010 to June 2017, from a local melanoma registry in a medical university and Hospital Information System (HIS) in Austria. In papers where these characteristics were not clearly reported, the retrieval process was time-consuming. Several papers provided additional details (e.g., patient age, data from private insurance or public health records). Presumably for reasons of privacy and anonymity, specifics on the healthcare facility (e.g., hospital name) were not always provided, however, the country of origin was always reported. While variations exist in the style of reporting, we recommend case studies include these aforementioned basic characteristics when reporting the event log data.
4.2. Adopting the Use of Standard Clinical Descriptors
4.2.1. Encounter Environment
A patient can have vastly different experiences within the healthcare system depending on the clinical setting or encounter environment. For example, a patient with heart failure who presents to the AED may require admission as a hospital inpatient, follow-up at their GP practice site or outpatient clinic, and prescription drugs at a pharmacy. As such, in our analysis of the selected papers, we focused on five patient encounter environments: Inpatient, Outpatient, AED, GP practice site, and Pharmacy. All five encounter types can be coded by SNOMED CT. While further details can be provided (e.g., Outpatient Clinic for Thyroid Disease [
32]), we recommend case studies report at least the patient encounter environment using standard clinical codes e.g., SNOMED CT.
4.2.2. Clinical Specialty
Different clinical specialties are often involved in the care of a patient. For example, for a patient diagnosed with cancer, a multidisciplinary care plan can encompass input from a medical specialty, a surgical specialty and clinical oncology. As each specialty offers their own unique set of knowledge and expertise, it is important to identify which clinical specialty is involved.
For each of our selected papers, we identified at least one of the 18 high-level clinical specialties coded by SNOMED CT. For greater specificity, SNOMED CT offers further standard clinical codes for sub-specialities. In fact, Baek et al. list multiple sub-specialities along with their corresponding SNOMED CT codes in their study [
12]. Also, instead of
Clinical specialty, another category of clinical descriptors such as the type of medical practitioner or occupation could have been considered (e.g., mapping to surgeon instead of surgical specialty).
In any event, the task of identifying and assigning such standard clinical codes is time consuming, and beyond the scope of this paper. For future case studies, we recommend reporting the clinical specialty (or similar clinical descriptor such as medical practitioner) by adopting standard clinical codes e.g., SNOMED CT.
4.2.3. Medical Diagnosis
There are literally thousands of medical diagnoses, and each diagnosis comes with its own treatment and management plan. ICD-10 is a standard coding scheme in healthcare that provides specific clinical descriptors and codes for diseases and health conditions. In our analysis, we were able to identify at least one medical diagnosis or description of a medical diagnosis in each paper, which we could map to the corresponding ICD-10 code. Further, over 25% (10 out of 38) of our selected papers utilized either ICD-9 or ICD-10 codes in their study. For broader comparison across studies, we assigned the selected papers to one or more of the 22 ICD-10 chapters or block categories. In
Table 6 we only listed the ICD-10 chapters that were covered in the case studies.
It is important to distinguish the difference between a medical diagnosis (i.e., the process of identifying the disease or medical condition that explains a patient’s signs and symptoms) versus a patient’s signs (e.g., rash) or symptoms (e.g., cough). While the majority of ICD-10 chapters describe a group of medical diagnoses, some cover other clinical descriptors, such as signs and symptoms (R00-R99), external causes of morbidity and mortality (V01-V98), and codes for special purposes (U00-99). ICD-10 also allows for the coding of location, severity, cause, manifestation and type of health problem [
43].
Taken together, we recommend adopting use of a standard coding scheme e.g., ICD-10 for clinical terms and aspects relating to medical diagnosis in process mining case studies in healthcare. Recently developed, ICD-11 is not adopted yet but provides backward compatibility, i.e., ICD-10 coded case studies will be comparable to newer ICD-11 coded ones, once the new coding scheme will be taken on by the information system vendors.
4.3. Conclusions and Future Perspectives
In summary, we propose adopting a standard for describing event log data and reporting medical terminology using standard clinical descriptors and coding schemes. In doing so, the goal is to improve accuracy and comparability across future clinically-relevant process mining case studies in health care. As such, we provide a sample checklist template of standard criteria for the reporting of such case studies, in
Appendix A.
In scientific research, the idea of having a set of guidelines, criteria, or standards for peer-reviewed publications is not novel. In fact, journals such as
Nature are taking initiatives by creating mandatory reporting summary templates (
https://www.nature.com/documents/nr-reporting-summary-flat.pdf), in order to improve comparability, transparency, and reproducibility of the work they publish [
44]. Other journals and disciplines, including biomedical informatics, are following suit [
45]. Thus, as data sets become more transparent and available, consistency in reporting characteristics of the event log data (e.g., origin of data, number of patients or cases, healthcare facility, timeframe of the study) will aid in improving comparability and reproducibility across future studies.
Further to the work by Rojas et al. [
28], we identified and described the clinical terms and aspects in our selected papers with respect to three categories: the patient encounter environment, clinical specialty, and medical diagnosis. We then correlated the clinical terms and aspects to their respective standard clinical descriptors and codes found in SNOMED CT and ICD-10. For studies where a higher granularity for patient encounter environments is needed, SNOMED CT offers more codes and the compositional grammar could be useful. Similarly, for
Clinical specialty in SNOMED CT, reporting of sub-specialties under e.g.,
Medical speciality will provide increased specificity for clarity and comparison.
As aforementioned, several case studies have already adopted the use of a standard clinical coding scheme to describe medical diagnoses. Howevever, our consideration of SNOMED CT and ICD-10 serves only as a starting point. In fact, SNOMED CT also provides standard codes for medical diagnoses, which can provide further specificity and clarity. For example, instead of ICD-10, the one of Systematized Nomenclature for Dentistry or SNODENT CT (which is incorporated into SNOMED CT) could have been used to code for the clinical descriptors of missing and filled teeth in one of our selected papers [
7].
Finally, when adopting the use of standard clinical descriptors, we recognize other fundamental clinical categories to consider are medical investigations and procedures. As such, the use of standard clinical descriptors is becoming increasingly relevant, not only for clarity and comparability, but efficiency in outcome measurements such as length of stay (LOS) and financial cost. For example, in their paper, Baek et al. utilized process mining techniques and statistical methods to identify the factors associated with LOS in a South Korean hospital [
12]. This study is just one use case for a more detailed description of the medical context where process mining case studies could allow for future meta-studies, e.g., benchmarking LOS in different hospitals or countries, based on diagnoses while also considering other important factors like the patient encounter environment.