Next Article in Journal
Sustainable Earnings among Immigrants, and the Role of Health Status for Self-Sufficiency: A 10-Year Follow-Up Study of Labour Immigrants and Refugees to Sweden 2000–2006
Previous Article in Journal
The Role of Sports in the Subjective Psychological Well-Being of Hungarian Adult Population in Three Waves of the COVID-19 Pandemic
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Machine Learning Model in Predicting Sarcopenia in Crohn’s Disease Based on Simple Clinical and Anthropometric Measures

1
Department of Digestive Diseases, Huashan Hospital, Fudan University, Shanghai 200040, China
2
Department of Radiology, Huashan Hospital, Fudan University, Shanghai 200040, China
3
Department of Allergy and Immunology, Huashan Hospital, Fudan University, Shanghai 200040, China
*
Authors to whom correspondence should be addressed.
These authors contributed equally to this work.
Int. J. Environ. Res. Public Health 2023, 20(1), 656; https://doi.org/10.3390/ijerph20010656
Submission received: 25 October 2022 / Revised: 22 December 2022 / Accepted: 26 December 2022 / Published: 30 December 2022

Abstract

:
Sarcopenia is associated with increased morbidity and mortality in Crohn’s disease. The present study is aimed at investigating the different diagnostic performance of different machine learning models in identifying sarcopenia in Crohn’s disease. Patients diagnosed with Crohn’s disease at our center provided clinical, anthropometric, and radiological data. The cross-sectional CT slice at L3 was used for segmentation and the calculation of body composition. The prevalence of sarcopenia was calculated, and the clinical parameters were compared. A total of 167 patients were included in the present study, of which 127 (76.0%) were male and 40 (24.0%) were female, with an average age of 36.1 ± 14.3 years old. Based on the previously defined cut-off value of sarcopenia, 118 (70.7%) patients had sarcopenia. Seven machine learning models were trained with the randomly allocated training cohort (80%) then evaluated on the validation cohort (20%). A comprehensive comparison showed that LightGBM was the most ideal diagnostic model, with an AUC of 0.933, AUCPR of 0.970, sensitivity of 72.7%, and specificity of 87.0%. The LightGBM model may facilitate a population management strategy with early identification of sarcopenia in Crohn’s disease, while providing guidance for nutritional support and an alternative surveillance modality for long-term patient follow-up.

1. Introduction

Sarcopenia is defined as the progressive generalized loss of skeletal muscle mass and strength associated with increased morbidity and mortality. Although more commonly recognized as a degenerative process in the older population, sarcopenia has also been associated with a wide range of disease spectrum, including cancer, metabolic disorders, and inflammatory diseases [1,2]. Malabsorption is a major contributor to muscle loss and dysfunction in sarcopenia and is commonly observed in patients with inflammatory bowel disease (IBD) [3].
Crohn’s disease (CD), a subtype of inflammatory bowel disease, is characterized by transmural inflammation of the gastrointestinal tract and can often be complicated by strictures, fistulae, and abscess formation [4]. Due to the different severity of enteropathy, CD patients have varying degrees of malnutrition resulting in loss of muscle mass and function. Previous studies have demonstrated that sarcopenia has a prognostic role in the management of inflammatory bowel disease. The estimated prevalence of sarcopenia in CD ranges from 20–70%, based on data collected in different patient subgroups, including active or stable disease, pediatric patients, and different ethnicity [5]. Sarcopenia is undoubtedly the direct result of chronic inflammation and malnutrition. Growing evidence has shown that it has a debilitating effect on patients’ physiological reserve, which hinders their ability for postoperative recovery and increases the likelihood of surgical complications. Alterations in body composition throughout the course of the disease may also provide therapeutic indications [6].
According to the European Working Group on Sarcopenia in Older People (EWGSOP) [1] and Asian Working Group for Sarcopenia (AWGS) [2], skeletal muscle mass can be quantified as appendicular skeletal muscle mass (ASM) or cross-sectional analysis of a specific muscle group based on magnetic resonance imaging (MRI) and computed tomography (CT), which is considered the gold standard for diagnosis. However, the requirement of specific image processing software to quantify CT images can be a labor-intensive and time-consuming process even for trained specialists.
Machine learning is a subdomain of artificial intelligence (AI), which incorporates the use of software algorithms to identify patterns in clinical datasets. Machine learning has rapidly driven the progress of AI in health care, demonstrating impressive results in patient monitoring, clinical decision support, improving diagnostics and prognostics, and even clinical research [7]. The incorporation of machine learning into clinical practice has become an area of interest for clinicians across all subspecialties.
The present study is determined to explore the incidence of sarcopenia in CD patients and to investigate the accuracy of different machine learning models and their performance in determining sarcopenia in Crohn’s disease based on easily accessible clinical data. Patients diagnosed with Crohn’s disease at a tertiary hospital in Shanghai, China, were recruited and evaluated for the presence of sarcopenia, which was determined based on skeletal muscle segmentation of the patient’s abdominal CT scan. Clinical and anthropometric data were collected and evaluated for their performance in identifying sarcopenia based on different machine learning models. Different machine learning models were tested based on randomly assigned training and validation sets. We aim to explore the clinical potential of an ideal diagnostic model that can be used not only to identify the presence of sarcopenia but also to provide a conventional surveillance modality throughout the course of the disease. The following machine learning methods, including Naive Bayes, Logistic Model, Classification Tree, Random Forest, adaBoost, XGBoost, and LightGBM, could successfully identify the presence of sarcopenia based on baseline characteristics. Among these models, LightGBM yielded the most ideal diagnostic performance (AUC of 0.933, AUCPR of 0.970, sensitivity of 72.7%, specificity of 87.0%), which portrayed its clinical applicability. Furthermore, we applied SHAP values to interpret the LightGBM model, highlighting the significance of BMI, gender, height, and CRP in the diagnosis of sarcopenia in Crohn’s disease.

2. Materials and Methods

2.1. Patients and Data Collection

The present study was conducted at the Department of Digestive Diseases of Huashan Hospital, Fudan University, a municipal tertiary medical center located in Shanghai, China. A retrospective review of our inpatient medical database was conducted. A total of 323 patients diagnosed and treated for Crohn’s disease from January 2016 through March 2022 were reviewed for eligibility. Patients were excluded if the abdominal or small intestinal CT was not performed at our center within one month of admission. Patients with missing or incomplete clinical data were also excluded. A total of 167 patients were ultimately included in the present study. Detailed patient history was reviewed and collected for demographic data, disease characteristics based on Montreal classification, laboratory examinations, endoscopic results with SES-CD (simple endoscopic score for Crohn’s disease), and body composition via radiological findings, prior to the diagnosis of sarcopenia. Candidate predictor values were identified, including gender; age; age at diagnosis (A1, A2, A3); disease location (L1, L2, L3, L4); disease behavior (B1, B2, B3); perianal disease; SES-CD; laboratory data including white blood cell counts (WBC), red blood cell counts (RBC), hemoglobin (Hb), platelet counts (PLT), albumin (ALB), prealbumin (PA), erythrocyte sedimentation rate (ESR), and C-reactive protein (CRP); height (cm); weight (kg); and body mass index (BMI, kg/m2).

2.2. Body Composition

Abdominal CT or small intestinal CT was routinely conducted as part of disease assessment for hospitalized patients. The original Dicom files were retrospectively reviewed by a blinded radiologist. A single cross-sectional CT slice at the level of L3 (third lumbar vertebrae) was used for segmentation of skeletal muscle mass (SMM), visceral adipose tissue (VAT), and subcutaneous adipose tissue (SAT). SliceOmatic software (5.0 Rev-9, Tomovision, Montreal, QC, Canada) was used to calculate body composition including skeletal muscle mass (SMM), visceral adipose tissue (VAT), and subcutaneous adipose tissue (SAT). The threshold for delineating skeletal muscle tissue was −29 to 150 Hounsfield units (HU), visceral adipose tissue was −150 to −50 HU, and subcutaneous adipose tissue was −190 to −30 HU. Manual segmentation was performed to calculate the surface area (cm2) of the region of interest, which included the psoas, quadratus lumborum, transverse abdominis, external and internal obliques, rectus abdominis, and erector spinae muscles. Skeletal muscle index (SMI), visceral adipose index (VAI), and subcutaneous adipose index (SAI) were subsequently calculated by dividing the targeted surface area by height squared. A cutoff value of SMI <49.9 cm2/m2 in male patients and <28.7 cm2/m2 in female patients was used to define sarcopenia, based on a previous report of a Chinese IBD population cohort [8].

2.3. Machine Learning and Statistical Analyses

Patients were randomly divided into training (n = 133) and validation cohorts (n = 34) in a ratio of 80% to 20%. Naive Bayes from e1071 package [9], Logistic Model [10], Classification Tree from rpart package [11], Random Forest from randomForest package [12], adaBoost from JOUSBoost package [13], XGBoost model from xgboost package [14], and LightGBM model from lightgbm package [15] were applied for machine learning. For XGBoost model, the optimal parameters including eta value, maximal depth, minimal child weight, and subsample were determined via 5-fold cross-validation and Bayesian optimization as performed in previous studies, and trained by 10 iterations [16]. For LightGBM model, the optimal parameters including lambda1, lambda2, feature fraction, and MinHessianLeaf were determined by grid search, and the training process was repeated by 100 rounds. For outcome output, Naive Bayes, Classification Tree, Random Forest, and adaBoost models directly output binary variables (sarcopenic or nonsarcopenic). Logistic Model, XGBoost, and LightGBM models output the probability that the patient had sarcopenia. A probability greater than 0.5 was predictive of sarcopenia. For the model evaluation, confusionMatrix was constructed by caret package. Furthermore, Matthews correlation coefficient (MCC) and F1 score were calculated for each model. The area under the receiver operating characteristic (ROC) curve and precision-recall (PR) curve was utilized to optimize model selection. To explain the variables in the XGBoost or LightGBM model, SHAP (Shapley additive explanations) value was calculated for each variate of each sample with shapforxgboost package, which was visualized by ggplot2. Continuous variables were compared by Student‘s T-test or Wilcoxon test, depending on the distribution and variance of the data. Categorical variables were tested using chi-square test or Fisher’s test. All statistical analyses were performed with R (4.0.3).

3. Results

3.1. Patient Characteristics

A total of 167 patients were included in the present study, of which 127 (76.0%) were male and 40 (24.0%) were female, with an average age of 36.1 ± 14.3 years old. Based on the previously defined cut-off value of sarcopenia, 118 (70.7%) patients had sarcopenia. Disease characteristics of Crohn’s disease were defined by the Montreal classification. The majority of patients (62.9%) experienced disease onset at the age of 17–40 years old (A2), followed by >40 years old (A3), and <17 years old (A1). Most patients had ileal or ileocolic disease (45.8% and 42.8%, respectively), while stricturing disease was the more common disease behavior (51.8%). Perianal disease was observed in the majority of patients (74.3%). All patients received endoscopic examination to assess disease severity via colonoscopy, retrograde double-balloon endoscopy, or both. SES-CD was used to determine the severity of mucosal defect. The average SES-CD score for all patients was 6.47 ± 5.9. Detailed patient characteristics are summarized in Table 1.
Baseline characteristics were compared between sarcopenic and nonsarcopenic patients. A significant difference was noted in gender distribution and age of disease onset. Sarcopenia had a male predominance (92.4%) and a younger age of disease diagnosis. A significant difference was also noted in WBC, RBC, and PLT. Anthropometric data showed that sarcopenic patients were taller, but their BMI was significantly lower than nonsarcopenic patients. Body composition data showed an average SMI of 39.4 ± 6.4 cm2/m2 in sarcopenic patients, while the average SMI was 41.6 ± 9.8 cm2/m2 in nonsarcopenic patients. Visceral adipose tissue and subcutaneous adipose tissue were also significantly lower in sarcopenic patients compared to nonsarcopenic patients, with an average VAI of 17.7 ± 14.7 cm2/m2 vs. 33.67 ± 19.9 cm2/m2 and SAI of 21.6 ± 13.8 cm2/m2 vs. 43.44 ± 19.0 cm2/m2, respectively (Figure S1).

3.2. Model Building and Evaluation

A total of seven machine learning models were implemented to determine an ideal diagnostic algorithm for sarcopenia, namely Naive Bayes, Logistic Model, Classification Tree, Random Forest, adaBoost, XGBoost, and LightGBM. All 18 variables (as listed in Section 2.1) derived from clinical and laboratory data were included for algorithm calculations. Each machine learning model was trained with the randomly allocated training cohort (80%) as described in Section 2.3. The performance of each model was evaluated on the validation cohort (20%), in which performance results including accuracy, sensitivity, specificity, precision, F1 score, MCC, AUC (area under curve), AUPRC (area under the precision-recall curve), TP (true positive), FP (false positive), FN (false negative), TN (true negative), PPV (positive predictive value), and NPV (negative predictive value) were calculated and summarized in Table 2. A comprehensive comparison showed that LightGBM was the most ideal diagnostic model, with an AUC of 0.933, AUCPR of 0.970, sensitivity of 72.7%, specificity of 87.0%, PPV of 0.727, NPV of 0.870, F1 of 0.727, and MCC of 0.597, while the adaBoost model was the least ideal (Figure 1).
Consequently, SHAP analysis was used to further interpret the results of the LightGBM model for its optimal diagnostic performance by computing the contribution of each variable. The average Shapley scores plot ranked the variables from most important to least important in contribution to the patient’s sarcopenic status in Crohn’s disease. The four most important variables were BMI, followed by gender, height, and CRP, based on the LightGBM model (Figure 2A).
The SHAP summary plot shows the distribution of all 18 variables and their corresponding positive or negative contribution to the prediction of sarcopenia. Each dot represents per patient per feature, colored according to an attribution value, wherein yellow represents a lower value, while blue represents a higher value. BMI has the greatest variability, in which a lower BMI value was associated with a positive Shapley value, contributing to an increased likelihood of sarcopenia. Conversely, a higher prealbumin level predicted a lower chance of sarcopenia due to a negative marginal impact on the Shapley value (Figure 2B).
The SHAP dependence plot represents the importance of each variable with respect to baseline. A corresponding Shapley value that exceeds one contributes to the risk of sarcopenia. In general, patients with lower BMI, taller stature, higher CRP, higher PLT, lower PA, lower WBC, younger age, and lower ALB contributed to an increased risk of sarcopenia (Figure S2).
The collective effect of each variable can be visualized at the local (patient) level Shapely values. Each bar represents the patient’s total Shapely score based on the additive contribution of each variable in a landscape view (Figure S3).
Representative case studies show the performance of the diagnostic model at a local level. The Shapley values of the top four predictive variables indicate the risk of sarcopenia in individual patients, for instance, a taller male patient with a lower BMI and higher CRP is at risk for sarcopenia (red background) compared to a shorter female patient with higher BMI and lower CRP (blue background) (Figure 2C).

4. Discussion

The present study presents a diagnostic model for predicting sarcopenia in Crohn’s disease. The optimal algorithm was determined based on reruns of different machine learning models with easily accessible clinical and laboratory data. It was discovered that multiple machine learning methods, including Naive Bayes, Logistic Model, Classification Tree, Random Forest, adaBoost, XGBoost, and LightGBM could identify sarcopenia given baseline data. Among the models, LightGBM shared the best diagnostic performance on the validation cohort (ROC-AUC = 0.933, PR-AUC = 0.970, sensitivity = 0.727, specificity = 0.870), which indicated optimal clinical applicability. Furthermore, we applied SHAP values to interpret the LightGBM model, indicating the significance of BMI, gender, height, and CRP in the diagnosis of sarcopenia among CD patients. The high-performance gradient boosting framework displayed promising value in the clinical diagnosis of sarcopenia in Crohn’s disease patients. To the best of our knowledge, this is the first article to predict the presence of sarcopenia in Crohn’s disease patients based on a screening of different machine learning algorithms. The variables applied in the present study included demographic and anthropometric data, and laboratory parameters which are routinely collected upon hospital admission or at outpatient clinics.
Machine learning has become a new tool used in the field of medicine, applied to medical diagnosis and clinical decisions. However, the application of machine learning from theory to practice still has a long way to go [17]. Traditional machine learning methods, such as logistic regression and random forests, are prone to underfitting due to the lack of boosting and ensembling. In this study, high-tech gradient boosting algorithms including XGBoost and LightGBM were integrated into the machine learning framework to establish a powerful and clinically applicable model [18]. Physicians may be reluctant to employ machine learning in clinical decision making due to the lack of transparency in the derivation of a diagnosis or decision. In order to improve the transparency of the calculation process, the Shapley additive explanations (SHAP) methodology provided a visual depiction of our predictions [19,20]. Highly relevant variables including BMI, height, gender, and CRP could easily identify patients with sarcopenia based on Shapley values at a local (patient) level.
Machine learning has weighted the role of different variables in determining the presence of sarcopenia. The identification of risk factors may provide insight into disease pathogenesis. The diagnosis of sarcopenia has been rapidly gaining awareness in various disease entities, including cancer, metabolic syndromes, and autoimmune disease. As a subtype of IBD, Crohn’s disease can result in mucosal defects throughout the entire GI tract. Severe complications such as fistulas, strictures, and obstruction may also interfere with nutrient absorption. Active inflammation shortens the contact time of nutrients and intestinal mucosal surface, which interferes with absorption of amino acids, and contributes to the exacerbation of malabsorption and the patient’s sarcopenic state [21]. Sarcopenia may be a direct result of malabsorption or an indirect consequence of systemic inflammatory cascade. Chronic inflammation may also have a role in the development of sarcopenia. The systemic elevation of proinflammatory cytokines such as interferon IFN-γ, IL-1, IL-4, and tumor necrosis factor TNF-α are associated with protein catabolism and reduced muscle protein synthesis, potentially through inhibition of the anabolic mTORC1 pathway [22,23]. Consistent with our results, variables associated with active inflammation such as CRP, PLT, and WBC were associated with a higher likelihood of sarcopenia.
The present study also identified several anthropometric measures, such as BMI and height, as strong predictors for sarcopenia. Interestingly, a taller stature was associated with an increased risk of sarcopenia. The common notion denotes that increased height is commonly associated with increased muscles mass. However, in chronic inflammatory diseases, such as IBD, muscle mass may not necessarily increase with height, instead a negative correlation could be observed. Another possible explanation is of ethic disparity; since this study was conducted in Shanghai, China, only Chinese patients were included in the present cohort. Asians appear to have lower height-adjusted muscle mass values compared to Caucasians [24]. Due to the small sample size of our study, the majority of patients diagnosed with sarcopenia were male (92.4%); therefore, height could also be influenced by the uneven gender distribution.
Although the etiology of sarcopenia remains elusive, its prognostic value has been confirmed by several studies. Multiple studies have demonstrated that sarcopenia in IBD has been associated with poor outcomes, especially in patients undergoing surgery [25,26,27]. Studies have also shown that the presence of sarcopenia positively correlates with disease severity and prolonged hospital stay or rehospitalization in patients with IBD. Early identification of sarcopenia can sway treatment decisions involving early and active intervention for both disease activity and sarcopenia. Nutritional intervention has been proven effective in maintaining muscle mass and increasing muscle strength and function in sarcopenia, which can be counteracted with adequate dietary protein intake of 1.2–1.5 g/kg/day, especially during active-phase IBD. The role of pharmacological intervention in improving sarcopenia in IBD patients has also received increasing attention. Infliximab improves muscle mass and muscle strength in CD patients after 24 weeks with a significant reduction in IL-6 [28]. Recent publication of Selecting Therapeutic Targets in Inflammatory Bowel Disease (STRIDE)-II has advocated not only the importance of clinical remission and mucosal healing but also the reestablishment of physical function and improvement of QOL or ADL as long-term treatment goals [29]. These statements suggest the important role of sarcopenia in the management of IBD.
Sarcopenia can be assessed via various methods, including imaging techniques such as computed tomography (CT) and dual-energy X-ray absorptiometry (DXA). Comparatively, CT provides a direct estimate of muscle mass, while DXA provides an indirect estimate of lean mass. However, measurement of muscle mass is insufficient in the determination of muscle function. Grip strength and walking speed can collectively provide a more well-rounded assessment of the disease state [30]. Skeletal muscle signal intensity in MRI T1-weighted images has also been investigated to provide diagnostic reference [31]. To date, there is no consensus as to the gold standard diagnostic modality for sarcopenia [32]. Therefore, the reported incidence in IBD patients ranges from 20–70%, which may be due to differently defined cut-off values across different patient populations, patient ethnicities, and disease statuses, as well as the use of different diagnostic modalities.
The present study screened and identified an optimal machine learning algorithm, the LightGBM, with a high predictive performance for identifying sarcopenia in Crohn’s disease. The variables used are easily accessible in both inpatient and outpatient settings. Crohn’s disease patients require routine follow-up with periodic disease surveillance. However, abdominal CT is usually ordered once every 6–12 months due to radiation exposure. Apart from the inconvenience of performing an imaging study during every visit, a patient’s sarcopenic state can only be determined based on measurement of skeletal muscle mass performed by a trained radiologist or automated segmentation system, which may not be readily available. The proposed machine learning model can provide an alternative modality to assess sarcopenia in Crohn’s disease patients at shorter intervals. This information could also be additive to evaluating patient response to treatment and disease activity.

Study Limitations

Although the present study employed the cross-sectional CT images at L3 of CD patients for accurate measurement of SMM and SMI, there are several limitations. Due to the retrospective nature of the study, muscle strength and physical performance of the patient at the point of study inclusion cannot be assessed. Radiological calculation of skeletal muscle mass remains a rather monotonous assessment of sarcopenia. Secondly, the chronic disease nature of CD makes it difficult to consider disease duration and previous treatment received as confounding factors when determining a patient’s sarcopenic state.

5. Conclusions

Accumulating evidence has demonstrated the diagnostic and prognostic significance of sarcopenia in IBD patients [33]. Early identification and appropriate intervention of sarcopenia may accelerate clinical remission and mucosal healing. The present study proposes a noninvasive predive model based on anthropometric data and clinical variables to predict the presence of sarcopenia in CD patients. The LightGBM model may facilitate a population management strategy with early identification of sarcopenia, while also providing guidance for nutritional support and an alternative surveillance modality for long-term patient follow-up.

Supplementary Materials

The following are available online at https://www.mdpi.com/article/10.3390/ijerph20010656/s1, Figure S1: L3 segmentation and calculation of body composition index in a sarcopenic and nonsarcopenic patient. Figure S2: SHAP dependence plot of the LightGBM model. Dependence plot depicts how each variable affects the prediction of sarcopenia. Figure S3: SHAP force plot of the LightGBM model. Each stacked bar represents one patient and their total Shapley score. The predicted sarcopenic state of each patient is determined by his/her total Shapley value.

Author Contributions

Y.T. and S.M. collected patient data, conducted all statistical analyses, and contributed to the conception and drafting of the manuscript. Y.Z. retrieved radiological data and calculated body composition. W.Z., B.Z., L.R., F.L. and H.S. edited the manuscript and provided critical revision. L.R., Z.L. and J.L. provided study conceptualization and contributed to the design and final approval of the submitted manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the National Natural Science Foundation of China (No. 81870456), Huashan Hospital Fudan University Research Starting Grant (No. 2020QD008) and Huashan Hospital Fudan University Original Research Grant (No. IDF151039/006).

Institutional Review Board Statement

The study protocol was approved by the Huashan Hospital, Fudan University Institute Review Board (No. 2022-913). All eligible participants were asked to provide informed consent for the review of their clinical data.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Original data can be made available via formal acquisition to the corresponding author.

Acknowledgments

The authors would like to thank the nursing faculty of the Department of Digestive Diseases and the staff of the Information Center for data retrieval.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

IBDInflammatory bowel disease
CDCrohn’s disease
EWGSOPEuropean Working Group on Sarcopenia in Older People
AWGSAsian Working Group for Sarcopenia
ASMAppendicular skeletal muscle mass
MRIMagnetic resonance imaging (MRI)
CTComputed tomography
AIArtificial Intelligence
SES-CDSimple endoscopic score for Crohn’s disease
WBCWhite blood count
RBCRed blood count
HbHemoglobin
PLTPlatelet
ALBAlbumin
PAPrealbumin
ESRErythrocyte sedimentation rate
CRPC-reactive protein
BMIBody mass index
L3Third lumbar vertebrae
SMMSkeletal muscle mass
VATVisceral adipose tissue
SATSubcutaneous adipose tissue
SMISkeletal muscle index
VAIVisceral adipose index
SAISubcutaneous adipose index
MCCMatthews correlation coefficient
ROCReceiver operating characteristics
PRPrecision-recall
SHAPShapley additive
AUCArea under curve
AUPRCArea under the precision-recall curve
TPTrue positive
FPFalse positive
FNFalse negative
TNTrue negative
PPVPositive predictive value
NPVNegative predictive value

References

  1. Cruz-Jentoft, A.J.; Bahat, G.; Bauer, J.; Boirie, Y.; Bruyère, O.; Cederholm, T.; Cooper, C.; Landi, F.; Rolland, Y.; Sayer, A.A.; et al. Sarcopenia: Revised European consensus on definition and diagnosis. Age Ageing 2019, 48, 16–31. [Google Scholar] [CrossRef] [Green Version]
  2. Chen, L.-K.; Woo, J.; Assantachai, P.; Auyeung, T.-W.; Chou, M.-Y.; Iijima, K.; Jang, H.C.; Kang, L.; Kim, M.; Kim, S.; et al. Asian Working Group for Sarcopenia: 2019 Consensus Update on Sarcopenia Diagnosis and Treatment. J. Am. Med. Dir. Assoc. 2020, 21, 300–307.e302. [Google Scholar] [CrossRef]
  3. Cederholm, T.; Barazzoni, R.; Austin, P.; Ballmer, P.; Biolo, G.; Bischoff, S.C.; Compher, C.; Correia, I.; Higashiguchi, T.; Holst, M.; et al. ESPEN guidelines on definitions and terminology of clinical nutrition. Clin. Nutr. 2017, 36, 49–64. [Google Scholar] [CrossRef]
  4. Roda, G.; Ng, S.C.; Kotze, P.G.; Argollo, M.; Panaccione, R.; Spinelli, A.; Kaser, A.; Peyrin-Biroulet, L.; Danese, S. Crohn’s disease. Nat. Rev. Dis. Prim. 2020, 6, 22. [Google Scholar] [CrossRef]
  5. Nishikawa, H.; Nakamura, S.; Miyazaki, T.; Kakimoto, K.; Fukunishi, S.; Asai, A.; Nishiguchi, S.; Higuchi, K. Inflammatory Bowel Disease and Sarcopenia: Its Mechanism and Clinical Importance. J. Clin. Med. 2021, 10, 4214. [Google Scholar] [CrossRef]
  6. Ryan, M.E.; McNicholas, M.D.; Ben Creavin, M.; Kelly, M.M.E.; Walsh, F.T.; Beddy, F.D. Sarcopenia and Inflammatory Bowel Disease: A Systematic Review. Inflamm. Bowel Dis. 2019, 25, 67–73. [Google Scholar] [CrossRef]
  7. May, M. Eight ways machine learning is assisting medicine. Nat. Med. 2021, 27, 2–3. [Google Scholar] [CrossRef]
  8. Zhang, T.; Ding, C.; Xie, T.; Yang, J.; Dai, X.; Lv, T.; Li, Y.; Gu, L.; Wei, Y.; Gong, J.; et al. Skeletal muscle depletion correlates with disease activity in ulcerative colitis and is reversed after colectomy. Clin. Nutr. 2017, 36, 1586–1592. [Google Scholar] [CrossRef]
  9. Cinelli, M.; Sun, Y.; Best, K.; Heather, J.M.; Reich-Zeliger, S.; Shifrut, E.; Friedman, N.; Shawe-Taylor, J.; Chain, B. Feature selection using a one dimensional naïve Bayes’ classifier increases the accuracy of support vector machine classification of CDR3 repertoires. Bioinformatics 2017, 33, 951–955. [Google Scholar] [CrossRef] [Green Version]
  10. LaValley, M.P. Logistic Regression. Circulation 2008, 117, 2395–2399. [Google Scholar] [CrossRef]
  11. Yang, P.; Bamlet, W.; Ebbert, J.; Taylor, W.; De Andrade, M. Glutathione pathway genes and lung cancer risk in young and old populations. Carcinogenesis 2004, 25, 1935–1944. [Google Scholar] [CrossRef] [Green Version]
  12. Pavey, T.G.; Gilson, N.D.; Gomersall, S.R.; Clark, B.; Trost, S.G. Field evaluation of a random forest activity classifier for wrist-worn accelerometer data. J. Sci. Med. Sport 2017, 20, 75–80. [Google Scholar] [CrossRef] [Green Version]
  13. Tang, J.; Henderson, A.; Gardner, P. Exploring AdaBoost and Random Forests machine learning approaches for infrared pathology on unbalanced data sets. Analyst 2021, 146, 5880–5891. [Google Scholar] [CrossRef]
  14. Mustapha, I.B.; Saeed, F. Bioactive Molecule Prediction Using Extreme Gradient Boosting. Molecules 2016, 21, 983. [Google Scholar] [CrossRef] [Green Version]
  15. Zhan, Z.-H.; You, Z.-H.; Li, L.-P.; Zhou, Y.; Yi, H.-C. Accurate Prediction of ncRNA-Protein Interactions From the Integration of Sequence and Evolutionary Information. Front. Genet. 2018, 9, 458. [Google Scholar] [CrossRef]
  16. Smith, M.; Alvarez, F. Identifying mortality factors from Machine Learning using Shapley values—A case of COVID19. Expert Syst. Appl. 2021, 176, 114832. [Google Scholar] [CrossRef]
  17. Uche-Anya, E.; Anyane-Yeboa, A.; Berzin, T.M.; Ghassemi, M.; May, F.P. Artificial intelligence in gastroenterology and hepatology: How to advance clinical practice while ensuring health equity. Gut 2022, 71, 1909–1915. [Google Scholar] [CrossRef]
  18. Yan, J.; Xu, Y.; Cheng, Q.; Jiang, S.; Wang, Q.; Xiao, Y.; Ma, C.; Yan, J.; Wang, X. LightGBM: Accelerated genomically designed crop breeding through ensemble learning. Genome Biol. 2021, 22, 271. [Google Scholar] [CrossRef]
  19. Yu, Z.; Ji, H.; Xiao, J.; Wei, P.; Song, L.; Tang, T.; Hao, X.; Zhang, J.; Qi, Q.; Zhou, Y.; et al. Predicting Adverse Drug Events in Chinese Pediatric Inpatients With the Associated Risk Factors: A Machine Learning Study. Front. Pharmacol. 2021, 12, 659099. [Google Scholar] [CrossRef]
  20. Ning, Y.; Ong, M.E.H.; Chakraborty, B.; Goldstein, B.A.; Ting, D.S.W.; Vaughan, R.; Liu, N. Shapley variable importance cloud for interpretable machine learning. Gene Expr. Patterns 2022, 3, 100452. [Google Scholar] [CrossRef]
  21. Jeejeebhoy, K.N.; Duerksen, D.R. Malnutrition in Gastrointestinal Disorders: Detection and Nutritional Assessment. Gastroenterol. Clin. N. Am. 2018, 47, 1–22. [Google Scholar] [CrossRef] [PubMed]
  22. Wallace, K.L.; Zheng, L.B.; Kanazawa, Y.; Shih, D.Q. Immunopathology of inflammatory bowel disease. World J. Gastroenterol. 2014, 20, 6–21. [Google Scholar] [CrossRef] [PubMed]
  23. Adams, V.; Linke, A.; Wisloff, U.; Döring, C.; Erbs, S.; Kränkel, N.; Witt, C.C.; Labeit, S.; Müller-Werdan, U.; Schuler, G.; et al. Myocardial expression of Murf-1 and MAFbx after induction of chronic heart failure: Effect on myocardial contractility. Cardiovasc. Res. 2007, 73, 120–129. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  24. Woo, J.; Arai, H.; Ng, T.; Sayer, A.A.; Wong, M.; Syddall, H.; Yamada, M.; Zeng, P.; Wu, S.; Zhang, T. Ethnic and geographic variations in muscle mass, muscle strength and physical performance measures. Eur. Geriatr. Med. 2014, 5, 155–164. [Google Scholar] [CrossRef]
  25. Cravo, M.L.; Velho, S.; Torres, J.; Santos, M.P.C.; Palmela, C.; Cruz, R.; Strecht, J.; Maio, R.; Baracos, V. Lower skeletal muscle attenuation and high visceral fat index are associated with complicated disease in patients with Crohn’s disease: An exploratory study. Clin. Nutr. ESPEN 2017, 21, 79–85. [Google Scholar] [CrossRef]
  26. Cushing, K.C.; Kordbacheh, H.; Gee, M.S.; Kambadakone, A.; Ananthakrishnan, A.N. Sarcopenia is a Novel Predictor of the Need for Rescue Therapy in Hospitalized Ulcerative Colitis Patients. J. Crohn’s Colitis 2018, 12, 1036–1041. [Google Scholar] [CrossRef]
  27. Grillot, J.; D’Engremont, C.; Parmentier, A.-L.; Lakkis, Z.; Piton, G.; Cazaux, D.; Gay, C.; De Billy, M.; Koch, S.; Borot, S.; et al. Sarcopenia and visceral obesity assessed by computed tomography are associated with adverse outcomes in patients with Crohn’s disease. Clin. Nutr. 2020, 39, 3024–3030. [Google Scholar] [CrossRef]
  28. Subramaniam, K.; Fallon, K.; Ruut, T.; Lane, D.; McKay, R.; Shadbolt, B.; Ang, S.; Cook, M.; Platten, J.; Pavli, P.; et al. Infliximab reverses inflammatory muscle wasting (sarcopenia) in Crohn’s disease. Aliment. Pharmacol. Ther. 2015, 41, 419–428. [Google Scholar] [CrossRef]
  29. Turner, D.; Ricciuto, A.; Lewis, A.; D’Amico, F.; Dhaliwal, J.; Griffiths, A.M.; Bettenworth, D.; Sandborn, W.J.; Sands, B.E.; Reinisch, W.; et al. STRIDE-II: An Update on the Selecting Therapeutic Targets in Inflammatory Bowel Disease (STRIDE) Initiative of the International Organization for the Study of IBD (IOIBD): Determining Therapeutic Goals for Treat-to-Target strategies in IBD. Gastroenterology 2021, 160, 1570–1583. [Google Scholar] [CrossRef]
  30. Bryant, R.V.; Trott, M.J.; Bartholomeusz, F.D.; Andrews, J.M. Systematic review: Body composition in adults with inflammatory bowel disease. Aliment. Pharmacol. Ther. 2013, 38, 213–225. [Google Scholar] [CrossRef]
  31. Spooren, C.E.; Lodewick, T.M.; Beelen, E.M.; Van Dijk, D.P.; Bours, M.J.; Haans, J.J.; Masclee, A.A.; Pierik, M.J.; Bakers, F.C.; Jonkers, D.M. The reproducibility of skeletal muscle signal intensity on routine magnetic resonance imaging in Crohn’s disease. J. Gastroenterol. Hepatol. 2020, 35, 1902–1908. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  32. Ashton, J.J.; Peiris, D.; Green, Z.; Johnson, M.J.; Marino, L.V.; Griffiths, M.; Beattie, R. Routine abdominal magnetic resonance imaging can determine psoas muscle area in paediatric Crohn’s disease and correlates with bioelectrical impedance spectroscopy measures of lean mass. Clin. Nutr. ESPEN 2021, 42, 233–238. [Google Scholar] [CrossRef] [PubMed]
  33. Ding, N.S.; Tassone, D.; Al-Bakir, I.; Wu, K.; Thompson, A.J.; Connell, W.R.; Malietzis, G.; Lung, P.; Singh, S.; Choi, C.R.; et al. Systematic review: The impact and importance of body composition in Inflammatory Bowel Disease. J. Crohn’s Colitis 2022, 16, 1475–1492. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Receiver operating characteristic curves and precision-recall curves of the seven different machine learning models.
Figure 1. Receiver operating characteristic curves and precision-recall curves of the seven different machine learning models.
Ijerph 20 00656 g001
Figure 2. Ranking of the importance of the top 14 variables (A) with Shapley additive explanations (SHAP) summary plot (B) in the LightGBM model for predicting sarcopenia in Crohn’s disease. Shapley values of the top four variables based on the LightGBM model predicts sarcopenia in representative patient samples (C). BMI: body mass index; CRP: C-reactive protein; WBC: white blood count; PA: prealbumin; PLT: platelet; RBC: red blood count; ALB: albumin; SES-CD: simple endoscopic score for Crohn’s disease; ESR: erythrocyte sedimentation rate; HB: hemoglobin; p: perianal disease; B: disease behavior; L: disease location; A: age at diagnosis.
Figure 2. Ranking of the importance of the top 14 variables (A) with Shapley additive explanations (SHAP) summary plot (B) in the LightGBM model for predicting sarcopenia in Crohn’s disease. Shapley values of the top four variables based on the LightGBM model predicts sarcopenia in representative patient samples (C). BMI: body mass index; CRP: C-reactive protein; WBC: white blood count; PA: prealbumin; PLT: platelet; RBC: red blood count; ALB: albumin; SES-CD: simple endoscopic score for Crohn’s disease; ESR: erythrocyte sedimentation rate; HB: hemoglobin; p: perianal disease; B: disease behavior; L: disease location; A: age at diagnosis.
Ijerph 20 00656 g002
Table 1. Baseline clinical, anthropometric, and radiological characteristics of the included patients.
Table 1. Baseline clinical, anthropometric, and radiological characteristics of the included patients.
Overall Population
n = 167
Sarcopenia
n = 118
No Sarcopenia
n = 49
p-Value
Demographics
Gender <0.001
Female40 (24.0%)9 (7.6%)31 (63.3%)
Male127 (76.0%)109 (92.4%)18 (36.7%)
Age36.1 (14.3)32.6 (12.4)44.6 (15.3)<0.001
Montreal Classification
Age at diagnosis <0.001
A1 (<17 years old)6 (3.6%)6 (5.1%)0 (0.00%)
A2 (17–40 years old)105 (62.9%)85 (72.0%)20 (40.8%)
A3 (>40 years old)56 (33.5%)27 (22.9%)29 (59.2%)
Location 0.186
L1 (Ileal)76 (45.8%)48 (40.7%)28 (58.3%)
L2 (Colonic)14 (8.4%)10 (8.5%)4 (8.3%)
L3 (Ileocolonic)71 (42.8%)56 (47.5%)15 (31.2%)
L4 (Upper Disease)5 (3.0%)4 (3.4%)1 (2.1%)
Disease Behavior 0.446
B1 (nonstricturing, nonpenetrating)57 (34.3%)44 (37.3%)13 (27.1%)
B2 (stricturing)86 (51.8%)58 (49.2%)28 (58.3%)
B3 (penetrating)23 (13.9%)16 (13.6%)7 (14.6%)
Perianal Disease 0.11
Yes124 (74.3%)35 (29.7%)8 (16.3%)
No43 (25.7%)83 (70.3%)41 (83.7%)
Endoscopic Scores
SES-CD6.47 (5.9)6.81 (6.4)5.60 (4.5)0.191
Laboratory Parameters
White blood cell, WBC (×109)6.41 (2.5)6.70 (2.7)5.72 (1.8)0.007
Red blood cell, RBC (×1012)4.44 (0.7)4.54 (0.6)4.22 (0.7)0.005
Hemoglobin, Hb (g/L)123 (27.7)123 (20.9)124 (39.8)0.823
Platelet, PLT (×109)274 (94.6)284 (91.9)250 (97.7)0.039
Albumin, ALB (g/L)37.8 (5.8)37.9 (6.1)37.7 (5.2)0.855
Prealbumin, PA (mg/L)183 (60.4)179 (57.8)193 (66.3)0.211
Erythrocyte sedimentation rate, ESR (mm/h)22.3 (24.4)21.9 (23.5)23.3 (26.8)0.772
C-reactive protein, CRP (mg/L)19.0 (25.0)20.8 (26.1)14.4 (21.4)0.122
Anthropometrics
Height, H (cm)169 (8.3)172.3 (7.1)162.65 (7.1)<0.001
Weight, W (kg)57.3 (9.9)57.02 (9.3)57.9 (11.3)0.643
Body mass index, BMI (kg/m2)19.9 (2.9)19.13 (2.4)21.7 (3.2)<0.001
Body Composition
Skeletal Muscle Mass, SMM (cm2)115.7 (25.7)117.4 (22.1)111.4 (32.7)0.171
Skeletal Muscle Index, SMI (cm2/m2)40.0 (7.6)39.4 (6.4)41.6 (9.8)0.147
Visceral Adipose Tissue, VAT (cm2)64.0 (50.8)53.3 (45.0)89.7 (55.2)<0.001
Visceral Adipose Index, VAI (cm2/m2)22.4 (17.9)17.7 (14.7)33.7 (19.9)<0.001
Subcutaneous Adipose Tissue, SAT (cm2)79.2 (50.2)64.5 (42.8)114.7 (49.4)<0.001
Subcutaneous Adipose Index, SAI (cm2/m2)28.0 (18.4)21.6 (13.8)43.44 (19.0)<0.001
Table 2. Performance of the prediction models generated by seven machine learning algorithms. F1: F1 score; MCC: Matthews correlation coefficient; AUC: area under curve; AUPRC: area under the precision-recall curve; TP: true positive; FP: false positive; FN: false negative; TN: true negative; PPV: positive predictive value; NPV: negative predictive value.
Table 2. Performance of the prediction models generated by seven machine learning algorithms. F1: F1 score; MCC: Matthews correlation coefficient; AUC: area under curve; AUPRC: area under the precision-recall curve; TP: true positive; FP: false positive; FN: false negative; TN: true negative; PPV: positive predictive value; NPV: negative predictive value.
Diagnostic ModelAccuracySensitivity Specificity Precision F1 MCC AUCAUPRCTPFPFNTNPPVNPV
Naive Bayes0.67650.63640.69570.50000.56000.31560.66600.7779747160.63640.6957
Logistic Model0.83870.62500.91300.71430.66670.56310.76900.8687532210.62500.9130
Classification Tree0.76470.45450.91300.71430.55560.42530.68380.7730562210.45450.9130
Random Forest0.83870.75000.86960.66670.70590.59730.80980.8969623200.75000.8696
adaBoost0.67650.54550.73910.50000.52170.27860.64230.7584656170.54550.7391
XGBoost0.79410.63640.86960.70000.66670.51950.91300.9650743200.63640.8696
LightGBM0.82350.72730.86960.72730.72730.59680.93280.9701833200.72730.8696
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Tseng, Y.; Mo, S.; Zeng, Y.; Zheng, W.; Song, H.; Zhong, B.; Luo, F.; Rong, L.; Liu, J.; Luo, Z. Machine Learning Model in Predicting Sarcopenia in Crohn’s Disease Based on Simple Clinical and Anthropometric Measures. Int. J. Environ. Res. Public Health 2023, 20, 656. https://doi.org/10.3390/ijerph20010656

AMA Style

Tseng Y, Mo S, Zeng Y, Zheng W, Song H, Zhong B, Luo F, Rong L, Liu J, Luo Z. Machine Learning Model in Predicting Sarcopenia in Crohn’s Disease Based on Simple Clinical and Anthropometric Measures. International Journal of Environmental Research and Public Health. 2023; 20(1):656. https://doi.org/10.3390/ijerph20010656

Chicago/Turabian Style

Tseng, Yujen, Shaocong Mo, Yanwei Zeng, Wanwei Zheng, Huan Song, Bing Zhong, Feifei Luo, Lan Rong, Jie Liu, and Zhongguang Luo. 2023. "Machine Learning Model in Predicting Sarcopenia in Crohn’s Disease Based on Simple Clinical and Anthropometric Measures" International Journal of Environmental Research and Public Health 20, no. 1: 656. https://doi.org/10.3390/ijerph20010656

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop