Next Article in Journal
Workplace Health Promotion Embedded in Medical Surveillance: The Italian Way to Total Worker Health Program
Previous Article in Journal
Associations between Gendered Family Structures and Adolescent Stress, Loneliness, and Sadness in South Korea
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Development and Internal Validation of Risk Assessment Models for Chronic Obstructive Pulmonary Disease in Coal Workers

School of Public Health, North China University of Science and Technology, No. 21 Bohai Avenue, Caofeidian New Town, Tangshan 063210, China
*
Authors to whom correspondence should be addressed.
Int. J. Environ. Res. Public Health 2023, 20(4), 3655; https://doi.org/10.3390/ijerph20043655
Submission received: 25 January 2023 / Revised: 6 February 2023 / Accepted: 16 February 2023 / Published: 18 February 2023

Abstract

:
Coal workers are more likely to develop chronic obstructive pulmonary disease due to exposure to occupational hazards such as dust. In this study, a risk scoring system is constructed according to the optimal model to provide feasible suggestions for the prevention of chronic obstructive pulmonary disease in coal workers. Using 3955 coal workers who participated in occupational health check-ups at Gequan mine and Dongpang mine of Hebei Jizhong Energy from July 2018 to August 2018 as the study subjects, random forest, logistic regression, and convolutional neural network models are established, and model performance is evaluated to select the optimal model, and finally a risk scoring system is constructed according to the optimal model to achieve model visualization. The training set results show that the logistic, random forest, and CNN models have sensitivities of 78.55%, 86.89%, and 77.18%; specificities of 85.23%, 92.32%, and 87.61%; accuracies of 81.21%, 85.40%, and 83.02%; Brier scores of 0.14, 0.10, and 0.14; and AUCs of 0.76, 0.88, and 0.78, respectively, and similar results are obtained for the test set and validation set, with the random forest model outperforming the other two models. The risk scoring system constructed according to the importance ranking of random forest predictor variables has an AUC of 0.842; the evaluation results of the risk scoring system shows that its accuracy rate is 83.7% and the AUC is 0.827, and the established risk scoring system has good discriminatory ability. The random forest model outperforms the CNN and logistic regression models. The chronic obstructive pulmonary disease risk scoring system constructed based on the random forest model has good discriminatory power.

1. Introduction

Chronic obstructive pulmonary disease (COPD) is a common preventable respiratory disease characterized by persistent airflow limitation, which is associated with an increased chronic inflammatory response of the airways and lungs to toxic particles or gases. COPD has a high prevalence and mortality, and it is the third leading cause of death worldwide; the global prevalence of COPD in 2019 was 13.1%, with prevalence rates ranging from 11.6% to 13.9% in different regions of the world [1]. COPD not only affects lung function but also has extrapulmonary effects that affect the whole body, with common comorbidities including cardiovascular disease, lung cancer, osteoporosis, anxiety, and depression [2]. COPD has serious health hazards for individuals and there is no effective way to slow down the progression of the disease in the present. Once the condition of COPD patients deteriorates, not only will their lung function level decrease, but also increase the mortality rate and disability rate [3]. Smoking, air pollution, biomass fuels, and occupational dust exposure are considered to be important risk factors for COPD. Due to the particularity of the working environment, coal workers are often exposed to dust, chemical substances, and other occupational harmful factors, which increase the risk of COPD [4]. At present, the research on COPD is mainly based on the general population to understand the pathogenesis and influencing factors of COPD, and there are few studies on coal workers.
Risk assessment models based on machine learning algorithms for related diseases have been widely used in the medical field [5,6]. Commonly used machine learning algorithms mainly include logistic regression, random forest, Xgboost, and convolutional neural network, and each algorithm has its own advantages and disadvantages. As an ensemble algorithm composed of multiple decision trees, random forest can be better applied to large datasets and has better prediction performance than a single estimator. The logistic regression model is a simple and highly interpretable model, but it cannot handle the complex relationship between the independent variables and the dependent variables, and it is easy to underfit and the accuracy is not high. Compared with general neural networks, convolutional neural networks can effectively reduce the complexity of the model by using weight sharing and sparse connection, and CNN (convolutional neural networks) is widely used in medical image recognition [7,8].
At present, risk assessment models for COPD mainly assess the risk of hospitalization of COPD patients due to deterioration of the condition [9], and there are few models that assess the risk of COPD in occupational populations. Therefore, in order to protect the lung health of coal workers, we urgently need to establish a COPD risk assessment model suitable for coal workers, and establish a risk scoring system according to the optimal model.

2. Materials and Methods

2.1. Research Object

This study relies on China’s key Research and Development program “Cohort Study on Health Effects of Occupational Groups in Beijing-Tianjin-Hebei Region”, and 3955 coal workers who participated in occupational health examinations in Gequan Mine and Dongpang Mine in Hebei province from July 2018 to August 2018 are the research objects.
Inclusion criteria: 18~60 years old, ≥1 year of service. Exclusion criteria: those who could not measure lung function, i.e.,: those who had undergone chest, abdominal, or eye surgery in the past 3 months, those who were pregnant or breastfeeding, and those who had been hospitalized for heart disease in the past 1 month; those who had missing information from the questionnaire.
The study was conducted in accordance with the Declaration of Helsinki, verified and approved by the Ethics Committee of the North China University of Technology (15006), and all study subjects voluntarily participated in this investigation and signed an informed consent form.

2.2. Data Collection

Personal information was obtained through questionnaires, which are administered to workers by professional staff in a one-to-one manner. The content of the questionnaire mainly includes the following sections: (1) demographic information: age, gender, ethnicity, marital status, education level, economic income, etc.; (2) behavioral lifestyle: smoking, drinking status, dietary conditions, physical activity, sleep quality; (3) personal history of diseases: hypertension, diabetes, tumors; (4) work status: nature of employment, length of service, type of work, shift situation.

2.3. Physical Examination

(1)
Height and weight: measurements were obtained with the Dekang DK-08-C height and weight meter, for which the subjects should remove shoes, hats, watches, and other items that affect the test results and the measurements should be obtained in the correct position according to the instructions of the relevant personnel.
(2)
Pulmonary function test: pulmonary function measurements were obtained as instructed by the staff, where the subject should sit quietly, sit with the upper body straight, keep the head horizontal, clip on the nose clip, and put on the mouthpiece according to the instructions of the professional staff before the test, while ensuring that the tongue cannot block the mouthpiece or leak air.

2.4. Definition of Ending

The pulmonary function test was performed by professionals using a portable spirometer (China CHEST) to measure mainly the first and second expiratory volume with force (FEV1), force spirometry (FVC), and according to the 2017 Global Initiative for Chronic Obstructive Lung Disease (GOLD) guidelines [9], FEV1/FVC < 70% is diagnosed as COPD.

2.5. Variable Definitions

2.5.1. Body Mass Index(BMI)

BMI = weight (kg)/height2(m2), BMI < 18.5 kg/m2 is defined as underweight, 18.5 kg/m2 ≤ BMI < 24 kg/m2 is defined as normal weight, 24 kg/m2 ≤ BMI < 28 kg/m2 is defined as overweight, and BMI ≥ 28 kg/m2 is defined as obese.

2.5.2. Smoking Index (SI)

Smoking index = daily smoking index × number of years of smoking, grouped as 0, 1, 100~, and 200~.

2.5.3. Drinking Status

In this study, drinking status was categorized as never drinking, formerly abstained from drinking, and current drinking.

2.5.4. Physical Exercise

Exercise was determined by exercising more than 3 times a week and for more than half an hour each time.

2.5.5. Physical Activity

In this study, the International Physical Activity Questionnaire (IPAQ) was used to investigate the physical activity of coal workers [10]. Physical activity was classified as “low”, “medium”, and “high” according to intensity, frequency, and overall weekly physical activity level. The overall weekly physical activity level < 600 MET-min/w is considered low, the overall weekly physical activity level 600 to MET-min/w is considered medium, and the overall weekly physical activity level 3000~ MET-min/w is considered high.

2.5.6. Sleep Quality

The Athens Insomnia Scale (AIS) was applied to assess the sleep quality of coal workers [11], with scores <4 being accessibility, with scores 4–6 being suspected insomnia and scores >6 being insomnia.

2.5.7. Cumulative Dust Exposure (CDE)

The criteria for determining dust exposure in this study are based on the “Determination of Dust in Workplace Air Part 1: Total Dust Concentration”, and the cumulative individual dust exposure is calculated based on the total dust concentration in the workplace measured by a qualified testing company and the actual results of daily testing [12].
C D E = C 1 T 1 + C 2 T 2 + C 3 T 3 + . . . + C n T n
Cn is the annual geometric mean concentration in mg/m3 for a job performed by a coal worker; Tn is the duration of dust pick-up in years for a job performed by a worker. The specific grouping is as follows: <50, 50~, and 100~.

2.5.8. Shift Situation

A system of working hours in which the production process requires 24 h of continuous work, guaranteed by one or several teams working in shifts determines the shift situation. This study classifies shift work situations into the following three situations, never shifted, ever shifted, and now shifted [13].

2.5.9. Ventilation and Dust Removal Measures

The evaluation of ventilation and dust removal measures were combined with the evaluation results of the inspection company and the evaluation of the operation of the facility in the daily work of coal workers. The specific grouping is as follows: difference, ordinary, and good.

2.6. Statistical Methods

The counts were expressed as rates, and the chi-square test was used for comparison between groups; unconditional logistic regression was used for multi-factor analysis. Through a large number of literature review and collection of relevant data, univariate analysis of relevant factors was carried out, and the variables meaningful for univariate analysis were further incorporated into multivariate analysis, and the influencing factors of COPD of coal workers were finally determined. The statistical tests were all two-sided, and the test level was α = 0.05. All of this was carried out in the SPSS 22.0 statistical software (IBM, Armonk, NY, USA).

2.7. Model Establishment

In this study, sklearn.model_selection.train_test_split was used to divide the dataset into training set, test set, and validation set according to 7:2:1 (Supplementary Material S1). The screening of model predictors was carried out through univariate analysis, multivariate analysis, and literature review to construct a risk assessment model.
Logistic regression model is a classification algorithm that uses a sigmoid function for classification and is implemented in this study using the Sklearn. Logistic Regression module (Supplementary Material S2).
The convolutional neural network model consists of convolutional layer, pooling layer, activation layer, and finally a fully connected layer for the classification output. In this study, the CNNs were constructed using keras, the activation function is Relu, the loss function is binary_crossentropy and the optimizer is rmsprop (Supplementary Material S3).
Random forest model is essentially a collection of multiple decision trees and is an ensemble learning method. The random forest model is built using the Random Forest Classifier module in sklearn, and the parameters are tuned by the learning curve and the grid search method RandomizdSearchCV. In this model, the following parameters were adjusted, including the tree tree n_estimators estimators, the maximum depth of the tree max_depth, the number of randomly selected features max_festures, and the minimum number of samples min_samples_split, in order to ensure a good learning ability and generalization ability to avoid overfitting (Supplementary Material S4).
All models are built in Python 3.10.

2.8. Model Evaluation

The performance of the model was evaluated in terms of both discrimination and calibration.
Discrimination is a measure of a model’s ability to distinguish between patients and non-patients and, and commonly evaluated metrics include sensitivity, specificity, accuracy, ROC curve, and its area under the curve AUC.
Sensitivity = T P T P + F N
Accuracy = T P + T N T P + F P + T N + F N
Specifity = T N T N + F P
Calibration is a measure of the accuracy of a model in assessing the future occurrence of an outcome event for an individual, and commonly used measures are Brier score and calibration curve. Calibration curve is an important method to evaluate the calibration of a model, it can visually measure the consistency between the predicted probability and the true probability of the model; the closer the curve is to the diagonal line means the better the calibration of the model.

2.9. Establishment of a Risk Scoring System

The optimal model was derived from the development and evaluation of a COPD risk assessment model for coal workers, on the basis of which a risk scoring system was established.

2.9.1. Risk Scoring System

A risk scoring system was constructed using an assignment method based on the importance ranking of the optimal model predictor variables, which involve being assigned in the following manner.
The hazard score corresponding to each independent variable S n is the relative importance of the respective variable I n divided by the smallest relative importance I m , i.e.,
S n I n I m
Total hazard fraction Sc is the sum of the individual hazard scores, i.e.,
S c = S 1 + S 2 + S 3 + . . . + S n
Combining the results of the single-factor and multi-factor analyses, the risk factors are set to a maximum value and the protective factors are set to a minimum value of 0. The risk scores for each factor is displayed in the results section of the risk scoring system.
When the variable is dichotomous, it is assigned to 0, Sn;
When the variable is a trivial variable, it is assigned to 0, Sn/2, Sn;
When the variable is a four-category variable, it is assigned as 0, Sn/3, 2Sn/3, Sn;

2.9.2. Mapping the ROC Curve of a COPD Risk Scoring System for Coal Workers

A risk scoring system was constructed by randomly selecting 70% of the participants, and an ROC curve is drawn according to their score and whether they have COPD.

2.9.3. Setting up Hazard Stratification

According to the ROC curve of the COPD risk scoring system, the maximum M of the Jordon index was found on the ROC curve, and the study subjects were divided into two levels: low-risk population (Sc < M) and high-risk population (Sc ≥ M).

2.9.4. Performance Evaluation of Risk Scoring Systems

  • The remaining 30% of workers, classified according to the above classification criteria, were used to calculate the accuracy rate of the risk scoring system.
  • The area under the ROC curve was used to determine the diagnostic value of the risk scoring system.
The area under the ROC curve ≤ 0.5 indicates that the risk scoring system has no diagnostic value. The area under the ROC curve 0.5~0.7 indicates that the risk scoring system has diagnostic value. The area under the ROC curve 0.7~0.8 indicates that the risk scoring system has good diagnostic value. The area under the ROC curve > 0.8 indicates that the diagnostic value of the risk scoring system is sufficient, and the sensitivity and specificity of the risk scoring system are high, which can better identify for disease.

2.10. Quality Control

Pre-survey training was provided to investigators and information entry for the questionnaire was carried out in pairs to ensure the accuracy of the data. When performing pulmonary function measurement, staff should instruct participants to perform measurements in accordance with standard movements to ensure the quality of pulmonary function test and increase the accuracy and reliability of outcome diagnosis. Factor analysis and review of the literature ensured that factors associated with outcomes were included in the model and that appropriate statistical analysis methods were used.

3. Results

3.1. Analysis of General Demographic Characteristics

This study includes 3955 study participants, of which 918 coal workers have COPD, with a prevalence rate of 23.2%. A univariate analysis of the relationship between general demographic characteristics of coal workers and COPD shows that age, gender, education, household income, and BMI are all associated with COPD, with statistically significant differences (p < 0.05), as detailed in Table 1

3.2. Analysis of the Health Status of Coal Workers

The univariate analysis of the relationship between the health status of coal workers and COPD shows that the personal history of respiratory diseases is associated with COPD, and the difference is statistically significant (p < 0.05), as detailed in Table 2

3.3. Lifestyle Analysis of Coal Worker Behavior

Through the univariate analysis of the relationship between coal workers’ behavior and lifestyle and COPD, the result shows that smoking index, physical exercise, vegetable intake, and fruit intake are all related to COPD, and the differences are statistically significant (p < 0.05), as detailed in Table 3.

3.4. Analysis of Occupational Harmful Factors of Coal Workers

A univariate analysis of the relationship between occupational factors and COPD in coal workers showed that seniority, cumulative dust exposure, ventilation, and dust removal measures, mask usage and chemical poison exposure are all associated with COPD, with statistically significant differences (p < 0.05); see Table 4 for details.

3.5. Multivariate Analysis of Influencing Factors of COPD among Coal Workers

The meaningful influencing factors of univariate analysis were used as input variables to perform unconditional logistic regression analysis for coal workers’ COPD, and the assignment method is shown (Table 5). Multicollinearity diagnosis of independent variables requiring inclusion in multivariate analysis shows (Table 6) that variance inflation factors (VIF) are greater than 0 and less than 10, and a tolerance greater than 0.1 for all variables. The result shows (Table 7) that age 30 and above, male, history of respiratory diseases, smoking index 1 and above, cumulative dust exposure 50 and above, working experience of 10 years and above, and exposure to chemical poisons are risk factors for COPD in coal workers (all p < 0.05), and with a bachelor’s degree (junior college) or above, physical exercise, and from3–4 days/week to the daily use of masks along with generally good ventilation and dust removal measures are protective factors for the occurrence of COPD in coal workers (all p < 0.05).

3.6. Model Results

According to the result of the multi-factor analysis and literature review, a risk assessment model was constructed by including age, gender, education level, personal history of respiratory diseases, smoking index, physical exercise, seniority, mask usage, ventilation and dust removal measures, cumulative dust exposure, and chemical poison exposure.
In the training set (Table 8), the sensitivity, specificity, accuracy, and AUC of random forest are 86.89%, 92.32%, 85.40%, and 0.88, respectively, which are higher than those of the CNN and logistic models. The Brier score and Log loss of random forest are 0.10 and 0.35, respectively, which are lower than those of the CNN and logistic models, and the random forest model has the best performance.
In the test set (Table 8), the sensitivity, specificity, accuracy, and AUC of random forest are 81.86%, 87.06%, 85.10%, and 0.82, respectively, which are higher than those of the CNN and logistic models. The Brier score and Log loss of random forest are 0.13 and 0.41, respectively, which are lower than those of the CNN and logistic models, and the random forest model has the best performance.
In the validation set (Table 8), the sensitivity, specificity, accuracy, and AUC of random forest are 82.93%, 84.30%, 83.11%, and 0.78, respectively, which are higher than those of the CNN and logistic models. The Brier score and Log loss of random forest are 0.11 and 0.37, respectively, which are lower than those of the CNN and logistic models, and the random forest model has the best performance.
The calibration curve of the random forest (Figure 1a–c) is closer to the diagonal line, indicating that the model’s predicted value is closer to the true value. The ROC curve (Figure 2a–c) shows that the random forest model outperforms the other two models in all three sets.
In summary, the random forest model outperforms the CNN and logistic models in the risk assessment of COPD in coal workers.
The optimal model is the random forest model and the variables are ranked in importance according to the optimal model. The result is shown in Figure 3, where chemical poison exposure, cumulative dust exposure, mask usage, and smoking index are the important predictor variables for the random forest model.

3.7. Risk Scoring System

Based on the model evaluation, the optimal model is the random forest model, on which the risk scoring system is constructed. The risk scoring system was constructed using the assignment method according to the importance of the predictor variables (Figure 3), and the assignment method is shown in Table 9.
A risk scoring system was constructed for a random sample of 70% of the study subjects and ROC curves were plotted according to their scores and whether they have COPD, the results of which are shown in Figure 4, with an AUC of 0.842. Risk stratification was set: a risk score of 23.05 has the highest Jorden index; therefore, a risk score < 23.05 is defined as low risk and a risk score ≥ 23.05 as high risk.
The remaining 30% of the study subjects was used to evaluate the performance of the risk scoring system. The study population was assigned a risk score according to Table 8 and classified according to the classification criteria. The result shows (Table 10) that 774 people in the low-risk group are normal and 52 have COPD, and 141 of the high-risk population are normal and 220 have COPD. The accuracy of the risk scoring system is 83.7%, and the AUC of the ROC curve is 0.827 (Figure 5), indicating that the established risk scoring system has good discriminating ability.

4. Discussion

Coal meets 27% of the world’s energy needs, supplies 40% of the world’s electricity, and is an important pillar of China’s industry [14]. A large number of coal workers are exposed to dust, noise, vibration, and high heat, which can lead to occupational diseases such as pneumoconiosis, noise deafness, vibration sickness, and various chronic diseases [15,16]. Our study is dedicated to the physical health of coal workers and we have constructed a risk assessment model and a risk scoring system suitable for COPD in coal workers.
A total of 3955 coal workers were included in the study, with a COPD prevalence rate of 23.2%, which is higher than that of the general population [17]. Older age was a risk factor for COPD in this study, with an OR of 1.770 (1.063–2.948), which is consistent with the result of related study [18]. This may be related to lung ageing, reduced lung function, and reduced immunity of the lungs to environmental injury [19]. The study found that being male is a risk factor for COPD, with an OR of 3.965 (2.172–7.247). The higher risk of disease in males may be due to the fact that male coal workers are more likely to smoke, but there is also a study that suggests the risk of COPD in females is increasing [20]. This may be related to women’s greater exposure to biomass fuels, higher sensitivity to cigarette smoke, and a faster decline in FEV1 in female smokers [21,22]. This study focuses on coal workers, who are far more male than female, so there may be some bias in the investigation of the effect of gender on COPD. Personal history of respiratory disease is a risk factor for COPD in this study, and it mainly refers to a history of tuberculosis and asthma. Asthma is an important cause of the acceleration of FEV1 reduction [23]. Tuberculosis is an important cause of airflow obstruction and respiratory symptoms [24,25]. The result of this study, which quantifies smoking in coal workers using a smoking index, suggests that smoking is a risk factor for COPD, which has been considered a major risk factor for COPD in many previous studies [26,27]. This may be due to the fact that cigarette smoke stimulates the release of inflammatory cytokines from respiratory cells, leading to respiratory damage [28,29]. Dust is an important occupational factor for coal workers, and this study quantifies the dust exposure of coal workers by using cumulative dust exposure. The OR values of cumulative dust exposure exceeding 50 mg/m3 and 100 mg/m3 per year are 1.382 (1.039–1.837) and 2.228 (1.638–3.029), respectively, and the increase in cumulative dust exposure will lead to an increased risk of COPD among coal workers. The possible reason for this is that coal dust can inactivate α-1 antitrypsin and produce reactive oxygen species, that α-1 antitrypsin inactivation increases the risk of COPD, and that reactive oxygen species may lead to emphysema in miners [30]. Seniority refers to the number of years of exposure to dust, and in this study 10 years or more of service can lead to an increased risk of COPD among coal workers. Exposure to chemical poison is also an occupational hazard for coal workers, that mainly refers to inhalation of irritant gases and fumes. Chemical poison exposure usually activates alveolar macrophages and leukocytes, leading to the release of reactive oxygen species, which leads to inflammatory changes in the airways and increases the risk of COPD [31]. Masks and ventilation and dust removal measures are important dust prevention measures for coal workers, and in this study, they are protective factors that can reduce the risk of COPD among workers [32]. These protective measures are important in a high-risk environment such as coal mines to achieve primary prevention of occupational diseases. Physical exercise is a protective factor in this study, and those who carry out physical exercise have a lower risk of COPD, suggesting that increasing physical exercise among coal workers can reduce the decline in FEV1 [33]. Previous studies have found that physical activity is the strongest predictor of all-cause mortality in COPD patients [34]. It is also an important measure of pulmonary rehabilitation in COPD patients [35]. The level of education above a bachelor’s degree is a protective factor for COPD, which may be associated with good lifestyle habits and minimal dust exposure in those with high levels of education [36].
In this study, the dataset was divided into three sets: training set, test set, and verification set; and three models of logistic, random forest, and convolutional neural network were established to evaluate the risk of COPD in coal workers. The performance of the models was evaluated from the aspects of discrimination and calibration. The results show that the random forest model has the best performance, with a sensitivity of 81.86% (test set) and a specificity of 87.06% (test set), which is more suitable for the risk assessment of COPD in coal workers. The random forest model is an improvement on the decision tree model that is widely used in the medical field and outperforms other models in some studies [37,38]. In this study, the CNN is better than the logistic model but not as good as the random forest model. In one study, Sandeep Bodduluri uses a machine learning algorithm to distinguish between the structural phenotypes of slow-onset lungs, in which the AUC of CNN and random forest models are 0.80 and 0.78, respectively, and CNN performs better [39]. CNN performs differently in different studies, which may be related to the type of data. CNN achieves better results in the recognition of images, and the application effect in other areas varied depending on the data. The logistic model performs the worst in this study, indicating that the model’s predicted values deviate significantly from the actual values and is not suitable for the risk assessment of COPD in coal workers. The importance ranking of the predictors of the random forest model indicates that chemical poison exposure, cumulative dust exposure, mask usage, smoking index, and ventilation and dust removal measures are important predictors, and the importance ranking of predictors indicates measures that coal workers can employ to achieve higher health benefits. This study constructs a risk scoring system for COPD based on the importance ranking of the optimal model random forest predictor variables and evaluates the risk scoring system with an accuracy of 83.7% and an AUC of 0.827, indicating that the scoring system has good discriminatory ability. The establishment of the risk scoring system explores the application value of the model, which can calculate the individual risk score according to their health data, evaluate the risk of COPD of individual occurrences, and provide a reference basis for the health management of coal workers.
There are some limitations of this study. First, biomass fuels and air pollution are also important influence factors of COPD [40]. However, due to the design of the questionnaire and the collection of the samples, we lack data on this component, so we are unable to include these two variables in the study. In addition, this is a cross-sectional study and therefore inferior to prospective cohort studies in verifying causality. Due to data collection limitations, we did not include coal workers over 60 years of age, which may have led to selective bias. Follow-up studies can survey retired workers to assess the effect of age on coal workers’ COPD. In this study, we did not stage COPD, taking into account the use of the model and the distribution of pulmonary function test data. If COPD is not staged, it may be difficult to extrapolate the model because of differences in the distribution of data. The contribution of our study is mainly to provide a risk assessment model for COPD in coal workers and to construct a risk scoring system based on the risk assessment model. As pulmonary function testing is low among coal workers in daily life, our risk scoring system can be used to assess the risk of COPD among coal workers without pulmonary function testing using health check-up data, and to make targeted recommendations based on the individual’s relevant circumstances, thereby protecting the health of coal workers. The innovation of this paper lies in the fact that, firstly, our research is based on the data obtained from field surveys to explore the relevant influencing factors of the disease, then we used the obtained data to build a risk assessment model suitable for the research object, and finally we realized the model visualization by building a risk scoring system, which increases the applicability of the model.
According to the conclusions of our COPD study of coal workers, the following measures can effectively reduce the occupational hazards of coal dust for coal workers. Ventilation and dust removal measures are important protective measures, so water injection into coal seams, the adoption of new dust prevention technologies, and ensuring the good functioning of ventilation systems in the workplace can help reduce coal workers’ dust exposure. Carrying out health education for coal workers, strengthening workers’ awareness of dust prevention and the use of masks, and encouraging workers to develop healthy lifestyle habits, such as quitting smoking and exercising, are all important measures.

5. Conclusions

In this study, the analysis of the relevant data of coal workers shows that an age 30 years old and above, male, personal history of respiratory diseases, smoking index 1 and above, cumulative dust exposure 50 mg/m3 and above, seniority ≥ 10 years, and exposure to chemical poison are risk factors for COPD in coal workers (all p < 0.05). A bachelor’s degree (junior college) and above, physical exercise, at least 3–4 days/week use of masks, and good ventilation and dust removal measures are protective factors for COPD among coal workers.
The random forest model is better than the CNN and logistic models in assessing COPD risk in coal workers. The COPD risk scoring system was constructed based on the random forest model that has better discriminatory ability.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/ijerph20043655/s1.

Author Contributions

Data collection, H.W. (Hui Wang), R.M., X.W. and Z.S.; software, H.W. (Hui Wang), Z.Z. (Zekun Zhao) and H.L.; writing original draft, H.W. (Hui Wang); validation, H.W.(Huan Wang), J.H., Y.Z., J.C., Z.Z. (Ziwei Zheng), Y.C. and Y.Y.; writing review, H.W. (Hui Wang), L.X., X.L., J.S. and J.W. All authors responded to the modification of the study protocol and approved the final manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Youth Talent Promotion Program of School of Public Health, North China University of Science and Technology (Fund number: 2023002).

Institutional Review Board Statement

The study was reviewed and approved by the Ethics Committee of North China University of Technology (approval number: 15006).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Because the data are being researched and are private, the data have not been made public during the study, but can be obtained from the corresponding author upon reasonable request.

Acknowledgments

We would like to thank North China University of Science and Technology for providing the platform. I would also like to thank teachers for their guidance in the research, and thank my classmates and friends for their help.

Conflicts of Interest

The authors declare that they have no competing interest.

References

  1. Murray, C.J.; Lopez, A.D. Alternative projections of mortality and disability by cause 1990-2020: Global Burden of Disease Study. Lancet 1997, 349, 1498–1504. [Google Scholar] [CrossRef] [PubMed]
  2. Barnes, P.J.; Celli, B.R. Systemic manifestations and comorbidities of COPD. Eur. Respir. J. 2009, 33, 1165–1185. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  3. Rothnie, K.J.; Mullerova, H.; Smeeth, L.; Quint, J. Natural History of Chronic Obstructive Pulmonary Disease Exacerbations in a General Practice-based Population with Chronic Obstructive Pulmonary Disease. Am. J. Respir. Crit. Care. Med. 2018, 198, 464–471. [Google Scholar] [CrossRef]
  4. Go, L.; Krefft, S.; Cohen, R.; Rose, C. Lung disease and coal mining: What pulmonologists need to know. Curr. Opin. Pulm. Med. 2016, 22, 170–178. [Google Scholar] [CrossRef]
  5. Zhang, X.; Tang, F.; Ji, J.; Han, W.; Lu, P. Risk Prediction of Dyslipidemia for Chinese Han Adults Using Random Forest Survival Model. Clin. Epidemiol. 2019, 11, 1047–1055. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  6. Deng, F.; Peng, M.; Li, J.; Chen, Y.; Zhang, B.; Zhao, S. Nomogram to predict the risk of septic acute kidney injury in the first 24 h of admission: An analysis of intensive care unit data. Ren. Fail 2020, 42, 428–436. [Google Scholar] [CrossRef]
  7. Parde, C.J.; Hu, Y.; Castillo, C.; Sankaranarayanan, S.; O’Toole, A.J. Social Trait Information in Deep Convolutional Neural Networks Trained for Face Identification. Cogn. Sci. 2019, 43, e12729. [Google Scholar] [CrossRef]
  8. Vives-Boix, V.; Ruiz-Fernandez, D. Diabetic retinopathy detection through convolutional neural networks with synaptic metaplasticity. Comput. Methods Programs Biomed. 2021, 206, 106094. [Google Scholar] [CrossRef]
  9. Vogelmeier, C.; Criner, G.; Martinez, F.; Anzueto, A.; Barnes, P.; Bourbeau, J.; Celli, B.; Chen, R.; Decramer, M.; Fabbri, L.; et al. Global Strategy for the Diagnosis, Management, and Prevention of Chronic Obstructive Lung Disease 2017 Report. GOLD Executive Summary. Am. J. Respir. Crit. Care Med. 2017, 195, 557–582. [Google Scholar] [CrossRef] [Green Version]
  10. Lou, X.; He, Q. Validity and Reliability of the International Physical Activity Questionnaire in Chinese Hemodialysis Patients: A Multicenter Study in China. Med. Sci. Monit. 2019, 25, 9402–9408. [Google Scholar] [CrossRef]
  11. Lin, C.-Y.; Cheng, A.; Nejati, B.; Imani, V.; Ulander, M.; Browall, M.; Griffiths, M.; Broström, A.; Pakpour, A. A thorough psychometric comparison between Athens Insomnia Scale and Insomnia Severity Index among patients with advanced cancer. J. Sleep. Res. 2020, 29, e12891. [Google Scholar] [CrossRef] [Green Version]
  12. Qian, Q.-Z.; Cao, X.-K.; Qian, Q.-Q.; Shen, F.-H.; Wang, Q.; Liu, H.-Y.; Tong, J.-W. Relationship of cumulative dust exposure dose and cumulative abnormal rate of pulmonary function in coal mixture workers. Kaohsiung J. Med. Sci. 2016, 32, 44–49. [Google Scholar] [CrossRef] [Green Version]
  13. Hansen, A.; Stayner, L.; Hansen, J.; Andersen, Z. Night shift work and incidence of diabetes in the Danish Nurse Cohort. Occup. Environ. Med. 2016, 73, 262–268. [Google Scholar] [CrossRef] [PubMed]
  14. Santo, T.L. Emphysema and chronic obstructive pulmonary disease in coal miners. Curr. Opin. Pulm. Med. 2011, 17, 123–125. [Google Scholar] [CrossRef]
  15. Shen, F.; Yuan, J.; Sun, Z.; Hua, Z.; Qin, T.; Yao, S.; Fan, X.; Chen, W.; Liu, H.; Chen, J. Risk identification and prediction of coal workers’ pneumoconiosis in Kailuan Colliery Group in China: A historical cohort study. PLoS ONE 2013, 8, e82181. [Google Scholar] [CrossRef] [PubMed]
  16. Petsonk, E.L.; Rose, C.; Cohen, R. Coal mine dust lung disease. New lessons from old exposure. Am. J. Respir. Crit. Care Med. 2013, 187, 1178–1185. [Google Scholar] [CrossRef] [PubMed]
  17. Zhu, B.; Wang, Y.; Ming, J.; Chen, W.; Zhang, L. Disease burden of COPD in China: A systematic review. Int. J. Chron. Obstruct. Pulmon. Dis. 2018, 13, 1353–1364. [Google Scholar] [CrossRef] [Green Version]
  18. Wang, C.; Xu, J.; Yang, L.; Xu, Y.; Zhang, X.; Bai, C.; Kang, J.; Ran, P.; Shen, H.; Wen, F.; et al. Prevalence and risk factors of chronic obstructive pulmonary disease in China (the China Pulmonary Health [CPH] study): A national cross-sectional study. Lancet 2018, 391, 1706–1717. [Google Scholar] [CrossRef]
  19. Mercado, N.; Ito, K.; Barnes, P.J. Accelerated ageing of the lung in COPD: New concepts. Thorax 2015, 70, 482–489. [Google Scholar] [CrossRef] [Green Version]
  20. Gut-Gobert, C.; Cavaillès, A.; Dixmier, A.; Guillot, S.; Jouneau, S.; Leroyer, C.; Marchand-Adam, S.; Marquette, D.; Meurice, J.-C.; Desvigne, N.; et al. Women and COPD: Do we need more evidence? Eur. Respir. Rev. 2019, 28, 180055. [Google Scholar] [CrossRef] [Green Version]
  21. Miller, M.R.; Jordan, R.E.; Adab, P. Gender differences in COPD: Are women more susceptible to smoking effects than men? Thorax 2011, 66, 921–922. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  22. Gan, W.Q.; Man, P.; Postma, D.; Camp, P.; Sin, D. Female smokers beyond the perimenopausal period are at increased risk of chronic obstructive pulmonary disease: A systematic review and meta-analysis. Respir. Res. 2006, 7, 52. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  23. Silva, G.; Sherrill, D.; Guerra, S.; Barbee, R. Asthma as a risk factor for COPD in a longitudinal study. Chest 2004, 126, 59–65. [Google Scholar] [CrossRef] [PubMed]
  24. Xing, Z.; Sun, T.; Janssens, J.-P.; Chai, D.; Liu, W.; Tong, Y.; Wang, Y.; Ma, Y.; Pan, M.; Cui, J.; et al. Airflow obstruction and small airway dysfunction following pulmonary tuberculosis: A cross-sectional survey. Thorax 2022, 78, 274–280. [Google Scholar] [CrossRef]
  25. Willcox, P.A.; Ferguson, A.D. Chronic obstructive airways disease following treated pulmonary tuberculosis. Respir. Med. 1989, 83, 195–198. [Google Scholar] [CrossRef]
  26. Wheaton, A.; Liu, Y.; Croft, J.; VanFrank, B.; Croxton, T.; Punturieri, A.; Postow, L.; Greenlund, K. Chronic Obstructive Pulmonary Disease and Smoking Status—United States, 2017. MMWR Morb. Mortal. Wkly. Rep. 2019, 68, 533–538. [Google Scholar] [CrossRef] [Green Version]
  27. Alam, D.; Chowdhury, M.A.; Siddiquee, A.; Ahmed, S.; Clemens, J. Prevalence and Determinants of Chronic Obstructive Pulmonary Disease (COPD) in Bangladesh. COPD 2015, 12, 658–667. [Google Scholar]
  28. Moon, H.-G.; Zheng, Y.; An, C.H.; Kim, Y.-K.; Jin, Y. CCN1 secretion induced by cigarette smoking extracts augments IL-8 release from bronchial epithelial cells. PLoS ONE 2013, 8, e68199. [Google Scholar] [CrossRef]
  29. Rusznak, C.; Mills, P.R.; Devalia, J.L.; Sapsford, R.J.; Davies, R.J.; Lozewicz, S. Effect of cigarette smoke on the permeability and IL-1beta and sICAM-1 release from cultured human bronchial epithelial cells of never-smokers, smokers, and patients with chronic obstructive pulmonary disease. Am. J. Respir. Cell Mol. Biol. 2000, 23, 530–536. [Google Scholar] [CrossRef]
  30. Huang, X.; Laurent, R.; Zalma, R.; Pezerat, H. Inactivation of alpha 1-antitrypsin by aqueous coal solutions: Possible relation to the emphysema of coal workers. Chem. Res. Toxicol. 1993, 6, 452–458. [Google Scholar] [CrossRef]
  31. Weinmann, S.; Vollmer, W.; Breen, V.; Heumann, M.; Hnizdo, E.; Villnave, J.; Doney, B.; Graziani, M.; McBurnie, M.A.; Buist, S. COPD and occupational exposures: A case-control study. J. Occup. Environ. Med. 2008, 50, 561–569. [Google Scholar] [CrossRef] [PubMed]
  32. Weeks, J.L. Occupational health and safety regulation in the coal mining industry: Public health at the workplace. Annu. Rev. Public Health 1991, 12, 195–207. [Google Scholar] [CrossRef] [PubMed]
  33. Hartman, J.; Boezen, M.; de Greef, M.; Bossenbroek, L.; Hacken, N. Consequences of physical inactivity in chronic obstructive pulmonary disease. Expert Rev. Respir. Med. 2010, 4, 735–745. [Google Scholar] [CrossRef]
  34. Waschki, B.; Kirsten, A.; Holz, O.; Müller, K.-C.; Meyer, T.; Watz, H.; Magnussen, H. Physical activity is the strongest predictor of all-cause mortality in patients with COPD: A prospective cohort study. Chest 2011, 140, 331–342. [Google Scholar] [CrossRef]
  35. Spruit, M.; Pitta, F.; McAuley, E.; ZuWallack, R.; Nici, L. Pulmonary Rehabilitation and Physical Activity in Patients with Chronic Obstructive Pulmonary Disease. Am. J. Respir. Crit. Care Med. 2015, 192, 924–933. [Google Scholar] [CrossRef]
  36. Zhang, D.-D.; Liu, J.-N.; Ye, Q.; Chen, Z.; Wu, L.; Peng, X.-Q.; Lu, G.; Zhou, J.-Y.; Tao, R.; Ding, Z.; et al. Association between socioeconomic status and chronic obstructive pulmonary disease in Jiangsu province, China: A population-based study. Chin. Med. J. 2021, 134, 1552–1560. [Google Scholar] [CrossRef] [PubMed]
  37. Wang, J.; Li, C.; Li, J.; Qin, S.; Liu, C.; Wang, J.; Chen, Z.; Wu, J.; Wang, G. Development and internal validation of risk prediction model of metabolic syndrome in oil workers. BMC Public Health 2020, 20, 1828. [Google Scholar] [CrossRef]
  38. Xing, F.; Luo, R.; Liu, M.; Zhou, Z.; Xiang, Z.; Duan, X. A New Random Forest Algorithm-Based Prediction Model of Post-operative Mortality in Geriatric Patients With Hip Fractures. Front. Med. 2022, 9, 829977. [Google Scholar] [CrossRef]
  39. Bodduluri, S.; Nakhmani, A.; Reinhardt, J.; Wilson, C.; McDonald, M.-L.; Rudraraju, R.; Jaeger, B.; Bhakta, N.; Castaldi, P.; Sciurba, F.; et al. Deep neural network analyses of spirometry for structural phenotyping of chronic obstructive pulmonary disease. JCI Insight 2020, 5, e132781. [Google Scholar] [CrossRef]
  40. Salvi, S. Tobacco smoking and environmental risk factors for chronic obstructive pulmonary disease. Clin. Chest Med. 2014, 35, 17–27. [Google Scholar] [CrossRef]
Figure 1. (a) The calibration curve training set. (b) The calibration curve of test set. (c) The calibration curve of validation set.
Figure 1. (a) The calibration curve training set. (b) The calibration curve of test set. (c) The calibration curve of validation set.
Ijerph 20 03655 g001aIjerph 20 03655 g001b
Figure 2. (a) The ROC curve of training set. (b) The ROC curve of test set. (c) The ROC curve of validation set.
Figure 2. (a) The ROC curve of training set. (b) The ROC curve of test set. (c) The ROC curve of validation set.
Ijerph 20 03655 g002
Figure 3. Importance ranking of predictor variables for the random forest model.
Figure 3. Importance ranking of predictor variables for the random forest model.
Ijerph 20 03655 g003
Figure 4. ROC curve created by COPD risk scoring system.
Figure 4. ROC curve created by COPD risk scoring system.
Ijerph 20 03655 g004
Figure 5. ROC curve for COPD risk scoring system validation.
Figure 5. ROC curve for COPD risk scoring system validation.
Ijerph 20 03655 g005
Table 1. Analysis of the relationship between coal workers’ general demographic characteristics and COPD.
Table 1. Analysis of the relationship between coal workers’ general demographic characteristics and COPD.
VariableClassifyNumberCOPDχ2p
NumberProportion (%)
Age<30341308.893.746<0.001
30~193140020.7
40~107928025.9
50~60420834.4
GenderFemale283227.840.7550.044
Male367289624.4
Marital statusUnmarried1492718.15.1390.077
Married374887223.3
Others581932.8
Education Junior high school and below175149528.366.413<0.001
High School/technical secondary school122228022.9
College and above98214314.6
Household income<500076019525.76.7670.034
5000~260060623.3
10,000~59511719.7
BMI (kg/m2)<18.5832631.311.0830.011
18.5~123430725.7
24~172340423.4
28~91518123.2
Table 2. Analysis of the relationship between coal workers’ health status and COPD.
Table 2. Analysis of the relationship between coal workers’ health status and COPD.
VariableClassifyNumberCOPDχ2p
NumberProportion (%)
DiabetesYes3649726.62.6570.103
No359182122.9
HypertensionYes3397020.61.3660.243
No361684823.5
Personal history of respiratory diseaseYes123839231.772.242<0.001
No271752619.4
Table 3. Analysis of the relationship between coal workers’ behavioral lifestyle and COPD.
Table 3. Analysis of the relationship between coal workers’ behavioral lifestyle and COPD.
VariableClassifyNumberCOPDχ2p
NumberProportion (%)
Smoking index0150522214.8112.051<0.001
1~89021924.6
100~72220628.5
200~83827132.3
Drinking statusNever121826221.53.6450.162
Once822328.0
Now265563323.8
Physical exerciseNo266571026.653.949<0.001
Yes129020816.1
Physical activityLow48913026.63.8520.146
Middle3166821.5
High315072022.9
Sleep qualityAccessibility329677123.42.1800.336
Suspicious Insomnia4699820.9
Insomnia1904925.8
Vegetable intakeNever1033130.112.7710.005
Occasionally3238827.2
Often84321825.9
Every day268658121.6
Fruit intakeNever1233528.519.334<0.001
Occasionally84923427.6
Often106925724.0
Every day191439220.5
Meat intakeNever47912225.53.8290.281
Occasionally224551122.8
Often91420222.1
Every day3178326.2
SaltLight73616121.90.9120.634
Moderate193545623.6
Salty128430123.4
Soy productsNever2606123.52.6090.456
Often153937724.5
Occasionally107323622.0
Every day108324422.5
Table 4. Analysis of the relationship between coal workers’ occupational harmful factors and COPD.
Table 4. Analysis of the relationship between coal workers’ occupational harmful factors and COPD.
VariableClassifyNumberCOPDχ2p
NumberProportion
(%)
Shift
situation
Never122828223.00.0670.967
Once1914523.6
Now253659123.3
Seniority (years)<10120817114.2142.547<0.001
10~174840723.3
20~58016728.8
30~41917341.3
Cumulative dust exposure (mg/m3.years)<505467914.5232.826<0.001
50~243343918.0
100~97640041.0
Ventilation and dust removal measuresDifference21410147.2119.244<0.001
Ordinary34112536.7
Good340069220.4
Mask usageNever1768950.6115.800<0.001
1–2 days/weeks56816128.3
3–4 days/weeks2668732.7
Every day294558119.7
Chemical poison exposureNo231132314.0265.997<0.001
Yes164459523.2
Table 5. The variable assignment method for the influencing factor.
Table 5. The variable assignment method for the influencing factor.
Variable NameVariable MeaningAssignment Method
YCOPD0 = no, 1 = yes
X1Age<30 = 1, 30~ = 2, 40~ = 3, 50~ = 4
X2Gender1 = female, 2 = male
X3Education1 = junior high school and below, 2 = high school/technical secondary school, 3 = college and above;
X4Household income< 5000 = 1, 5000~ = 2, 10,000~ = 3
X5Personal history of respiratory disease0 = no, 1 = yes
X6Smoking index0 = 1, 1~ = 2, 100~ = 3, 200~ = 4
X7Vegetable intakeNever = 0, occasionally = 1, often = 2, every day = 3
X8Fruit intakeNever = 0, occasionally = 1, often = 2, every day = 3
X9BMI1 = <18.5, 2 = 18.5~, 3 = 24~, 4 = 28~
X10Physical exercise0 = No, 1 = yes
X11Ventilation and dust removal measuresDifference = 1, ordinary = 2, good = 3
X12Mask usage0 = never, 1 = 1–2 days/weeks, 2 = 3–4 days/weeks, 3 = every day
X13Seniority (years)<10 = 1, 10~ = 1, 20~ = 2, 30~ = 3
X14Cumulative dust exposure<50~ = 1, 50~ = 1100~ = 3
X15Chemical poison exposure0 = no, 1 = yes
Table 6. Multicollinearity diagnosis of independent variables.
Table 6. Multicollinearity diagnosis of independent variables.
VariableToleranceVIF
Age0.5401.850
Gender0.7231.383
Education0.8741.144
Personal history of respiratory disease0.9651.037
Smoking index0.9061.104
Physical exercise0.9181.089
Ventilation and dust removal measures0.6781.475
Mask usage0.7101.409
Seniority (years)0.5541.804
Cumulative dust exposure0.8441.184
Chemical poison exposure0.9011.110
Table 7. Results of multivariate unconditional logistic regression analysis of COPD among coal workers.
Table 7. Results of multivariate unconditional logistic regression analysis of COPD among coal workers.
VariableβSEβWaldχ2pOR (95%CI)
Age
<30----1.00
30~0.5950.2187.4100.0061.812 (1.181–2.781)
40~0.6450.2168.8880.0031.833 (1.146–2.932)
50~0.5010.2354.5530.0331.770 (1.063–2.948)
Gender
Female----1.00
Male1.3780.30720.108p < 0.0013.965 (2.172–7.247)
Education
Junior high school and below1.00
High school/technical secondary school−0.1860.0993.5290.0600.830 (0.684–1.008)
College and above−0.3200.11236.7560.0090.726 (0.570–0.924)
Personal history of respiratory disease
No1.00
Yes0.9190.09299.049p < 0.0012.506 (2.092–3.004)
Household income
<50001.00
5000~0.0190.1110.0300.8631.019 (0.820–1.266)
10,000~0.0550.1560.1250.7241.057 (0.779–1.434)
Smoking index
01.00
1~0.4810.11916.442p < 0.0011.618 (1.282–2.041)
100~0.5420.12419.265p < 0.0011.720 (1.350–2.191)
200~0.3800.11910.1750.0011.462 (1.158–1.847)
Vegetable intake
Never1.00
Occasionally0.3030.3790.6400.4241.354 (0.644–2.845)
Often0.3200.3610.7880.3751.377 (0.679–2.793)
Every day0.5280.3562.2030.1381.696 (0.844–3.408)
Fruit intake
Never1.00
Occasionally−0.1890.2580.5340.4650.828 (0.499–1.373)
Often−0.2220.2570.7430.3890.801 (0.484–1.326)
Every day−0.3120.2521.5350.2150.732 (0.447–1.199)
BMI
<18.51.00
18.5~0.2460.3290.5580.4550.828 (0.499–1.373)
24~0.1410.3260.1860.6660.801 (0.484–1.326)
28~−0.0640.3330.0360.8490.732 (0.447–1.199)
Physical exercise
No1.00
Yes−0.3210.09810.7480.0010.726 (0.599–0.879)
Ventilation and dust removal measures
Difference1.00
Ordinary−1.0410.29212.709p < 0.0010.353 (0.199–0.626)
Good−1.6920.27737.190p < 0.0010.184 (0.107–0.317)
Mask usage
Never1.00
1–2 days/weeks−0.0610.2360.0680.7950.940 (0.592–1.496)
3–4 days/weeks−0.8110.25510.0970.0010.445 (0.270–0.733)
Every day−0.5320.2185.9960.0150.588 (0.384–0.900)
Seniority (years)
<101.00
10~0.3620.1218.9330.0031.437 (1.133–1.822)
20~0.4290.1706.3970.0111.536 (1.101–2.143)
30~0.5970.1959.4080.0021.817 (1.241–2.662)
Cumulative dust exposure (mg/m3.years)
<501.00
50~0.3230.1454.9570.0261.382 (1.039–1.837)
100~0.8010.15726.102p < 0.0012.228 (1.638–3.029)
Chemical poison exposure
No1.00
Yes1.0910.092140.818p < 0.0012.976 (2.486–3.564)
Table 8. Comparison of risk assessment performance of three models.
Table 8. Comparison of risk assessment performance of three models.
Evaluation IndicatorTraining SetTest SetValidation Set
LogisticRandom ForestCNNLogisticRandom forestCNNLogisticRandom ForestCNN
Sensitivity
(%)
78.5586.8977.1866.9481.8675.2662.9082.9374.53
Specificity
(%)
85.2392.3287.6179.3287.0683.2181.4684.3082.19
Accuracy
(%)
81.2185.4083.0284.1085.1082.5580.4083.1185.10
Brier score0.140.100.140.150.130.150.130.110.13
AUC0.760.880.780.740.820.760.750.780.77
Log Loss0.450.350.430.430.410.440.460.370.43
Table 9. Hazard score assignment scale.
Table 9. Hazard score assignment scale.
VariableHazard Score
Age<30 = 0, 30~ = 1.1, 40~ = 2.2, 50~ = 3.5
GenderFemale = 0, male = 1
EducationJunior high school and below = 2.5, high school/technical secondary school = 1.25, college and above = 0
Personal history of respiratory diseaseNo = 0, yes = 4
Smoking Index0 = 0, 1~ = 1.5, 100~ = 3, 200~ = 4.5
Physical exerciseNo = 2, yes = 0
Ventilation and dust removal measuresDifference = 4.5, ordinary = 2.25, good = 0
Mask usageNever = 8, 1–2 days/weeks = 5.3, 3–4 days/weeks = 2.7, every day = 0
Cumulative dust exposure<50~ = 0, 50~ = 4, 100~ = 8
Seniority<10 = 0, 10~ = 1.3, 20~ = 2.6, 30~ = 4
Chemical poison exposureNo = 0, yes = 8
Table 10. Classification results of the COPD risk scoring system.
Table 10. Classification results of the COPD risk scoring system.
Hazard StratificationCOPD [n(%)]Total
YesNo
<23.0552 (19.12)774 (84.59)826
≥23.05220 (80.88)141 (15.41)361
Total2729151187
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, H.; Meng, R.; Wang, X.; Si, Z.; Zhao, Z.; Lu, H.; Wang, H.; Hu, J.; Zheng, Y.; Chen, J.; et al. Development and Internal Validation of Risk Assessment Models for Chronic Obstructive Pulmonary Disease in Coal Workers. Int. J. Environ. Res. Public Health 2023, 20, 3655. https://doi.org/10.3390/ijerph20043655

AMA Style

Wang H, Meng R, Wang X, Si Z, Zhao Z, Lu H, Wang H, Hu J, Zheng Y, Chen J, et al. Development and Internal Validation of Risk Assessment Models for Chronic Obstructive Pulmonary Disease in Coal Workers. International Journal of Environmental Research and Public Health. 2023; 20(4):3655. https://doi.org/10.3390/ijerph20043655

Chicago/Turabian Style

Wang, Hui, Rui Meng, Xuelin Wang, Zhikang Si, Zekun Zhao, Haipeng Lu, Huan Wang, Jiaqi Hu, Yizhan Zheng, Jiaqi Chen, and et al. 2023. "Development and Internal Validation of Risk Assessment Models for Chronic Obstructive Pulmonary Disease in Coal Workers" International Journal of Environmental Research and Public Health 20, no. 4: 3655. https://doi.org/10.3390/ijerph20043655

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop