Next Article in Journal
The Effect of Neoglandin on the Activity of N-Acetyl-β-D-Hexosaminidase in the Serum and Urine of Alcohol-Dependent Men
Previous Article in Journal
Framework for Integrating Productive, Contributory, and Noncontributory Work with Safe and Unsafe Acts and Conditions
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Risk Prediction for the Development of Hyperuricemia: Model Development Using an Occupational Health Examination Dataset

1
Key Laboratory of Coal Mine Health and Safety of Hebei Province, School of Public Health, North China University of Science and Technology, Tangshan 063210, China
2
College of Science, North China University of Science and Technology, Tangshan 063210, China
3
School of Public Health, North China University of Science and Technology, Tangshan 063210, China
*
Authors to whom correspondence should be addressed.
Int. J. Environ. Res. Public Health 2023, 20(4), 3411; https://doi.org/10.3390/ijerph20043411
Submission received: 27 December 2022 / Revised: 13 February 2023 / Accepted: 13 February 2023 / Published: 15 February 2023

Abstract

:
OBJECTIVE: Hyperuricemia has become the second most common metabolic disease in China after diabetes, and the disease burden is not optimistic. METHODS: We used the method of retrospective cohort studies, a baseline survey completed from January to September 2017, and a follow-up survey completed from March to September 2019. A group of 2992 steelworkers was used as the study population. Three models of Logistic regression, CNN, and XG Boost were established to predict HUA incidence in steelworkers, respectively. The predictive effects of the three models were evaluated in terms of discrimination, calibration, and clinical applicability. RESULTS: The training set results show that the accuracy of the Logistic regression, CNN, and XG Boost models was 84.4, 86.8, and 86.6, sensitivity was 68.4, 72.3, and 81.5, specificity was 82.0, 85.7, and 86.8, the area under the ROC curve was 0.734, 0.724, and 0.806, and Brier score was 0.121, 0.194, and 0.095, respectively. The XG Boost model effect evaluation index was better than the other two models, and similar results were obtained in the validation set. In terms of clinical applicability, the XG Boost model had higher clinical applicability than the Logistic regression and CNN models. CONCLUSION: The prediction effect of the XG Boost model was better than the CNN and Logistic regression models and was suitable for the prediction of HUA onset risk in steelworkers.

1. Introduction

Hyperuricemia (HUA) is a metabolic disorder disease that develops due to abnormal purine metabolism, resulting in elevated serum uric acid (SUA) concentrations [1]. A 2014 meta-analysis covering 16 provinces, municipalities, and autonomous regions in China showed that the prevalence of HUA in China was 13.3% (19.4% for men and 7.9% for women) [2]. Another meta-analysis in 2021, which included 2,277,712 subjects, showed that the prevalence of HUA had increased to 16.4% (20.4% for men and 9.8% for women) [3]. Previous studies have shown that the prevalence of HUA in China has doubled in the last 20 years and has become another public health problem of concern after diabetes [4]. Worldwide, the burden of gout has increased in 195 countries and regions, especially in developed countries and regions [5]. HUA is not only an early stage of gout but also an independent risk factor for coronary heart disease, hypertension, diabetes, and chronic kidney disease [6], which seriously endangers human health.
The steel industry is a pillar industry of the Chinese economy and directly employs as many as two million people. The health status of the workers is directly related to the development of the Chinese steel industry. It has been pointed out that steelworkers are exposed to occupational hazardous factors such as shift work, high temperature, and noise for a long time, and also have unhealthy habits such as smoking, alcohol consumption, and a high-salt diet, which cause or affect the risk factors of HUA differently from the general population [7]. Therefore, there is an urgent need to develop new risk prediction models for steelworkers’ morbidity, which can be used to improve the quality of life and health status of steelworkers.
Logistic regression is a traditional prediction model commonly used in the medical field and is widely used for a variety of disease predictions because of its clear parameter meaning and easy-to-understand outcome metrics. The convolutional neural network (CNN) is a feedforward neural network with a deep structure that is good at mining local features of data and extracting global training features and classification and has some advantages that traditional techniques do not have [8]. XG Boost, known as eXtreme gradient boosting, achieves classification by iterative computation of classifiers, and the addition of its regular term ensures the model’s robustness and reduces the time to process features because it was good at handling missing data [9]. We established the above three HUA morbidity risk prediction models based on the medical examination data information of more than two thousand steelworkers and compared their prediction effects, aiming to select the optimal model and provide a theoretical basis for the health management of this special occupational group.
At present, the popularization of HUA in China is still insufficient, the prevention and treatment situation is not optimistic, and the awareness rate and cure rate of HUA among patients are low [10,11]. Therefore, screening risk factors affecting HUA to establish prediction models, early identification, detection, and intervention of HUA patients has great social value to prevent and control the development of HUA and reduce the burden of the disease.

2. Materials and Methods

2.1. Study Design and Participants

The present study was a retrospective cohort study, relying on the Chinese National Key Research and Development Program “Beijing-Tianjin-Hebei Regional Occupational Population Health Effects Cohort Study”, which completed the baseline survey from January to September 2017 and the follow-up survey from March to September 2019. A total of 2992 steelworkers were included in the study, and the inclusion criteria for the study population were formal employees of the unit; more than 1 year of service; non-HUA patients at the time of the baseline survey; and voluntary signing of the informed consent form. Exclusion criteria were age > 60 years; and those with incomplete information. The study was reviewed and approved by the Ethics Committee of North China University of Technology (approval number: 16004).

2.2. Data Collection and Preprocessing

The subjects of this study were workers in the production department of Tangshan Iron and Steel Group who participated in the health examination, and all information was obtained from the baseline and follow-up surveys of the Beijing-Tianjin-Hebei cohort, including questionnaires, physical examinations, and laboratory examinations. The final data set was randomly divided into the training set (70% of observations) and the validation set (30%).
The questionnaire for the survey was developed by our team, one-on-one interviews were conducted by professionally trained PhD and MSc students from the School of Public Health of North China University of Technology to the workers of the steel enterprise.
Physical examinations were conducted by trained professional medical examiners according to standard testing methods for height, weight, blood pressure, and other indicators for workers in this enterprise.
For laboratory testing, fasting blood and morning urine were collected by the medical examination hospital before 9:00 a.m. daily and sent to the laboratory department of the medical examination hospital for uniform blood biochemical testing using a Myriad automatic biochemical analyzer (BS-800). The test indexes included fasting plasma glucose (FPG), uric acid (UA), total cholesterol (TC), triglyceride (TG), high-density lipoprotein cholesterol (HDL), triglyceride (TG), lipoprotein cholesterol (HDL-C), low-density lipoprotein cholesterol (LDL-C), creatinine (Cr), urea nitrogen (BUN), etc.

2.3. Definition of HUA

According to the Practice Guidelines for the Diagnosis and Management of Hyperuricemia in Renal Diseases in China (2017 edition) [12], developed by the Nephrologist Branch of the Chinese Physicians Association, blood uric acid ≥ 420 μmol/L in men and ≥360 μmol/L in women are being treated for HUA during the follow-up survey.

2.4. Definition of Variables

  • Hypertension: According to the classification criteria of the Chinese Guidelines for the Prevention and Treatment of Hypertension 2018 Revision [13], systolic blood pressure ≥ 140 mmHg and/or diastolic blood pressure ≥ 90 mmHg, or a previous history of hypertension and current use of antihypertensive drugs, were defined as hypertension;
  • Diabetes: According to the classification criteria for glucose metabolic status in the Chinese guidelines for the prevention and treatment of type 2 diabetes mellitus (2020 edition) [14], fasting blood glucose ≥ 7.0 mmol/L, or a previous history of diabetes currently undergoing diabetes treatment was defined as diabetes mellitus;
  • Smoking index: SI = number of cigarettes smoked per day × number of years of smoking [15]. The current study divided the smoking index into 3 groups according to the median: group 0 (0), group 1 (1~), and group 3 (300 and above);
  • Drinking index: DI = years of drinking × (amount of liquor/month + 0.ll × amount of beer/month) [15]. The current study divided the drinking index into 3 groups according to the median: group 0 (0), group 1 (1~), and group 3 (1028.57 and above);
  • The way of defining cumulative noise exposure, cumulative dust exposure, cumulative heat exposure, and cumulative days of night shift was detailed in the published article of the subject group [16];
  • Physical exercise: more than three times a week, no less than 30 min each time;
  • Body mass index: BMI = weight (kg)/height2 (m2). The normal range of body weight was BMI < 24 kg/m2, the overweight range was 24.0 kg/m2 ≤ BMI < 28.0 kg/m2, and the obese range was BMI ≥ 28.0 kg/m2;
  • Dyslipidemia: according to the Chinese guidelines for the prevention and treatment of dyslipidemia in adults (revised version 2016) [17], serum total cholesterol ≥ 6.2 mmol/L, and/or triglycerides ≥ 2.3 mmol/L, and/or LDL cholesterol ≥ 4.1 mmol/L, and/or HDL cholesterol < 1.0 mmol/L, a previous history of hyperlipidemia, or the current use of lipid-lowering drugs was defined as dyslipidemia;
  • Physical activity: The physical activity of workers was investigated using the International Physical Activity Questionnaire (IPAQ) (long-volume version) [18], and an overall weekly force activity level < 600 MET-min/w was defined as low-intensity operations, an overall weekly force activity level ≥ 600 MET-min/w was defined as medium-intensity operations, and weekly overall force activity level ≥ 3000 MET-min/w was defined as high-intensity operations;
  • Occupational tension: The revised Chinese Work Content Questionnaire [19] (JCQ) was used to assess occupational stress, using the ratio of job requirements to degree of work autonomy (D/C ratio), with a D/C ratio > 1 indicating occupational stress and a D/C ratio ≤ 1 indicating no occupational tension;
  • Sleep quality: Assess the sleep quality of steelworkers according to the internationally accepted Athens insomnia scale [20] (AIS), which is divided into no sleep disorder (overall score ≤ 6) and insomnia (total score > 6) according to the score;
  • DASH score: The DASH dietary model (Dietary Approaches to Stop Hypertension) encourages the intake of five major food groups (fruits, vegetables, nuts and legumes, low-fat milk, and whole grains) to be positively scored; the higher the frequency of intake, the higher the score. The three major food groups restricted by the DASH model (sodium-containing foods, red and processed meats, and sweetened beverages) were negatively scored;r the more frequently they were consumed, the lower the score.

2.5. Sample Size Calculation

The sample size calculation method for developing a clinical prediction model proposed by Richard et al. was used [21].
To ensure that the model could accurately predict the mean of the outcome events, the prevalence of hyperuricemia ɵ approximately 12% [22] was reviewed in the literature, and the margin of error δ was set at 0.05, which was calculated to require at least 144 study subjects.
n = ( 1.96 δ ) 2   θ ( 1 θ )
In order to control the minimum mean error of all individual prediction values, the mean absolute error MAPE was set to 0.05, the expected shrinkage rate RCS2 was set to 0.1, and the predictor variable P was about 15, which was calculated to require at least 433 study subjects.
n = exp ( 0.508 + 0.259 ln ( θ ) + 0.504 ln ( p ) ln ( MAPE ) 0.544 )
To ensure that the expected shrinkage rate was 10% and reduce model overfitting, S was 0.9, the study variable P was about 15, and it was calculated that at least 1274 cases of study subjects were required.
n = p ( s 1 ) ln ( 1 R CS 2 s )
To ensure that the difference between the developed model and RCS2 optimization adjustment value was minimized, RCS2 in Equation (4) was 0.1, maxRCS2 was 0.48, and S was calculated to be 0.81, which was calculated to require at least 600 study subjects.
s = R CS 2 R CS + δ maxR CS 2 2
n = p ( s 1 ) ln ( 1 R CS 2 S )
It was calculated that at least 1274 cases were needed to establish the model sample, and a total of 2992 cases were included in this study. The sample size met the needs of the study.

2.6. Model Building

The current study consisted of two main phases: (1) variable screening and (2) model development. We used LASSO regression for variable selection, and we screened the significant variables among 54 clinical characteristics by compressing the coefficients to achieve the effect of variable screening. The code for the LASSO regression implementation is shown in Supplementary Material S1. Logistic regression models, CNN models, and XG Boost models were then developed based on the selected variables and literature review.

2.6.1. Logistic Regression Model

The logistic regression model was built using the Sklearn package of python 3.6. The code for the logistic model implementation is shown in Supplementary Material S2.

2.6.2. CNN Model

The CNN model was built using the Numpy package, and the sigmoid function was used as the excitation function. The code for the CNN model implementation is shown in Supplementary Material S3.

2.6.3. XG Boost Model

The XG Boost model was built using the Sklearn package, using the sigmoid function as the excitation function and the BCE (Binary Cross Entropy) binary cross entropy as the loss function. The code for the XG Boost model implementation is shown in Supplementary Material S2.

2.7. Model Evaluation

The prediction effectiveness of the model was evaluated in terms of discrimination, calibration, and clinical applicability. The discrimination index included sensitivity, specificity, Youden index, ROC curve, and area under the curve. The calibration index includes the Brier score, Log loss, and calibration curve. Clinical applicability was evaluated by DCA graphs.

2.8. Statistical Analysis

An Excel 2010 database was established based on the questionnaire and physical examination data to screen the risk factors for HUA in steelworkers, and a prediction model was established based on the screened variables. Count data were described as rates or composition ratios, and the χ2 test was used for comparison between groups; ordinal data were described as rates or composition ratios, and the Kruskal–Wallis test was used for comparison between groups. SPSS 26.0 and Python 3.9 statistical software were used. The test level α was set at 0.05, and both two-sided tests were used.

2.9. Quality Control

Design phase: review the literature and consult experts to modify and improve the subject scheme; data collection stage: investigators were uniformly trained. Double-checking of data entry was used, and manual and computerized checking of data entry and logical error checking were performed to ensure the authenticity of the data; Data analysis stage: randomly selected training set and test set.

3. Results

3.1. Study Population

A total of 4518 steelworkers participated in the occupational health screening, removing 989 HUA patients, 385 missed visits, and 152 incomplete information from the baseline survey, resulting in a final cohort size of 2992. The study population was randomly divided into a training cohort (2094) and a validation cohort (898) in a ratio of 7:3, as shown in Figure 1.

3.2. Analysis of Study Population Characteristics

The cohort was followed up from March to September 2019 with a median follow-up time of 26 months and 465 new HUA patients and a crude incidence rate of 15.5%, including 16.31% in men and 7.58% in women. A comparative analysis of the basic characteristics of the workers in the training and validation cohorts revealed no statistically significant differences in the indicators, as detailed in Table 1.

3.3. Variable Screening

Predictor variables were screened by LASSO regression, and 6 predictor variables were finally screened out of 54 variables, including total cholesterol, BMI, blood pressure, waist circumference, creatinine, and DASH score, as shown in Figure 2. The figure on the left was the LASSO coefficient path diagram, where each curve represents the trajectory of the coefficient of each variable, and the variables first attributed to point 0 were excluded. The figure on the right is the cross-validation curve. The mistakes were the smallest when the parameters corresponding to the dashed line were selected, and the intersection of the dashed line and the abscissa coordinates corresponded to the Lambda in the left figure. Finally, six indicators with a large impact on the study outcome were screened. Throughout the literature review, we found that smoking, alcohol consumption, and physical activity were also important influencing factors of HUA [23], so they were added together to the subsequent model development.

3.4. Multicollinearity Test

The predictor variables were tested for multicollinearity, and we found that the variance inflation factors of all variables were greater than 0 and less than 1.4, and the tolerances were between 0 and 1. There was no multicollinearity problem, as shown in Table 2.

3.5. Evaluation of Model Effectiveness

The results of the training set of 2094 cases (70%) showed that the XG Boost model was better than the other two models in terms of sensitivity, specificity, Youden index, F1 score, AUC (95% CI), Brier score, and Log loss, respectively. The CNN model had a higher classification accuracy of 86.8%. The Logistic regression model indicators were slightly worse, as shown in Table 3, ROC curves as shown in Figure 3a.
The results of the validation set of 898 (30%) showed that the XG Boost model outperformed the other two models in terms of classification accuracy, sensitivity, specificity, Youden index, F1 score, AUC (95% CI), Brier score, and Log loss, respectively, as shown in Table 3, ROC curves as shown in Figure 3b.
The XG Boost model outperformed both the CNN and Logistic regression models in terms of Brier Score and Log loss, the calibration curves for both the training and validation sets were close to the diagonal, with no serious deviation from the results. Moreover, the XG Boost model performed best in terms of calibration accuracy, with the Logistic Regression model coming second and the CNN model deviating more from the diagonal, as shown in Figure 4.
The clinical decision curves for the three models are shown in Figure 5, among which the XG Boost model had the highest clinical applicability, and the logistic regression and CNN models had slightly worse clinical applicability. The nomogram of HUA risk in steelworkers was shown in Figure 6.

4. Discussion

In this study, we used LASSO regression for the screening of predictor variables, and eventually screened out 6 influencing factors among 54 predictor variables. LASSO regression was an advanced variable selection algorithm for high-dimensional data, and the complexity of the model can be simplified by constructing a penalty function to complete the screening of predictor variables [24]. Compared with the traditional stepwise regression method, LASSO regression can simultaneously process all independent variables at the same time, which not only effectively controlled model overfitting, but also made the model much more stable. On top of this, we added three influencing factors of HUA among steelworkers, such as smoking, alcohol consumption, and physical activity, found through the literature review, to improve the efficiency of the study. By comparing the predictive effects of the three different models, we found that the XG Boost model was the optimal model in this study and that the XG Boost model achieved better results in three areas: discrimination (AUROC 0.806), calibration (Brier Score 0.095), and clinical applicability. Our study highlighted the value of occupational health screening data in predicting HUA, and the screening of predictor variables may provide a scientific basis for the prevention and treatment of HUA in steelworkers.
The current study showed that overweight and obesity were important risk factors for the development of HUA in steelworkers, which is similar to the findings obtained from several previous studies [25,26,27]. Some studies have shown that obesity and the development of HUA were causally related to each other and were closely associated with unhealthy dietary habits, alcohol intake, and a sedentary lifestyle [11]. On the one hand, obese people tend to eat more meat, leading to increased exogenous purine intake and causing HUA. On the other hand, obese people ingest more energy than they consume, resulting in hyper-synthesis of purines in the body, leading to increased endogenous uric acid production [28]. An analysis of the US population found that BMI was the most important modifiable risk factor for HUA, with 44% of the population attributing HUA to overweight or obesity [29]. Both previous and current studies suggested that controlling overweight and obesity was beneficial in reducing the incidence of HUA. Dietary factors were also another important factor influencing the occurrence of HUA. The DASH dietary pattern involved in the current study was originally designed and developed to control hypertension and was a dietary pattern focusing on plant-based foods and high-quality protein that not only significantly reduced blood pressure but had also been used for cardiovascular disease prevention. Regarding the effect of the DASH diet on the risk of gout, Sharan conducted a cohort study that included more than 40,000 study subjects with up to 26 years of follow-up, and their results showed that the DASH diet score was negatively associated with the risk of gout [30]. The possible mechanism for this was that the DASH diet was lower in purines, reducing the purine load in the body. In addition, the DASH diet may act by improving insulin resistance in order to reduce SUA levels [31]. The above study supported the view of the current study that the DASH dietary pattern was a protective factor for the occurrence of HUA in steelworkers. Eating more fruits and vegetables and controlling sugar intake can contribute to the primary prevention of HUA in steelworkers.
Some studies have shown that reducing smoking and alcohol consumption, and a less sedentary lifestyle, can contribute to the prevention of HUA [23]. Smoking or second-hand smoke can increase the risk of HUA and gout. The possible reason for this is that smoking can excite the autonomic nervous system and affect the metabolism of purines in the body, with the potential effect of elevating SUA. In addition, the harmful substances in tobacco can adversely affect the respiratory and circulatory systems, leading to slower blood circulation and impaired uric acid excretion [32]. Alcohol consumption was another important influencing factor in the development of HUA. Firstly, the metabolic process of ethanol in the body consumed a large amount of water, which made the SUA value high. Secondly, the metabolism of ethanol was very likely to produce lactic acid, which was excreted through the kidneys and prevents the normal excretion of uric acid [33]. A sedentary lifestyle could lead to increased uric acid due to slower blood circulation. Moderate exercise accelerates metabolism and facilitates the excretion of uric acid. Long-term moderate-intensity aerobic exercise and aerobic exercise combined with strength training could reduce SUA concentrations in HUA patients [34], and there is a positive correlation between the amount of exercise and the decrease in SUA when exercise is performed at aerobic intensity. The possible mechanism is that long-term moderate-intensity aerobic exercise may protect renal function by alleviating the inflammatory response and ameliorating renal injury through pro-uric acid-excretory protein expression. Exercise plays a direct or indirect role in reducing SUA [35].
In this study on the prediction of HUA onset in steelworkers, the XG Boost model achieved better results and was more suitable for the prediction of HUA onset risk in steelworkers. XG Boost is a classification supervision model based on multiple trees, which essentially took the sum of the predicted values of each tree as the final predicted value. XG Boost had excellent computational efficiency, predictive generalization ability, and overfitting control, making it a long-term dominant data science competition solution. Rajdeep used six different machine learning algorithms to predict obesity risk and achieved a classification accuracy of up to 97.87% for the XG Boost model [36]. Savitesh predicted the risk of pre-diabetes in children and adolescents and found that XG Boost was the best classification model with a 10-fold cross-validation score of up to 90.13%. Savitesh integrated the XG Boost algorithm into a screening tool for completing the automatic prediction of pre-diabetes [37]. Shoukun performed miner fatigue identification based on physiological indicators from ECG and EMG and found that the XG Boost model had the best accuracy and robustness with a recognition accuracy of 89.47% and AUC of 0.90, the recognition of miner fatigue based on the XG Boost model is feasible [38]. The unique ability of logistic regression to correct different prevalence rates made it widely used in medical research, but it showed the poor ability of correct classification and low sensitivity in this study. Although the classification ability of the CNN model was relatively strong, it performed poorly in calibration, possibly because it was better at dealing with image problems.
Our study has several advantages. Firstly, in the process of evaluating the sample size, we did not choose the empirical-based estimation algorithm of 10-fold EPV but used the calculation method proposed by Richard that guaranteed the expected shrinkage rate and controls the error of individual prediction values [21], which made the calculation of the sample size of the HUA onset risk prediction model for steelworkers more rigorous. Second, instead of the traditional stepwise regression method, we chose LASSO regression, which allowed for extensive variable screening in the selection of predictor variables. LASSO regression compensated for the shortcomings of stepwise regression in terms of local optimal estimation and effectively helped us in the selection of predictor variables. In addition, we added three recognized influences such as smoking, alcohol consumption, and physical activity, based on the literature review, making the development of a predictive model for HUA in steelworkers of public health significance. Third, during the development of the model, we made a comprehensive determination in terms of discrimination, calibration, and clinical applicability. Fourthly, we developed a nomogram to predict the risk of HUA in steelworkers. The nomogram was clear and intuitive. From the perspective of steelworkers, the nomogram could predict their own risk of developing HUA in the future, and from the perspective of clinicians, the nomogram could be used to quickly and accurately identify workers at high risk of HUA for targeted health education. By understanding their own risk of HUA and raising awareness of risk factors, steelworkers can change their unhealthy lifestyles accordingly and reduce the risk of illness.
Our study has certain limitations. Firstly, as data are not easily available, our study did not find a suitable dataset to externally validate the newly developed HUA risk prediction model for steelworkers. Secondly, we only used traditional machine learning algorithms and did not improve on the relevant algorithms. Therefore, in the future, we will further investigate new algorithms to improve the predictive performance of the model.

5. Conclusions

The prediction effect of the XG Boost model was better than the CNN and Logistic regression models and was suitable for the prediction of HUA onset risk in steelworkers.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/ijerph20043411/s1, Table S1: Variable filter forecast table; Material S1: Lasso regression code; Material S2: Log and XG code; Material S3: CNN code.

Author Contributions

Methodology, Z.Z. (Ziwei Zheng); investigation, Z.Z. (Ziwei Zheng), Z.S., X.W., R.M. and H.W. (Hui Wang); data curation, Z.Z. (Zekun Zhao), H.L., H.W. (Huan Wang) and Y.Z.; writing—original draft preparation, Z.Z. (Ziwei Zheng); writing—review and editing, Y.C. and Y.Y.; visualization, J.H. and R.H.; supervision, X.L., L.X., J.S. and J.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki and approved by the Ethics Committee of North China University of Technology on 12 May 2016 (approval number: 16040).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Data are available on request due to restrictions privacy. The data presented in this study are available on request from the corresponding author. The data are not publicly available due to the data being not readily available.

Acknowledgments

This study acknowledges all researchers in the design and operation.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Wang, D.; Zhang, Y.; Tian, Z. Evaluation of methodological quality and reporting quality of domestic clinical guidelines for hyperuricemia. China J. Chin. Mater. Med. 2022, 47, 547–556. [Google Scholar]
  2. Rui, L.; Cheng, H.; Di, W. Prevalence of Hyperuricemia and Gout in China’s mainland from 2000 to 2014: A Systematic Review and Meta-Analysis. BioMed Res. Int. 2015, 2015, 762820. [Google Scholar]
  3. Li, Y.; Shen, Z.; Zhu, B. Demographic, regional and temporal trends of hyperuricemia epidemics in mainland China from 2000 to 2019: A systematic review and meta-analysis. Glob Health Action 2021, 14, 1874652. [Google Scholar] [CrossRef] [PubMed]
  4. Yuhan, G.; Shichong, J.; Dihua, L. Prediction model of random forest for the risk of hyperuricemia in a Chinese basic health checkup test. Biosci. Rep. 2021, 41, 4. [Google Scholar]
  5. Safiri, S.; Kolahi, A.A.; Cross, M. Prevalence, Incidence, and Years Lived with Disability Due to Gout and Its Attributable Risk Factors for 195 Countries and Territories 1990-2017: A Systematic Analysis of the Global Burden of Disease Study 2017. Arthritis Rheumatol 2020, 72, 1916–1927. [Google Scholar] [CrossRef] [PubMed]
  6. Wang, L.M.; Deng, Q.; Wang, L.H. The prevalence and risk factors of acute cardiovascular events in China: Findings from China Chronic Disease Risk Factor Surveillance 2010. Heart 2013, 99, 3. [Google Scholar] [CrossRef] [PubMed]
  7. Zhang, S.; Wang, Z.; Yang, L. Dose-response relationship between shift work and hyperuricemia. Chin. J. Dis. Control. Prev. 2018, 22, 1123–1127. [Google Scholar]
  8. Lee, J.H.; Kim, D.H.; Jeong, S.N. Detection and diagnosis of dental caries using a deep learning-based convolutional neural network algorithm. J. Dent. 2018, 77, 106–111. [Google Scholar] [CrossRef]
  9. Qureshi, Z.; Maqbool, A.; Mirza, A. Efficient Prediction of Missed Clinical Appointment Using Machine Learning. Comput. Math Methods Med. 2021, 2021, 2376391. [Google Scholar] [CrossRef] [PubMed]
  10. Liu, X.; Liu, L.; Zhang, X. Analysis of influencing factors and disease awareness of asymptomatic hyperuricemia in young and middle-aged physical examination population in Daqing City. Chin. Evid. -Based Nurs. 2021, 7, 894–901. [Google Scholar]
  11. Mats, D.; Lennart, J.; Edward, R. Global epidemiology of gout: Prevalence, incidence, treatment patterns and risk factors. Nat. Rev. Rheumatol. 2020, 16, 380–390. [Google Scholar]
  12. Nephrologist Branch of Chinese Medical Doctor Association: Practice Guidelines for the Diagnosis and Treatment of Hyperuricemia in Renal Disease in China (2017 Edition). Natl. Med. J. China 2017, 97, 1927–1936.
  13. 2018 revised edition of the Chinese Guidelines for the Prevention and Treatment of Hypertension. Prev. Treat. Cardiovasc. Cerebrovasc. Dis. 2019, 19, 1–44.
  14. Diabetes Branch of Chinese Medical Association: Guidelines for the prevention and treatment of type 2 diabetes in China (2020 edition). Int. J. Endocrinol. Metab. 2021, 41, 482–548.
  15. Zhang, S. Association of shift and rhythm-related gene polymorphisms in steelworkers with HUA. Postgraduate Thesis, . North China University of Science and Technology, Tangshan, Hebei, 2019. [Google Scholar]
  16. Chen, Y.; Yang, Y.; Zheng, Z. Influence of occupational exposure on hyperuricemia in steelworkers: A nested case-control study. BMC Public Health 2022, 22, 1508. [Google Scholar] [CrossRef]
  17. Zhu, J.; Gao, R.; Zhao, S. Guidelines for the prevention and treatment of dyslipidemia in adults in China (2016 revised edition). Chinses J. Health Manag. 2017, 11, 7–28. [Google Scholar]
  18. Lou, X.; He, Q. Validity and Reliability of the International Physical Activity Questionnaire in Chinese Hemodialysis Patients: A Multicenter Study in China. Med. Sci. Monit. 2019, 25, 9402–9408. [Google Scholar] [CrossRef]
  19. Sun, R.; Lan, Y. A study on the association between nursing staff’s job adaptation and occupational stress. Chin. J. Prev. Med. 2020, 54, 1197–1201. [Google Scholar]
  20. Lin, C.Y.; Cheng, A.S.K.; Nejati, B. A thorough psychometric comparison between Athens Insomnia Scale and Insomnia Severity Index among patients with advanced cancer. J. Sleep Res. 2020, 29, e12891. [Google Scholar] [CrossRef] [Green Version]
  21. Riley, R.D.; Ensor, J.; Snell, K.I.E. Calculating the sample size required for developing a clinical prediction model. Bmj 2020, 368, m441. [Google Scholar] [CrossRef] [Green Version]
  22. Yang, X.; Tao, Q.; Zhan, S. A 5-year risk prediction model for hyperuricemia in 35~74-year-old health examination population in Taiwan. China Prev. Med. 2013, 14, 655–659. [Google Scholar]
  23. He, H.J.; Guo, P.; He, J. Prevalence of hyperuricemia and the population attributable fraction of modifiable risk factors: Evidence from a general population cohort in China. Front. Public Health 2022, 10, 936717. [Google Scholar] [CrossRef] [PubMed]
  24. Zhuo, J.; Yang, L.; Zhu, J. Establishment and validation of clinical prediction model for recurrence after unilateral chronic subdural hematoma drilling and drainage. J. Clin. Neurosurg. 2021, 18, 58–63. [Google Scholar]
  25. Lu, J.; Bai, Z.; Chen, Y. Effects of bariatric surgery on serum uric acid in people with obesity with or without hyperuricaemia and gout: A retrospective analysis. Rheumatology 2021, 60, 3628–3634. [Google Scholar] [CrossRef]
  26. Yeo, C.; Kaushal, S.; Lim, B. Impact of bariatric surgery on serum uric acid levels and the incidence of gout-A meta-analysis. Obes. Rev. 2019, 20, 1759–1770. [Google Scholar] [CrossRef] [PubMed]
  27. Qu, X.; Zheng, L.; Zu, B. Prevalence and Clinical Predictors of Hyperuricemia in Chinese Bariatric Surgery Patients. Obes. Surg. 2022, 32, 1508–1515. [Google Scholar] [CrossRef] [PubMed]
  28. Yokose, C.; McCormick, N.; Choi, H.K. The role of diet in hyperuricemia and gout. Curr. Opin. Rheumatol. 2021, 33, 135–144. [Google Scholar] [CrossRef] [PubMed]
  29. Choi, H.K.; McCormick, N.; Lu, N. Population Impact Attributable to Modifiable Risk Factors for Hyperuricemia. Arthritis Rheumatol. 2020, 72, 157–165. [Google Scholar] [CrossRef]
  30. Rai, S.K.; Fung, T.T.; Lu, N. The Dietary Approaches to Stop Hypertension (DASH) diet, Western diet, and risk of gout in men: Prospective cohort study. Bmj 2017, 357, j1794. [Google Scholar] [CrossRef] [Green Version]
  31. Gao, Y.; Cui, L.F.; Sun, Y.Y. Adherence to the Dietary Approaches to Stop Hypertension Diet and Hyperuricemia: A Cross-Sectional Study. Arthritis Care Res. 2021, 73, 603–611. [Google Scholar] [CrossRef]
  32. Xiao, T.; Zhen, J.; Wang, C. Effects of smoking and aerobic exercise on markers related to metabolic syndrome in male college students. Chin. J. Sch. Health 2020, 41, 845–848. [Google Scholar]
  33. Nakamura, K.; Sakurai, M.; Miura, K. Alcohol intake and the risk of hyperuricaemia: A 6-year prospective study in Japanese men. Nutr. Metab. Cardiovasc. Dis. 2012, 22, 989–996. [Google Scholar] [CrossRef]
  34. Nishida, Y.; Iyadomi, M.; Higaki, Y. Influence of physical activity intensity and aerobic fitness on the anthropometric index and serum uric acid concentration in people with obesity. Intern. Med. 2011, 50, 2121–2128. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  35. Jiang, Z.; Cao, J.; Cao, H. Research status and prospect of exercise in the prevention and treatment of hyperuricemia. China Prev. Med. 2021, 22, 390–396. [Google Scholar]
  36. Kaur, R.; Kumar, R.; Gupta, M. Predicting risk of obesity and meal planning to reduce the obese in adulthood using artificial intelligence. Endocrine 2022, 78, 458–469. [Google Scholar] [CrossRef]
  37. Kushwaha, S.; Srivastava, R.; Jain, R. Harnessing machine learning models for non-invasive pre-diabetes screening in children and adolescents. Comput. Methods Programs Biomed. 2022, 226, 107180. [Google Scholar] [CrossRef]
  38. Chen, S.; Xu, K.; Yao, X. Information fusion and multi-classifier system for miner fatigue recognition in plateau environments based on electrocardiography and electromyography signals. Comput. Methods Programs Biomed. 2021, 211, 106451. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Overview of the study.
Figure 1. Overview of the study.
Ijerph 20 03411 g001
Figure 2. Variable selection based on LASSO regression. (a) LASSO coefficient path map; (b) Cross validation curve.
Figure 2. Variable selection based on LASSO regression. (a) LASSO coefficient path map; (b) Cross validation curve.
Ijerph 20 03411 g002
Figure 3. ROC curves of three models. (a) Training set; (b) validation set.
Figure 3. ROC curves of three models. (a) Training set; (b) validation set.
Ijerph 20 03411 g003
Figure 4. Calibration curves of three models: (a) Logistic; (b) CNN; (c) XG Boost.
Figure 4. Calibration curves of three models: (a) Logistic; (b) CNN; (c) XG Boost.
Ijerph 20 03411 g004aIjerph 20 03411 g004b
Figure 5. DCA curves of three models: (a) Logistic; (b) CNN; (c) XG Boost.
Figure 5. DCA curves of three models: (a) Logistic; (b) CNN; (c) XG Boost.
Ijerph 20 03411 g005
Figure 6. Nomogram of HUA prediction.
Figure 6. Nomogram of HUA prediction.
Ijerph 20 03411 g006
Table 1. Characteristics analysis of steelworkers.
Table 1. Characteristics analysis of steelworkers.
VariableTraining Set (2094)Validation Set (898)χ2/H(K)p
TotalPatients (%)TotalPatients (%)
Age (Year) 0.072 *0.965 *
<4040382 (20.60)19039 20.53)
40~856112 (13.08)36952 (14.09)
≥50838124 (14.80)33956 (16.52)
Gender 1.6760.196
male1930310 (16.06)798135 (16.92)
female1648 (4.88)10012 (12)
Marital status 0.6250.732
unmarried436 (13.95)204 (20)
married1942275 (14.16)816121 (14.83)
other10937 (33.94)6222 (35.48)
Education level 4.507 *0.105 *
junior secondary school or lower152 (13.33)62 (33.33)
high school and secondary school1771252 (14.23)775118 (15.23)
college and above30864 (20.78)11727 (23.08)
Monthly income per capita of the household (Yuan) 0.042 *0.979 *
<150059664 (10.74)27831 (11.15)
1500~844141 (16.71)35968 (18.94)
≥2500654113 (17.28)26148 (18.39)
DASH 0.0770.782
<251024197 (19.24)43494 (21.66)
≥251070121 (11.31)46453 (11.42)
AIS 0.0260.873
≤639350 (12.72)15122 (14.57)
>61701268 (15.76)747125 (16.73)
Smoking index 0.088 *0.957 *
01024116 (11.33)43249 (11.34)
<30050094 (18.80)22245 (20.27)
≥300570108 (18.95)24453 (21.72)
Drinking index 0.153 *0.926 *
01390156 (11.22)60769 (11.37)
<1028.5733681 (24.11)14442 (29.17)
≥1028.5736881 (22.01)14736 (24.49)
IPAQ 0.400 *0.819 *
low23082 (35.65)11746 (39.32)
medium7019 (27.14)3713 (35.14)
high1794217 (12.10)74488 (11.83)
BMI (kg/m2) 0.371 *0.831 *
<2478663 (8.02)35637 (10.39)
24~915153 (16.72)38974 (19.02)
≥28393102 (25.95)15336 (23.53)
Systolic pressure (mmHg) 1.153 *0.562 *
<1201002104 (10.38)44560 (13.48)
120~839147 (17.52)36369 (19.01)
≥14025367 (26.48)8918 (20.22)
Diastolic pressure (mmHg) 0.905 *0.636 *
<8048152 (10.81)23426 (11.11)
80~1265195 (15.42)534100 (18.73)
≥9034871 (20.40)13021 (16.15)
Waistline (cm) 0.3630.547
normal1204186 (15.45)52274 (14.18)
abnormal890132 (14.83)37673 (19.41)
TG (mmol/L) 0.4100.522
normal1829266 (14.54)809123 (15.20)
abnormal26552 (19.62)8924 (26.97)
TC (mmol/L) 0.3830.536
normal1866271 (14.52)808122 (15.10)
abnormal22847 (20.61)9025 (27.78)
FPG (mmol/L) 0.489 *0.783 *
<6.11389201 (14.47)61791 (14.75)
6.1~52090 (17.31)20840 (19.23)
≥7.018527 (14.59)7316 (21.92)
HDL-C (mmol/L) 0.0710.790
≥1.01821267 (14.66)796124 (15.58)
<1.027351 (18.68)10223 (22.55)
LDL-C (mmol/L) 0.0810.777
<4.11826270 (14.79)794124 (15.62)
≥4.126848 (17.91)10423 (22.12)
Cr (U/L) 0.0380.845
low1001119 (11.89)44162 (14.06)
high1093199 (18.21)45785 (18.60)
BUN (mmol/L) 0.4680.494
<7.11974301 (15.25)843141 (16.73)
≥7.112017 (14.17)556 (10.91)
Seniority (Year) 0.019*0.990*
<1532731 (9.48)16216 (9.88)
15~1332208 (15.62)574101 (17.60)
≥3043579 (18.16)16230 (18.52)
JCQ 0.0550.814
no750117 (15.60)34161 (17.89)
yes1334201 (15.07)55786 (15.44)
Cumulative noise exposure (dB (A)·Year) 0.146 *0.930 *
0921136 (14.77)39368 (17.30)
<21.34594101 (17.00)24440 (16.39)
≥21.3457981 (13.99)26139 (14.94)
Dust cumulative exposure (mg/m3·Year) 0.128 *0.938 *
01250193 (15.44)53584 (15.70)
<1374.342952 (12.12)17226 (15.12)
≥1374.341573 (17.59)19137 (19.37)
Cumulative exposure to high temperatures (°C·Year) 0.321 *0.852 *
01211134 (11.07)46263 (13.64)
<568.550098 (19.60)20339 (19.21)
≥568.547386 (18.18)23345 (19.31)
Accrued days for night shifts (Day) 1.160 *0.560 *
027831 (11.15)11821 (17.80)
<1976.4915156 (17.05)38760 (15.50)
≥1976.4901131 (14.54)39766 (16.62)
* Ordinal data were compared between groups using the Kruskal-Wallis test.
Table 2. Multicollinearity test of predictor variables.
Table 2. Multicollinearity test of predictor variables.
VariableToleranceVIF
TC0.9861.014
BMI0.7601.316
Hypertension0.8251.212
Waistline0.7791.284
Cr0.9961.004
DASH0.9011.110
Smoking index0.8971.114
Drinking index0.8981.114
IPAQ0.8941.119
Table 3. Performance of the three models.
Table 3. Performance of the three models.
Evaluation IndicatorLogisticCNNXG Boost
Training SetValidation SetTraining SetValidation SetTraining SetValidation Set
Accuracy (%)84.483.986.885.086.688.1
Sensitivity (%)68.466.072.370.581.578.1
Specificity (%)82.079.785.780.686.884.6
Youden index0.5040.4570.5800.5110.6830.627
F1 Score0.2390.2080.2070.1400.3430.278
AUC
(95% CI)
0.734 (0.685, 0.782)0.731 (0.676, 0.785)0.724 (0.669, 0.779)0.713 (0.667, 0.759)0.806 (0.748, 0.863)0.733 (0.689, 0.777)
Brier Score0.1210.1260.1940.1220.0950.107
Log loss0.3980.4130.4420.4090.3280.361
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zheng, Z.; Si, Z.; Wang, X.; Meng, R.; Wang, H.; Zhao, Z.; Lu, H.; Wang, H.; Zheng, Y.; Hu, J.; et al. Risk Prediction for the Development of Hyperuricemia: Model Development Using an Occupational Health Examination Dataset. Int. J. Environ. Res. Public Health 2023, 20, 3411. https://doi.org/10.3390/ijerph20043411

AMA Style

Zheng Z, Si Z, Wang X, Meng R, Wang H, Zhao Z, Lu H, Wang H, Zheng Y, Hu J, et al. Risk Prediction for the Development of Hyperuricemia: Model Development Using an Occupational Health Examination Dataset. International Journal of Environmental Research and Public Health. 2023; 20(4):3411. https://doi.org/10.3390/ijerph20043411

Chicago/Turabian Style

Zheng, Ziwei, Zhikang Si, Xuelin Wang, Rui Meng, Hui Wang, Zekun Zhao, Haipeng Lu, Huan Wang, Yizhan Zheng, Jiaqi Hu, and et al. 2023. "Risk Prediction for the Development of Hyperuricemia: Model Development Using an Occupational Health Examination Dataset" International Journal of Environmental Research and Public Health 20, no. 4: 3411. https://doi.org/10.3390/ijerph20043411

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop