Next Article in Journal
Obesity, Physical Performance, Balance Confidence, and Falls in Community-Dwelling Older Adults: Results from the Korean Frailty and Aging Cohort Study
Previous Article in Journal
Effect of Low-Dose Alcohol Consumption on Chronic Liver Disease
Previous Article in Special Issue
Mediterranean Diet and Cardiovascular Disease: The Moderating Role of Adequate Sleep—Results from the ATTICA Cohort Study (2002–2022)
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Sleep Quality, Nutrient Intake, and Social Development Index Predict Metabolic Syndrome in the Tlalpan 2020 Cohort: A Machine Learning and Synthetic Data Study

by
Guadalupe Gutiérrez-Esparza
1,2,*,†,
Mireya Martinez-Garcia
3,†,
Tania Ramírez-delReal
4,
Lucero Elizabeth Groves-Miralrio
3,
Manlio F. Marquez
5,
Tomás Pulido
6,
Luis M. Amezcua-Guerra
3 and
Enrique Hernández-Lemus
7,8,*
1
Researcher for Mexico CONAHCYT, National Council of Humanities, Sciences and Technologies, Mexico City 08400, Mexico
2
Clinical Research, National Institute of Cardiology ‘Ignacio Chávez’, Mexico City 14080, Mexico
3
Department of Immunology, National Institute of Cardiology ‘Ignacio Chávez’, Mexico City 14080, Mexico
4
Center for Research in Geospatial Information Sciences, Aguascalientes 20313, Mexico
5
Department of Electrocardiology, National Institute of Cardiology ‘Ignacio Chavez’, Mexico City 14080, Mexico
6
Cardiopulmonary Department, National Institute of Cardiology ‘Ignacio Chávez’, Mexico City 14080, Mexico
7
Computational Genomics Division, National Institute of Genomic Medicine, Mexico City 14610, Mexico
8
Center for Complexity Sciences, Universidad Nacional Autónoma de Mexico, Mexico City 04510, Mexico
*
Authors to whom correspondence should be addressed.
Co-first authors.
Nutrients 2024, 16(5), 612; https://doi.org/10.3390/nu16050612
Submission received: 10 January 2024 / Revised: 16 February 2024 / Accepted: 19 February 2024 / Published: 23 February 2024
(This article belongs to the Special Issue Diet- and Sleep-Based Approach for Cardiovascular Risk/Diseases)

Abstract

:
This study investigated the relationship between Metabolic Syndrome (MetS), sleep disorders, the consumption of some nutrients, and social development factors, focusing on gender differences in an unbalanced dataset from a Mexico City cohort. We used data balancing techniques like SMOTE and ADASYN after employing machine learning models like random forest and RPART to predict MetS. Random forest excelled, achieving significant, balanced accuracy, indicating its robustness in predicting MetS and achieving a balanced accuracy of approximately 87%. Key predictors for men included body mass index and family history of gout, while waist circumference and glucose levels were most significant for women. In relation to diet, sleep quality, and social development, metabolic syndrome in men was associated with high lactose and carbohydrate intake, educational lag, living with a partner without marrying, and lack of durable goods, whereas in women, best predictors in these dimensions include protein, fructose, and cholesterol intake, copper metabolites, snoring, sobbing, drowsiness, sanitary adequacy, and anxiety. These findings underscore the need for personalized approaches in managing MetS and point to a promising direction for future research into the interplay between social factors, sleep disorders, and metabolic health, which mainly depend on nutrient consumption by region.

1. Introduction

Metabolic Syndrome (MetS) is a condition that increases the risk of developing or worsening several serious health conditions such as diabetes, heart disease, and stroke, as well as cognitive decline and dementia [1]. Sleep disturbances such as insomnia, apnea, and snoring, linked to MetS, can exacerbate these health risks [2,3]. In 2017, the National Health and Nutrition Survey of Mexico [4] estimated the prevalence of sleep disorders in Mexicans using a sample of 8649 people older than 18 years old. The results showed a prevalence of snoring while sleeping of 48.5%, difficulty sleeping of 36.9%, and tiredness or fatigue during the day of 32.4%; likewise, insomnia was 18.8% more prevalent in women. Regarding apnea, the results indicated that 23.7% had a higher risk of presenting apnea, especially the populations of those who were overweight and obese, hypertensive, and those over 40 years of age. In another study [5], the prevalence of insomnia was 36.7%, being more common among women (with a prevalence of 41.9%) than men (with a prevalence of 36.7%). Effective treatment for sleep disorders hinges on identifying their specific type and underlying causes, highlighting the ongoing need for improved diagnosis and treatment strategies.
The prevalence data on sleep disorders underscore the importance of understanding their impact on conditions like MetS. This underscores the necessity of employing tools such as the Medical Outcomes Study Sleep Scale (MOS) [6] in research to assess sleep quality and its influence on health. Its widespread use in diverse research studies [7,8,9] has deepened studies of how sleep disorders affect various health conditions and populations, thanks to its ability to measure multiple sleep-related aspects.
Similarly, nutrition and specific nutrients play crucial roles in developing and managing MetS [10]. MetS is a cluster of conditions that includes abdominal obesity, insulin resistance, dyslipidemia, and hypertension. Poor dietary choices and other lifestyle factors can contribute to developing and exacerbating these risk factors [11,12]. Excessive caloric intake, especially from high-fat and high-sugar diets, contributes to obesity; in consequence, it can contribute to insulin resistance, which is a key feature of metabolic syndrome. Low consumption of dietary fiber, commonly found in fruits, vegetables, and whole grains, is associated with insulin resistance. Diets high in saturated and trans fats can lead to dyslipidemia, which is characterized by elevated levels of triglycerides and low-density lipoprotein cholesterol and decreased high-density lipoprotein cholesterol. This lipid profile is a risk factor for cardiovascular diseases associated with metabolic syndrome. In contrast, omega-3 fatty acids, found in fatty fish, flax seeds, and walnuts, have been associated with favorable lipid profiles and may have a protective effect against metabolic syndrome [13,14,15,16]. As expected, nutrition and dietary habits are associated with MetS; various research has found the contributions of nutrients through applying diverse statistical models on the increasing or decreasing risk [17,18,19].
In the same way, another factor significantly associated with MetS is the social development index (SDI) [20], which is a composite measure of social and economic development. The SDI serves as a metric to evaluate the well-being and social progress in Mexico. Originating in the early 2000s and modeled after the Human Development Index (HDI), the SDI categorizes the level of social development in territorial units. These units correspond, for instance, to the subdivision of municipal geostatistical areas in Mexico City. The SDI employs a methodology established by the National Council for the Evaluation of Social Development Policy (Consejo Nacional de Evaluación de la Política de Desarrollo Social, CONEVAL) for its calculation (refer to Methods for further details on the SDI) [21].
Countries with higher SDI scores tend to have better health outcomes, including lower rates of MetS [22], and an additional study connects the risk of MetS with economic and social vulnerability as well as inappropriate nutrition profiles [23]. The evidence suggests a close association between the SDI and sleep disturbances, which is a relationship influenced by socioeconomic factors such as income level and education. These factors directly affect access to health services and lifestyle habits, such as diet and physical activity, which are essential for maintaining optimal sleep quality. Analyzing how the SDI and sleep disturbances interact with MetS is crucial to unravel the social and economic determinants that shape these complex interconnections. Understanding these dynamics will not only facilitate the identification of the types of sleep disorders that increase the prevalence of MetS but will also contribute to developing more effective strategies for its prevention and treatment, thus improving overall health and well-being. For this reason, developing automated methods for diagnosing sleep disorders, identifying the determinants of the SDI, and predicting MetS have become fields of significant research interest.
In the case of sleep disruption, machine learning has shown promise in improving the accuracy and efficiency of the diagnosis process. The work of Mencar et al. [24] presents the application of five machine learning models to predict the severity of obstructive sleep apnea syndrome (OSAS) using polysomnography data, where the random forest model obtained the highest accuracy (90.91%) and relevant features such as respiratory rate and oxygen saturation were extracted. Another study [25] applies a machine learning model to predict the presence of OSAS using clinical and demographic data. The random forest model performed best, achieving an accuracy of 87.1%. The most important predictors were body mass index (BMI), age, and gender, as well as additional predictors such as neck circumference and smoking.
In another study by Eyvazlou et al. [26], an ANN model was developed to predict MetS based on sleep quality and work-related risk factors. The results showed that the ANN model could identify individuals at risk of MetS with a sensitivity of 74.1% and a specificity of 76.2%. Moreover, other studies [27,28] have also applied machine learning to understand the social determinants that affect and influence the health of individuals.
However, despite the excellent results described in previous studies, one of the most common challenges in medical diagnoses is the issue of class imbalance. This problem significantly impacts the performance of classifiers, as they tend to exhibit a bias towards the majority class, resulting in skewed outcomes. In this context, authors such as Kim et al. [29] propose a prediction model that utilizes balancing techniques to identify middle-aged Korean individuals at a high risk of MetS. The dataset used in their study comprises age, gender, anthropometric data, sleep quality, and blood indicators of 1991 individuals. The results showed that XGBoost (using Scikit-learn library in Python ver. 3.8.5), employing SMOTE, achieved an AUC of 85.1%.
The present study examines the connection between the SDI, sleep disturbances, types of nutrients consumed, and MetS within a cohort from Mexico City. We aim to identify critical factors that may be key to reducing MetS incidence or severity by applying machine learning algorithms. Additionally, we will use data balancing techniques to improve the predictive performance of our models and enhance feature selection. By incorporating these methods, we aim to uncover valuable insights and contribute to developing more accurate and practical approaches for addressing MetS.

2. Materials and Methods

2.1. Data

Data for this study were derived from the baseline assessment of a cohort called Tlalpan 2020 from the National Institute of Cardiology Ignacio Chávez in Mexico City [30]. This project was authorized by the Institutional Ethics Committee of the National Institute of Cardiology Ignacio Chavez under code 13-802. The dataset used in this investigation includes data from 3156 volunteers (all of them were informed of the research purposes and signed a letter of informed consent) about their anthropometric measurements, consumption of alcohol and tobacco, level of physical activity, level of economic income, level of education, anxiety, family history health, biomedical evaluation, quality of sleep, and the amount of nutrients consumed.

2.1.1. Quality of Sleep

The sleep quality was measured using MOS [6], a self-report for assessing sleep quality and quantity. This questionnaire includes 12 items about sleep disruption, snoring, sleep shortness of breath or headache, sleep adequacy, and sleep somnolence; it additionally measures the number of hours of sleep per day over the previous four weeks. The MOS has been used in several studies, such as discriminating the quality of sleep among a Spanish postmenopausal population [9], diagnosing cases of apnea [7,8], and identifying sleep disturbance in patients with rheumatoid arthritis [31], among others.

2.1.2. Clinical and Anthropometric Parameters

Clinical and anthropometric data such as systolic blood pressure (SBP) and diastolic blood pressure (DBP) (measured according to standard procedure [32]) were collected, as well as waist circumference (WC), height and weight (measured according to ISAK [33]) for calculationof BMI, and the height–waist index (WHtR). These were calculated from primary measurement data.

2.1.3. Biochemical Evaluation

The following laboratory test measurements corresponding to blood samples were included: glucose (GLU), triglycerides (TRIG), HDL cholesterol (HDL), LDL cholesterol (LDL), uric acid (URIC), atherogenic index (IAT), and sodium (NA).

2.1.4. Social Development Index

Comprising key dimensions associated with education, health, and housing, the SDI incorporates specific indicators for the evaluation of each dimension. The weight assigned to each indicator varies based on its significance in the overall assessment of social development. The resulting scores are aggregated to yield a score for each dimension. The SDI value facilitates the ordering of territorial units based on their achieved levels of development, classified as Very Low, Low, Medium, and High [34,35].
The SDI indicators (as reported in reference [21]) are briefly described below:
  • Quality and available space in the home (QUA_HOUS): The quality of housing is measured by the type of flooring, and the amount of living space is indicated by the number of people per bedroom, with two being the standard.
  • Educational access (EDULAG): This indicator measures the proportion of people aged 18–59 who have completed secondary school or have received 13 years of schooling, which is considered a minimum standard for well-being.
  • Access to social security and/or Medical Service (HEALTHAC): This indicator measures the coverage of any of the Mexican health systems.
  • Durable goods (DURAB): This indicator measures possession of material goods whose value is equal to or greater than USD 17.81, or possession of at least three items such as a television, gas stove, computer, refrigerator, or washing machine.
  • Sanitary adequacy (SANITRY): This indicator measures the availability of a water supply, toilet facilities, and access to a drainage system.
  • Electricity access (ENER_AD): This indicator measures whether or not there is adequate access to electricity.

2.1.5. Habits and Factors Associated with Lifestyle

Furthermore, habit data were also collected, such as habitual smoking, alcohol consumption, and physical activity (calculated based on the International Physical Activity Questionnaire, IPAQ, Ref. [36] by metabolic equivalent minutes/week, which are classified in the following categories: low, moderate, and high).
Education level was collected and classified into three categories: primary school, high school, and university studies, as well as postgraduate school. Similarly, we collected the level of economic income, which was classified into three categories based on the Mexican peso income paid monthly: low (MXN 1.00 to MXN 6600.00), medium (MXN 6601.00 to MXN MXN 11,000.00), and high (more than MXN11,000.00).

2.1.6. Psychological Stress Level

We used the State-Trait Anxiety Inventory (STAI) to collect data about psychological stress levels, which were categorized into five categories: high (>65), moderate (56–65), medium (46–55), minor (36–45), and low (<35) [37,38].

2.1.7. Dietary Information

To gather information about the frequency of food consumption and other dietary products, we utilized a software tool called the “Evaluation of Nutritional Habits and Nutrient Consumption System“ from the National Institute of Public Health [39]. This system examines the meals individuals have consumed over a day within the previous year and computes the quantity of nutrients ingested.
All data mentioned in this section are presented in the Table 1.

2.2. Methods

2.2.1. Feature Selection

Feature selection is essential to identify and establish the most critical variables.
In this study, we employed logistic regression to measure the relationship between variables and class alongside machine learning algorithms to discern the most significant features. The algorithms used were RF and RPART (see Machine Learning Modelsbelow), applying the mean decrease accuracy for calculating variable importance, which can be expressed as follows:
M D I i = a l l n o d e s ( ( I m p ( n o d e ) W e i g h t . I m p ( n o d e ) ) / N S . N )
where M D I i is the mean decrease impurity of the i t h variable; I m p ( n o d e ) is the impurity of the node before the split; W e i g h t . I m p ( n o d e ) is the weighted impurity of the child nodes resulting from the split; and N S . N is the number of samples in the node before the split.

2.2.2. Balancing Methods

Balancing methods such as SMOTE and ADASYN have helped address the class imbalance issue within our dataset.
ADASYN (Adaptive Synthetic Sampling), which is part of the UBL R package, takes a unique approach by generating synthetic samples based on the local density of minority class instances, with a focus on instances that are more challenging to learn. In this method, the β parameter controls the desired balance rate between the minority and majority classes during the generation of synthetic samples. When β is set to a value greater than 1, a proportionally larger number of synthetic samples will be generated relative to the instances of the minority class. This further increases the ratio between the minority and majority classes.
The second method, SMOTE (Synthetic Minority Oversampling Technique) of the performanceEstimation R package (Version: 1.1.0), generates synthetic samples for the minority class. In SMOTE, the k parameter determines the number of nearest neighbors used to generate synthetic samples. A small value of k can lead to an excessive generation of synthetic samples that may be too close together, resulting in model overfitting. Moreover, if k is too large, synthetic samples may be less representative of the minority class and fail to capture data variability adequately.

2.2.3. Machine Learning Models

To build the models, we applied two machine learning algorithms, RF [41,42] and RPART [43,44], as well as PCA [45,46]. RF, introduced by Breiman [47], is a machine learning algorithm combining multiple decision trees to create a model with the highest accuracy. Rpart (Recursive Partitioning and Regression Trees), by Breiman [48], works by recursively partitioning the input data based on predictor variables to create a tree-like structure. This algorithm aims to find the optimal splits in the data that maximize the homogeneity or purity of the resulting subgroups. Principal component analysis (PCA) is a data analysis technique used to simplify the complexity of data by reducing their dimensionality, facilitating visualization and analysis.

2.3. Performance Measures

We used sensitivity, specificity, and balanced accuracy (B.ACC) to evaluate model performance. These metrics provide a fair assessment of the model’s performance across all classes, considering the issue of class imbalance.
S E N S = T P T P + F N
S P C = T N F P + T N
B . A C C = 1 2 T P P + T N N
where P = Positive, N = Negative, TP = True Positive, FN = False Negative, TN = True Negative, and FP = False Positive, respectively.

3. Statistical Analysis and Development of Prediction Models

All experiments were performed using the R programming language (3.6.1) [49]. Min-max was used to normalize continuous variables, and dichotomous variables were represented as numbers. Figure 1 provides a general overview of the experimental process described in this section. To develop predictive models, it was necessary to process the data and implement a balancing technique. The minority class was oversampled, taking into account the majority class. As a first step, SMOTE was applied, and it was necessary to determine the best value of k (number of nearest neighbors), so experiments were conducted by varying k (here, we present k = 1 , k = 5 , and k = 9 ). In this process, the dataset was randomly divided into 70% for training and 30% for testing. To accomplish this task, we applied two machine learning algorithms, RF and RPART. In the case of RF, we varied the mtry parameter from 1 to 10 and considered ntree values of 100, 300, 500, and 1000 for each model.
Additionally, a subset of features was extracted in each created model using the variable importance (VarImp) of RF, and a 10-fold cross-validation was performed. Similarly, in the case of RPART, parameter tuning was conducted by considering c p = 0 , c p = 0.05 , and c p = 0.005 , using a 10-fold cross-validation. Likewise, a subset of features was extracted in each created model.
Once the feature subsets were obtained, along with the optimal value for each corresponding parameter of each algorithm and data balancing technique, we tested the generated feature subsets using RF and RPART. This was accomplished by conducting 30 runs with different seeds to assess the performance of each model. In all experiments, a minimum of 30 independent runs were conducted for each algorithm using 30 different seeds. The mean and standard deviation of the performance measures were calculated for each of these runs.

4. Results

Understanding how MetS, nutrition, sleep disturbances, and SDI relate in men and women can have important clinical and public health implications. In this study, we used logistic regression before dataset balancing to pinpoint the critical variables associated with MetS in both sexes. Table 2 presents the results of the features and their corresponding values obtained.
Analyzing the data, in men, the top 10 variables most related to MetS are GLU, TRIG, WC, IAT, SBP, vitamin B12 (B12), BMI, lactose (LACT), carbohydrates (CARBO), and high glucose levels based on the dietary survey (GLU_1). Conversely, in women, the ten most relevant variables include GLU, TRIG, WC, BMI, SBP, total proteins (PROTEI), fructose (FRUCT), high cholesterol total based on the dietary survey (CHOL_SN), URIC, and copper (CU). To achieve a more effective visualization of these prominent features from the logistic regression for both men and women, Figure 2 is presented. Red square symbols represent the most substantial variables for women, while blue triangles represent those for men. A cautionary note must be made for the seemingly outlier behavior of blood glucose and triglycerides with very high coefficients. Let us recall that these features are closely related to the very definition of MetS. Such variables were included in our models only for the sake of database completeness and comprehensiveness. Detailed results for women can be found in Supplementary Table S1, and those for men are available in Supplementary Table S2.
Subsequently, we employed SMOTE and ADASYN with RF and RPART to reassess the most influential features associated with MetS prediction within a now balanced dataset. Following this, with the data balancing techniques effectively applied and their parameters fine-tuned, we extract feature subsets by utilizing RPART and RF for both women and men. Extracting features related to MetS in a balanced dataset improves model generalization (conducting training more evenly and accurately), optimizing performance, and reducing overfitting. Considering the challenges associated with including all variables in a model, such as noise, redundancy, and overfitting, we extracted the 17 variables with the highest values obtained in each model of RF and RPART after applying SMOTE and ADASYN.
The extracted feature subsets, along with their respective values, are presented in Table 3, Table 4, Table 5 and Table 6. These tables also detail the employed balancing technique for each set of variables and their corresponding parameters ranging from 1 to 5. Each subset was adjusted for its corresponding parameter, B for ADASYN and k for SMOTE, considering values of 1 and 5.
Similarly, Table 7 showcases the performance achieved by the RF algorithm, while Table 8 presents the performance of the RPART algorithm. In both tables, the Value column provides information regarding the relative importance of each feature.

4.1. Best Features for Men Using RF and ADASYN/SMOTE

Specifically, Table 3 exhibits four feature subsets obtained from male data using random forest with ADASYN and SMOTE. According to Table 7, the most effective subset was obtained by applying ADASYN with B = 1 with a balanced accuracy of 86.22% and a deviation standard of 0.26%.
The most influential factor within this subset was BMI, which had a significant importance value of 92.9499. This was followed by WEIGHT and energy efficiency (ENER_AD), with importance values of 49.4782 and 48.8887, respectively. Other factors such as educational lag (EDULAG), common-law marriage (LIV_TOG), durable goods (DURAB), and maternal gout history (MOTHERGT) also contributed to the model, albeit to a lesser extent.

4.2. Best Features for Men Using RPART and ADASYN/SMOTE

In the case of features obtained by RPART (see Table 4), using both SMOTE and ADASYN, the results were slightly worse than those obtained with RF (Table 3). In this scenario, the best subset was achieved by the subset with the parameter ADASYN = 5, which achieved an 82.32% balanced accuracy metric with a standard deviation of 0.99% (see Table 8).
Switching gears to the outcomes yielded by random forest with ADASYN using a B value of 5, BMI takes center stage with a substantial value of 683.74, signifying its paramount role in predicting the outcomes related to the examined condition. Following closely in significance are ENERGY_AD and EDULAG, boasting values of 619.99 and 565.33, respectively, both making substantial contributions to predictive capability. ALCOHOL and WEIGHT also exhibit noteworthy importance with values of 355.97 and 295.25, underlining their relevance within the model. Moreover, features like divorce (DIVORC), no academic degree (NONE), and MOTHERGT, while exerting a comparatively lower influence, still contribute to the model’s predictive capacity, as indicated by their respective values.

4.3. Best Features for Women Using RF and ADASYN/SMOTE

The random forest model using SMOTE with k = 5 achieved the best performance for women, reaching an 88.50% accuracy with a standard deviation of 0.40% (see Table 7). In this case, Table 5 reveals that BMI was identified as the primary predictor, with a notable value of 484.31, clearly highlighting the critical importance of BMI in predicting MetS in this particular context. Additionally, IAT (481.48) and WEIGHT (339.17) also showed significant associations, further emphasizing the relevance of weight-related measurements.
Including sleep disturbances (SLPSNR1, SLPSOB1, BREATH, DROWSY, and SLPNOTQ) and even cholesterol levels (CHOL_ANT) among the influential variables underscores their pivotal contributions to MetS prediction in women. The importance of AGE and SDI parameters like sanitary adequacy (SANITRY) is also noteworthy. It is essential to highlight that psychological factors such as trait anxiety (TRAIT_ANX) were included, accounting for the potential influence of mental health aspects in MetS prediction.

4.4. Best Features for Women Using RPART and ADASYN/SMOTE

In this instance, SMOTE with k = 5, combined with RPART, achieved the best performance, attaining a balanced accuracy of 84.49% with a standard deviation of 1.43% (see Table 8). The results of the corresponding subset (RPART applied to women’s data using SMOTE with a parameter value, k = 5 ) shown in Table 6 reveal that the most influential feature was IAT, with a value of 483.23, followed closely by BMI and WEIGHT, which have values of 410.37 and 409.78, respectively. Features like URIC, snores during sleep (SLPSNR1), somnolence (SLPS3), SODIUM, vitamin E consumption (VITE), and habitual smoking (SMOKING) also exhibit noticeable influence, indicating their relevance in understanding the targeted phenomenon. Conversely, some nutrients like sucrose (SUCR), maltose (MALT), and FRUCT have relatively lower values; however, they can provide valuable information about dietary habits, nutritional deficiencies, or behaviors related to MetS.
This study’s results, employing random forest and RPART algorithms and SMOTE and ADASYN techniques for both genders, offer valuable insights. These results underscore the importance of health and lifestyle elements in MetS prediction, encompassing sleep disturbances, cholesterol levels, age, psychological factors, and SDI parameters.

4.5. Analyzing the Best Features Using PCA

Based on the results of the features obtained in the best models, we used PCA to visually and graphically analyze the top features for men and women to explore potential correlations and latent patterns among these influential factors and reduce dimensionality to the greatest possible extent.
In the case of men, we considered feature subsets obtained from the random forest model using ADASYN with B = 1 and RPART with ADASYN and B = 5. The subsequent features were integrated: BMI, WEIGHT, ENER_AD, EDULAG, LIV_TOG, DURAB, MOTHERGT, IAT, HEALTHAC, DIVORC, QUA_HOUS, STRATUM, FATHERGT, NONE, MARRIED, VALUE, URIC, SANITRY, SINGLE, and ALCOHOL.
For women, we considered feature subsets obtained from the random forest model with SMOTE and k = 5 and the RPART model with SMOTE and k = 5 . These models are regarded because they achieved the highest performance (see Table 7 and Table 8 where extremely small percentage uncertainty values in Table 8 are shown rounded down to 0.00 for clearer presentation). The following features were included: BMI, IAT, WEIGHT, URIC, SLPSNR1, CHOL_ANT, AGE, SLPSOB1, BREATH, TRAIT_ANX, SMO_PASS, SANITRY, MOTHERDL, DROWSY, SMOKING, SINGLE, EXSMOKER, SEC_SCHOOL, SLPNOTQ, SLPS3, SODIUM, ALCOHOL, SATFAT, MONFAT, NA, VITE, FATHERDB, SUCR, MARRIED, FRUCT, ZN, and MALT.
The PCA analysis, as shown in Figure 3, revealed the relative importance of features concerning MetS in men. The first principal component (PC1) was more influenced by features such as WEIGHT, BMI, and SDI by value (VALUE), suggesting that these variables significantly contributed to the observed variability in the data. On the other hand, the second principal component (PC2) was more affected by features like EDULAG and socioeconomic stratum (STRATUM). These findings indicated that weight and BMI were prominent factors in the context of MetS, as well as education and socioeconomic stratum. In this case, PC1 was considered the most significant component, as it had a magnitude of 0.508501, capturing most of the variability, while PC2 had a magnitude of 0.499809.
On the other hand, in the case of women (see Figure 4), features associated with the variability of MetS along PC1 were sodium levels based on the dietary survey (SODIUM), saturated fat (SATFAT), and monosaturated fat (MONFAT), which exhibit significant magnitudes in PC1. Furthermore, BMI significantly influences PC1, indicating its association with this variability. Conversely, variables like short sleep duration (SLPSOB1) and waking up with shortness of breath (BREATH) demonstrate significant magnitudes in PC2. Similarly, TRAIT_ANX and feeling drowsy or sleepy (DROWSY) are associated with variability in PC2. Therefore, considering the magnitudes in the principal components, the features in women associated with the risk of MetS include SODIUM, SATFAT, and MONFAT from PC1, as well as SLPNOTQ and SLPSOB1 from PC2.

5. Discussion

MetS is a severe and potentially life-threatening condition that significantly increases the risk of developing cardiovascular diseases and also increases the severity of diabetes. Over the years, several consistently highlighted risk factors have been associated with MetS. Considering imbalanced data, this study analyzed participant data from a cohort to identify the primary risk factors in both men and women. Subsequently, data balancing techniques were applied to ascertain whether significant differences exist, contributing to selecting risk factors for MetS prediction. Using data balancing techniques is crucial in this context, as it helps ensure a more accurate and unbiased identification of relevant risk factors, especially when working with unevenly distributed data. In this study, we applied logistic regression to identify the risk factors in men and women that predict the occurrence of MetS within an imbalanced data environment.

5.1. Logistic Regression

The logistic regression analysis in women demonstrates (as expected, of course) the strong connection between MetS and elevated glucose levels, which is in line with prior research [50,51], emphasizing the crucial role of glucose in MetS. Additionally, uric acid is also identified as a significant risk factor in women [52,53,54]. Subsequent findings revealed other risk factors, including waist circumference, BMI, and systolic blood pressure, which are all essential components of MetS. WC is an indicator of abdominal obesity closely linked to insulin resistance, while BMI reflects the relationship between weight and height, which is a significant obesity-related risk factor for MetS.
Furthermore, Figure 2 highlights additional significant factors derived from dietary data, including the intake of protein and fructose [55,56,57]. When these two nutrients are combined, they have been linked to an elevated risk of MetS [58]. Likewise, copper consumption is evident, which can impact glucose regulation [3] and liver function, which are both crucial components in MetS [59]. These factors underscore the importance of moderate consumption of these nutrients in preventing MetS.
In the case of men, glucose was identified as the primary factor associated with MetS, followed by triglycerides, waist circumference, atherogenic index, and systolic blood pressure. Additionally, the consumption of lactose [60] and carbohydrates [61] was noted among the nutrients. Elevated glucose, triglycerides, and waist circumference are critical markers of MetS, while the atherogenic index assesses cardiovascular risk. High systolic blood pressure is another significant component of this syndrome. Regarding lactose, it is worth noting that certain dairy products may include added sugars, which can potentially increase the overall calorie intake [62]. This potentially contributes to obesity and insulin resistance, which are two critical factors in the onset of MetS. Moreover, high lactose consumption is associated with a risk factor for developing diabetes, cardiovascular diseases, and increased cholesterol levels [63,64].
It is possible that when working with unbalanced datasets, machine learning models like logistic regression tend to be biased towards the majority class. For this reason, data balancing techniques such as SMOTE and ADASYN were used to enable a more equitable training of the models to identify more precise relationships between variables and the MetS.

5.2. Use of Machine Learning with Synthetic Data

The most effective machine learning models for women revealed associations with attributes related to sleep quality, such as snores during sleep [65], short sleep duration (SLPSOB1) [66], waking up with shortness of breath (BREATH) [67], restless sleep (SLPNOTQ) [68], and somnolence (SLPS3). Multiple studies have shown that poor sleep quality is closely linked to cardiovascular disease [69,70], diabetes [71], and MetS [72], as well as other adverse health outcomes. In the case of women, an increased likelihood of facing significant risks related to cardiovascular diseases and sleep problems has been observed, especially for those in the postmenopausal stage, which, in turn, can contribute to the development of risks associated with MetS [73]. Additionally, they highlighted factors related to anxiety (TRAIT_ANX), despite the association between MetS and anxiety remaining a subject of debate due to various issues [74], this study, like some others [75,76,77,78], identified anxiety as one of the critical factors that predisposing women to MetS.
In the same way, ex-smokers and current smokers (EXSMOKER, SMOKING) were found to be relevant features; based on this, it has been observed that both smokers and former smokers are predisposed to MetS. This finding is supported by various studies that suggest that smoking can have an adverse impact on blood lipid levels and lead to metabolic disturbances [79,80,81].
In women, nutritional components also appeared as relevant features, such as SATFAT, MONFAT, SUCR, FRUCT, and MALT. Based on this, a study has revealed that fructose, sucrose, and maltose are critical components of the leading nutrient pattern associated with a higher risk of MetS [58].
In the case of men, the most effective machine learning models displayed more pronounced associations with features linked to the SDI, encompassing ENER_AD, EDULAG, durable goods (DURAB, HEALTHHAC), quality and living space (QUA_HOUS), socioeconomic stratum (STRATUM), social development index by value (VALUE), and sanitary adequacy (SANITRY). In studies [22,82,83,84], a significant association has been observed between a low socioeconomic level and the prevalence of metabolic syndrome. Furthermore, these models underscored variables related to parental gout conditions (MOTHERGT, FATHERGT). This supports research exploring the genetic predisposition to gout and suggests that a family history of this disease may increase the risk of other family members developing it [85]. This condition may also be related to metabolic syndrome due to poor dietary habits that could lead to obesity and insulin resistance [86,87].

5.3. Principal Component Analysis

Based on the resulting features obtained for men and women via machine learning models, we applied principal component analysis to identify trends and potential correlations. The PCA conducted using the features obtained for men (Figure 3) showed that PC1 (the most significant component) revealed a strong association of body-related factors, specifically WEIGHT, and BMI. PC2 shows a strong correlation among variables related to the SDI. This indicates that the SDI plays a significant role in the onset of MetS, in addition to focusing on interventions related to weight and obesity management.
Figure 3 depicts the distribution of participants in clusters, where Cluster 1, highlighted in green, turned out to be the cluster most predisposed to developing MetS. The arrows emphasize the contribution of individual features to the principal components.
In the context of MetS in women, the most influential factors in PC1 were factors related to dietary components such as sodium levels based on the dietary survey (SODIUM), SATFAT, and monounsaturated fats (MONFAT), sucrose (SUCR), and FRUCT, among others. PC2 exhibits a trend towards variables related to poor quality of sleep and anxiety, as SLPSOB1, TRAIT_ANX, SLPNOTQ, and SLPS3 have significant values in this component. Other variables related to smoking and education (SEC_SCHOOL) also notably influence this component. This suggests that dietary control is crucial in preventing MetS among women, as well as addressing poor sleep quality and anxiety. Hence, PCA highlights relevant differences in the presentation and risk factors of MetS between men and women [88,89], which is an issue that is progressively gaining relevance in the biomedical literature [90].
The PCA results for women illustrated in Figure 4 show the distribution of participants in clusters. Similarly to the men’s analysis, the cluster most predisposed to developing MetS was Cluster 1, which is depicted by yellow dots.

5.4. Implications for Metabolic Syndrome Surveillance, Risk Factors, and Public Health Policy

The results of this project suggest several key findings related to the diagnosis of metabolic syndrome:
  • Identification of known risk factors: For both men and women, specific variables were identified as strongly related to MetS. These included glucose (GLU), triglycerides (TRIG), waist circumference (WC), body mass index (BMI), and systolic blood pressure (SBP), among others. Notably, these variables are consistent with established criteria for diagnosing MetS, reflecting their importance in understanding the condition.
  • Gender-Specific Influential Factors: This study highlights that certain factors vary in importance between men and women in predicting MetS. For instance, vitamin B12, lactose, and carbohydrates were influential in men, while total proteins, fructose, and copper were significant for women. These gender-specific variations underscore the complexity of MetS and the need for tailored diagnostic approaches. One cautionary note regarding potential outliers, specifically blood glucose and triglycerides, emphasizes their close association with the definition of MetS.
  • Influence of Sleep and Dietary Habits: The inclusion of sleep-related variables (sleep disturbances, breathing issues) and dietary elements (cholesterol levels, nutrients) underscores their relevance in predicting MetS. These findings suggest that lifestyle factors and dietary habits are integral components in the diagnostic considerations for MetS.
  • Potential Role of Psychological Factors: Psychological factors such as trait anxiety were included in the analysis, emphasizing the potential influence of mental health aspects in predicting MetS for both men and women.
  • Gender-Specific Dietary Influences: For women, the analysis identified specific dietary factors like sodium levels, saturated fat, and monounsaturated fat as influential. This emphasizes the importance of considering gender-specific dietary influences in MetS diagnosis.
Understanding the gender-specific variations and influential factors highlighted in this study can inform targeted interventions that address the unique needs of both men and women. Public health policies can be crafted to recognize and address the gender-specific variations in the risk factors for MetS. By tailoring interventions to the specific needs of each gender, policymakers can enhance the effectiveness of preventive measures. Moreover, this study underscores the importance of lifestyle factors, including sleep patterns and dietary habits, in predicting MetS. Public health initiatives can thus prioritize educational campaigns and interventions promoting healthier sleep practices and balanced diets. Encouraging regular physical activity, reducing sedentary behaviors, and emphasizing the significance of maintaining a healthy weight can be integral components of public health programs aimed at preventing MetS.
Given the inclusion of psychological factors such as trait anxiety in the analysis, public health policies can integrate mental health considerations into MetS prevention strategies. Mental health awareness campaigns, stress management programs, and access to mental health resources can contribute to holistic approaches addressing the interconnectedness of mental and physical well-being. Public health campaigns can leverage the study’s findings to engage communities and raise awareness about the risk factors associated with MetS. Community-based initiatives can offer educational resources, workshops, and screenings to empower individuals to make informed lifestyle choices. By fostering a culture of health consciousness and providing accessible information, public health policies can contribute to the early detection and prevention of MetS. In view of the evolving nature of health trends and behaviors, public health policies should include mechanisms for continuous monitoring and adaptation. Regular assessments of the population’s health status, behavior patterns, and response to interventions can inform policy adjustments. This dynamic approach ensures that public health strategies remain effective and responsive to changing circumstances.

5.5. The Role of Social Development Dimensions in Metabolic Syndrome

It is worthwhile to recall that after applying balancing techniques, relevant associations arise between metabolic syndrome and some SDI dimensions (see Figure 5). These effects are moderate-to-medium-sized yet statistically significant. Indeed, since some of these aspects may be modifiable by public policy, it is relevant to consider them. Metabolic syndrome has been previously reported to be related to social dimensions and inequality, but also to dietary patterns [91,92]. Interestingly Soofu and coworkers [91] also report the effect that we found of an association of MetS to housing conditions and ownership of durable assets. Inadequate housing conditions, in particular, have been discussed to contribute to an increase in the risk of cardiovascular disease [93]. In fact, local residential environments may constitute significant risk factors for MetS, which is a fact that needs to be considered in order to develop environmental interventions to improve population health [94].
Restricted access to education (referred to as EDULAG in Figure 5) has also been considered a relevant feature related to MetS [95,96]. Indeed, education levels have been found to be among the best predictors of metabolic conditions in another Mexico City cohort [97]. A similar association has been reported with regards to housing (QUA_HOUS in Figure 5) [98,99]. A study in an urban Korean population found that non-apartment residents were more likely to have MetS and related phenotypes compared to apartment residents in a model that was adjusted for confounding variables such as sociodemographic characteristics, residence area, health behavior, and nutritional information awareness [93]. Sanitary conditions are known to modify both environmental conditions and even intrinsic factors such as the gut microbiota, affecting the development of MetS [100,101,102]. All of these dimensions of social development are related in a non-trivial fashion to the development of the complex pathophenotypes making up metabolic syndrome as is further evidenced by our study. However, the actual relationships between these and other risk factors remain to be investigated as open questions that must be studied in order to design targeted public health interventions.

6. Conclusions

In this study, logistic regression was initially utilized to identify pivotal factors linked to MetS across genders, followed by dataset balancing techniques. Our findings indicated significant variables for men, including high glucose levels, triglycerides, waist circumference, systolic blood pressure, vitamin B12, body mass index, high intake of carbohydrates, and lactose. For women, critical factors were glucose levels, triglycerides, waist circumference, body mass index, systolic blood pressure, total protein intake, fructose, cholesterol, uric acid, and copper levels. Further analysis employing SMOTE and ADASYN with RF and RPART methods re-evaluated critical features for MetS prediction in a balanced dataset. This improved model generalization by ensuring more consistent and precise training, enhancing performance, and minimizing overfitting risks. Notably, the analysis also highlighted the relevance of family history of gout as a significant factor, particularly among men. This finding underscores the potential genetic predisposition to gout, suggesting that a familial history of the condition might increase the likelihood of MetS in relatives, possibly due to shared dietary habits contributing to obesity and insulin resistance. These insights emphasize the need for gender-specific public health strategies and medical interventions, considering both the common risk factors and those unique to each gender, such as the family history of gout, to effectively manage and prevent MetS.

Limitations

The current study has some limitations. This research was based on data from a cohort of relatively healthy adult residents of Mexico City. The regional emphasis of the study might affect generalizability; therefore, it is advisable to exercise caution when extrapolating the findings to wider populations. All data on socioeconomic status, lifestyle habits, family medical history, and macro- and micronutrient intake were self-reported. Although we trust the veracity of the information, some details may have been omitted or not remembered by the participants. Likewise, the instruments applied to evaluate physical activity, state of anxiety, and sleep quality are practical and easy to apply, but their effectiveness also depends on the truthfulness of the informants. Another limitation is our reliance on SDI data published by the Government of Mexico City, requiring trust in the data quality from this secondary source. Also, it is crucial to note that the cross-sectional design hinders causal inference, underscoring the need for future longitudinal investigations. Nevertheless, we were able to provide a comprehensive overview of the associations between metabolic syndrome, sleep disorders, the consumption of some nutrients, and contextual social development data such as quality and available space in the home, educational access, access to social security and/or medical services, durable goods access, sanitary adequacy, and electricity access. Moreover, as data balancing techniques continue to evolve, a variety of methods are emerging. However, in this study, we addressed only two of the most frequently used methods, ADASYN and SMOTE. It is important to highlight that we conducted only internal validation for our methods, emphasizing the necessity for external validation in larger populations in future studies.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/nu16050612/s1.

Author Contributions

G.G.-E.: Conceptualization, Data Curation, Investigation, Software, Validation, Writing—original draft, and Writing—review and editing; M.M.-G.: Investigation, Supervision, Data Curation, and Writing—review and editing; T.R.-d.: Investigation, Software, Data Curation, and Writing—review and editing; L.E.G.-M.: Data Curation and Writing—review and editing; M.F.M.: Investigation, Project Administration, and Writing—review and editing; T.P.: Investigation, Project Administration, and Writing—review and editing; L.M.A.-G.: Investigation, Project Administration, Supervision, and Writing—review and editing; E.H.-L.: Conceptualization, Formal Analysis, Methodology, Investigation, Funding acquisition, and Writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Council of Humanities, Sciences, and Technologies (CONAHCYT, México), Cátedras CONAHCYT 1591, and by Intramural funds from the National Institute of Genomic Medicine (INMEGEN, México).

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, and approved by the Institutional Ethics Committee of the National Institute of Cardiology Ignacio Chavez under code 13-802 (approval date 19 February 2013) for studies involving humans.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

All relevant data are contained within the article. The original contributions presented in the study are included in the article/supplementary material; further inquiries can be directed to the corresponding authors.

Acknowledgments

The authors gratefully acknowledge Maite Vallejo and Tlalpan 2020 project advisory group for their logistic support in this work.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Raffaitin, C.; Feart, C.; Le Goff, M.; Amieva, H.; Helmer, C.; Akbaraly, T.; Tzourio, C.; Gin, H.; Barberger-Gateau, P. Metabolic syndrome and cognitive decline in French elders: The Three-City Study. Neurology 2011, 76, 518–525. [Google Scholar] [CrossRef] [PubMed]
  2. Lin, S.C.; Sun, C.A.; You, S.L.; Hwang, L.C.; Liang, C.Y.; Yang, T.; Bai, C.H.; Chen, C.H.; Wei, C.Y.; Chou, Y.C. The link of self-reported insomnia symptoms and sleep duration with metabolic syndrome: A Chinese population-based study. Sleep 2016, 39, 1261–1266. [Google Scholar] [CrossRef] [PubMed]
  3. Zhang, Y.; Jiang, X.; Liu, J.; Lang, Y.; Liu, Y. The association between insomnia and the risk of metabolic syndrome: A systematic review and meta-analysis. J. Clin. Neurosci. 2021, 89, 430–436. [Google Scholar] [CrossRef]
  4. Romero-Martínez, M.; Shamah-Levy, T.; Cuevas-Nasu, L.; Gómez-Humarán, I.M.; Gaona-Pineda, E.B.; Gómez-Acosta, L.M.; Rivera-Dommarco, J.Á.; Hernández-Ávila, M. Diseño metodológico de la encuesta nacional de salud y nutrición de medio camino 2016. Salud Pública México 2017, 59, 299–305. [Google Scholar] [CrossRef] [PubMed]
  5. Jiménez-Genchi, A.; Caraveo-Anduaga, J. Crude and adjusted prevalence of sleep complaints in Mexico City. Sleep Sci. 2017, 10, 113. [Google Scholar]
  6. Stewart, A.L.; Ware, J.E. Measuring Functioning and Well-Being: The Medical Outcomes Study Approach; Duke University Press: Durham, NC, USA, 1992. [Google Scholar]
  7. Kim, M.K.; You, J.A.; Lee, J.H.; Lee, S.A. The reliability and validity of the Korean version of the Medical Outcomes Study-Sleep Scale in patients with obstructive sleep apnea. Sleep Med. Res. 2011, 2, 89–95. [Google Scholar] [CrossRef]
  8. Akçay, B.D.; Akcay, D.; Yetkin, S. Turkish reliability and validity study of the medical outcomes study (MOS) sleep scale in patients with obstructive sleep apnea. Turk. J. Med. Sci. 2021, 51, 268–279. [Google Scholar] [CrossRef]
  9. Zagalaz-Anula, N.; Hita-Contreras, F.; Martínez-Amat, A.; Cruz-Díaz, D.; Lomas-Vega, R. Psychometric properties of the medical outcomes study sleep scale in Spanish postmenopausal women. Menopause 2017, 24, 824–831. [Google Scholar] [CrossRef]
  10. Kern, H.J.; Mitmesser, S.H. Role of nutrients in metabolic syndrome: A 2017 update. Nutr. Diet. Suppl. 2018, 2018, 13–26. [Google Scholar] [CrossRef]
  11. Symonds, M.E.; Sebert, S.P.; Hyatt, M.A.; Budge, H. Nutritional programming of the metabolic syndrome. Nat. Rev. Endocrinol. 2009, 5, 604–610. [Google Scholar] [CrossRef]
  12. Feldeisen, S.E.; Tucker, K.L. Nutritional strategies in the prevention and treatment of metabolic syndrome. Appl. Physiol. Nutr. Metab. 2007, 32, 46–60. [Google Scholar] [CrossRef] [PubMed]
  13. García-García, F.J.; Monistrol-Mula, A.; Cardellach, F.; Garrabou, G. Nutrition, bioenergetics, and metabolic syndrome. Nutrients 2020, 12, 2785. [Google Scholar] [CrossRef] [PubMed]
  14. Castro-Barquero, S.; Ruiz-León, A.M.; Sierra-Pérez, M.; Estruch, R.; Casas, R. Dietary strategies for metabolic syndrome: A comprehensive review. Nutrients 2020, 12, 2983. [Google Scholar] [CrossRef] [PubMed]
  15. Tamura, Y.; Omura, T.; Toyoshima, K.; Araki, A. Nutrition management in older adults with diabetes: A review on the importance of shifting prevention strategies from metabolic syndrome to frailty. Nutrients 2020, 12, 3367. [Google Scholar] [CrossRef] [PubMed]
  16. Tørris, C.; Småstuen, M.C.; Molin, M. Nutrients in fish and possible associations with cardiovascular disease risk factors in metabolic syndrome. Nutrients 2018, 10, 952. [Google Scholar] [CrossRef]
  17. Jung, H.; Dan, H.; Pang, Y.; Kim, B.; Jeong, H.; Lee, J.E.; Kim, O. Association between dietary habits, shift work, and the metabolic syndrome: The Korea nurses’ health study. Int. J. Environ. Res. Public Health 2020, 17, 7697. [Google Scholar] [CrossRef]
  18. Al-Daghri, N.M.; Khan, N.; Alkharfy, K.M.; Al-Attas, O.S.; Alokail, M.S.; Alfawaz, H.A.; Alothman, A.; Vanhoutte, P.M. Selected dietary nutrients and the prevalence of metabolic syndrome in adult males and females in Saudi Arabia: A pilot study. Nutrients 2013, 5, 4587–4604. [Google Scholar] [CrossRef]
  19. Bian, S.; Gao, Y.; Zhang, M.; Wang, X.; Liu, W.; Zhang, D.; Huang, G. Dietary nutrient intake and metabolic syndrome risk in Chinese adults: A case–control study. Nutr. J. 2013, 12, 106. [Google Scholar] [CrossRef]
  20. Wang, J.; Li, C.; Li, J.; Qin, S.; Liu, C.; Wang, J.; Chen, Z.; Wu, J.; Wang, G. Development and internal validation of risk prediction model of metabolic syndrome in oil workers. BMC Public Health 2020, 20, 1828. [Google Scholar] [CrossRef]
  21. EVALUA-DF. Metodología para la construcción del Índice de Desarrollo Social de Unidades territoriales del Distrito Federal. In Índice del Desarrollo Social de las Unidades Territoriales del Distrito Federal, Delegación, Colonia y Manzana 2011, 1st ed.; Consejo de Evaluación del Desarrollo Social del Distrito Federal. Gobierno del Distrito Federal: Ciudad de México, Mexico, 2011; Volume 1, pp. 14–43. [Google Scholar]
  22. Abbate, M.; Pericas, J.; Yañez, A.M.; López-González, A.A.; De Pedro-Gómez, J.; Aguilo, A.; Morales-Asencio, J.M.; Bennasar-Veny, M. Socioeconomic inequalities in metabolic syndrome by age and gender in a Spanish working population. Int. J. Environ. Res. Public Health 2021, 18, 10333. [Google Scholar] [CrossRef] [PubMed]
  23. Blanquet, M.; Legrand, A.; Pélissier, A.; Mourgues, C. Socio-economics status and metabolic syndrome: A meta-analysis. Diabetes Metab. Syndr. Clin. Res. Rev. 2019, 13, 1805–1812. [Google Scholar] [CrossRef]
  24. Mencar, C.; Gallo, C.; Mantero, M.; Tarsia, P.; Carpagnano, G.E.; Foschino Barbaro, M.P.; Lacedonia, D. Application of machine learning to predict obstructive sleep apnea syndrome severity. Health Inform. J. 2020, 26, 298–317. [Google Scholar] [CrossRef]
  25. Keshavarz, Z.; Rezaee, R.; Nasiri, M.; Pournik, O. Obstructive sleep apnea: A prediction model using supervised machine learning method. Importance Health Inform. Public Health Pandemic 2020, 272, 387–390. [Google Scholar]
  26. Eyvazlou, M.; Hosseinpouri, M.; Mokarami, H.; Gharibi, V.; Jahangiri, M.; Cousins, R.; Nikbakht, H.A.; Barkhordari, A. Prediction of metabolic syndrome based on sleep and work-related risk factors using an artificial neural network. BMC Endocr. Disord. 2020, 20, 169. [Google Scholar] [CrossRef] [PubMed]
  27. Seligman, B.; Tuljapurkar, S.; Rehkopf, D. Machine learning approaches to the social determinants of health in the health and retirement study. SSM-Popul. Health 2018, 4, 95–99. [Google Scholar] [CrossRef] [PubMed]
  28. Baqui, P.; Marra, V.; Alaa, A.M.; Bica, I.; Ercole, A.; van der Schaar, M. Comparing COVID-19 risk factors in Brazil using machine learning: The importance of socioeconomic, demographic and structural factors. Sci. Rep. 2021, 11, 15591. [Google Scholar] [CrossRef] [PubMed]
  29. Kim, J.; Mun, S.; Lee, S.; Jeong, K.; Baek, Y. Prediction of metabolic and pre-metabolic syndromes using machine learning models with anthropometric, lifestyle, and biochemical factors from a middle-aged population in Korea. BMC Public Health 2022, 22, 664. [Google Scholar] [CrossRef] [PubMed]
  30. Colín-Ramírez, E.; Rivera-Mancía, S.; Infante-Vázquez, O.; Cartas-Rosado, R.; Vargas-Barrón, J.; Madero, M.; Vallejo, M. Protocol for a prospective longitudinal study of risk factors for hypertension incidence in a Mexico City population: The Tlalpan 2020 cohort. BMJ Open 2017, 7, e016773. [Google Scholar] [CrossRef]
  31. Wolfe, F.; Michaud, K.; Li, T. Sleep disturbance in patients with rheumatoid arthritis: Evaluation by medical outcomes study and visual analog sleep scales. J. Rheumatol. 2006, 33, 1942–1951. [Google Scholar] [PubMed]
  32. Chobanian, A.V.; Bakris, G.L.; Black, H.R.; Cushman, W.C.; Green, L.A.; Izzo, J.L., Jr.; Jones, D.W.; Materson, B.J.; Oparil, S.; Wright, J.T., Jr.; et al. Seventh report of the joint national committee on prevention, detection, evaluation, and treatment of high blood pressure. Hypertension 2003, 42, 1206–1252. [Google Scholar] [CrossRef] [PubMed]
  33. Marfell-Jones, M.J.; Stewart, A.; De Ridder, J. International Standards for Anthropometric Assessment; International Society for the Advancement of Kinanthropometry: Wellington, New Zealand, 2012. [Google Scholar]
  34. Cerón Vargas, J.A.; Raccanello, K. Índice de desarrollo social de la Ciudad de México como herramienta de focalización de la política social. Retos de la Dirección 2018, 12, 64–86. [Google Scholar]
  35. Martínez-García, M.; Rodríguez-Hernández, A.P.; Gutiérrez-Esparza, G.O.; Castrejón-Pérez, R.C.; Hernández-Lemus, E.; Borges-Yáñez, S.A. Relationship between the Social Development Index and Self-Reported Periodontal Conditions. Healthcare 2023, 11, 1548. [Google Scholar] [CrossRef]
  36. Craig, C.L.; Marshall, A.L.; Sjöström, M.; Bauman, A.E.; Booth, M.L.; Ainsworth, B.E.; Pratt, M.; Ekelund, U.; Yngve, A.; Sallis, J.F.; et al. International physical activity questionnaire: 12-country reliability and validity. Med. Sci. Sport. Exerc. 2003, 35, 1381–1395. [Google Scholar] [CrossRef] [PubMed]
  37. Spielberger, C.D.; Smith, L.H. Anxiety (drive), stress, and serial-position effects in serial-verbal learning. J. Exp. Psychol. 1966, 72, 589. [Google Scholar] [CrossRef]
  38. Horváth, A.; Montana, X.; Lanquart, J.P.; Hubain, P.; Szűcs, A.; Linkowski, P.; Loas, G. Effects of state and trait anxiety on sleep structure: A polysomnographic study in 1083 subjects. Psychiatry Res. 2016, 244, 279–283. [Google Scholar] [CrossRef]
  39. Hernández-Avila, J.; González-Avilés, L.; Rosales-Mendoza, E. Manual de Usuario. SNUT Sistema de Evaluación de Hábitos Nutricionales y Consumo de Nutrimentos; Instituto Nacional de Salud Pública: Ciudad de Mexico, Mexico, 2003. [Google Scholar]
  40. Spritzer, K.; Hays, R. MOS Sleep Scale: A Manual for Use and Scoring, Version 1.0; RAND: Los Angeles, CA, USA, 2003; pp. 1–8. [Google Scholar]
  41. Avcı, C.; Akbaş, A. Sleep apnea classification based on respiration signals by using ensemble methods. Bio-Med. Mater. Eng. 2015, 26, S1703–S1710. [Google Scholar] [CrossRef] [PubMed]
  42. Shi, Y.; Ma, L.; Chen, X.; Li, W.; Feng, Y.; Zhang, Y.; Cao, Z.; Yuan, Y.; Xie, Y.; Liu, H.; et al. Prediction model of obstructive sleep apnea–related hypertension: Machine learning–based development and interpretation study. Front. Cardiovasc. Med. 2022, 9, 1042996. [Google Scholar] [CrossRef]
  43. Xia, S.J.; Gao, B.Z.; Wang, S.H.; Guttery, D.S.; Li, C.D.; Zhang, Y.D. Modeling of diagnosis for metabolic syndrome by integrating symptoms into physiochemical indexes. Biomed. Pharmacother. 2021, 137, 111367. [Google Scholar] [CrossRef] [PubMed]
  44. Al-Jedaani, A.W.; Aziz, W.; Alshdadi, A.A.; Alqarni, M.; Nadeem, M.S.A.; Wailoo, M.P.; Schlindwein, F.S. An Intelligent System Based on Heart Rate Variability Measures and Machine Learning Techniques for Classification of Normal and Growth Restricted Children. In Proceedings of the WITS 2020: Proceedings of the 6th International Conference on Wireless Technologies, Embedded, and Intelligent Systems; Springer: Singapore, 2022; pp. 101–111. [Google Scholar]
  45. Worachartcheewan, A.; Schaduangrat, N.; Prachayasittikul, V.; Nantasenamat, C. Data mining for the identification of metabolic syndrome status. EXCLI J. 2018, 17, 72. [Google Scholar]
  46. Li, J.; Zhang, Y.; Lu, T.; Liang, R.; Wu, Z.; Liu, M.; Qin, L.; Chen, H.; Yan, X.; Deng, S.; et al. Identification of diagnostic genes for both Alzheimer’s disease and Metabolic syndrome by the machine learning algorithm. Front. Immunol. 2022, 13, 1037318. [Google Scholar] [CrossRef]
  47. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  48. Breiman, L.; Friedman, J.H.; Olshen, R.A.; Stone, C.J. Classification and Regression Trees; Wadsworth and Brooks: Monterey, CA, USA, 1984. [Google Scholar]
  49. R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2013. [Google Scholar]
  50. Bentley-Lewis, R.; Koruda, K.; Seely, E.W. The metabolic syndrome in women. Nat. Clin. Pract. Endocrinol. Metab. 2007, 3, 696–704. [Google Scholar] [CrossRef]
  51. Hakkarainen, H.; Huopio, H.; Cederberg, H.; Pääkkönen, M.; Voutilainen, R.; Heinonen, S. The risk of metabolic syndrome in women with previous GDM in a long-term follow-up. Gynecol. Endocrinol. 2016, 32, 920–925. [Google Scholar] [CrossRef] [PubMed]
  52. Kim, I.Y.; Han, K.D.; Kim, D.H.; Eun, Y.; Cha, H.S.; Koh, E.M.; Lee, J.; Kim, H. Women with metabolic syndrome and general obesity are at a higher risk for significant hyperuricemia compared to men. J. Clin. Med. 2019, 8, 837. [Google Scholar] [CrossRef] [PubMed]
  53. King, C.; Lanaspa, M.A.; Jensen, T.; Tolan, D.R.; Sánchez-Lozada, L.G.; Johnson, R.J. Uric acid as a cause of the metabolic syndrome. Uric Acid Chronic Kidney Dis. 2018, 192, 88–102. [Google Scholar]
  54. Copur, S.; Demiray, A.; Kanbay, M. Uric acid in metabolic syndrome: Does uric acid have a definitive role? Eur. J. Intern. Med. 2022, 103, 4–12. [Google Scholar] [CrossRef]
  55. Taskinen, M.R.; Packard, C.J.; Borén, J. Dietary fructose and the metabolic syndrome. Nutrients 2019, 11, 1987. [Google Scholar] [CrossRef] [PubMed]
  56. Mortera, R.R.; Bains, Y.; Gugliucci, A. Fructose at the crossroads of the metabolic syndrome and obesity epidemics. Front. Biosci. Landmark 2019, 24, 186–211. [Google Scholar]
  57. Badely, M.; Sepandi, M.; Samadi, M.; Parastouei, K.; Taghdir, M. The effect of whey protein on the components of metabolic syndrome in overweight and obese individuals; a systematic review and meta-analysis. Diabetes Metab. Syndr. Clin. Res. Rev. 2019, 13, 3121–3131. [Google Scholar] [CrossRef]
  58. Khayyatzadeh, S.S.; Moohebati, M.; Mazidi, M.; Avan, A.; Tayefi, M.; Parizadeh, S.M.R.; Ebrahimi, M.; Heidari-Bakavoli, A.; Azarpazhooh, M.R.; Esmaily, H.; et al. Nutrient patterns and their relationship to metabolic syndrome in Iranian adults. Eur. J. Clin. Investig. 2016, 46, 840–852. [Google Scholar] [CrossRef]
  59. Lu, C.W.; Lee, Y.C.; Kuo, C.S.; Chiang, C.H.; Chang, H.H.; Huang, K.C. Association of serum levels of zinc, copper, and iron with risk of metabolic syndrome. Nutrients 2021, 13, 548. [Google Scholar] [CrossRef]
  60. Mirenayat, F.S.; Hajhashemy, Z.; Siavash, M.; Saneei, P. Effects of sumac supplementation on metabolic markers in adults with metabolic syndrome: A triple-blinded randomized placebo-controlled cross-over clinical trial. Nutr. J. 2023, 22, 25. [Google Scholar] [CrossRef]
  61. Liu, Y.S.; Wu, Q.J.; Xia, Y.; Zhang, J.Y.; Jiang, Y.T.; Chang, Q.; Zhao, Y.H. Carbohydrate intake and risk of metabolic syndrome: A dose–response meta-analysis of observational studies. Nutr. Metab. Cardiovasc. Dis. 2019, 29, 1288–1298. [Google Scholar] [CrossRef] [PubMed]
  62. Song, W.O.; Wang, Y.; Chung, C.E.; Song, B.; Lee, W.; Chun, O.K. Is obesity development associated with dietary sugar intake in the US? Nutrition 2012, 28, 1137–1141. [Google Scholar] [CrossRef] [PubMed]
  63. Michaëlsson, K.; Wolk, A.; Langenskiöld, S.; Basu, S.; Lemming, E.W.; Melhus, H.; Byberg, L. Milk intake and risk of mortality and fractures in women and men: Cohort studies. BMJ 2014, 349, g6015. [Google Scholar] [CrossRef] [PubMed]
  64. Lanou, A.J. Should dairy be recommended as part of a healthy vegetarian diet? Counterpoint. Am. J. Clin. Nutr. 2009, 89, 1638S–1642S. [Google Scholar] [CrossRef] [PubMed]
  65. Ma, J.; Zhang, H.; Wang, H.; Gao, Q.; Sun, H.; He, S.; Meng, L.; Wang, T. Association between self-reported snoring and metabolic syndrome: A systematic review and meta-analysis. Front. Neurol. 2020, 11, 517120. [Google Scholar] [CrossRef]
  66. Xie, J.; Li, Y.; Zhang, Y.; Vgontzas, A.N.; Basta, M.; Chen, B.; Xu, C.; Tang, X. Sleep duration and metabolic syndrome: An updated systematic review and meta-analysis. Sleep Med. Rev. 2021, 59, 101451. [Google Scholar] [CrossRef] [PubMed]
  67. Chasens, E.R.; Imes, C.C.; Kariuki, J.K.; Luyster, F.S.; Morris, J.L.; DiNardo, M.M.; Godzik, C.M.; Jeon, B.; Yang, K. Sleep and metabolic syndrome. Nurs. Clin. 2021, 56, 203–217. [Google Scholar] [CrossRef]
  68. Lian, Y.; Yuan, Q.; Wang, G.; Tang, F. Association between sleep quality and metabolic syndrome: A systematic review and meta-analysis. Psychiatry Res. 2019, 274, 66–74. [Google Scholar] [CrossRef]
  69. Aziz, M.; Ali, S.S.; Das, S.; Younus, A.; Malik, R.; Latif, M.A.; Humayun, C.; Anugula, D.; Abbas, G.; Salami, J.; et al. Association of subjective and objective sleep duration as well as sleep quality with non-invasive markers of sub-clinical cardiovascular disease (CVD): A systematic review. J. Atheroscler. Thromb. 2017, 24, 208–226. [Google Scholar] [CrossRef]
  70. Wipper, B.; Winkelman, J.W. The long-term psychiatric and cardiovascular morbidity and mortality of restless legs syndrome and periodic limb movements of sleep. Sleep Med. Clin. 2021, 16, 279–288. [Google Scholar] [CrossRef]
  71. Chair, S.Y.; Wang, Q.; Cheng, H.Y.; Lo, S.W.S.; Li, X.M.; Wong, E.M.L.; Sit, J.W.H. Relationship between sleep quality and cardiovascular disease risk in Chinese post-menopausal women. BMC Women’s Health 2017, 17, 79. [Google Scholar] [CrossRef]
  72. Kang, K.W.; Kim, M.K.; Nam, T.S.; Kang, K.H.; Park, W.J.; Moon, H.S.; Oh, H.G.; Rhee, E.J.; Joo, E.Y. Association between Sleep and the Metabolic Syndrome Differs Depending on Age. J. Sleep Med. 2023, 20, 19–27. [Google Scholar] [CrossRef]
  73. Hery, C.M.B.; Hale, L.; Naughton, M.J. Contributions of the Women’s Health Initiative to understanding associations between sleep duration, insomnia symptoms, and sleep-disordered breathing across a range of health outcomes in postmenopausal women. Sleep Health 2020, 6, 48–59. [Google Scholar] [CrossRef] [PubMed]
  74. Ji, S.; Chen, Y.; Zhou, Y.; Cao, Y.; Li, X.; Ding, G.; Tang, F. Association between anxiety and metabolic syndrome: An updated systematic review and meta-analysis. Front. Psychiatry 2023, 14, 1118836. [Google Scholar] [CrossRef] [PubMed]
  75. Li, R.c.; Zhang, L.; Luo, H.; Lei, Y.; Zeng, L.; Zhu, J.; Tang, H. Subclinical hypothyroidism and anxiety may contribute to metabolic syndrome in Sichuan of China: A hospital-based population study. Sci. Rep. 2020, 10, 2261. [Google Scholar] [CrossRef] [PubMed]
  76. Peltzer, K.; Pengpid, S. Relationship between depression, generalized anxiety, and metabolic syndrome among Bhuddist temples population in Nakhon Pathom-Thailand. Iran J. Psychiatry Behav. Sci. 2018; in press. [Google Scholar] [CrossRef]
  77. Rioli, G.; Tassi, S.; Mattei, G.; Ferrari, S.; Galeazzi, G.M.; Mancini, S.; Alboni, S.; Roncucci, L. The association between symptoms of anxiety, depression, and cardiovascular risk factors: Results from an Italian cross-sectional study. J. Nerv. Ment. Dis. 2019, 207, 340–347. [Google Scholar] [CrossRef] [PubMed]
  78. Berto, L.F.; Suemoto, C.K.; Moreno, A.B.; Maria de Jesus, M.F.; Nunes, M.A.A.; Maria del Carmen, B.M.; Barreto, S.M.; Diniz, M.d.F.H.S.; Lotufo, P.A.; Benseñor, I.M.; et al. Increased prevalence of depression and anxiety among subjects with metabolic syndrome in the Brazilian longitudinal study of adult health (ELSA-Brasil). J. Acad.-Consult.-Liaison Psychiatry 2022, 63, 529–538. [Google Scholar] [CrossRef] [PubMed]
  79. Kim, S.W.; Kim, H.J.; Min, K.; Lee, H.; Lee, S.H.; Kim, S.; Kim, J.S.; Oh, B. The relationship between smoking cigarettes and metabolic syndrome: A cross-sectional study with non-single residents of Seoul under 40 years old. PLoS ONE 2021, 16, e0256257. [Google Scholar] [CrossRef]
  80. Behl, T.A.; Stamford, B.A.; Moffatt, R.J. The Effects of Smoking on the Diagnostic Characteristics of Metabolic Syndrome: A Review. Am. J. Lifestyle Med. 2023, 17, 397–412. [Google Scholar] [CrossRef]
  81. Youn, J.A.; Lee, Y.H.; Noh, M.S. Relationship between smoking duration and metabolic syndrome in Korean Former Smokers. J. Korean Soc. Res. Nicotine Tob. 2018, 9, 18–25. [Google Scholar] [CrossRef]
  82. Khambaty, T.; Schneiderman, N.; Llabre, M.M.; Elfassy, T.; Moncrieft, A.E.; Daviglus, M.; Talavera, G.A.; Isasi, C.R.; Gallo, L.C.; Reina, S.A.; et al. Elucidating the multidimensionality of socioeconomic status in relation to metabolic syndrome in the Hispanic Community Health Study/Study of Latinos (HCHS/SOL). Int. J. Behav. Med. 2020, 27, 188–199. [Google Scholar] [CrossRef] [PubMed]
  83. Iguacel, I.; Börnhorst, C.; Michels, N.; Breidenassel, C.; Dallongeville, J.; González-Gross, M.; Gottrand, F.; Kafatos, A.; Karaglani, E.; Kersting, M.; et al. Socioeconomically disadvantaged groups and metabolic syndrome in European adolescents: The HELENA study. J. Adolesc. Health 2021, 68, 146–154. [Google Scholar] [CrossRef] [PubMed]
  84. Atad, O.I.; Toker, S. Subjective workload and the metabolic syndrome: An exploration of the mediating role of burnout and the moderating effect of physical activity. Int. J. Stress Manag. 2023, 30, 95–107. [Google Scholar] [CrossRef]
  85. Dalbeth, N.; Stamp, L.K.; Merriman, T.R. The genetics of gout: Towards personalised medicine? BMC Med. 2017, 15, 108. [Google Scholar] [CrossRef]
  86. Wang, L.; Zhang, T.; Liu, Y.; Tang, F.; Xue, F. Association of serum uric acid with metabolic syndrome and its components: A mendelian randomization analysis. BioMed Res. Int. 2020, 2020, 6238693. [Google Scholar] [CrossRef]
  87. Kim, S.K. Interrelationship of uric acid, gout, and metabolic syndrome: Focus on hypertension, cardiovascular disease, and insulin resistance. J. Rheum. Dis. 2018, 25, 19–27. [Google Scholar] [CrossRef]
  88. Gerdts, E.; Regitz-Zagrosek, V. Sex differences in cardiometabolic disorders. Nat. Med. 2019, 25, 1657–1666. [Google Scholar] [CrossRef]
  89. Tramunt, B.; Smati, S.; Grandgeorge, N.; Lenfant, F.; Arnal, J.F.; Montagner, A.; Gourdy, P. Sex differences in metabolic regulation and diabetes susceptibility. Diabetologia 2020, 63, 453–461. [Google Scholar] [CrossRef] [PubMed]
  90. Faulkner, J.L.; Belin de Chantemèle, E.J. Sex hormones, aging and cardiometabolic syndrome. Biol. Sex Differ. 2019, 10, 1–9. [Google Scholar] [CrossRef]
  91. Soofi, M.; Najafi, F.; Soltani, S.; Karamimatin, B. Measurement and Decomposition of Socioeconomic Inequality in Metabolic Syndrome: A Cross-sectional Analysis of the RaNCD Cohort Study in the West of Iran. J. Prev. Med. Public Health 2023, 56, 50. [Google Scholar] [CrossRef] [PubMed]
  92. Gupta, R.; Sharma, K.K.; Gupta, B.K.; Gupta, A.; Saboo, B.; Maheshwari, A.; Mahanta, T.; Deedwania, P.C. Geographic epidemiology of cardiometabolic risk factors in middle class urban residents in India: Cross–sectional study. J. Glob. Health 2015, 5, 010411. [Google Scholar] [CrossRef] [PubMed]
  93. Lee, K. The relationship between housing types and metabolic and weight phenotypes: A nationwide survey. Metab. Syndr. Relat. Disord. 2019, 17, 129–136. [Google Scholar] [CrossRef]
  94. Baldock, K.; Paquet, C.; Howard, N.; Coffee, N.; Hugo, G.; Taylor, A.; Adams, R.; Daniel, M. Associations between resident perceptions of the local residential environment and metabolic syndrome. J. Environ. Public Health 2012, 2012, 589409. [Google Scholar] [CrossRef]
  95. Wamala, S.P.; Lynch, J.; Horsten, M.; Mittleman, M.A.; Schenck-Gustafsson, K.; Orth-Gomer, K. Education and the metabolic syndrome in women. Diabetes Care 1999, 22, 1999–2003. [Google Scholar] [CrossRef]
  96. Hoveling, L.A.; Lepe, A.; Boissonneault, M.; de Beer, J.A.; Smidt, N.; de Kroon, M.L.; Liefbroer, A.C. Educational inequalities in metabolic syndrome prevalence, timing, and duration amongst adults over the life course: A microsimulation analysis based on the lifelines cohort study. Int. J. Behav. Nutr. Phys. Act. 2023, 20, 104. [Google Scholar] [CrossRef] [PubMed]
  97. Stephens, C.R.; Easton, J.F.; Robles-Cabrera, A.; Fossion, R.; De la Cruz, L.; Martínez-Tapia, R.; Barajas-Martínez, A.; Hernández-Chávez, A.; López-Rivera, J.A.; Rivera, A.L. The impact of education and age on metabolic disorders. Front. Public Health 2020, 8, 180. [Google Scholar] [CrossRef]
  98. Braziene, A.; Tamsiunas, A.; Luksiene, D.; Radisauskas, R.; Andrusaityte, S.; Dedele, A.; Vencloviene, J. Association between the living environment and the risk of arterial hypertension and other components of metabolic syndrome. J. Public Health 2020, 42, e142–e149. [Google Scholar] [CrossRef] [PubMed]
  99. Tamashiro, K.L. Metabolic syndrome: Links to social stress and socioeconomic status. Ann. N. Y. Acad. Sci. 2011, 1231, 46–55. [Google Scholar] [CrossRef] [PubMed]
  100. Rekliti, M.; Sapountzi-Krepia, D. The epidemic of metabolic syndrome: Health promotion strategies. Int. J. Caring Sci. 2009, 2, 1–10. [Google Scholar]
  101. Santos, A.C.; Ebrahim, S.; Barros, H. Gender, socio-economic status and metabolic syndrome in middle-aged and old adults. BMC Public Health 2008, 8, 62. [Google Scholar] [CrossRef] [PubMed]
  102. Brizita, D.I.; Nevena, I.D. Precise Nutrition and Metabolic Syndrome, Remodeling the Microbiome with Polyphenols, Probiotics, and Postbiotics. In Advances in Precision Nutrition, Personalization and Healthy Aging; Springer: Cham, Switzerland, 2022; pp. 145–178. [Google Scholar]
Figure 1. Experimental process.
Figure 1. Experimental process.
Nutrients 16 00612 g001
Figure 2. The most important variables obtained through logistic regression for men and women before data balancing.
Figure 2. The most important variables obtained through logistic regression for men and women before data balancing.
Nutrients 16 00612 g002
Figure 3. PCA of features of men for metabolic syndrome with clusters.
Figure 3. PCA of features of men for metabolic syndrome with clusters.
Nutrients 16 00612 g003
Figure 4. PCA of features of women for metabolic syndrome with clusters.
Figure 4. PCA of features of women for metabolic syndrome with clusters.
Nutrients 16 00612 g004
Figure 5. Top features for men and women considering the results of RF and RPART applying balancing techniques.
Figure 5. Top features for men and women considering the results of RF and RPART applying balancing techniques.
Nutrients 16 00612 g005
Table 1. Dataset variables.
Table 1. Dataset variables.
Name VariableDescriptionType
AGEageContinuous
WEIGHTweightContinuous
HEIGHTheightContinuous
BMIbody mass indexContinuous
WCwaistContinuous
SBPsystolic blood pressureContinuous
DBPdiastolic blood pressureContinuous
LIV_TOGcommon-law marriageDichotomous
MARRIEDmarriedDichotomous
SINGLEsingleDichotomous
DIVORCdivorcedDichotomous
VALUEsocial development index by valueContinuous
STRATUMsocioeconomic stratumContinuous
QUA_HOUSquality and living spaceContinuous
HEALTHACaccess to healthcare and social securityContinuous
EDULAGeducational lagContinuous
DURABdurable goodsContinuous
SANITRYsanitary adequacyContinuous
ENER_ADenergy efficiencyContinuous
ED_LEVELeducational level in the neighborhoodContinuous
SEC_SCHOOLsecondary schoolDichotomous
DOCTORATEdoctorateDichotomous
MASTERmasterDichotomous
SCHOOLschoolDichotomous
BACHELORSbachelor’s degreeDichotomous
HIGH_SCHOOLhigh schoolDichotomous
TECH_SCHOOLtechnical schoolDichotomous
NONEno academic degreeDichotomous
TOTMETmetabolic equivalent of taskContinuous
STAT_ANXstate anxietyDichotomous
TRAIT_ANXtrait anxietyDichotomous
SLPNOTQsleep was not quietContinuous
BREATHwaking up with shortness of breathContinuous
DROWSYfeeling drowsy or sleepyContinuous
TROBLStrouble falling asleepContinuous
AWAKENawakens during your sleep timeContinuous
STYAWKEtrouble staying awakeContinuous
TAKENAPtakes naps of 5 min or longerContinuous
SLPD4sleep disturbancesContinuous
SLPSNR1snores during sleepContinuous
SLPSOB1sleep short (headache)Continuous
SLPA2sleep adequacyContinuous
SLPS3somnolenceContinuous
SLPS6sleep problems (Index I [40])Continuous
SLPS9sleep problems (Index II [40])Continuous
SLPQRAWsleep quantityContinuous
SLPOP1sleep qualityDichotomous
SMOKINGsmoking practiceDichotomous
CURRENTcurrent smokerDichotomous
EXSMOKERex-smokerDichotomous
SMO_PASSsmoker passiveDichotomous
ALCOHOLalcohol consumptionDichotomous
ENERGYDRKenergy drinksDichotomous
MOTHEROBmaternal obesity historyDichotomous
FATHEROBpaternal obesity historyDichotomous
MOTHERDBmaternal diabetic historyDichotomous
FATHERDBpaternal diabetic historyDichotomous
MOTHERHTmaternal hypertension historyDichotomous
FATHERHTpaternal hypertension historyDichotomous
MOTHERDLmaternal dyslipidemia historyDichotomous
FATHERDLpaternal dyslipidemia historyDichotomous
MOTHERGTmaternal gout historyDichotomous
FATHERGTpaternal gout historyDichotomous
URICuric acidContinuous
CREAcreatinineContinuous
HDLCOhigh-density lipoproteinContinuous
LDLCOlow-density lipoproteinContinuous
GLUblood glucoseContinuous
IATatherogenic indexContinuous
CHOL_ANTcholesterolContinuous
TRIGtriglyceridesContinuous
NAsodiumContinuous
CALORenergyContinuous
PROTEItotal proteinsContinuous
APROTproteins of animal originContinuous
CARBOcarbohydratesContinuous
SUCRsucroseContinuous
FRUCTfructoseContinuous
LACTlactoseContinuous
STstarchContinuous
MALTmaltoseContinuous
GLU_1glucose levels based on the dietary surveyContinuous
CRUDEcrude fiberContinuous
SOLFBsoluble dietary fiberContinuous
INSFBinsoluble dietary fiberContinuous
HEMCLhemicelluloseContinuous
CALCcalciumContinuous
IRONtotal ironContinuous
MAGNmagnesiumContinuous
PHphosphorusContinuous
KpotassiumContinuous
SODIUMsodium levels based on the dietary surveyContinuous
ZNzincContinuous
CUcopperContinuous
MNmanganeseContinuous
SEiodineContinuous
VITCvitamin CContinuous
B1thiamineContinuous
B2riboflavinContinuous
B6vitamin B6Continuous
B12vitamin B12Continuous
VITKvitamin KContinuous
RETINOLretinolContinuous
VITDvitamin DContinuous
VITEvitamin EContinuous
CHOL_SNcholesterol levels based on the dietary surveyContinuous
ALCOalcohol levels based on the dietary surveyContinuous
CAFFcaffeineContinuous
AFATanimal fatContinuous
VFATvegetable fatContinuous
TFATAVtotal fat: animal + vegetableContinuous
SATFATsaturated fatContinuous
MONFATmonounsaturated fatContinuous
POLYpolyunsaturated fatContinuous
MSMetSDichotomous
Table 2. Features and values obtained through logistic regression for men and women.
Table 2. Features and values obtained through logistic regression for men and women.
Women Men
VariableCoefficientp_Value VariableCoefficientp_Value
GLU 4.61438598 6.24 × 10−59 GLU 3.94711748 2.45 × 10−39
TRIG 3.63418178 1.18 × 10−37 TRIG 2.98165065 3.25 × 10−24
WC 1.75532078 2.86 × 10−9 WC 2.53131848 1.02 × 10−9
BMI 1.60919304 1.05 × 10−6 IAT 2.06238741 5.13 × 10−11
SBP 1.40299133 1.15 × 10−12 SBP 1.53063308 1.31 × 10−11
PROTEI 0.90748897 0.08529715 B12 1.41903991 0.00880359
FRUCT 0.73077934 0.23874313 BMI 1.40229014 0.00087404
CHOL_SN 0.72037259 0.06868106 LACT 1.29691863 0.00581383
URIC 0.65547784 0.01333401 CARBO 1.18935354 0.0886463
CU 0.64813271 0.17111299 GLU_1 1.1674073 0.10024746
Table 3. Features of men obtained using RF with ADASYN and SMOTE applied.
Table 3. Features of men obtained using RF with ADASYN and SMOTE applied.
ADASYN, B = 1 ADASYN, B = 5 SMOTE, K = 1 SMOTE, K = 5
Features Value Features Value Features Value Features Value
BMI92.9499 ENER_AD130.906694 MOTHERDL204.657628 BMI289.868211
WEIGHT49.4782 BMI104.213511 ALCOHOL199.602686 MOTHERDL172.071267
ENER_AD48.8887 WEIGHT81.5087781 BMI198.579371 WEIGHT169.929592
EDULAG45.2797 EDULAG67.7406035 SLPSOB1111.323472 ALCOHOL131.283664
LIV_TOG33.3601 ALCOHOL62.4379604 CURRENT95.3509822 IAT93.2909179
DURAB31.5583 STRATUM57.134903 BREATH80.8262246 CHOL_ANT63.4703128
MOTHERGT27.5583 ED_LEVEL55.578244 SLPD470.1756789 NA49.2933568
IAT25.7470 NONE38.1101529 CAFF68.9892898 CREA45.8846962
HEALTHAC23.4522 DURAB36.4129389 SLP660.2949079 SINGLE44.6897663
DIVORC20.1163 VALUE36.0130176 WEIGHT56.9297661 SLPSNR135.672622
QUA_HOUS17.4925 DIVORC35.8243538 TOTMET52.4806201 MOTHERDB35.21356
STRATUM16.1269 FATHERGT33.7033121 ALCO45.7609412 ENERGYDRK34.0359073
FATHERGT14.5872 MASTER29.8751736 AWAKEN39.0795326 URIC31.8268793
NONE14.0213 PRIMARIA28.3852397 IAT38.042823 AGE27.9839119
MARRIED13.9584 SLPSNR127.9671847 TROBLS36.7528999 MARRIED27.8864259
VALUE13.8059 AGE24.3706018 STYAWKE36.2387269 DOCTORATE24.4733499
URIC13.7930 IAT22.0506592 MALT34.3472852 DIVORC24.142464
SANITRY13.5609 SANITRY21.924077 BACHELORS33.7934562 SLPOP123.8868609
SINGLE13.4148 SINGLE21.7818986 MARRIED32.6228111 SEC_SCHOOL22.755325
ALCOHOL12.9798 DOCTORATE19.8069099 SLP931.0845509 SLPQRAW20.666244
Table 4. Features of men obtained using RPART with ADASYN and SMOTE applied.
Table 4. Features of men obtained using RPART with ADASYN and SMOTE applied.
ADASYN, B = 1 ADASYN, B = 5 SMOTE, K = 1 SMOTE, K = 5
Features Value Features Value Features Value Features Value
LIV_TOG447.069761 BMI683.735277 BMI185.940586 BMI164.086828
BMI402.975487 ENER_AD619.998675 WEIGHT131.361866 WEIGHT132.276557
ENER_AD338.664389 EDULAG565.325738 FATHERGT115.496204 IAT131.937059
EDULAG325.498647 ALCOHOL355.970533 MOTHERDL96.1708037 SINGLE83.6531675
DURAB285.861702 WEIGHT295.254303 IAT67.2839991 MOTHERDL71.6947353
SLP664.2112969 DIVORC214.489844 AGE40.9532174 APROT47.2274885
WEIGHT33.1175418 NONE200.599299 LACT28.7681412 TFATAV22.4867652
IAT27.5407406 MOTHERGT178.450647 MOTHERHT25.3414479 ST20.7519258
FATHEROB14.5734264 PROTEI14.5865884 HEALTHAC19.7752349
SLPSNR113.7361635 CAFF14.1658755 SATFAT17.5962564
ZN12.4515539 HEIGHT16.3718359
MN12.20696 CHOL_ANT15.4222905
IRON10.5317678 MONFAT13.9908309
VALUE10.2017285 CREA13.6358167
STYAWKE10.1887194 URIC11.0085972
MONFAT10.0410598 AGE10.5421496
CHOL_ANT9.78675973 CALC10.0034374
ST9.41791645 SMOKING9.53883547
SINGLE9.40405705 LACT9.34161011
SOLFB7.74765092 TOTMET9.09355989
Table 5. Features of women obtained using RF with ADASYN and SMOTE applied.
Table 5. Features of women obtained using RF with ADASYN and SMOTE applied.
ADASYN, B = 1 ADASYN, B = 5 SMOTE, K = 1 SMOTE, K = 5
Features Value Features Value Features Value Features Value
BMI208.269603 ENER_AD344.249674 WEIGHT321.316267 BMI484.307061
IAT151.849516 BMI210.90055 IAT294.958989 IAT481.475021
WEIGHT98.3094923 IAT173.895403 BMI253.281611 WEIGHT339.174822
EDULAG98.0933243 ALCOHOL146.230976 EXSMOKER246.78181 URIC142.754087
LIV_TOG82.4204188 DURAB142.91494 MASTER241.332636 SLPSNR192.0496746
ENER_AD80.7154997 EDULAG142.817907 FATHERDL211.443455 CHOL_ANT74.3706077
URIC60.4722703 WEIGHT128.038926 CREA170.195583 AGE72.769531
VALUE53.5122927 VALUE80.989846 MOTHERHT125.867318 SLPSOB170.1959444
DURAB48.2486067 NONE76.4699068 SLPSOB1125.384246 BREATH60.3028803
QUA_HOUS37.8080123 QUA_HOUS62.8303545 SMO_PASS86.2763209 TRAIT_ANX56.4099594
SLPSNR131.399627 BACHELORS56.0706757 BREATH83.1176663 SMO_PASS50.8288614
HEALTHAC30.6724986 SANITRY52.5802813 CHOL_ANT78.8668934 SANITRY50.3648334
SANITRY24.2597947 HEALTHAC45.9188536 SMOKING57.7946015 MOTHERDL50.0567677
ALCOHOL24.2064626 URIC43.8531276 TRAIT_ANX57.3909833 DROWSY44.564559
AGE21.594859 SINGLE39.5694722 SLPSNR151.1574483 SMOKING44.5264858
SINGLE18.0193809 DIVORC37.3860944 NA50.3156936 SINGLE41.993735
HIGH_SCHOOL17.1684616 AGE33.8392029 MARRIED48.4664641 EXSMOKER38.9120379
SLP616.0530682 TECH_SCHOOL32.4092154 SLPOP148.3006717 SEC_SCHOOL38.4719692
SOLFB14.4271683 SCHOOL28.2955057 SLPNOTQ35.6761924
FATHERGT13.8839264 MARRIED27.6425229
Table 6. Features of women obtained using RPART with ADASYN and SMOTE applied.
Table 6. Features of women obtained using RPART with ADASYN and SMOTE applied.
ADASYN, B = 1 ADASYN, B = 5 SMOTE, K = 1 SMOTE, K = 5
Features Value Features Value Features Value Features Value
BMI664.323812 BMI1164.1686 BMI427.45413 IAT483.233069
LIV_TOG535.392713 DURAB1117.88127 IAT363.893488 BMI410.367827
ENER_AD507.53479 ENER_AD1090.27197 SLPSNR1259.475806 WEIGHT409.777127
EDULAG505.45874 EDULAG772.049538 SLPS3259.475806 URIC278.65513
IAT468.310602 ALCOHOL655.016952 EXSMOKER217.54026 SLPSNR186.0218576
NONE533.217568 SMOKING31.3976405
IAT380.443927 SLPS330.5201011
WEIGHT366.577281 SODIUM15.7251124
VALUE104.231729 ALCOHOL12.4735987
TECH_SCHOOL92.1094015 SATFAT12.1523683
MONFAT12.1446951
NA11.2712045
VITE10.3455105
CHOL_ANT9.04441276
FATHERDB8.09870623
SUCR7.16739885
MARRIED6.39473684
FRUCT4.94398493
MALT4.8372105
Table 7. Results of the random forest models applying ADASYN and SMOTE in men and women.
Table 7. Results of the random forest models applying ADASYN and SMOTE in men and women.
SexSubsetParametersB.ACC (%)Sensitivity (%)Specificity (%)
MenADASYN, B = 1Mtry = 986.2290.9381.50
Ntree = 200±0.26±0.60±0.41
MenADASYN, B = 5Mtry = 885.5687.8583.26
Ntree = 200±0.34±0.49±0.55
MenSMOTE, K = 1Mtry = 1082.8691.5174.21
Ntree = 200±1.66±0.68±3.45
MenSMOTE, K = 5Mtry = 1075.4390.4860.39
Ntree = 100±1.29±0.95±2.50
WomenADASYN, B = 1Mtry = 1087.1291.1083.15
Ntree = 200±0.25±0.40±0.29
WomenADASYN, B = 5Mtry = 1086.7388.6284.84
Ntree = 300±0.20±0.24±0.36
WomenSMOTE, K = 1Mtry = 1082.5590.4874.62
Ntree = 300±0.71±0.39±1.46
WomenSMOTE, K = 5Mtry = 1088.5091.9185.10
Ntree = 300±0.40±0.42±0.75
Table 8. Results of the RPART models applying ADASYN and SMOTE in men and women.
Table 8. Results of the RPART models applying ADASYN and SMOTE in men and women.
SexSubsetParametersB.ACC (%)Sensitivity (%)Specificity (%)
MenADASYN, B = 1cp = 0.0582.1481.5782.71
±1.75±3.38±2.07
MenADASYN, B = 5cp = 0.0582.3282.8781.77
±0.99±4.67±5.02
MenSMOTE, K = 1cp = 0.00175.4173.0977.73
±2.78±4.07±5.36
MenSMOTE, K = 5cp = 0.00274.6771.9677.38
±2.78±4.07±5.36
WomenADASYN, B = 1cp = 0.0578.9069.9687.84
±0.31±0.00±0.62
WomenADASYN, B = 5cp = 0.0578.9069.9687.84
±0.31±0.00±0.62
WomenSMOTE, K = 1cp = 0.00180.8679.8581.87
±1.91±3.79±3.57
WomenSMOTE, K = 5cp = 0.00584.4984.2084.79
±1.43±3.01±2.51
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Gutiérrez-Esparza, G.; Martinez-Garcia, M.; Ramírez-delReal, T.; Groves-Miralrio, L.E.; Marquez, M.F.; Pulido, T.; Amezcua-Guerra, L.M.; Hernández-Lemus, E. Sleep Quality, Nutrient Intake, and Social Development Index Predict Metabolic Syndrome in the Tlalpan 2020 Cohort: A Machine Learning and Synthetic Data Study. Nutrients 2024, 16, 612. https://doi.org/10.3390/nu16050612

AMA Style

Gutiérrez-Esparza G, Martinez-Garcia M, Ramírez-delReal T, Groves-Miralrio LE, Marquez MF, Pulido T, Amezcua-Guerra LM, Hernández-Lemus E. Sleep Quality, Nutrient Intake, and Social Development Index Predict Metabolic Syndrome in the Tlalpan 2020 Cohort: A Machine Learning and Synthetic Data Study. Nutrients. 2024; 16(5):612. https://doi.org/10.3390/nu16050612

Chicago/Turabian Style

Gutiérrez-Esparza, Guadalupe, Mireya Martinez-Garcia, Tania Ramírez-delReal, Lucero Elizabeth Groves-Miralrio, Manlio F. Marquez, Tomás Pulido, Luis M. Amezcua-Guerra, and Enrique Hernández-Lemus. 2024. "Sleep Quality, Nutrient Intake, and Social Development Index Predict Metabolic Syndrome in the Tlalpan 2020 Cohort: A Machine Learning and Synthetic Data Study" Nutrients 16, no. 5: 612. https://doi.org/10.3390/nu16050612

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop