Next Article in Journal
Cardiovascular Anesthesia and Critical Care in the French West Indies: Historical Evolution and Current Prospects
Previous Article in Journal
Hormonal and Glycemic Responses During and After Constant- and Alternating-Intensity Exercise
Previous Article in Special Issue
Predictive Factors for Urinary Tract Infections in Patients with Type 2 Diabetes
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Exploring Predictors of Type 2 Diabetes Within Animal-Sourced and Plant-Based Dietary Patterns with the XGBoost Machine Learning Classifier: NHANES 2013–2016

by
Adam C. Eckart
* and
Pragya Sharma Ghimire
Department of Health and Human Performance, Kean University, Union, NJ 07083, USA
*
Author to whom correspondence should be addressed.
J. Clin. Med. 2025, 14(2), 458; https://doi.org/10.3390/jcm14020458
Submission received: 9 December 2024 / Revised: 7 January 2025 / Accepted: 9 January 2025 / Published: 13 January 2025
(This article belongs to the Special Issue Type 2 Diabetes and Complications: From Diagnosis to Treatment)

Abstract

:
Background/Objectives: Understanding the relationship between dietary patterns, nutrient intake, and chronic disease risk is critical for public health strategies. However, confounding from lifestyle and individual factors complicates the assessment of diet–disease associations. Emerging machine learning (ML) techniques offer novel approaches to clarifying the importance of multifactorial predictors. This study investigated the associations between animal-sourced and plant-based dietary patterns and Type 2 diabetes (T2D) history, accounting for diet–lifestyle patterns employing the XGBoost algorithm. Methods: Using data from the National Health and Nutrition Examination Survey (NHANES) from 2013 to 2016, individuals consuming animal-sourced foods (ASF) and plant-based foods (PBF) were propensity score-matched on key confounders, including age, gender, body mass index, energy intake, and physical activity levels. Predictors of T2D history were analyzed using the XGBoost classifier, with feature importance derived from Shapley plots. Lifestyle and dietary patterns derived from principal component analysis (PCA) were incorporated as predictors, and high multicollinearity among predictors was examined. Results: A total of 2746 respondents were included in the analysis. Among the top predictors of T2D were age, BMI, unhealthy lifestyle, and the ω6: ω3 fatty acid ratio. Higher intakes of protein from ASFs and fats from PBFs were associated with lower T2D risk. The XGBoost model achieved an accuracy of 83.4% and an AUROC of 68%. Conclusions: This study underscores the complex interactions between diet, lifestyle, and body composition in T2D risk. Machine learning techniques like XGBoost provide valuable insights into these multifactorial relationships by mitigating confounding and identifying key predictors. Future research should focus on prospective studies incorporating detailed nutrient analyses and ML approaches to refine prevention strategies and dietary recommendations for T2D.

1. Introduction

The relationship between diet and disease has long been a significant interest in public health and nutrition. Dietary patterns and nutrient intake have wide-ranging implications for general health, and understanding the influence of specific nutrients on disease risk is vital for effectively formulating dietary guidelines. The existing literature that compares dietary patterns emphasizes the differences between those focused on plant-based foods (PBFs) and animal-sourced foods (ASFs) in relation to the prevention and management of cardiometabolic diseases, including cardiovascular disease (CVD) and type 2 diabetes (T2D). Many cohort studies show different cardiometabolic diseases and mortality outcomes in association with the intake of plant-based or animal-sourced fat and protein [1,2,3,4,5]. One popular method used in observational studies to demonstrate these effects involves a statistical procedure in which risk ratios are calculated for isocaloric substitutions of one nutrient type for the other [2,3,6]. For example, in one study, replacing 5% of energy from animal fat with 5% from plant fat was associated with reduced overall mortality and CVD mortality [2]. However, this procedure can introduce bias by decoupling subject characteristics from nutrient intakes [7]. In other words, statistical methods that adjust risk ratios based on increases in specific nutrients while holding the total calorie intake constant do not isolate the effect of nutrient proportions alone. They also alter the composition of the sample used in each analysis. Thus, the new risk ratios reflect a different sample of individuals whose disease risk is affected by many other factors besides dietary composition, such as age, obesity status, socioeconomic status (SES), race/ethnicity, or access to healthcare, to name a few. Since dietary patterns are often linked to many health factors, this risk analysis method is problematic, as it may not fully account for healthy user bias, even after adjustments for common confounders [7]. Likewise, individuals with unhealthy lifestyles typically engage in other risk-increasing behaviors or present with multiple comorbidities that may not be accounted for or adjusted for during disease risk analysis, leading to residual confounding.
Lifestyle behaviors and individual characteristics, including physical activity, total energy intake, BMI, sedentary behavior, and age, have an outsized effect on cardiometabolic disease outcomes and are highly interrelated [8]. In many studies, the association between increased risk for cardiometabolic diseases and higher total and animal protein intake compared to plant protein intake applies only to those with unhealthy lifestyle factors [3,6,9,10,11,12]. Adjustments for variables such as BMI, alcohol consumption, physical activity, and smoking often result in attenuation of associations between animal-sourced food intake and disease or mortality outcomes. For example, animal protein was associated with higher incident T2D in the Melbourne Collaborative Cohort Study, while plant protein was inversely associated [6]. However, higher animal protein intake was also positively associated with BMI and fat intake and inversely associated with SES, physical activity, fiber, and vitamin intake. In contrast, plant protein intake was linked to higher physical activity, fiber, and vitamin intake and was inversely associated with BMI and smoking. The inverse association between plant protein intake and T2D attenuated after adjustment for lifestyle factors, fiber, magnesium, vitamin intakes, sodium, and saturated fatty acids [6].
Similarly, in a recent study using data from the National Health and Nutrition Examination Survey (NHANES), the effect of an unhealthy lifestyle characterized by high BMI, higher alcohol consumption, higher total energy, less overall protein intake, higher sugar intake, low physical activity, higher likelihood of smoking, and low participation in other health behaviors (e.g., dental visits and fiber intake) on CVD history was over 2.5 times higher than the effect of red meat intake alone [11]. Moreover, red meat eaters had unfavorable lipid profiles and less physical activity and were more likely to smoke compared to white meat eaters. Evidence from these studies suggests an inexorable link between lifestyle, dietary patterns, and disease outcomes. Consequently, high intake of specific nutrients within hypercaloric diets could be linked to cardiometabolic disease status, irrespective of purported nutrient-disease interactions [13].
Due to the difficulty in isolating the effects of nutrient intakes, links between dietary patterns and chronic diseases would seem paradoxical if other lifestyle factors were not emphasized. For example, in the Shanghai Women’s Health Study, diets higher in dairy were associated with lower T2D risk and a lower likelihood of smoking and alcohol consumption. In contrast, diets higher in plant-based foods were associated with lower SES and higher T2D risk [12]. A prospective study of over 43,000 people found that diets higher in meat consumption compared to those higher in fruits and vegetables were associated with a higher risk of T2D in non-smokers but not in those with a history of smoking [14]. Another prospective study of over half a million people reported that white meat intake was associated with higher CVD risk in former smokers. In contrast, processed meat intake was linked to lower risk among those who never smoked [15].
Confounding issues in nutrient-disease research warrant changes in the methodological approach to account for bias and to identify relevant modifiable risk factors. Traditional statistical approaches cannot effectively handle the challenges posed by high-dimensional data, non-linear relationships, and interactions prevalent in nutritional epidemiology. Developments in machine learning (ML) techniques offer new approaches to reducing model overfit and improving predictive power. They also provide greater flexibility in dealing with high-dimensional and deeply interrelated predictors, which are very common in nutritional epidemiology. These methods enable the examination of large and complex datasets, managing multicollinearity, providing robust estimates, or selecting the most relevant predictors. Utilizing these advanced methods can significantly improve the identification of risk factors and enhance dietary recommendations through a better understanding of diet–disease relationships.
T2D is a global public health issue directly linked to dietary and lifestyle factors, making it an ideal health outcome for analyzing multifactorial predictors using ML techniques. Many studies have compared the performances of various ML models in predicting T2D, with gradient-boosted decision trees among the most used and top-performing, as determined by accuracy and area under the receiver operating characteristic curve (AUROC) metrics [16]. Gradient-boosted decision trees are innovative machine learning algorithms that address issues related to predictive modeling, such as overfit or bias, by combining weaker models, correcting errors, and updating predictions. Gradient boosting minimizes the log loss (for classification models) and the size of the tree, improving interpretability [17]. XGBoost is an extreme gradient-boosting decision tree algorithm that utilizes lasso (L1) and ridge (L2) regularization to prevent overfitting and provides feature importance outputs. Regularization adds a penalty (λ) term to the loss function, reducing coefficients for less relevant predictors [17]. This is especially useful when high multicollinearity exists, which may obscure the true relationships between predictors and the outcome. While L2 reduces feature weights near zero, L1 reduces insignificant features to zero, retaining only the most relevant predictors. The degree of L1 and L2 regularization can be optimized during the ML process to improve accuracy.
ML predictive modeling for T2D is still nascent. The feature sets used in these models are heterogeneous, making comparing feature selection across studies challenging. However, in a meta-analysis of 90 studies on the performance of 18 different models, models that included lifestyle, socioeconomic, and diagnostic data were more accurate overall [16]. In another study, lasso regression was used to select relevant predictors of T2D from a population-based cohort that included 2012 adult men and women. Smoking and waist circumference were among the most important lifestyle factors, increasing the odds of T2D incidence by over 65% and 5% per cm, respectively [18]. In a different study, predictors of fasting blood glucose in 650 participants, 270 of whom had T2D, were compared using basic linear, ridge, and lasso regression [19]. There were high variance inflation factors (VIFs), a measure of multicollinearity for total cholesterol and LDL cholesterol, and a moderate VIF for triglycerides. However, there was an agreement among ML models for age, BMI, and gender as significant predictors of T2D. Another study evaluated the predictive performance of five different ML models in predicting T2D history employing data from NHANES. The top three predictors in order of importance were sleep, energy intake, and age, with an AUROC of 83% and an 82% sensitivity [20]. Similarly, a study compared the importance of T2D predictors across logistic regression, an artificial neural network, and decision tree predictive models [21]. All three models achieved moderately high sensitivity, ranking lifestyle factors, SES, and health-related behaviors among the most important predictors.
Given the limited research using gradient-boosted decision tree models to investigate the relationships between animal-sourced and plant-based diets, lifestyle factors, and T2D, we aim to address this gap. Our analysis will focus on common predictors of T2D history typically found in nutrient-disease studies. To add another layer of control, we will propensity score-match ASF and PBF dietary patterns on key confounders. Additionally, we will assess the predictive power of the XGBoost algorithm and investigate how effectively this method can minimize noise from complicated, interrelated data.

2. Materials and Methods

2.1. Data and Sample Extraction

NHANES is a nationally representative survey of US civilians. Estimates were calculated using combined-cycle case weights following NHANES guidance to address variations in selection probabilities, non-response, missing data, and sub-sample datasets [22]. Data were collected and combined from 2013 to 2016, including age and gender, body composition, physical activity, smoking status, individual food intake, T2D history, T2D medication use, macro-nutrient intake, serum lipids, and metabolic markers (Table S1).

Inclusion and Exclusion Criteria

Respondents with missing case weights, those following a special diet, individuals with serious disabilities, those taking medication for type I diabetes, those on glucagon-like peptide-1 receptor agonists (GLP-1RAs), participants with a history of bariatric surgery, or those currently pregnant or lactating were excluded from the sample (Figure 1). To enhance generalizability, address the heterogeneity in the relationship between dietary variables and T2D, improve diet–lifestyle pattern analysis, and strengthen propensity score-matching between diet groups, the sample included children and adults aged 16 to 80 years.

2.2. Dietary Groups

The NHANES dietary survey records food intakes using USDA food codes. USDA food codes recorded during the dietary interview were classified into broader food categories using the What We Eat in America (WWEIA) food groups (Figure 1). Respondents were selected for the ASF dietary pattern if the percentage of total calories from ASFs was greater than zero. Respondents recording food intake only from plant-based food groups were selected for the PBF dietary pattern.

Propensity Score-Matching

After dietary pattern stratification, respondents were propensity score-matched on age, gender, BMI, total caloric intake, and the ratio of physical activity to sedentary time (PAX) without replacement and a 0.5 match tolerance. All variables included in the analyses were examined for normality via the Kolmogorov–Smirnov test (p ≤ 0.05).

2.3. T2D Cases

Respondents were categorized as having T2D if they answered “Yes” to the question, “Doctor told you have diabetes” and met at least one of the following criteria: glycohemoglobin of 6.5% or higher, current use of T2D medication (insulin or pills) at the time of the survey, or a plasma fasting glucose of ≥126 mg/dL. Gestational or borderline T2D cases were not included in the analysis.

2.4. Serum Metabolic Markers and Derived Variables

To address diet quality related to dietary fat, we derived variables, including the ratio of dietary omega-6 (ω6) fatty acid (FA) to omega-3 (ω3) FAs, the serum ω6FAs: ω3FAs ratio, and the ratio of total unsaturated fatty acids (UFAs) to saturated fatty acids (SFAs) [PUFAs + MUFAs/SFAs]. The UFAs: SFAs ratio variable is based on the National Cancer Institute Healthy Eating Index (HEI) scoring standard of ≥2.5 [23].
Evidence suggests that the ratios of dietary and serum ω6FAs: ω3FAs are significant markers of dietary fat quality, which impacts overall inflammation and metabolic health [24,25]. ω3FAs have been shown to exert anti-inflammatory effects, whereas a disproportionately high intake of ω6FAs relative to ω3FAs may exacerbate inflammation. An imbalance in this ratio has been associated with an increased risk of T2D [24,25].
High-sensitive c-reactive protein (hs-CRP) is a biomarker of systemic inflammation, which plays an important role in the pathogenesis of T2D and CVD. Elevated hs-CRP levels have been associated with insulin resistance and an increased risk of developing T2D [26]. By including hs-CRP in the analysis, this study can evaluate the inflammatory responses associated with different dietary patterns and those characterized by varying ω6FA: ω3FA ratios.
Insulin levels directly reflect insulin resistance and pancreatic function. High insulin levels indicate reduced insulin sensitivity, which can lead to T2D. Evaluating insulin levels alongside serum fatty acids, dietary patterns, and hs-CRP provides comprehensive insights into the metabolic associations of T2D.
To address the associations of recent lifestyle changes, we calculated the change in BMI from one year before the survey. To analyze the effect of an unhealthy lifestyle, we categorized respondents based on the presence of at least one unhealthy lifestyle factor, including obesity (≥30 kg/m2), smoking history, 1-SD increase in BMI within the past year, or less than 30 min of daily recreational physical activity at any intensity.

2.5. Statistical Analyses

To examine multicollinearity, linear regression was used to derive each predictor’s variance inflation factor (VIF). We used principal component analysis (PCA) to analyze diet–lifestyle patterns for those with confirmed T2D. Principal components were rotated via direct oblimin ( = 0 ) to maximize interpretability. Regression factor scores from each component were added as new features. Variable patterns loading on each component were characterized according to the strength of partial correlations for each predictor in the pattern matrix.
The XGBoost classifier was configured for optimization using a parameter grid for learning rate, n-estimators (the number of trees in the ensemble), max depth (the number of allowable branches), subsample (the percentage of the sample used for training ensemble trees), and gamma value (the regularization parameter that controls the number of tree splits). Training was performed on 70% of the data and tested on 30%.
The classes in the training dataset were balanced using the Synthetic Minority Oversampling Technique (SMOTE). We used a random state of 42 to generate synthetic samples of the T2D-positive class by interpolating between observed cases. In this way, there is equal representation for the classes in the resampled training set, which overcomes the problem of class imbalance. The resampled data were then used in model training and optimization to improve predictive performance and equity across both classes. Sensitivity and accuracy analyses were performed via AUROC and precision-recall curves (AUPRC). Shapley Additive Explanation (SHAP) plots were generated to explain the contribution of each predictor to T2D status in order of importance. Data aggregation, transformation, cleaning, and propensity score matching were performed in IBM SPSS (version 29.0); PCA and XGBoost analyses were performed in Python (version 3.9.6).

3. Results

The diet-matched sample included 1373 respondents in each dietary group. The proportion of males in the ASF and PBF groups was 53.2% and 44.7%, respectively. The median age of those in the PBF pattern was 48 years compared to 47 years in the ASF pattern (Table 1). The percentage of respondents with T2D was 11.3% and 9.5% in the ASF and PBF groups, respectively. The median total energy intake in the ASF pattern was 7% higher than in the PBF pattern. Those in the ASF pattern consumed a median intake of 14.71% of their total calories from ASFs. In contrast, those in the PBF pattern consumed 9% of their total calories from PBFs.
Median dietary intakes of cholesterol, total ω3FAs, total ω6FAs, MUFAs from PBFs, PUFAs from PBFs, total plant protein, total plant fat, overall protein, overall fat, total MUFAs, total PUFAs, and saturated fat were higher in the ASF pattern. Conversely, carbohydrates, the dietary ω6FAs: ω3FAs ratio, and fiber were higher in the PBF pattern. The ASF group had a higher intake of MUFAs and PUFAs from animal sources than from plant sources. The UFA: SFA ratio was similar between groups.
Serum values of HDL-C, LDL-C, glycohemoglobin, total cholesterol, triglycerides, total serum ω3FAs, total serum ω6FAs, and plasma fasting glucose were similar. However, hs-CRP and fasting insulin were higher in the ASF group. Serum fatty acid subtype arachidonic acid (AA) was higher in the ASF group. However, alpha-linolenic acid (ALA), linoleic acid (LA), eicosapentaenoic acid (EPA), and docosahexaenoic acid (DHA) were lower than the PBF group.
A total of 87.3% of the PBF group had an unhealthy lifestyle compared to 91.9% in the ASF group. Sedentary time was higher in the ASF group. However, BMI, BMI change, and body fat percentage were similar between groups. Also, 42.8% of those in the ASF group had a history of smoking compared to 39.4% in the PBF group. The proportion of respondents taking prescription pills for T2D was 31.6% and 40.2% in the ASF and PBF groups, respectively.
Table 2 shows the descriptive estimates stratified by T2D history. There were 286 cases of T2D, and of those, 52.1% were male. At the median, the T2D group was 17 years older, consumed less overall energy, and had a poorer body composition, higher serum triglycerides, lower HDL-C, lower fiber intake, higher intake of plant protein, lower intake of plant fats, higher intake of ASF fats, lower intake of ASF protein, higher glycohemoglobin, higher hS-CRP, higher insulin levels, higher fasting glucose, lower intake of cholesterol, lower intake of PUFAs and MUFAs, and higher sedentary time. However, those with T2D had a 14% lower intake of carbohydrates, a 12% lower saturated fat intake, lower LDL-C, lower EPA concentration, a lower dietary ω6FAs: ω3FAs ratio, and higher levels of AA, ALA, and DHA than those without T2D. The T2D group experienced a reduction in BMI in the past year, while non-diabetics saw a slight increase. Approximately 41.3% of those with T2D reported a smoking history, compared to 41.0% in non-diabetics. Of diabetic smokers, 15.6% were current smokers and 25.7% were former smokers.
Linear regression showed high VIFs for ω6FAs and PUFAs (Figure 2). Moderate VIFs were found for total energy, MUFAs, and serum ω3FAs.
The PCA resulted in 10 components extracted, explaining over 75% of the variance in T2D history (Table 3). Component 1 explained nearly 20% of the variance and was influenced primarily by high unsaturated fatty acids and high total energy intake. Component 2 explained nearly 9% of the variance, with a high factor loading from the female gender. A strong factor loading on Component 3 from the PBF dietary pattern explained 8.4%. Component 4 explained 7.1% of the variance with a strong negative loading from serum ω6FAs: ω3FAs ratio and a strong positive loading from serum ω3 FAs. Component 5 was influenced by higher levels of plant-based fats and proteins and lower ratios of dietary ω6: ω3 FAs, accounting for 6.3% of the variance. Component 6 had strong loadings of age, smoking history, an unhealthy lifestyle, and a negative loading of BMI change in the past year. Component 7 had high loadings of animal-sourced fat and protein. Component 8 was influenced by poor body composition, a recent BMI increase, and an unhealthy lifestyle. The ratio of UFs: SFAs loaded strongly on Component 9. Lastly, Component 10 was primarily influenced by lower ratios of physical activity to sedentary time.
The best hyperparameters for the XGBoost classifier included a gamma value of 0.05, a learning rate of 0.2, a max depth of 7, n-estimators of 150, and a subsample of 0.5. The mean accuracy across the 10-fold cross-validation was 94.3%. The AUROC was 68%, with an overall accuracy of 83.4% and an F1 score of 22% (Figure 3). The AUPRC for the T2D-positive class was 18%.
Figure 4 shows the top individual and lifestyle predictors by feature importance. Positively associated predictors of T2D were age, BMI, the dietary ω6FAs: ω3FAs ratio, ω3FAs, female gender, ASF fat intake, fiber intake, and serum ω6FAs. Individual predictors inversely associated with T2D were smoking history, BMI change, PUFAs, PBF fat intake, and ASF protein intake. Diet–lifestyle features positively associated with T2D prediction were ‘Age, Smoking, Unhealthy Lifestyle, and Recent BMI Decrease’ and ‘Higher UFA: SFA Ratio’. In contrast, lifestyle features ‘Poor Body Composition, Recent BMI Increase’ and the PBF dietary pattern were negatively associated. The associations between body fat percentage and ‘Low Physical Activity, Unhealthy Lifestyle’ features were inconclusive. Predictive values for some features, including the PBF dietary pattern and body fat percentage, had substantial overlap between positive and negative impacts on prediction.

4. Discussion

This cross-sectional investigation observed complex dietary, lifestyle, and body composition patterns associated with T2D. High multicollinearity measures for UFAs confounded the relationship between dietary patterns and T2D history. Supporting established links, age, BMI, and an unhealthy lifestyle were among the top predictors of T2D history, emphasizing the role of age and lifestyle in disease progression [27,28]. However, the effect of smoking on T2D was unclear. Although an unhealthy lifestyle, defined by obesity, a marked recent increase in BMI, low physical activity, or smoking history was associated with T2D, smoking history as an individual predictor was inversely associated. Evidence suggests a complex relationship between smoking and T2D risk, with smoking contributing to a relatively higher risk of prediabetes and an increased risk of T2D following smoking cessation that gradually declines [29]. The increased risk after smoking cessation may be due to cumulative exposure to smoking or the result of weight gain following cessation. In the current study, smoking history included current and former smokers, so it is likely that the increased risk in those with and without a smoking history was exacerbated by unhealthy levels of body fat. Moreover, the prevalence of higher BMI and body fat percentage, despite recent BMI decreases, and the higher proportion of former smokers in the T2D group support the theory of weight gain following smoking cessation.
There was no clear link between either the ASF or PBF dietary pattern and T2D history. Despite the higher prevalence of T2D in the ASF group, higher levels of animal protein were negatively associated with T2D. This link is likely due to the strong link between ASF patterns and unhealthy lifestyle factors rather than the inclusion of ASFs. Conversely, diets higher in plant fats were inversely associated with T2D, like the studies mentioned previously. High amounts of circulating amino acids, specifically branched-chain amino acids (BCAAs), found in animal protein sources have been linked to hyperinsulinemia and impaired glucose uptake via the activation of the mammalian target of rapamycin (mTOR) pathway, which promotes protein synthesis and lipogenesis [30]. However, the effects of amino acid types on insulin action and glucose tolerance remain unclear. Recent clinical evidence suggests that the mTOR pathway is not activated with isocaloric, non-energy-restrictive substitutions of fat for animal or plant protein [31]. This may explain the attenuated associations between total and animal-sourced protein and increased risk of T2D or insulin resistance with an adjustment for body composition or total energy in several studies [32,33,34]. In contrast, one study found an association between animal-sourced and total protein intakes in obese women compared to men [35]. However, adjustment for previous chronic disease conditions weakened this relationship.
Evidence suggests that age, muscle mass, and adiposity have independent effects on insulin resistance [36,37,38]. In men, age appears to be an independent predictor of insulin resistance, while in females, increases in age-related adiposity drive insulin resistance [37]. In one study of over 100 post-menopausal women, lean body mass and visceral fat independently predicted insulin levels and hs-CRP, an inflammatory marker associated with increased CVD risk [36]. We found a similar result in the PCA, with female gender and body fat percentage loading strongly onto Component 2, accounting for 8.9% of the variance in T2D outcomes.
The difference in BCAA content between animal-sourced and plant-based protein sources may explain the association between protein type and insulin resistance, especially in older, overweight individuals. Animal protein consumption has been linked to higher muscle mass index (muscle mass (kg)/height (m)2) through higher muscle protein synthesis compared to plant protein [39]. However, animal protein intake and higher muscle mass may promote insulin resistance as age- and adiposity-related inflammation increase. In a cross-sectional study of older adults, insulin resistance measured by HOMA-IR was associated with animal protein intake, but only in those with a higher muscle mass index and higher body fat. In contrast, plant protein was inversely associated with muscle mass index and insulin resistance [38]. Importantly, however, a higher muscle mass index was inversely associated with body fat and chronic disease. The loss of type I muscle fibers, which are dense in mitochondria and have a high oxidative capacity, may also play a role in the development of T2D via increased fat deposition [40]. Our results support these effects. The ASF group had higher insulin levels and BMI but slightly lower body fat, which indicates relatively higher muscle mass. These findings suggest a complex relationship between age, diet, body composition, and insulin dysregulation.
A higher ratio of dietary ω6FAs: ω3FAs and higher serum ω6FAs predicted T2D. Evidence on the effects of UFAs on T2D or metabolic markers is unclear [4,41,42,43]. A meta-analysis of 10 cohort studies showed that ω3FA consumption was positively associated with T2D, exhibiting an inverted U-shape relationship [37]. In another meta-analysis of 83 randomized controlled trials, increasing ω3FAs, ω6FAs, or total PUFA had little or no effect on preventing or treating newly diagnosed T2D [44]. In the Prospective Metabolism and Islet Cell Evaluation (PROMISE) longitudinal study that included 477 participants with 6 years of follow-up, total non-esterified fatty acids (NEFAs) independently predicted decreased beta cell function [45]. No individual NEFAs had a positive influence on insulin sensitivity except for EPA. Similarly, the ASF group and T2D group had a higher overall intake of fats, including plant sources. This may be evidence of an adverse effect of overconsumption of UFAs despite a similar ratio of UFAs: SFAs to that of the PBF group.
Other research suggests that the relative dose of ω3FAs and long-term consumption may mediate this relationship [37,41]. Using NHANES data from 2005 to 2020, Jiang et al. observed that higher amounts of specific subtypes of MUFAs (16:1, 18:1, and 20:1) and PUFAs (18:2 and18:3) were related to reduced T2D risk [46]. Furthermore, only the highest intakes of PUFA subtypes 20:5 (EPA) and 22:5 (DPA) were associated with lower risk. In the current study, like Jiang et al., the T2D group had lower intakes of EPA, and the ASF group had lower intakes of EPA and DHA [46]. Conversely, the SHAP plot revealed an association between very high levels of ω3FAs and T2D, which may be due to secondary prevention efforts by diabetics to increase ω3FA levels.
In the current study, the ratio of serum ω6FAs: ω3FAs was over 13 for both groups. However, the ratios associated with improved metabolic health are 4–5:1 [24,25,47]. Although there is no consensus on the optimal ratios, some evidence suggests that the serum and dietary ω6FAs: ω3FAs ratio may predict T2D [24,25]. Modern diets have increased the availability of ω6FAs compared to ω3FAs, which has implications for metabolic health. ω6FAs are converted to AA, which may have inflammatory and thrombotic effects, whereas ω3FAs are converted to EPA and DHA, which exert anti-inflammatory effects [48]. ω6FAs may also contribute to obesity by increasing triglyceride concentration via increased cell membrane permeability, while ω3FAs have the opposite effect [49]. This is supported by several investigations showing a positive association between relatively higher amounts of ω3FAs and improved insulin sensitivity, fasting glucose, insulin, hs-CRP levels, and fitness measures [24,25,26].
The use of the ω6FAs:ω3FAs ratio has been questioned recently, however, due to the need for more reporting of specific subtypes used in the ratio [50]. Another criticism is that using a ratio does not consider the absolute amounts of each UFA despite evidence of a dose–response effect of specific UFAs in reducing T2D risk [26,46,50]. Furthermore, some evidence suggests that the lack of ω3FAs, not the relative increases in ω6FAs, is proinflammatory [50].
In another study in children and older adults that combined multiple cross-sectional and longitudinal studies, a lower ω6FAs:ω3FAs ratio was associated with higher HEI scores, indicating higher diet quality [47]. Still, participants across age groups did not meet the recommended amounts of EPA and DHA. Our study supports this as a higher ratio of ω6FAs:ω3FAs was predictive of T2D. Interestingly, higher ratios of UFAs:SFAs were predictive of T2D, despite the HEI standard of ≥2.5 for optimal FA consumption. Again, this association may be due to attempts by diabetics to improve the quality of their FA intake. Further investigation is needed to clarify the causal relationships between fatty acid subtypes and T2D.
Pancreatic beta cell function and their survival are related to the carbon chain length and degree of FA saturation, with long-chain SFAs (>12 carbons) inducing cytotoxicity [51]. However, in vitro studies have shown improvements in beta cell function after exposure to UFAs [52]. Some epidemiological evidence suggests that food preparation practices confound the relationship between UFAs and T2D. For example, Qian observed a 50% increase in T2D risk with higher consumption of fried plant-based MUFAs than animal-sourced MUFAs [53]. Deep frying typically involves using long-chain (18 carbons) plant oils. However, deep frying with low smoke point oils, especially with repeated use, increases oxidation and the production of SFAs and trans fats and impairs beta cell function [50,51].
Noteworthy observations from this study highlight the importance of lifestyle modification in preventing and treating T2D. First is the impact of BMI and unhealthy lifestyles on T2D outcomes. A recent investigation showed that just 14 days of reduced physical activity led to increases in insulin resistance and total body fat and decreases in limb lean mass [54]. Countless clinical trials representing the standard care model, which includes increasing physical activity and caloric restriction, have demonstrated effectiveness in reducing T2D risk and improving T2D and cardiometabolic markers [55]. Next is the consumption of nearly 135 more calories in the ASF group at the median despite being matched on total energy, BMI, and physical activity index. The link between T2D and energy surplus is inexorable, and recent attempts have been made to define T2D as an energy surplus disease [56]. Lack of dietary planning could lead to reduced dietary quality and over-consumption of calorically dense foods, including animal-sourced foods or processed foods from any source, increasing the risk of T2D. Caloric restriction, however, is a powerful tool in managing T2D. In a network meta-analysis of 18 studies on caloric restriction methods, intermittent fasting and continuous energy restriction methods improved HbA1c, body weight, and BMI compared to traditional diets [57].
Although dietary interventions for T2D management have been studied extensively, the best dietary pattern has yet to be identified. A recent meta-analysis of 56 trials comparing nine different dietary patterns showed that all dietary patterns were effective in reducing HbA1c and fasting glucose compared to the control, with a low-carbohydrate diet, Mediterranean diet, and Paleolithic diet among the most effective [58]. These dietary patterns share common characteristics favorable to metabolic profiles such as energy balance, higher UFA content, higher protein, higher intake of fruits and vegetables, minimally processed foods, and higher intakes of fiber.

4.1. Strengths

This is the first study incorporating XGBoost analysis to address the relationship between specific dietary patterns and T2D risk. We found that lifestyle variables confounded the relationship between nutrients from animal and plant sources and T2D history. By matching dietary patterns on common confounders, our study provides insights into how lifestyle and nutrient intake contexts influence T2D risk, offering a novel perspective on the interplay between diet and metabolic health. Although counter-matching has traditionally been employed to improve case–control efficiency, our study differs in that it uses propensity score matching to isolate dietary exposure groups for analysis of T2D predictors. The results of this study align with other studies employing ML methods for predicting T2D that found a high impact of age, body composition, and lifestyle variables on T2D [18,19,20,21]. However, the sets of predictors used across studies differed, which is likely to change feature importance. It is important to emphasize that researchers should consider the aims, sample characteristics, and sets of predictors used when comparing models. To our knowledge, this is the first study to employ XGBoost on an NHANES sample with PCA-engineered variables depicting diet–lifestyle patterns as additional features. Including PCA-engineered features allowed for the differentiation of importance for individual and interrelated predictors.

4.2. Limitations

The use of 24 h dietary recall and food-frequency questionnaires as a proxy for habitual intake is a weakness in this and many other dietary studies. We did not include sources of nutrients from beverages other than milk-based drinks and fruit juices, which does not account for all energy sources and prevents an in-depth examination of diet quality. However, the differences between the ASF and PBF groups were similar to those in other studies. Moreover, we did not account for other factors such as comorbidities, alcohol consumption, family history, or SES. However, due to their interrelatedness with other lifestyle factors, we did not deem it necessary to include all known/available background factors to illustrate issues with confounding. Prescription drug use could have biased serum marker estimates or skewed the associations between serum markers and T2D. Overall, prescription drug use was higher in the PBF group, suggesting increased rates of medical treatment and supporting the link between dietary patterns, healthy lifestyle bias, and disease outcomes. The cross-sectional design opens the possibility of reverse causality attribution, although many of our results are supported by prospective studies. Although we excluded individuals who reported being on a diet at the time of the survey, this questionnaire item may not account for all individuals who recently changed their lifestyle or dietary habits.
Furthermore, the results of this study may differ from future cross-sectional studies that include different age ranges. The XGBoost model achieved relatively high accuracy but low sensitivity, which may be due to the inclusion of interrelated predictors or the exclusion of other relevant predictors. However, our goal was not necessarily to find the best T2D predictors but to explore the importance of lifestyle and dietary factors in T2D history. Future studies should prospectively analyze the interactions of pre-existing metabolic conditions, nutrient intake thresholds, and T2D incidences to clarify the relationships further and propose new guidelines for management and prevention purposes.

4.3. Practical Implications

Although randomized controlled interventions significantly reduce bias and serve to isolate causal predictors, they are limited by funding, the extent to which nutrients and other physiological-influencing behaviors can be controlled, lack of long-term follow-up, and diet adherence issues, to name a few. While often having high statistical power, cross-sectional studies are often limited by retrospective design, lack of control variables and groups, lack of causal inference, and confounding due to unknown or uncollected variables. Despite these shortcomings, the wide availability (often publicly available) and high sample size of cohort-based datasets make these data sources readily accessible and have utility in providing insights on otherwise complex subject matter. In conjunction with incorporating control variables, robust ML techniques may improve the signal-to-noise ratio compared to traditional predictive methods.

5. Conclusions

This study highlights the complex interplay between dietary and lifestyle patterns and their association with T2D history, emphasizing the significant impact of age, body composition, and dietary fatty acid composition. Although the animal-sourced food pattern had a higher prevalence of T2D, this was primarily explainable by unhealthy lifestyles, yet animal-sourced protein intake was inversely associated. The XGBoost algorithm clarified the importance of interrelated multifactorial predictors, underscoring the limitations of traditional approaches in addressing confounding. Future research in this area should establish robust ML techniques and include longitudinal designs that would help isolate the effects of dietary and lifestyle factors on chronic disease outcomes. This approach can refine dietary guidelines and preventive strategies for T2D.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/jcm14020458/s1, Table S1. Aggregated and Derived Variables.

Author Contributions

A.C.E.: Conceptualization, Methodology, Validation, Formal analysis, Investigation, Data Curation, Writing—Original Draft, and Visualization. P.S.G.: Conceptualization, Methodology, Validation, and Writing—Review and Editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Institutional Review Board Statement

This research is considered exempt under 45 CFR 46.104. NHANES undergoes periodic human subjects research ethical review by the NCHS Ethics Review Board. IRB/ERB protocol numbers and descriptions can be found here: https://www.cdc.gov/nchs/nhanes/about/erb.html?CDC_AAref_Val=https://www.cdc.gov/nchs/nhanes/irba98.htm accessed on 18 December 2024.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used throughout this study were derived from the following resources available in the public domain: https://www.cdc.gov/nchs/nhanes/?CDC_AAref_Val=https://www.cdc.gov/nchs/nhanes/index.htm accessed on 18 December 2024.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

ASFsanimal-sourced foods
AUPRCarea under the precision-recall curve
AUROCarea under the receiver operating characteristic curve
BMIbody mass index
FAsfatty acids
HEIhealthy eating index
HS-CRPhigh-sensitivity c-reactive protein
MLmachine learning
MUFAsmonounsaturated fatty acids
PAphysical activity
PAXratio of total physical activity to sedentary time
PBFsPlant-based foods
PUFAspolyunsaturated fatty acids
RCTrandomized controlled trial
SFAssaturated fatty acids
T2DType 2 diabetes
ω3FAsdietary omega-3 fatty acids
ω6FAsdietary omega-6 fatty acids
ω6FAsω3FAs ratio of omega-6 to omega-3 fatty acids
UFAsunsaturated fatty acids

References

  1. Petersen, K.S.; Flock, M.R.; Richter, C.K.; Mukherjea, R.; Slavin, J.L.; Kris-Etherton, P.M. Healthy dietary patterns for preventing cardiometabolic disease: The role of plant-based foods and animal products. Curr. Dev. Nutr. 2017, 1, cdn-117. [Google Scholar] [CrossRef] [PubMed]
  2. Zhao, B.; Gan, L.; Graubard, B.I.; Männistö, S.; Fang, F.; Weinstein, S.J.; Liao, L.M.; Sinha, R.; Chen, X.; Albanes, D.; et al. Plant and Animal Fat Intake and Overall and Cardiovascular Disease Mortality. JAMA Intern. Med. 2024, 184, 1234–1245. [Google Scholar] [CrossRef] [PubMed]
  3. Song, M.; Fung, T.T.; Hu, F.B.; Willett, W.C.; Longo, V.D.; Chan, A.T.; Giovannucci, E.L. Association of animal and plant protein intake with all-cause and cause-specific mortality. JAMA Intern. Med. 2016, 176, 1453–1463. [Google Scholar] [CrossRef] [PubMed]
  4. Appel, L.J.; Sacks, F.M.; Carey, V.J.; Obarzanek, E.; Swain, J.F.; Miller, E.R.; Conlin, P.R.; Erlinger, T.P.; Rosner, B.A.; Laranjo, N.M.; et al. Effects of protein, monounsaturated fat, and carbohydrate intake on blood pressure and serum lipids: Results of the OmniHeart randomized trial. JAMA 2005, 294, 2455–2464. [Google Scholar] [CrossRef] [PubMed]
  5. Kelemen, L.E.; Kushi, L.H.; Jacobs, D.R., Jr.; Cerhan, J.R. Associations of dietary protein with disease and mortality in a prospective study of postmenopausal women. Am. J. Epidemiol. 2005, 161, 239–249. [Google Scholar] [CrossRef] [PubMed]
  6. Shang, X.; Scott, D.; Hodge, A.M.; English, D.R.; Giles, G.G.; Ebeling, P.R.; Sanders, K.M. Dietary protein intake and risk of type 2 diabetes: Results from the Melbourne Collaborative Cohort Study and a meta-analysis of prospective studies. Am. J. Clin. Nutr. 2016, 104, 1352–1365. [Google Scholar] [CrossRef] [PubMed]
  7. Liu, T.; Nie, X.; Wu, Z.; Zhang, Y.; Feng, G.; Cai, S.; Lv, Y.; Peng, X. Can statistic adjustment of OR minimize the potential confounding bias for meta-analysis of case-control study? A secondary data analysis. BMC Med. Res. Methodol. 2017, 17, 179. [Google Scholar] [CrossRef]
  8. Zhang, Y.; Liu, X. Effects of physical activity and sedentary behaviors on cardiovascular disease and the risk of all-cause mortality in overweight or obese middle-aged and older adults. Front. Public Health 2024, 12, 1302783. [Google Scholar] [CrossRef]
  9. Tong, T.Y.; Appleby, P.N.; Bradbury, K.E.; Perez-Cornago, A.; Travis, R.C.; Clarke, R.; Key, T.J. Risks of ischaemic heart disease and stroke in meat eaters, fish eaters, and vegetarians over 18 years of follow-up: Results from the prospective EPIC-Oxford study. BMJ 2019, 366, l4897. [Google Scholar] [CrossRef] [PubMed]
  10. Satija, A.; Bhupathiraju, S.N.; Rimm, E.B.; Spiegelman, D.; Chiuve, S.E.; Borgi, L.; Willett, W.C.; Manson, J.E.; Sun, Q.; Hu, F.B. Plant-based dietary patterns and incidence of type 2 diabetes in US men and women: Results from three prospective cohort studies. PLoS Med. 2016, 13, e1002039. [Google Scholar] [CrossRef]
  11. Eckart, A.C.; Stavitz, J.A.; Bhochhibhoya, A.; Sharma Ghimire, P. Associations of animal source foods, cardiovascular disease history, and health behaviors from the national health and nutrition examination survey: 2013–2016. Glob. Epidemiol. 2023, 5, 100112. [Google Scholar] [CrossRef] [PubMed]
  12. Villegas, R.; Yang, G.; Gao, Y.T.; Cai, H.; Li, H.; Zheng, W.; Shu, X.O. Dietary patterns are associated with lower incidence of type 2 diabetes in middle-aged women: The Shanghai Women’s Health Study. Int. J. Epidemiol. 2010, 39, 889–899. [Google Scholar] [CrossRef] [PubMed]
  13. Stanhope, K.L.; Goran, M.I.; Bosy-Westphal, A.; King, J.C.; Schmidt, L.A.; Schwarz, J.M.; Stice, E.; Sylvetsky, A.C.; Turnbaugh, P.J.; Bray, G.A.; et al. Pathways and mechanisms linking dietary components to cardiometabolic disease: Thinking beyond calories. Obes. Rev. 2018, 19, 1205–1235. [Google Scholar] [CrossRef] [PubMed]
  14. Odegaard, A.O.; Koh, W.P.; Butler, L.M.; Duval, S.; Gross, M.D.; Yu, M.C.; Yuan, J.M.; Pereira, M.A. Dietary patterns and incident type 2 diabetes in Chinese men and women: The Singapore Chinese Health Study. Diabetes Care 2011, 34, 880–885. [Google Scholar] [CrossRef] [PubMed]
  15. Sinha, R.; Cross, A.J.; Graubard, B.I.; Leitzmann, M.F.; Schatzkin, A. Meat intake and mortality: A prospective study of over half a million people. Arch. Intern. Med. 2009, 169, 562–571. [Google Scholar] [CrossRef] [PubMed]
  16. Fregoso-Aparicio, L.; Noguez, J.; Montesinos, L.; García-García, J.A. Machine learning and deep learning predictive models for type 2 diabetes: A systematic review. Diabetol. Metab. Syndr. 2021, 13, 148. [Google Scholar] [CrossRef]
  17. Gunasekara, N.; Pfahringer, B.; Gomes, H.; Bifet, A. Gradient boosted trees for evolving data streams. Mach. Learn. 2024, 113, 3325–3352. [Google Scholar] [CrossRef]
  18. Tan, C.; Li, B.; Xiao, L.; Zhang, Y.; Su, Y.; Ding, N. A prediction model of the incidence of type 2 diabetes in individuals with abdominal obesity: Insights from the general population. Diabetes Metab. Syndr. Obes. Targets Ther. 2022, 15, 3555–3564. [Google Scholar] [CrossRef]
  19. Farvahari, A.; Gozashti, M.H.; Dehesh, T. The usage of lasso, ridge, and linear regression to explore the most influential metabolic variables that affect fasting blood sugar in type 2 Diabetes patients. Rom. J. Diabetes Nutr. Metab. Dis. 2019, 26, 371–379. [Google Scholar]
  20. Qin, Y.; Wu, J.; Xiao, W.; Wang, K.; Huang, A.; Liu, B.; Yu, J.; Li, C.; Yu, F.; Ren, Z. Machine learning models for data-driven prediction of diabetes by lifestyle type. Int. J. Environ. Res. Public Health 2022, 19, 15027. [Google Scholar] [CrossRef]
  21. Meng, X.H.; Huang, Y.X.; Rao, D.P.; Zhang, Q.; Liu, Q. Comparison of three data mining models for predicting diabetes or prediabetes by risk factors. Kaohsiung J. Med. Sci. 2013, 29, 93–99. [Google Scholar] [CrossRef]
  22. National Center for Health Statistics. NHANES Survey Methods and Analytic Guidelines. Available online: https://wwwn.cdc.gov/nchs/nhanes/analyticguidelines.aspx (accessed on 1 October 2024).
  23. National Cancer Institute, Division of Cancer Control and Population Sciences. Developing the Healthy Eating Index. Available online: https://epi.grants.cancer.gov/hei/developing.html#history (accessed on 1 October 2024).
  24. Serra, M.C.; Ryan, A.S.; Hafer-Macko, C.E.; Yepes, M.; Nahab, F.B.; Ziegler, T.R. Dietary and serum Omega-6/Omega-3 fatty acids are Associated with Physical and metabolic function in stroke survivors. Nutrients 2020, 12, 701. [Google Scholar] [CrossRef] [PubMed]
  25. Shetty, S.S.; Shetty, P.K. ω-6/ω-3 fatty acid ratio as an essential predictive biomarker in the management of type 2 diabetes mellitus. Nutrition 2020, 79, 110968. [Google Scholar] [CrossRef]
  26. Albert, B.B.; Derraik, J.G.; Brennan, C.M.; Biggs, J.B.; Smith, G.C.; Garg, M.L.; Cameron-Smith, D.; Hofman, P.L.; Cutfield, W.S. Higher omega-3 index is associated with increased insulin sensitivity and more favourable metabolic profile in middle-aged overweight men. Sci. Rep. 2014, 4, 6697. [Google Scholar] [CrossRef] [PubMed]
  27. Nanayakkara, N.; Curtis, A.J.; Heritier, S.; Gadowski, A.M.; Pavkov, M.E.; Kenealy, T.; Owens, D.R.; Thomas, R.L.; Song, S.; Wong, J.; et al. Impact of age at type 2 diabetes mellitus diagnosis on mortality and vascular complications: Systematic review and meta-analyses. Diabetologia 2021, 64, 275–287. [Google Scholar] [CrossRef] [PubMed]
  28. Janssen, J.A. Hyperinsulinemia and its pivotal role in aging, obesity, type 2 diabetes, cardiovascular disease and cancer. Int. J. Mol. Sci. 2021, 22, 7797. [Google Scholar] [CrossRef]
  29. Campagna, D.; Alamo, A.; Di Pino, A.; Russo, C.; Calogero, A.E.; Purrello, F.; Polosa, R. Smoking and diabetes: Dangerous liaisons and confusing relationships. Diabetol. Metab. Syndr. 2019, 11, 85. [Google Scholar] [CrossRef]
  30. Levine, M.E.; Suarez, J.A.; Brandhorst, S.; Balasubramanian, P.; Cheng, C.W.; Madia, F.; Fontana, L.; Mirisola, M.G.; Guevara-Aguirre, J.; Wan, J.; et al. Low protein intake is associated with a major reduction in IGF-1, cancer, and overall mortality in the 65 and younger but not older population. Cell Metab. 2014, 19, 407–417. [Google Scholar] [CrossRef]
  31. Markova, M.; Pivovarova, O.; Hornemann, S.; Sucher, S.; Frahnow, T.; Wegner, K.; Machann, J.; Petzke, K.J.; Hierholzer, J.; Lichtinghagen, R.; et al. Isocaloric diets high in animal or plant protein reduce liver fat and inflammation in individuals with type 2 diabetes. Gastroenterology 2017, 152, 571–585. [Google Scholar] [CrossRef]
  32. Malik, V.S.; Li, Y.; Tobias, D.K.; Pan, A.; Hu, F.B. Dietary protein intake and risk of type 2 diabetes in US men and women. Am. J. Epidemiol. 2016, 183, 715–728. [Google Scholar] [CrossRef] [PubMed]
  33. Sluijs, I.; Beulens, J.W.; Van Der, A.D.L.; Spijkerman, A.M.; Grobbee, D.E.; Van Der Schouw, Y.T. Dietary intake of total, animal, and vegetable protein and risk of type 2 diabetes in the European Prospective Investigation into Cancer and Nutrition (EPIC)-NL study. Diabetes Care 2010, 33, 43–48. [Google Scholar] [CrossRef] [PubMed]
  34. Azemati, B.; Rajaram, S.; Jaceldo-Siegl, K.; Sabate, J.; Shavlik, D.; Fraser, G.E.; Haddad, E.H. Animal-protein intake is associated with insulin resistance in Adventist Health Study 2 (AHS-2) calibration substudy participants: A cross-sectional analysis. Curr. Dev. Nutr. 2017, 1, e000299. [Google Scholar] [CrossRef]
  35. Van Nielen, M.; Feskens, E.J.; Mensink, M.; Sluijs, I.; Molina, E.; Amiano, P.; Ardanaz, E.; Balkau, B.; Beulens, J.W.; Boeing, H.; et al. Dietary protein intake and incidence of type 2 diabetes in Europe: The EPIC-InterAct Case-Cohort Study. Diabetes Care 2014, 37, 1854–1862. [Google Scholar] [CrossRef] [PubMed]
  36. Brochu, M.; Mathieu, M.E.; Karelis, A.D.; Doucet, É.; Lavoie, M.E.; Garrel, D.; Rabasa-Lhoret, R. Contribution of the lean body mass to insulin resistance in postmenopausal women with visceral obesity: A Monet study. Obesity 2008, 16, 1085–1093. [Google Scholar] [CrossRef]
  37. Ehrhardt, N.; Cui, J.; Dagdeviren, S.; Saengnipanthkul, S.; Goodridge, H.S.; Kim, J.K.; Lantier, L.; Guo, X.; Chen, Y.D.; Raffel, L.J.; et al. Adiposity-independent effects of aging on insulin sensitivity and clearance in mice and humans. Obesity 2019, 27, 434–443. [Google Scholar] [CrossRef] [PubMed]
  38. Matta, J.; Mayo, N.; Dionne, I.J.; Gaudreau, P.; Fulop, T.; Tessier, D.; Gray-Donald, K.; Shatenstein, B.; Morais, J.A. Muscle mass index and animal source of dietary protein are positively associated with insulin resistance in participants of the NuAge study. J. Nutr. Health Aging 2016, 20, 90–97. [Google Scholar] [CrossRef]
  39. Aubertin-Leheudre, M.; Adlercreutz, H. Relationship between animal protein intake and muscle mass index in healthy women. Br. J. Nutr. 2009, 102, 1803–1810. [Google Scholar] [CrossRef] [PubMed]
  40. Liu, Z.; Guo, Y.; Zheng, C. Type 2 diabetes mellitus related sarcopenia: A type of muscle loss distinct from sarcopenia and disuse muscle atrophy. Front. Endocrinol. 2024, 15, 1375610. [Google Scholar] [CrossRef] [PubMed]
  41. Bowen, K.J.; Harris, W.S.; Kris-Etherton, P.M. Omega-3 fatty acids and cardiovascular disease: Are there benefits? Curr. Treat. Options Cardiovasc. Med. 2016, 18, 69. [Google Scholar] [CrossRef] [PubMed]
  42. Aung, T.; Halsey, J.; Kromhout, D.; Gerstein, H.C.; Marchioli, R.; Tavazzi, L.; Geleijnse, J.M.; Rauch, B.; Ness, A.; Galan, P.; et al. Associations of omega-3 fatty acid supplement use with cardiovascular disease risks: Meta-analysis of 10 trials involving 77,917 individuals. JAMA Cardiol. 2018, 3, 225–233. [Google Scholar] [CrossRef]
  43. Watanabe, Y.; Tatsuno, I. Omega-3 polyunsaturated fatty acids for cardiovascular diseases: Present, past and future. Expert Rev. Clin. Pharmacol. 2017, 10, 865–873. [Google Scholar] [CrossRef]
  44. Brown, T.J.; Brainard, J.; Song, F.; Wang, X.; Abdelhamid, A.; Hooper, L. Omega-3, omega-6, and total dietary polyunsaturated fat for prevention and treatment of type 2 diabetes mellitus: Systematic review and meta-analysis of randomised controlled trials. BMJ 2019, 366, l4697. [Google Scholar] [CrossRef]
  45. Johnston, L.W.; Harris, S.B.; Retnakaran, R.; Giacca, A.; Liu, Z.; Bazinet, R.P.; Hanley, A.J. Association of NEFA composition with insulin sensitivity and beta cell function in the Prospective Metabolism and Islet Cell Evaluation (PROMISE) cohort. Diabetologia 2018, 61, 821–830. [Google Scholar] [CrossRef] [PubMed]
  46. Jiang, S.; Yang, W.; Li, Y.; Feng, J.; Miao, J.; Shi, H.; Xue, H. Monounsaturated and polyunsaturated fatty acids concerning prediabetes and type 2 diabetes mellitus risk among participants in the National Health and Nutrition Examination Surveys (NHANES) from 2005 to March 2020. Front. Nutr. 2023, 10, 1284800. [Google Scholar] [CrossRef]
  47. Sheppard, K.W.; Cheatham, C.L. Omega-6/omega-3 fatty acid intake of children and older adults in the US: Dietary intake in comparison to current dietary recommendations and the Healthy Eating Index. Lipids Health Dis. 2018, 17, 43. [Google Scholar] [CrossRef]
  48. Egalini, F.; Guardamagna, O.; Gaggero, G.; Varaldo, E.; Giannone, B.; Beccuti, G.; Benso, A.; Broglio, F. The effects of omega 3 and omega 6 fatty acids on glucose metabolism: An updated review. Nutrients 2023, 15, 2672. [Google Scholar] [CrossRef]
  49. Simopoulos, A.P. An increase in the omega-6/omega-3 fatty acid ratio increases the risk for obesity. Nutrients 2016, 8, 128. [Google Scholar] [CrossRef] [PubMed]
  50. Petersen, K.S.; Maki, K.C.; Calder, P.C.; Belury, M.A.; Messina, M.; Kirkpatrick, C.F.; Harris, W.S. Perspective on the health effects of unsaturated fatty acids and commonly consumed plant oils high in unsaturated fat. Br. J. Nutr. 2024, 132, 1039–1050. [Google Scholar] [CrossRef]
  51. Oh, Y.S.; Bae, G.D.; Baek, D.J.; Park, E.Y.; Jun, H.S. Fatty acid-induced lipotoxicity in pancreatic beta-cells during development of type 2 diabetes. Front. Endocrinol. 2018, 9, 384. [Google Scholar] [CrossRef]
  52. Keane, D.C.; Takahashi, H.K.; Dhayal, S.; Morgan, N.G.; Curi, R.; Newsholme, P. Arachidonic acid actions on functional integrity and attenuation of the negative effects of palmitic acid in a clonal pancreatic β-cell line. Clin. Sci. 2011, 120, 195–206. [Google Scholar] [CrossRef] [PubMed]
  53. Qian, F.; Korat, A.A.; Malik, V.; Hu, F.B. Metabolic effects of monounsaturated fatty acid–enriched diets compared with carbohydrate or polyunsaturated fatty acid–enriched diets in patients with type 2 diabetes: A systematic review and meta-analysis of randomized controlled trials. Diabetes Care 2016, 39, 1448–1457. [Google Scholar] [CrossRef]
  54. Bowden Davies, K.A.; Sprung, V.S.; Norman, J.A.; Thompson, A.; Mitchell, K.L.; Halford, J.C.; Harrold, J.A.; Wilding, J.P.; Kemp, G.J.; Cuthbertson, D.J. Short-term decreased physical activity with increased sedentary behaviour causes metabolic derangements and altered body composition: Effects in individuals with and without a first-degree relative with type 2 diabetes. Diabetologia 2018, 61, 1282–1294. [Google Scholar] [CrossRef] [PubMed]
  55. Carbone, S.; Del Buono, M.G.; Ozemek, C.; Lavie, C.J. Obesity, risk of diabetes and role of physical activity, exercise training and cardiorespiratory fitness. Prog. Cardiovasc. Dis. 2019, 62, 327–333. [Google Scholar] [CrossRef]
  56. Ye, J.; Yin, J. Type 2 diabetes: A sacrifice program handling energy surplus. Life Metab. 2024, 3, loae033. [Google Scholar] [CrossRef]
  57. Zeng, X.; Ji, Q.P.; Jiang, Z.Z.; Xu, Y. The effect of different dietary restriction on weight management and metabolic parameters in people with type 2 diabetes mellitus: A network meta-analysis of randomized controlled trials. Diabetol. Metab. Syndr. 2024, 16, 254. [Google Scholar] [CrossRef]
  58. Schwingshackl, L.; Chaimani, A.; Hoffmann, G.; Schwedhelm, C.; Boeing, H. A network meta-analysis on the comparative efficacy of different dietary approaches on glycaemic control in patients with type 2 diabetes mellitus. Eur. J. Epidemiol. 2018, 33, 157–170. [Google Scholar] [CrossRef]
Figure 1. Data processing and analysis overview. Subpopulation percentages are weighted.
Figure 1. Data processing and analysis overview. Subpopulation percentages are weighted.
Jcm 14 00458 g001
Figure 2. Variance inflation factors for T2D predictors. Predictors were standardized (Z) before the analysis.
Figure 2. Variance inflation factors for T2D predictors. Predictors were standardized (Z) before the analysis.
Jcm 14 00458 g002
Figure 3. Confusion matrix (top), ROC curve (middle), and PRC (bottom) curve for the XGBoost classifier, red line: Reference Line.
Figure 3. Confusion matrix (top), ROC curve (middle), and PRC (bottom) curve for the XGBoost classifier, red line: Reference Line.
Jcm 14 00458 g003aJcm 14 00458 g003b
Figure 4. SHAP beeswarm plot ranking the top 20 features by their importance in predicting T2D history. Features on the y-axis increase in magnitude from left to right. Pink data points have a positive impact on prediction; blue data points have a negative impact on prediction. * PCA-engineered feature; data points represent regression factor scores. Higher scores indicate a stronger association with the feature.
Figure 4. SHAP beeswarm plot ranking the top 20 features by their importance in predicting T2D history. Features on the y-axis increase in magnitude from left to right. Pink data points have a positive impact on prediction; blue data points have a negative impact on prediction. * PCA-engineered feature; data points represent regression factor scores. Higher scores indicate a stronger association with the feature.
Jcm 14 00458 g004
Table 1. Descriptive estimates by dietary pattern.
Table 1. Descriptive estimates by dietary pattern.
PBF Pattern (n = 1373)
T2D (9.5%)
ASF Pattern (n = 1373)
T2D (11.3%)
Median Diff. (%)
[ASF/PBF]
MedianIQRMedianIQR
Age (years)48.0031.0047.0030.000.98
Alpha-Linolenic acid (18:3n–3) (μmol/L)73.3054.5069.8046.600.95
Arachidonic acid (20:4n–6) (μmol/L)784.00352.00829.00345.001.06
ASF MUFAs (gm) 5.227.43
ASF PUFAs (gm) 2.604.12
Total ASF protein (gm) 23.8826.19
Total ASF fat (gm) 14.6419.62
BMI Change (past year)0.192.230.232.601.25
Body Mass Index (kg/m2)27.108.7027.808.301.03
Carbohydrate (gm)243.59132.84238.31151.000.98
Cholesterol (mg)220.00256.00300.00301.001.36
Total dietary omega–3 (gm)0.090.240.240.452.74
Total dietary omega–6 (gm)15.0812.7716.7411.951.11
Dietary O6:O3 ratio131.24431.0661.46145.560.47
Dietary fiber (gm)16.9012.7015.2012.600.90
Direct HDL-Cholesterol (mg/dL)53.0022.0052.0022.000.98
Docosahexaenoic acid (22:6n–3) (μmol/L)152.0088.00140.0081.000.92
Eicosapentaenoic acid (20:5n–3) (μmol/L)53.4042.7051.8051.400.97
Energy (kcal)2028.001046.002161.001191.001.07
Fasting glucose (mg/dL)100.0014.00100.0015.001.00
Glycohemoglobin (%)5.400.505.400.601.00
hs-CRP (mg/L)1.503.501.803.41.20
Insulin (μU/mL)8.658.679.039.231.04
LDL-cholesterol (mg/dL)108.0046.00110.0050.001.02
Linoleic acid (18:2n–6) (μmol/L)3410.001220.003220.001060.000.94
Minutes sedentary activity420.00300.00360.00300.000.86
PAX0.000.000.000.00
PBF MUFAs (gm)1.684.393.126.621.86
PBF PUFAs (gm)1.233.081.915.371.55
Total plant protein (gm)4.197.376.2512.241.49
Total plant fat (gm)5.2212.089.5418.821.83
Total kcals from PBFs (%)9.013.00.0010.00.00
Total kcals from ASFs (%) 14.7117.4
Protein (gm)73.6846.1386.4454.701.17
Serum omega–6 (μmol/L)4453.101409.804416.901449.300.99
Serum omega–3 (μmol/L)341.65205.00344.17180.421.01
Serum omega–6: omega–3 ratio13.215.3213.575.341.03
Total cholesterol (mg/dL)186.0054.00186.0057.001.00
Total fat (gm)75.7950.0782.8355.851.09
Total monounsaturated fatty acids (gm)26.6118.8928.6420.211.08
Total body fat (%) [DEXA]32.0012.2031.7013.400.99
Total polyunsaturated fatty acids (gm)16.9114.0718.7813.591.11
Total saturated fatty acids (gm)24.4318.5426.7821.311.10
Total daily recreational physical activity (min)0.000.000.000.00
Triglyceride (mg/dL)94.0078.0090.0072.000.96
UFAs:SFAs ratio1.830.991.830.931.00
Table 2. Descriptive estimates by T2D status.
Table 2. Descriptive estimates by T2D status.
Non-Diabetics (n = 2460)Diabetics (n = 286)Median Diff. (%)
[T2D/Non-T2D]
MedianIQRMedianIQR
Age (years)46.0030.0063.0016.001.37
Alpha-Linolenic acid (18:3n–3) (μmol/L)72.0049.3075.5059.001.05
Arachidonic acid (20:4n–6) (μmol/L)797.00352.00886.00307.001.11
ASF MUFAs (gm)5.207.425.698.261.09
ASF PUFAs (gm)2.574.113.344.301.30
Total ASF protein (gm)24.0426.5022.5121.720.94
Total ASF fat (gm)14.6319.6215.5620.131.06
Total kcals from ASFs (%)15.0017.0016.0019.001.07
BMI Change (past year)0.262.34−0.602.98−2.33
Body Mass Index (kg/m2)27.208.4031.108.001.14
Carbohydrate (gm)244.18143.49209.18119.750.86
Cholesterol (mg)249.00284.00232.00289.000.93
Total dietary omega–3 (gm)0.160.340.170.381.02
Total dietary omega–6 (gm)16.1512.4013.5511.340.84
Dietary O6:O3 ratio90.32276.0768.67261.120.76
Dietary fiber (gm)16.2012.7014.9010.600.92
Direct HDL-Cholesterol (mg/dL)52.0022.0047.0019.000.90
Docosahexaenoic acid (22:6n–3) (μmol/L)143.0079.00152.0078.001.06
Eicosapentaenoic acid (20:5n–3) (μmol/L)53.4047.4051.9053.300.97
Energy (kcal)2101.001141.001781.001036.000.85
Fasting Glucose (mg/dL)99.0013.00134.0062.001.35
Glycohemoglobin (%)5.400.506.801.801.26
hs-CRP (mg/L)1.603.402.404.901.5
Insulin (μU/mL)8.658.7611.4711.191.33
LDL-cholesterol (mg/dL)110.0048.00100.0054.000.91
Linoleic acid (18:2n–6) (μmol/L)3320.001100.003410.001390.001.03
Minutes sedentary activity360.00300.00420.00240.001.17
PAX0.000.000.000.00
PBF MUFAs (gm)1.945.221.364.160.70
PBF PUFAs (gm)1.383.501.243.100.90
Total plant protein (gm)4.628.694.837.501.05
Total plant fat (gm)6.3213.974.0012.190.63
Total kcals from PBFs (%)0.050.140.050.141.02
Protein (gm)81.2751.9668.1440.530.84
Serum omega–6 (μmol/L)4424.001426.304545.201706.101.02
Serum omega–3 (μmol/L)344.17185.80336.54175.190.98
Serum omega–6:omega–3 ratio13.365.2413.296.520.99
Total Cholesterol (mg/dL)186.0055.00171.0063.000.92
Total fat (gm)79.7053.8769.4145.090.87
Total monounsaturated fatty acids (gm)27.6419.8423.9419.010.87
Total body fat (%) [DEXA]31.5012.7039.109.501.24
Total polyunsaturated fatty acids (gm)18.2113.6415.5212.430.85
Total saturated fatty acids (gm)25.5120.1722.5117.030.88
Total daily recreational physical activity (min)0.000.000.000.00
Triglyceride (mg/dL)91.0068.00114.00102.001.25
UFAs: SFAs ratio1.820.981.890.861.03
Table 3. PCA pattern matrix for diet and lifestyle features.
Table 3. PCA pattern matrix for diet and lifestyle features.
Component12345678910
Variance Explained (%)19.88.98.47.16.36.05.64.74.24.0
Higher Total
Energy and UFAs
Female
Gender
PBF Dietary PatternLower SerumO6:O3 Ratio, Higher sO3Higher PBF Fat and Protein, Lower Dietary O6:O3Age, Smoking, Unhealthy Lifestyle, Recent BMI DecreaseASF Fat and ProteinPoor Body Composition, Recent BMI IncreaseHigher UFA:SFA RatioLow Physical Activity, Unhealthy Lifestyle
Total Omega-6 (gm)0.890.03−0.02−0.050.050.010.030.020.180.01
PUFAs (gm)0.890.03−0.02−0.050.050.010.040.010.180.01
MUFAs (gm)0.89−0.04−0.03−0.010.020.000.050.00−0.10−0.02
Total Energy (kcals)0.85−0.11−0.01−0.010.07−0.030.07−0.01−0.23−0.03
Total Fiber (gm)0.63−0.070.140.110.10−0.03−0.12−0.030.04−0.16
Male Gender0.01−0.980.000.000.000.010.000.020.02−0.01
Female Gender−0.010.980.000.000.00−0.010.00−0.02−0.020.01
PBF Dietary Pattern−0.010.001.000.000.010.000.010.000.000.01
ASF Dietary Pattern0.010.00−1.000.00−0.010.00−0.010.000.00−0.01
Total Serum Omega-3 (μmol/L)−0.010.000.000.96−0.010.000.00−0.010.000.01
Total Serum Omega-6 (μmol/L)0.290.10−0.050.64−0.08−0.02−0.06−0.05−0.100.24
Serum O6:O3 Ratio0.220.06−0.04−0.75−0.04−0.03−0.08−0.03−0.070.17
Total Plant Protein (gm)0.04−0.010.070.020.88−0.02−0.08−0.02−0.02−0.01
Total Plant Fat (gm)0.120.030.01−0.030.860.02−0.030.00−0.030.02
Total Omega-3 (gm)−0.050.01−0.16−0.030.62−0.010.270.020.200.01
Age (years)−0.06−0.02−0.030.13−0.040.67−0.110.120.180.00
Smoker0.06−0.210.01−0.040.000.550.060.09−0.170.22
Unhealthy Lifestyle−0.010.010.00−0.030.030.450.020.30−0.140.45
BMI Change (past year)0.01−0.050.000.060.00−0.63−0.040.420.010.31
Total ASF Protein (gm)−0.02−0.010.010.04−0.01−0.010.900.000.01−0.01
Total ASF Fat (gm)0.080.020.02−0.01−0.030.000.890.01−0.020.02
BMI (kg/m2)0.04−0.06−0.02−0.03−0.02−0.010.030.890.01−0.08
Body Fat (%)−0.090.480.030.040.030.09−0.030.580.00−0.06
UFAs:SFAs0.07−0.040.000.000.010.01−0.010.010.920.05
PAX0.09−0.01−0.030.01−0.020.04−0.010.13−0.09−0.84
Dietary O6:O3 Ratio0.380.090.16−0.03−0.410.04−0.03−0.020.18−0.02
Pattern matrix for ten principal components, explaining over 75% of the variance in T2D history. The table values are regression coefficients that reflect the unique contribution of each variable to the component. Components were rotated via direct oblimin (Δ = 0). Darker colors indicate a stronger coefficient between the predictor and the component. Blue = positive association with the component; red = negative association with the component.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Eckart, A.C.; Sharma Ghimire, P. Exploring Predictors of Type 2 Diabetes Within Animal-Sourced and Plant-Based Dietary Patterns with the XGBoost Machine Learning Classifier: NHANES 2013–2016. J. Clin. Med. 2025, 14, 458. https://doi.org/10.3390/jcm14020458

AMA Style

Eckart AC, Sharma Ghimire P. Exploring Predictors of Type 2 Diabetes Within Animal-Sourced and Plant-Based Dietary Patterns with the XGBoost Machine Learning Classifier: NHANES 2013–2016. Journal of Clinical Medicine. 2025; 14(2):458. https://doi.org/10.3390/jcm14020458

Chicago/Turabian Style

Eckart, Adam C., and Pragya Sharma Ghimire. 2025. "Exploring Predictors of Type 2 Diabetes Within Animal-Sourced and Plant-Based Dietary Patterns with the XGBoost Machine Learning Classifier: NHANES 2013–2016" Journal of Clinical Medicine 14, no. 2: 458. https://doi.org/10.3390/jcm14020458

APA Style

Eckart, A. C., & Sharma Ghimire, P. (2025). Exploring Predictors of Type 2 Diabetes Within Animal-Sourced and Plant-Based Dietary Patterns with the XGBoost Machine Learning Classifier: NHANES 2013–2016. Journal of Clinical Medicine, 14(2), 458. https://doi.org/10.3390/jcm14020458

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop