Can the Plantar Pressure and Temperature Data Trend Show the Presence of Diabetes? A Comparative Study of a Variety of Machine Learning Techniques

Gerlein, Eduardo A.; Calderón, Francisco; Zequera-Díaz, Martha; Naemi, Roozbeh

doi:10.3390/a17110519

Open AccessArticle

Can the Plantar Pressure and Temperature Data Trend Show the Presence of Diabetes? A Comparative Study of a Variety of Machine Learning Techniques

¹

Department of Electronics, Pontificia Universidad Javeriana, Bogotá 110231, Colombia

²

School of Health and Society, University of Salford, Manchester M6 6PU, UK

³

School of Health Science and Wellbeing, Staffordshire University, Stoke on Trent ST4 2DF, UK

^*

Authors to whom correspondence should be addressed.

Algorithms 2024, 17(11), 519; https://doi.org/10.3390/a17110519

Submission received: 9 September 2024 / Revised: 20 October 2024 / Accepted: 25 October 2024 / Published: 12 November 2024

(This article belongs to the Special Issue Machine Learning in Medical Signal and Image Processing (2nd Edition))

Download

Browse Figures

Versions Notes

Abstract

:

This study aimed to explore the potential of predicting diabetes by analyzing trends in plantar thermal and plantar pressure data, either individually or in combination, using various machine learning techniques. A total of twenty-six participants, comprising thirteen individuals diagnosed with diabetes and thirteen healthy individuals, walked along a 20 m path. In-shoe plantar pressure data were collected and the plantar temperature was measured both immediately before and after the walk. Each participant completed the trial three times, and the average data between the trials were calculated. The research was divided into three experiments: the first evaluated the correlations between the plantar pressure and temperature data; the second focused on predicting diabetes using each data type independently; and the third combined both data types and assessed the effect of such to enhance the predictive accuracy. For the experiments, 20 regression models and 16 classification algorithms were employed, and the performance was evaluated using a five-fold cross-validation strategy. The outcomes of the initial set of experiments indicated that the machine learning models were significant correlations between the thermal data and pressure estimates. This was consistent with the findings from the prior correlation analysis, which showed weak relationships between these two data modalities. However, a shift in focus towards predicting diabetes by aggregating the temperature and pressure data led to encouraging results, demonstrating the effectiveness of this approach in accurately predicting the presence of diabetes. The analysis revealed that, while several classifiers demonstrated reasonable metrics when using standalone variables, the integration of thermal and pressure data significantly improved the predictive accuracy. Specifically, when only plantar pressure data were used, the Logistic Regression model achieved the highest accuracy at 68.75%. Those predictions based solely on temperature data showed the Naive Bayes model as the lead with an accuracy of 87.5%. Notably, the highest accuracy of 93.75% was observed when both the temperature and pressure data were combined, with the Extra Trees Classifier performing the best. These results suggest that combining temperature and pressure data enhances the model’s predictive accuracy. This can indicate the importance of multimodal data integration and their potentials in diabetes prediction.

Keywords:

diabetes prediction; thermal analysis; plantar pressure; machine learning

1. Introduction

Diabetes mellitus, a common chronic metabolic disorder characterized by elevated blood glucose levels, presents significant health challenges globally [1]. Its considerable impact on healthcare systems highlights the urgent need for early diagnosis and effective management to prevent complications and improve patient outcomes [2]. Among the complications associated with diabetes, diabetic foot problems are particularly severe, with the potential to lead to amputation in the most extreme cases [3]. Traditionally, diabetes diagnosis has relied on clinical assessments, blood tests, and self-reported symptoms. However, recent advances in technology have paved the way for innovative approaches to predicting and managing diabetes more efficiently [4]. Further complications, such as neuropathy, which causes nerve damage and loss of sensation, can lead to a diabetic foot, where minor injuries can go unnoticed and progress to severe infections or ulcers due to poor blood circulation and reduced healing capacity.

Accurate and timely prediction is essential for early intervention and effective disease management [5]. In recent years, wearable sensor technologies have gained popularity as tools for monitoring physiological data, showing promise in identifying new indicators to predict diabetes [6,7]. Notably, the use of thermal and pressure data has shown potential in improving the accuracy of diabetes diagnosis and treatment. For instance, measuring the plantar temperature can reveal differences in the soles of diabetic feet, with the ability to detect ulcers and necrosis with accuracies of 90% and 88%, respectively [8]. Infrared thermography has also proven to be effective in detecting temperature variability in the feet of diabetic patients, aiding in the early diagnosis and prevention of lesions in affected areas [9]. Furthermore, regression analysis has shown the potential to accurately predict the maximum plantar pressure in patients with type 2 diabetes mellitus, which is crucial for the prevention and early detection of diabetic foot complications [10]. However, studies that combine temperature and pressure data are limited. For example, Yavuz et al. [11] found no significant correlation between plantar temperatures and triaxial plantar stresses in individuals with diabetes. This lack of correlation suggests that these variables can be effectively combined as independent attributes in machine learning models for the prediction of diabetes, potentially leading to more robust predictive models, as is also discussed in this paper.

Wearable sensor technologies, such as those integrated into shoes or insoles, have emerged as promising tools for the continuous monitoring of physiological parameters relevant to the management of diabetes [12,13,14,15]. These non-invasive technologies enable continuous data collection, facilitating the early detection of potential problems, including those related to diabetic foot complications [16]. Of particular interest for the prediction of diabetes are the thermal patterns and pressure distribution measured by these devices [17].

Thermal imaging techniques are used to visualize the distribution of skin temperature, which can indicate underlying metabolic processes and pathological conditions. This non-invasive diagnostic method uses the principles of heat transfer and physiological responses of the body to detect temperature variations that can signal health problems [18]. Meanwhile, several studies have indicated that insole systems that measure plantar pressure can be beneficial in managing diabetic foot health by reducing ulcer recurrence, lowering plantar stress, helping to detect early complication, and improving gait and weight distribution [19,20,21].

The relationship between plantar pressure, temperature, and diabetic foot complications is an emerging area of research. Diabetic neuropathy often leads to foot ulceration due to a combination of elevated temperatures, loss of sensation, and abnormal pressure distribution. Understanding these factors is essential for the prevention and management of diabetic foot complications. Diabetic neuropathy is associated with higher plantar foot temperatures, which can be measured non-invasively using infrared thermal imaging, indicating its potential as a tool for evaluating high-risk diabetic feet [22]. Furthermore, plantar pressure measurements are increasingly integrated into clinical practice, with evidence supporting their role in ulcer prevention and the importance of long-term monitoring to provide feedback on concern pressure levels [23]. Sawacha et al. [24] emphasized that the simultaneous assessment of kinematics, kinetics, and plantar pressure can more accurately characterize the biomechanics of the diabetic foot, potentially helping to prevent foot ulcerations. The classification of plantar pressure distributions has proven useful in identifying diabetic patients at risk of foot ulceration and guiding the provision of preventive interventions, such as therapeutic footwear [25]. Changes in these parameters have been linked to diabetes-related foot complications, such as neuropathy and an increased risk of ulceration, ultimately contributing to the development of a diabetic foot [23]. However, there is a scarcity of studies that have investigated the potential of these measures to diagnose diabetes.

This study introduces a novel approach to predicting diabetes by integrating temperature and plantar pressure data, a combination not extensively explored in the previous research. While the existing studies typically focus on either temperature or pressure independently, our work leverages both modalities to enhance the predictive accuracy. This multimodal approach provides deeper insights into foot health and potential complications. This work contributes to the field by not only demonstrating the limitations of single-modality analysis but also by showing how integrating multiple data sources can yield more robust machine learning models for clinical prediction tasks.

Recently, the application of machine learning techniques in the prediction of diabetes has gained significant traction [26]. Machine learning uses computational algorithms to analyze large datasets, identify complex patterns, and make accurate predictions [27,28].

The machine learning approaches to diabetes prediction are inherently data-driven, relying on diverse datasets that include a wide range of patient information such as clinical data, genetic markers, lifestyle factors, and physiological measurements. The integration of wearable sensor technologies has further enriched these datasets by providing real-time, continuous monitoring of the parameters relevant to diabetes [29,30]. In the context of diabetes, the features of interest include blood glucose levels, insulin sensitivity, physical activity, dietary habits, and, as explored in this paper, thermal and pressure data from the feet. The machine learning models for diabetes prediction encompass a wide range of algorithms, such as Decision Trees, Support Vector Machines, Random Forests, and neural networks, among others. Each algorithm offers unique advantages and may be suited to different aspects of diabetes prediction. For example, deep learning models can effectively capture complex patterns in large datasets [31], while Decision Trees can provide more interpretable insights into risk factors [32,33]. Few studies have examined the application of several available algorithms in tandem on the predictions [34,35]. Evaluating the performance of machine learning models is a crucial step in the process. Metrics such as accuracy, sensitivity, specificity, and the area under the receiver operating characteristic curve (AUC-ROC) are commonly used to assess the predictive capacity of these models. Despite these, there is a scarcity of previous studies that have investigated the potential of these measures in tandem to diagnose diabetes.

The primary objective of this study is to investigate the feasibility of using thermal data, pressure data, or a combination of both variables to predict the presence of diabetes. By analyzing the time series data of thermal and pressure measurements from a diverse group of individuals—including both diabetic and non-diabetic subjects—we aim to explore the potential predictive capabilities of these variables. The study employs machine learning algorithms to assess whether plantar pressure, plantar temperature, or a combination of both can effectively predict diabetes. For our analysis, we used a consolidated thermal time series that includes data from five anatomical points at the feet, combined with pressure data.

In our study, the initial experiments aimed to investigate whether significant correlations could be established between thermal data and plantar pressure estimates using a variety of machine learning models for regression. Despite the use of sophisticated regression techniques, these models demonstrated sub-optimal performance in predicting pressure values from thermal data alone, as indicated by the low accuracy metrics and high error rates. The lack of a correlation suggests that the physiological processes reflected by the thermal measurements may not directly translate to the biomechanical indicators captured by the pressure data.

Recognizing the limitations of this approach, we shifted our focus towards aggregating both data types—temperature and pressure—for the prediction of diabetes. This multimodal approach yielded significantly more encouraging results, with a notable improvement in predictive accuracy. The combination of these independent datasets allowed machine learning models to leverage the complementary nature of thermal and pressure data, offering a more comprehensive understanding of the physiological changes associated with diabetes. By integrating these variables, we were able to achieve high levels of prediction accuracy, underscoring the importance of multimodal data fusion in medical diagnostics.

We hypothesize that integrating these two physiological datasets will result in more accurate predictions. For example, when used independently, classifiers such as the Logistic Regression model have shown moderate accuracy with pressure data (around 68.75%), while temperature data can yield higher accuracy, with the Naive Bayes model reaching up to 87.5%. However, it is expected that the combination of pressure and temperature data will lead to significantly improved performance, with classifiers like the Extra Trees Classifier demonstrating stronger metrics, including precision, recall, and F1 scores. This study will also explore the importance of algorithm selection in optimizing prediction accuracy and the need to tailor models to different participant groups.

By examining the performance of multiple machine learning models with both individual and combined data, this research seeks to highlight the potential of multimodal data integration in enhancing the accuracy of diabetes prediction.

The rest of the paper is organized as follows: Section 2 outlines the methods used in our study, detailing the composition of the dataset, the experimental setup and protocol, and the data acquisition procedures. In Section 3, we present the results of the predictive models using machine learning techniques. Section 4 delves into the discussion and conclusions, summarizing the insights gained and their implications for diabetes prediction and management.

2. Methods

This section describes the methodologies used in our investigation of the feasibility of predicting diabetes using thermal and pressure data. The following subsections provide a detailed discussion of the composition and characteristics of the dataset, the specific experimental procedures and protocols used to gather the data, and the techniques and instruments utilized for collecting plantar pressure and thermal measurements from the participants.

2.1. Dataset Description

The study involved a total of 26 participants, including 13 individuals at various stages of diabetes and 13 healthy controls. The group consisted of 18 women and 8 men, with ages ranging from 40 to 73 years. The weights of the participants ranged from 42 to 110 kilograms(kg), and their heights ranged between 1.42 and 1.90 meters(m).

2.2. Experimental Setup and Protocol

To ensure consistency and minimize confounding factors, an experimental area suitable for walking was carefully selected. The designated area was chosen to avoid “quick” twisting movements that could potentially increase friction inside the shoe, thereby impacting the measurements. A 25 m ∞-shaped walkway was established for data collection, as illustrated in Figure 1.

Simultaneously, a thermal camera and a chronometer were arranged to record the necessary temperature measurements before and after the walks. The thermal camera was placed at a distance of 1 to 1.5 m from the feet of the participants to capture accurate thermal images, as shown in Figure 2.

2.3. Data Acquisition Procedure

The participant was first informed about the study. Initially, plantar pressure measurements were taken in the shoe while walking. Participants were asked to walk on an ∞-shaped path in two familiarization trials, followed by three trials for pressure data collection. Twelve steps were analyzed to calculate the average peak plantar pressure during the stance phase of walking for each foot. The shoe pressure sensor (Parologg Pressure Measurement System, Paromed, Neu-Beuern, Germany) provided data that were processed to obtain the maximum plantar pressure, the average maximum plantar pressure, and the average plantar pressure at each anatomical point.

After completion of plantar pressure measurements, participants followed a procedure to measure foot temperature using a Flir One Pro thermal imaging camera. The temperature was recorded immediately before and immediately after the insoles, following these steps:

The participant was asked to be barefoot and lie supine on a flat couch, with a cushion placed under the head for comfort. Specific regions of interest on the feet were marked for thermal measurements. These regions included the hallux, first metatarsus, third metatarsus, fifth metatarsus, midfoot (proximal fifth metatarsal head), medial arch (proximal first metatarsal head), and heel, as shown in Figure 3.
After a 15 min acclimatization period on the couch, baseline foot temperature was recorded using the thermal imaging camera.
The participant then wore shoes and walked at a natural pace along the designated pathway.
Immediately after completing the walk, the shoes were removed and the participant was asked to lie on the couch.
Temperature measurements were taken at specific intervals 30 s, 90 s, 120 s, and 180 s after walking. For each measurement, the participant returned to the couch, removed their footwear, and thermal images were captured.
These temperature readings, along with the baseline values recorded after acclimatization, were used to calculate temperature changes.

This systematic approach ensures that each participant is evaluated under consistent starting conditions, thereby improving the reliability and validity of the study results. The data collection process resulted in a comprehensive dataset that includes thermal and pressure measurements from each participant. Specifically, each dataset comprises baseline foot temperature, thermal images taken at various time points during walking, and corresponding average pressure metrics at specific anatomical points.

2.4. Feature Description and Data Structure

For the first set of experiments, which focused on predicting plantar pressure metrics using temperature data, each data point consisted of temperature measurements recorded at five specific time intervals immediately after walking (0 s) and at 30 s, 90 s, 120 s, and 180 s. The data were collected from eight anatomical points on each foot: the hallux, first metatarsus, third metatarsus, fifth metatarsus, heel, lateral midfoot (LatMF), and medial midfoot (MedMF). Each anatomical point contributed temperature features based on three summary statistics (mean, maximum, and minimum) in addition to the five time-specific measurements. This resulted in 15 temperature features per anatomical point (5 time-based measurements + 3 summary statistics). Given that temperature data were collected from 8 anatomical points, each foot provided a total of 120 temperature features (15 features × 8 anatomical points).

These 120 temperature features were used as input to the machine learning regression models, with the target being one of the three plantar pressure metrics: peak pressure (PPP), average peak pressure (APPP), or average pressure (APP) for each anatomical point. The models tested included Extra Trees Regressor, K Neighbors Regressor, Dummy Regressor, Light Gradient Boosting Machine, Bayesian Ridge, Random Forest Regressor, Gradient Boosting Regressor, AdaBoost Regressor, Extreme Gradient Boosting, Orthogonal Matching Pursuit, Elastic Net, Lasso Least Angle Regression, Lasso Regression, Ridge Regression, Decision Tree Regressor, Huber Regressor, Linear Regression, Passive Aggressive Regressor, and Least Angle Regression. The goal of these regression models was to assess whether the temperature data alone could be used to predict the corresponding plantar pressure values. However, as detailed in the results, the models faced difficulties due to the low or insignificant correlations between temperature and pressure variables.

In the subsequent experiments, a multimodal approach was employed for the classification task of predicting diabetic status. In this setup, both temperature and pressure data were combined into the feature set. Pressure data were summarized into 3 metrics for each of the seven anatomical points (first metatarsus, third metatarsus, fifth metatarsus, hallux, heel, lateral midfoot, and medial midfoot), resulting in 21 pressure-related features (3 metrics × 7 points). When combined with the 120 temperature features, the complete feature vector used for classification consisted of 141 attributes per foot (120 temperature features + 21 pressure features). These feature vectors were used in machine learning classification models aimed at distinguishing between diabetic and non-diabetic subjects. The models tested in this task included Extra Trees Classifier, Random Forest Classifier, Extreme Gradient Boosting, Ada Boost Classifier, Gradient Boosting Classifier, Naive Bayes, Logistic Regression, Decision Tree Classifier, Linear Discriminant Analysis, Ridge Classifier, Quadratic Discriminant Analysis, K Neighbors Classifier, Support Vector Machine (SVM) with a linear kernel, Light Gradient Boosting Machine, and Dummy Classifier.

The models were implemented and evaluated using the PyCaret library [36], which automates the training and evaluation of machine learning algorithms for both regression and classification tasks. A 5-fold cross-validation strategy was employed to evaluate model performance. The dataset comprised 26 participants, with each foot treated as an independent instance, resulting in 52 instances in total. The cross-validation framework ensured that each participant’s data were used both in training and validation across different folds, mitigating overfitting and providing a robust estimate of model performance.

Analysis of Correlation Between Individuals’ Feet

To explore the validity of treating each foot as an independent instance in the dataset, a correlation analysis was performed across both feet of each individual and between feet of different individuals. The correlation index was calculated for each pair of feet, and the results are displayed in Figure 4, which shows the correlation matrix of pressure and temperature data.

The correlation matrix reveals that, on average, the correlation between feet from different individuals is relatively high, with a mean value of approximately 0.86. This suggests that, while there are similarities between individuals, there is still sufficient variability across the dataset to justify treating each foot as an independent instance. Such variability is important for machine learning models to generalize effectively, even when the left and right feet of the same individual are included in both the training and test sets across different folds.

Although the data demonstrate subtle similarities between feet, particularly in healthy individuals, these subtle differences are likely to capture clinically meaningful variations, especially in diabetic patients. In this population, asymmetry between feet may reflect complications such as neuropathy, making it important for the model to learn from these discrepancies. This approach enables the model to better generalize across a wider range of physiological conditions, improving its ability to detect early signs of complications.

By treating each foot as independent, we effectively increase the dataset size, which is particularly important in studies with small datasets. Given the high correlation values, the model benefits from additional data points while still capturing enough variation to avoid overfitting. Moreover, the use of 5-fold cross-validation ensures that the model is evaluated on a wide range of data splits, further reducing the risk of overfitting and ensuring robustness.

While potential bias due to symmetry in healthy individuals is acknowledged, the correlation analysis suggests that this bias is minimal. From a machine learning perspective, the variability present in the dataset justifies the approach, particularly given the asymmetry often observed in diabetic patients. As a result, treating each foot as an independent instance not only increases the dataset’s robustness but also enhances the generalization of the machine learning models by exposing them to a broader range of conditions.

Rationale for Independent Foot Analysis: While treating each foot as an independent data point increases the dataset size and helps to capture key asymmetries in diabetic individuals, we acknowledge that this approach may introduce bias, particularly in healthy individuals where greater symmetry between left and right feet is typically observed. Inclusion of both feet in both the training and testing sets could lead to inflated model performance as the symmetry between feet in healthy individuals might not reflect true independent variability. Results should be interpreted with caution, especially in cases where foot symmetry is expected.

The essential variations in patients likely include differences in temperature distribution, pressure patterns, and structural abnormalities between the left and right feet. These variations are particularly important in diabetic patients, where asymmetries can indicate complications like ulcers, neuropathy, or other foot pathologies. From a machine learning standpoint, capturing these discrepancies improves the model’s ability to detect early signs of these complications and enhances the generalization of predictions by training on a wider range of physiological conditions. The model, thus, becomes better at identifying both subtle and more pronounced differences in foot health.

Furthermore, cross-validation was applied during model evaluation to mitigate overfitting and ensure that the models were tested across various data splits, reducing the risk of performance overestimation [37,38].This method is commonly used to assess machine learning models’ effectiveness in scenarios where data symmetry, such as in gait analysis, is a concern [37]. Moreover, cross-validation is particularly effective in controlling overfitting when dealing with uncorrelated errors, as observed in machine learning models used for prediction tasks. Although k-fold cross-validation is not entirely immune to bias in small sample sizes, it offered a rigorous evaluation of the model’s performance across different subsets of the data [38].

During each iteration or “fold” within the cross-validation protocol, the training subset was split into subgroups. One subgroup served as the training data, facilitating the model’s learning process, while the other subgroup acted as the validation data, against which the model’s performance was assessed. This partitioning adhered to the established 5-fold methodology, ensuring a comprehensive and evenly distributed assessment across the entire dataset. Following this procedure, each instance within the dataset participated in both the training and validation phases across five distinct folds. This approach mitigates any bias that could arise from a single partitioning of the data, leading to a thorough evaluation of the model’s effectiveness in handling diverse scenarios and potential variations within the dataset.

Upon completion of the cross-validation process, performance metrics such as accuracy, recall, precision, F1 score, and kappa were systematically compiled for each fold and averaged to provide an estimate of the model’s overall performance.

2.5. Machine Learning Algorithms

2.5.1. Regression Models

For the regression tasks aimed at predicting plantar pressure metrics from temperature data, the following algorithms were utilized:

Extra Trees Regressor: An ensemble learning method that aggregates results from multiple randomized Decision Trees to improve prediction accuracy.
K Neighbors Regressor: A non-parametric method that predicts the output based on the average value of the k-nearest neighbors in the feature space.
Dummy Regressor: A simple baseline model that makes predictions using basic strategies such as the mean or median of the target values.
Light Gradient Boosting Machine (LightGBM): A gradient boosting framework that uses tree-based learning algorithms, optimized for efficiency and performance.
Bayesian Ridge: A linear regression model that uses Bayesian inference to estimate the regression coefficients.
Random Forest Regressor: A tree-based ensemble model that builds multiple Decision Trees and averages their outputs to enhance predictive accuracy.
Gradient Boosting Regressor: An ensemble technique that builds models sequentially, optimizing the prediction by minimizing the error of previous models.
AdaBoost Regressor: A boosting method that combines weak regressors to produce a strong predictive model by focusing on the most difficult-to-predict instances.
Extreme Gradient Boosting (XGBoost): A highly efficient and flexible boosting algorithm that improves performance by reducing overfitting and increasing accuracy.
Orthogonal Matching Pursuit: A greedy algorithm for linear regression that selects the most correlated features in each iteration.
Elastic Net: A regularized regression model that linearly combines L1 and L2 penalties of the lasso and ridge methods to improve prediction and feature selection.
Lasso Least Angle Regression (LassoLARS): A variant of linear regression that automatically selects the most relevant features by shrinking the less important ones to zero.
Lasso Regression: A regression method that performs both variable selection and regularization to enhance prediction accuracy.
Ridge Regression: A technique used when multicollinearity exists, adding a degree of bias to the regression estimates.
Decision Tree Regressor: A non-linear regression model that splits the dataset into subsets based on the feature values to make predictions.
Huber Regressor: A robust regression technique that is less sensitive to outliers in the data than least squares regression.
Linear Regression: A basic regression model that assumes a linear relationship between the input features and the target values.
Passive Aggressive Regressor: An online learning algorithm that updates the model in response to each individual sample.
Least Angle Regression (LARS): A regression algorithm particularly suited for high-dimensional data, similar to forward stepwise regression.

2.5.2. Classification Models

For the classification task of predicting diabetes status based on combined pressure and temperature features, the following models were employed:

Extra Trees Classifier: An ensemble learning method that aggregates the results of multiple randomized Decision Trees to make predictions.
Random Forest Classifier: A tree-based ensemble method that creates multiple Decision Trees for classification and averages their outputs.
Extreme Gradient Boosting (XGBoost): A highly efficient boosting algorithm used for classification tasks, known for its high performance in structured data.
AdaBoost Classifier: A boosting algorithm that improves classification by combining weak classifiers to form a stronger overall classifier.
Gradient Boosting Classifier: An iterative boosting method that combines weak classifiers to produce a strong predictive model by sequentially reducing the classification error.
Naive Bayes: A probabilistic classifier based on Bayes’ theorem, assuming independence between the features.
Logistic Regression: A simple linear classifier used to predict the probability of a binary outcome (diabetes or non-diabetes).
Decision Tree Classifier: A non-linear model that classifies instances by recursively partitioning the feature space based on feature values.
Linear Discriminant Analysis (LDA): A classification algorithm that models the differences between multiple classes by assuming normally distributed features.
Ridge Classifier: A variant of Logistic Regression that uses regularization to handle collinearity and improve classification.
Quadratic Discriminant Analysis (QDA): A classifier that assumes each class is normally distributed but with different covariance matrices.
K Neighbors Classifier: A non-parametric method that classifies instances based on the majority class of the k-nearest neighbors in the feature space.
Support Vector Machine (SVM) with a linear kernel: A classification algorithm that creates a linear boundary between classes to maximize the margin between them.
Light Gradient Boosting Machine (LightGBM): A highly efficient gradient boosting method optimized for classification tasks on large datasets.
Dummy Classifier: A simple baseline model that makes predictions using basic strategies such as stratified or most frequent class predictions.

2.5.3. Handling of Correlated Features in Machine Learning Models

The machine learning algorithms used in this study, including Extra Trees, Random Forest, Gradient Boosting, and Support Vector Machines (SVMs), are well-suited to handle potential correlations between features, such as the left and right foot data [39]. These algorithms are based on ensemble techniques, Decision Trees, or linear models, which inherently manage redundancy and correlation in input features [40]. For instance, Random Forest is effective in handling feature selection, even with a high number of variables, which helps in improving model accuracy and performance by eliminating unimportant variables [41]. Correlations between input features do not necessarily degrade the performance of these algorithms because they assess the contribution of each feature in relation to the target outcome, even when multiple features carry similar information (as could be the case with left and right foot data).

Moreover, the use of regularization techniques in models like SVMs [42] or Logistic Regression helps to control for overfitting, which can occur in the presence of correlated data [43].

Given the relatively high correlations observed in the dataset, these algorithms are capable of identifying meaningful patterns without being adversely affected by the similarity between left and right foot data. This approach ensures that the models generalize well even in the presence of correlated measurements.

2.5.4. Hyperparameter Tuning

In the process of model selection and training, default hyperparameters were used for each of the machine learning models as hyperparameter tuning was not performed in the initial comparison phase. PyCaret automatically trains a range of models using their standard settings, enabling quick evaluation of model performance. Once the best-performing model is identified, further optimization can be performed through hyperparameter tuning to improve the results. Table 1 is a summary of the machine learning models used in our study, separated into classification and regression models, together with their respective default parameter values.

3. Results

3.1. Correlation Between Plantar Pressure and Temperature Data

This section analyzes the relationship between plantar pressure and temperature data by calculating the correlation coefficients for various anatomical points. The aim of this analysis is to explore how changes in plantar pressure at specific regions of the foot correlate with variations in temperature. Understanding these correlations can provide insights into the biomechanical and thermodynamic responses of the foot under varying load conditions and diabetes conditions.

3.1.1. Calculation of Correlation Coefficients

The relationship between pressure and temperature was assessed by computing both Pearson and Spearman correlation coefficients for each combination of temperature and plantar pressure features. The Pearson correlation coefficient measures the linear relationship between two continuous variables, while the Spearman correlation coefficient captures potential non-linear relationships by assessing the monotonic association between ranked data.

The formulas used for these calculations are as follows:

Pearson correlation:

$r = \frac{\sum (X_{i} - \bar{X}) (Y_{i} - \bar{Y})}{\sqrt{\sum {(X_{i} - \bar{X})}^{2} \sum {(Y_{i} - \bar{Y})}^{2}}}$

where $X_{i}$ and $Y_{i}$ represent the data points for temperature and pressure, respectively, and $\bar{X}$ and $\bar{Y}$ are their respective means.
Spearman correlation:

$ρ = 1 - \frac{6 \sum d_{i}^{2}}{n (n^{2} - 1)}$

where $d_{i}$ is the difference between the ranks of the corresponding values and n is the number of observations.

Both correlation coefficients were computed for each anatomical point, such as the first metatarsal (1stM), fifth metatarsal (5thM), hallux, heel, and the lateral and medial midfoot (LatM and MedMF), in combination with various temperature measurements.

3.1.2. Anatomical Points and Correlation with Temperature

Due to the large number of data points across different anatomical regions, it becomes challenging to present all the correlation results in a single table. Comprehensive results are available in the Supplementary Materials at the end of the paper (Supplementary Materials). Therefore, the key findings from each anatomical point are summarized below.

First Metatarsal (1stM)

The correlation between temperature and pressure at the first metatarsal was generally weak. Several negative Pearson correlations were observed, particularly in relation to the MedMF temperature features. For example, the MedMF temperature features often showed weak inverse relationships with the 1stM pressure data, indicating that temperature increases in the midfoot are weakly associated with pressure decreases at the first metatarsal.

Third Metatarsal (3rdM)

The third metatarsal exhibited moderate negative correlations with temperature, especially in relation to the medial midfoot region. This suggests that, as the temperature increases in the midfoot, the pressure in the third metatarsal tends to decrease. For instance, MedMF_Ref_Min and 3rdM_APP showed a Pearson correlation as low as

r = - 0.317

, indicating inverse relationship.

Fifth Metatarsal (5thM)

The fifth metatarsal displayed more variable correlations with temperature. In some cases, such as with MedMF_Min and MedMF_Max, positive correlations were observed, suggesting that the temperature and pressure in this region increase together. This suggests that the lateral side of the foot experiences a localized mechanical and thermal response, where higher pressure is associated with increased temperature.

Hallux

The hallux consistently showed positive correlations with temperature, particularly in relation to the MedMF temperature features. The Pearson correlations between the MedMF temperature features and hallux pressure ranged from

r = 0.27

to

r = 0.38

, indicating a moderate positive relationship. This suggests that, as the plantar pressure at the hallux increases, the temperature in the midfoot rises, reflecting the transfer of the mechanical load and thermal response in this region.

Heel

The correlations between the temperature and pressure in the heel were relatively weak compared to those in the hallux and metatarsals. Moderate positive correlations were observed in some cases, particularly in relation to MedMF_Min and MedMF_Max, suggesting that temperature increases in the heel are weakly associated with pressure increases.

Lateral and Medial Midfoot (LatM and MedMF)

The strongest correlations were observed in the midfoot regions, particularly the medial midfoot (MedMF). Both the Pearson and Spearman correlation coefficients showed moderate to strong positive correlations between LatM and MedMF pressure and temperature. For example, MedMF_Ref_Mean and MedMF_Ref_Max showed Pearson correlations above

r = 0.4

with pressure at the LatM, indicating that increased pressure in the lateral midfoot is strongly associated with increased temperature in the medial midfoot.

This analysis reveals several important patterns:

Moderate Positive Correlations: Strong correlations were observed between pressure and temperature in the midfoot regions, particularly in the MedMF and LatM regions, indicating an interdependence between the mechanical load and thermal responses in these areas.
Weaker or Negative Correlations: In regions such as the 1stM and 3rdM, weaker or inverse correlations were found, suggesting that temperature variations have a limited effect on pressure in these areas.

3.1.3. Possible Impact of Correlation on Machine Learning Performance

The results of correlation analysis between the temperature and plantar pressure data highlights the potential limitations of using temperature as a standalone feature for pressure prediction. The generally weak and inconsistent correlations across most anatomical points suggest that the underlying relationship between these variables is not strong enough to support accurate pressure predictions using only temperature data. This lack of correlation may serve as an early indicator of poor performance in machine learning models designed to predict plantar pressure from temperature alone. Given the weak association, it is hypothesized that these models will struggle to capture the necessary patterns for reliable predictions.

However, this condition also presents an opportunity for exploring multimodal approaches, where temperature data are combined with other biomechanical features such as the pressure readings to improve the prediction accuracy. In the following sections, a series of machine learning experiments are conducted to test the predictive capability of temperature alone, and then in combination with other relevant features.

3.2. Machine Learning Analysis for Pressure Estimation

In this section, the machine learning analysis conducted to estimate pressure values based on thermal time series data is presented. The experiments were designed to explore the correlation between the thermal data and three pressure metrics: peak pressure, average peak pressure, and average pressure at various anatomical points on the feet.

3.2.1. Pressure Estimation at Individual Anatomical Points

As part of the first set of experiments, a dataset was constructed where the target regression values were each of the three pressure metrics. The features were derived from the thermal time series data. The temperature measurements, representing a time series, were recorded at specific intervals—30 s, 90 s, 120 s, and 180 s—at various anatomical points on the feet. The pressure values were consolidated into three metrics: peak pressure, average peak pressure, and average pressure for each anatomical point.

The analysis of the machine learning models revealed a lack of a significant correlation between the thermal time series data and the predicted pressure values. This outcome was consistent across all the anatomical points and pressure metrics, as indicated by the negative R² values. The performance metrics for the regression models, including MAE, MSE, RMSE, RMSLE, and MAPE, exhibited high error rates and negative R² values, suggesting that the models struggled to accurately estimate the pressure values based on thermal data alone. These findings indicate that thermal data, when used in isolation, may not provide a reliable basis for pressure estimation.

Due to the extensive volume of results, only the outcomes for three anatomical points are presented as examples. Table 2 provides the metrics for a regressor estimating the average peak pressure at the hallux, Table 3 presents the metrics for the average peak plantar pressure estimation at the third metatarsus, and Table 4 displays the metrics for peak plantar pressure estimation at the heel. As shown in these tables, large errors were obtained, suggesting the difficulty of predicting pressure values solely from thermal time series data. These results raise questions about the suitability of using thermal data alone for pressure estimation. Further investigation is necessary to identify additional features or data sources that may enhance the accuracy of pressure prediction models. The results indicate significant challenges in predicting pressure values using thermal time series data alone, as evidenced by the high error rates and negative R² values across most of the models.

For the hallux, as shown in Table 2, the Extra Trees Regressor had the best performance among the models, but even this model yielded a negative R² value (−0.4872) and substantial errors (MAE = 23.7315; RMSE = 27.1971), indicating poor predictive accuracy. Other models, such as the K Neighbors Regressor and Random Forest Regressor, demonstrated even higher errors and more negative R² values, further underscoring the difficulty of estimating pressure from thermal data in this region.

In the third metatarsus (Table 3), the models performed similarly poorly. The Light Gradient Boosting Machine and Dummy Regressor produced the same results, with a negative R² value of −0.7524 and considerable errors (MAE = 59.1336; RMSE = 69.5438). The Least Angle Regression model performed particularly poorly, with extreme errors (MAE = 1358.865; RMSE = 1583.866) and a highly negative R² value of −1376.8, suggesting that this model is entirely unsuitable for this task.

For the heel (Table 4), the Extra Trees Regressor again performed the best among the models but with a negative R² value (−0.8203) and significant errors (MAE = 42.6667; RMSE = 48.1738). The Linear Regression and Least Angle Regression models were particularly ineffective, with extremely high error metrics and highly negative R² values, indicating a complete failure to predict the pressure values accurately in this region.

Overall, the results across all the anatomical points suggest that the regression models struggle to accurately predict plantar pressure based solely on thermal data. The consistently high errors and negative R² values across the models raise questions about the feasibility of using thermal time series data as a standalone predictor for plantar pressure. Further research is needed to explore alternative features or combinations of data that may improve the predictive accuracy of these models.

3.2.2. Analysis of Correlation Between Consolidated Temperature and Pressure Prediction

An additional analysis was conducted to explore the potential correlation between consolidated temperature data from the entire foot, measured at the five anatomical points, and the pressure metrics at a single point. To test this hypothesis, the dataset was augmented to include the consolidated thermal time series data from these five anatomical points. These combined temperature data were then used to predict the average pressure at each of the anatomical points, focusing on the three pressure metrics. The objective of this analysis was to determine whether temperature information from multiple locations on the foot could enhance the prediction of pressure at a specific site.

Given the extensive results collected from five anatomical points and three pressure metrics, Table 5, Table 6 and Table 7 provide a representative sample of the findings. These tables focus on the first metatarsus targeting average peak pressure, the fifth metatarsus targeting average peak plantar pressure, and the lateral midfoot targeting peak plantar pressure. As observed, the results indicate that the correlation between the consolidated temperature and the pressure metrics did not produce promising outcomes. The analysis shows negative R² values and high errors across various evaluation metrics, suggesting that the consolidated temperature data are not sufficient for accurately predicting the pressure values at these specific anatomical points.

The regression models employed in this analysis were consistent with those used in the previous sections, and similar metrics were utilized to evaluate the performance of these models. The objective was to determine whether consolidated temperature data from multiple anatomical points could improve the prediction of the pressure metrics at specific locations on the foot. However, the analysis reveals a weak correlation between the consolidated temperature data and the pressure metrics, as indicated by consistently negative R² values and high error metrics across all the models.

In Table 5, which presents the results for the first metatarsus targeting average peak pressure, all the regression models demonstrate poor performance. The Light Gradient Boosting Machine and Dummy Regressor, which produced identical results, recorded a mean absolute error (MAE) of 24.3803 and a root mean square error (RMSE) of 29.9025, with a negative R² value of −0.6947. This suggests that these models, like the others, failed to capture any meaningful relationship between the consolidated temperature data and the pressure values at the first metatarsus. The errors were substantial across the board, with the Least Angle Regression model showing the worst performance, recording an MAE of 5.46 × 10³⁵ and infinite values for both MSE and RMSE, reflecting the model’s complete inability to make accurate predictions.

Similarly, in Table 6, which focuses on the fifth metatarsus targeting average peak plantar pressure, the results were equally discouraging. The Light Gradient Boosting Machine and Dummy Regressor again demonstrated a poor performance, with an MAE of 37.8672 and an RMSE of 46.2579, coupled with a negative R² value of −0.7488. Even the more advanced models, such as Extreme Gradient Boosting and Elastic Net, yielded high error metrics (e.g., RMSE values of 57.8543 and 55.4159, respectively) and negative R² values (−1.9084 and −1.7926), further underscoring the inadequacy of using consolidated temperature data to predict the pressure metrics at the fifth metatarsus. The Least Angle Regression model once again produced extreme results, with errors of the same magnitude as those observed in the previous table, highlighting its unsuitability for this task.

Overall, these results suggest that the consolidated temperature data from multiple anatomical points are not sufficient to predict the pressure metrics accurately at specific locations on the foot. The negative R² values across all the models indicate that the regression models failed to capture any meaningful relationship between the temperature data and the pressure metrics. The consistently high errors in metrics such as MAE, MSE, RMSE, RMSLE, and MAPE further reinforce this conclusion, suggesting that alternative data sources or additional features may be required to enhance the accuracy of the pressure prediction models.

The results of this analysis suggest that consolidating thermal data from multiple anatomical points did not improve the accuracy of the pressure predictions at individual anatomical points, as demonstrated by the regression metrics for the lateral midfoot targeting peak plantar pressure (Table 7). The weak correlation between the consolidated temperature data and the pressure metrics is evident from the consistently negative R² values and the high errors observed across the various models.

For example, the Extra Trees Regressor, which is generally a strong performer in regression tasks, yielded an MAE of 21.6342 and an RMSE of 24.3457, accompanied by a significantly negative R² value of −4.1466. Similarly, the Elastic Net and Extreme Gradient Boosting models also exhibited poor performances, with negative R² values of −4.509 and −4.8219, respectively, and relatively high error metrics (e.g., RMSE values of 26.9775 and 26.2291, respectively). These results indicate that the models were unable to effectively capture the relationship between the consolidated temperature data and the pressure at the lateral midfoot, leading to inaccurate predictions.

Additionally, the Passive Aggressive Regressor, which had one of the lowest RMSE values at 20.8999, still exhibited a negative R² value of −5.027, further confirming the lack of a meaningful relationship between the temperature data and the pressure metrics. The consistently high MAE, MSE, RMSE, and RMSLE values across all the models, coupled with the negative R² values, suggest that the temperature data from different anatomical points may not be directly related to the pressure at a specific location, at least with the models and features used in this study. The results for other models, such as the Decision Tree Regressor and Linear Regression, are even more striking, with highly negative R² values (−8.7018 and −16.3879, respectively) and large errors (e.g., RMSE values of 31.3636 and 33.3176, respectively). These metrics highlight the complexity of accurately predicting the pressure distribution in the feet based on thermal data alone.

In all the models tested, negative

R^{2}

values were consistently observed, as shown in Table 1, Table 2, Table 3, Table 4, Table 5, Table 6 and Table 7. A negative

R^{2}

value indicates that the model’s predictive power is worse than simply predicting the mean of the data. This suggests that the temperature data alone do not contribute effectively to predicting the plantar pressure at the anatomical points analyzed. These results corroborate the findings of the correlation analysis, where weak or no significant correlations were detected between the temperature and pressure data. Therefore, the poor predictive accuracy of the models, as indicated by the negative

R^{2}

values, is consistent with the inherent lack of a strong relationship between the two modalities.

3.2.3. Implications of Correlations for Machine Learning Model Performance

The observed correlations between plantar pressure and temperature, described in Section 3.1, provide insights that may explain the challenges faced in the first set of experiments, where temperature data were used to predict the plantar pressure. The weak and inconsistent correlations, especially in regions like the 1stM and 3rdM, suggest that temperature alone may not be a strong predictor of pressure in certain anatomical points. The lack of substantial linear or monotonic relationships indicates that the temperature data do not capture the complexities and variations in the pressure distribution across the foot. These findings imply that machine learning models trained solely on temperature features are likely to perform poorly when predicting plantar pressure as the underlying relationship between these variables is weak. The moderate to strong correlations in the midfoot regions (LatM and MedMF) may offer some predictive potential, but, overall, the weak correlations in other areas suggest that additional features or more complex data representations will be required to enhance the performance of the predictive models. This emphasizes the importance of incorporating more diverse and relevant biomechanical features when developing machine learning algorithms for plantar pressure prediction.

3.3. Diabetes Prediction from Temperature and Pressure Data

The lack of a significant correlation between the consolidated thermal data and pressure metrics, as observed in the previous analysis, suggests that these two variables may not be directly related or reflect a causal relationship. However, this apparent disconnect between the thermal data and pressure values does not diminish their individual predictive potential. On the contrary, it opens the door to treating temperature and pressure as independent features in the context of diabetes prediction.

By considering thermal and pressure data as separate, uncorrelated inputs, it becomes possible to harness the unique predictive capabilities of each variable. Temperature data may capture specific physiological responses, such as inflammation or altered blood flow, that are indicative of diabetic conditions, while pressure data could reflect biomechanical abnormalities associated with diabetes, such as altered gait or foot structure. Together, these independent features have the potential to provide a more comprehensive and accurate prediction model for diabetes.

In this following section, we explore the application of these variables in diabetes prediction. By leveraging their individual strengths as separate features within machine learning models, we aim to enhance the accuracy and reliability of diabetes diagnosis, as demonstrated in the first study. The primary objective of this study is to assess the combined predictive potential of temperature and pressure data to diagnose diabetes. To achieve this, a set of machine learning models, facilitated by the PyCaret library, were employed to predict the diabetes status for each instance in the dataset.

Table 8 provides an overview of the results obtained from the diabetes prediction task across the tested machine learning models. The table is organized to reflect a performance hierarchy, listing the models in descending order according to their predictive effectiveness. This arrangement begins with the highest-performing model and progresses to those with comparatively lower predictive capabilities.

Upon reviewing the performance metrics across the various machine learning models, several key insights emerge. The Extra Trees Classifier stands out as the top performer, demonstrating the highest accuracy, AUC, recall, precision, F1 score, and kappa values, which collectively indicate its strong and consistent performance across multiple metrics. In contrast, models such as the Random Forest Classifier, Extreme Gradient Boosting, and Naive Bayes show variability in their performance across the different metrics, suggesting differences in how these models address various aspects of the prediction task.

Certain models, such as Quadratic Discriminant Analysis and K Neighbors Classifier, exhibit lower accuracy, recall, precision, and F1 scores, which may reflect limitations in their predictive capabilities for this particular dataset. Additionally, models like the SVM with a Linear Kernel, Light Gradient Boosting Machine, and Dummy Classifier display zero performance across all the metrics, indicating that they may not be suitable for this specific prediction task. In general, the data highlight a diverse range of model performances, shedding light on potential candidates that excel in predicting diabetes based on the combined temperature and pressure data.

Table 9 and Table 10 summarize the results of two distinct experiments, each focusing on predicting diabetes using temperature or pressure as independent variables. Although several classifiers achieve respectable metrics when applied to individual variables, their performance does not match the superior precision observed with the combined data prediction presented in Table 8. In the experiment utilizing temperature data, the classifiers exhibit varying degrees of accuracy, with the Naive Bayes model achieving the highest accuracy at 0.875. Similarly, in the pressure-only experiment, the Logistic Regression model attains an accuracy of 0.6875, marking the best performance among the classifiers for this experiment.

Furthermore, it is important to observe that the F1 score, recall, and precision metrics are closely aligned in the table for diabetes prediction using both the combined pressure and temperature features, as well as in the individual predictions. This consistency across the metrics indicates that the model provides a balanced response between both classes. In particular, the similarity of these metrics suggests that the model is not biased toward one class over the other, effectively managing the trade-off between false positives and false negatives. This balance is particularly important in clinical prediction tasks, where misclassifications can have significant consequences. The fact that these performance metrics remain comparable across different feature sets (pressure, temperature, and combined) further supports the robustness of the model and highlights that both modalities contribute meaningful information for predicting diabetic conditions.

The findings from this experiment highlight the challenges associated with predicting diabetes based solely on temperature or pressure data. While each of these modalities provides valuable insights into the state of the foot, neither offers sufficient discriminatory power on its own to reliably identify diabetic conditions. The low predictive accuracy observed in both the temperature-only and pressure-only models underscores the limitations of single-modality approaches. These results strongly suggest that the integration of multiple data sources, such as combining temperature and pressure data, is necessary to improve the predictive performance. Additionally, incorporating other physiological or biomechanical features could further enhance the accuracy of machine learning models in clinical diagnostics.

It is noteworthy that, according to Table 8, the Extra Trees Classifier, which performs exceptionally well with the combined data, only achieves a modest accuracy of 0.375 in the pressure-only experiment. The key insight arises when these results are compared to the earlier section where the models utilized both temperature and pressure data in tandem for diabetes prediction. Although some classifiers display competitive metrics in the individual-variable experiments, their collective predictive power falls short of the combined data prediction. The Extra Trees Classifier stands out as the most effective model in the combined experiment, achieving an accuracy of 0.9375 and a perfect AUC of 1.0.

Figure 5 provides a comparative analysis of the Extra Trees Classifier and Random Forest Classifier, focusing on their performance in predicting diabetes using both thermal and pressure data. Figure 5a shows the feature importance for the Extra Trees Classifier, where certain features, primarily related to thermal data such as “Hallux_Ref_Max” and “Hallux_Ref_Min”, are highlighted as the most significant contributors to the model’s predictions.

Figure 5c illustrates the feature importance for the Random Forest Classifier. Unlike the Extra Trees Classifier, the Random Forest Classifier shows a more balanced distribution of importance across both the temperature and pressure features. This indicates that the Random Forest model considers a mix of both types of data—temperature (e.g., “Hallux_Min_30s”) and pressure (e.g., “1stM_APP”)—to be equally important in making accurate predictions. This balanced approach suggests that integrating both temperature and pressure data enhances the model’s ability to predict diabetes effectively.

Figure 5b,d present the decision boundaries for the Extra Trees model and Random Forest Classifier, respectively. These boundaries visually demonstrate how each model differentiates between diabetic (1) and non-diabetic (0) cases based on the input features. The Random Forest Classifier’s reliance on a combination of temperature and pressure data is reflected in the complexity and distribution of its decision boundary, showing a nuanced understanding of the data compared to the Extra Trees model.

4. Conclusions

This study presents a comprehensive investigation into the potential use of thermal and pressure data for the prediction of diabetes, spanning two sets of experiments that explored the relationships between these variables and their combined predictive power. The findings present promising avenues for the development of innovative strategies to enable early intervention and ultimately improve patient outcomes.

The first set of experiments revealed the challenges in directly correlating thermal data with plantar pressure metrics. The weak correlations observed in the regression models suggest that thermal data, when used in isolation, may not be sufficient for accurately predicting the pressure values at specific anatomical points. However, presented new opportunities, allowing us to treat temperature and pressure as independent features in the subsequent diabetes prediction models. By doing so, we leveraged the unique strengths of each variable, leading to more robust and accurate predictive models.

In the second set of experiments, the integration of thermal and pressure data into machine learning models significantly enhanced the predictive capabilities for diabetes. The combined analysis demonstrated that using both temperature and pressure variables together provides a more comprehensive understanding of the physiological responses associated with diabetes. This approach paved the way for more accurate risk assessments and personalized diabetes management strategies. The integration of temperature and pressure data is expected to improve diabetes prediction because these two modalities provide complementary physiological insights. Temperature data can reveal early signs of inflammation, vascular issues, or tissue damage, which are common precursors to complications in diabetic patients, such as ulcers. On the other hand, plantar pressure measurements offer information about structural changes, foot deformities, and abnormal pressure distribution, which are also characteristic aspects of diabetic foot conditions. The previous approaches have primarily focused on one modality, limiting their ability to capture the complex interactions between these physiological factors. By combining both temperature and pressure data, the model can identify a broader range of diabetes-related abnormalities, improving the overall predictive accuracy and offering a more comprehensive assessment of foot health in diabetic patients.

The study results indicate that, while the classifiers performed respectably with the standalone variables, significantly better results were achieved when the temperature and pressure data were combined. Specifically, the Logistic Regression model achieved the best performance among the classifiers, with an accuracy of 68.75% when using only plantar pressure data. In contrast, using only temperature data, the classifiers exhibited varying degrees of accuracy, the Naive Bayes model achieving the highest at 87.5%. Furthermore, when the models used both temperature and pressure data in tandem for the prediction of diabetes, the Extra Trees Classifier emerged as the most effective, achieving a precision of 93.75% and a perfect AUC score of 1. This model demonstrated strong performances on multiple metrics, including recall (0.875), precision (1), F1 score (0.9333), and kappa (0.875) values. Other models, such as the Random Forest Classifier and Extreme Gradient Boosting, showed varied performances across these metrics, highlighting differences regarding how they handled the prediction task. These results highlight the importance of integrating temperature with plantar pressure measurements in monitoring the activities of daily living regarding diabetes. The findings also emphasize the need to evaluate multiple classification algorithms to determine which is the most accurate for predicting diabetes in different participant clusters.

While this study treats each foot’s data as an input to the model, we acknowledge that this approach could introduce potential bias, especially in healthy individuals, where symmetry between the left and right feet is generally observed. However, asymmetries in the foot data between individuals with diabetes exist, which can provide valuable diagnostic information. To further investigate this, we calculated the correlation between feet across different individuals, revealing that, while there is a high correlation between the feet of the same person, subtle variations exist across individuals. These findings support the use of cross-validation with independent foot data as the dataset still exhibits enough variability to provide meaningful insights. However, this may reduce the variability in healthy individuals and could potentially overestimate the model’s performance in specific cases.

Furthermore, the study underscored the complexity of pressure distribution in feet and highlighted the importance of comprehensive data integration and advanced feature engineering. The inability of regression models to accurately predict pressure from consolidated temperature data emphasized the need for a more nuanced approach that considers the independent contributions of each variable to diabetes prediction. The introduction of additional modalities has the potential to overcome the limitations observed in this correlation analysis, paving the way for more robust and accurate pressure prediction models.

Limitations and Future Work: One limitation of this study is the potential bias introduced by treating each foot from the same individual as an input to the model. In healthy individuals, where foot symmetry is generally present, this may lead to an overestimation of the model performance. Future studies should examine whether treating both feet as correlated data is a more appropriate method when symmetry is expected. Furthermore, exploring whether foot symmetry or asymmetry persists across different demographic factors such as gender, ethnicity, and foot dominance would add depth to the analysis. Additionally, future research should explore and contrast the condition of treating both feet from the same individual as independent instances. This approach should ensure that the data from the same individual are not included in both the training and testing datasets, which would enhance the robustness and generalizability of the model by avoiding potential bias introduced by symmetry or shared characteristics between the feet.

Looking forward, future research should also focus on refining the feature engineering techniques and optimizing the model selection to fully harness the predictive potential of these combined variables. Additionally, further studies are essential to explore how insoles and other biomechanical factors impact prediction models, which could lead to new therapeutic interventions and enhance the accuracy of diabetes management tools. By continuing to build on the insights gained from this study, there is considerable potential to advance the field of diabetes prediction and improve the quality of life for individuals affected by this condition.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/a17110519/s1. Table S1: Anatomical Points and Correlation with Temperature.

Author Contributions

Conceptualization, E.A.G., F.C. and R.N.; Data curation, M.Z.-D.; Methodology and protocol implementation, M.Z.-D. and R.N.; Project administration, R.N. and M.Z.-D.; Software, E.A.G.; Validation, E.A.G.; Writing—original draft, E.A.G.; Writing—review and editing, E.A.G., F.C., M.Z.-D. and R.N. All authors have read and agreed to the published version of the manuscript.

Funding

The study was conducted as part of STANDUP—Smartphone Thermal Analysis for Diabetic Foot Ulcer Prevention and Treatment Project. The project was funded by the European Commission under the Marie Skłodowska-Curie Research and Innovation Staff Exchange (Horizon 2020-MSCARISE-2017, Grant Agreement Number 777661) January 2018–October 2023.

Data Availability Statement

Data are available upon reasonable request.

Acknowledgments

The authors would like to express their sincere gratitude to the group of volunteers who participated in this study. Special thanks are extended to the Bioengineering Laboratory, Section Footlab, for their invaluable support and collaboration in facilitating the data collection process.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AUC-ROC	Area Under the Receiver Operating Characteristic Curve
ML	Machine Learning
SVM	Support Vector Machine
MAE	Mean Absolute Error
MSE	Mean Squared Error
RMSE	Root Mean Squared Error
R²	Coefficient of Determination
RMSLE	Root Mean Squared Logarithmic Error
MAPE	Mean Absolute Percentage Error

References

World-Health-Organization. Global Report on Diabetes; WHO: Geneva, Switzerland, 2016. [Google Scholar]
American Diabetes Association. Standards of Medical Care in Diabetes—2015 Abridged for Primary Care Providers. Clin. Diabetes 2015, 33, 97–111. [Google Scholar] [CrossRef]
Laing, P. The development and complications of diabetic foot ulcers. Am. J. Surg. 1998, 176, 11S–19S. [Google Scholar] [CrossRef] [PubMed]
Vu, T.; Lin, F.; Alshurafa, N.; Xu, W. Wearable Food Intake Monitoring Technologies: A Comprehensive Review. Computers 2017, 6, 4. [Google Scholar] [CrossRef]
Punj, R.; Kumar, R. Technological aspects of WBANs for health monitoring: A comprehensive review. Wirel. Netw. 2019, 25, 1125–1157. [Google Scholar] [CrossRef]
Frykberg, R.G.; Gordon, I.L.; Reyzelman, A.M.; Cazzell, S.M.; Fitzgerald, R.H.; Rothenberg, G.M.; Bloom, J.D.; Petersen, B.J.; Linders, D.R.; Nouvong, A.; et al. Feasibility and efficacy of a smart mat technology to predict development of diabetic plantar ulcers. Diabetes Care 2017, 40, 973–980. [Google Scholar] [CrossRef] [PubMed]
Lekha, S.; Suchetha, M. Recent Advancements and Future Prospects on E-Nose Sensors Technology and Machine Learning Approaches for Non-Invasive Diabetes Diagnosis: A Review. IEEE Rev. Biomed. Eng. 2021, 14, 127–138. [Google Scholar] [CrossRef]
Maldonado, H.; Bayareh, R.; Torres, I.; Vera, A.; Gutiérrez, J.; Leija, L. Automatic detection of risk zones in diabetic foot soles by processing thermographic images taken in an uncontrolled environment. Infrared Phys. Technol. 2020, 105, 103187. [Google Scholar] [CrossRef]
Astasio-Picado, A.; Escamilla Martínez, E.; Martínez Nova, A.; Sánchez Rodríguez, R.; Gómez-Martín, B. Thermal map of the diabetic foot using infrared thermography. Infrared Phys. Technol. 2018, 93, 59–62. [Google Scholar] [CrossRef]
Hazari, A.; Maiya, A.; Agouris, I.; Monteiro, A.; Shivashankara. Prediction of peak plantar pressure for diabetic foot: The regressional model. Foot 2019, 40, 87–91. [Google Scholar] [CrossRef]
Yavuz, M.; Brem, R.W.; Glaros, A.G.; Garrett, A.; Flyzik, M.; Lavery, L.; Davis, B.L.; Hilario, H.; Adams, L.S. Association Between Plantar Temperatures and Triaxial Stresses in Individuals with Diabetes. Diabetes Care 2015, 38, e178–e179. [Google Scholar] [CrossRef]
Khandakar, A.; Mahmud, S.; Chowdhury, M.E.; Reaz, M.B.I.; Kiranyaz, S.; Mahbub, Z.B.; Md Ali, S.H.; Bakar, A.A.A.; Ayari, M.A.; Alhatou, M.; et al. Design and Implementation of a Smart Insole System to Measure Plantar Pressure and Temperature. Sensors 2022, 22, 7599. [Google Scholar] [CrossRef] [PubMed]
Liberman, A.; Buckingham, B. Diabetes Technology and the Human Factor. Diabetes Technol. Ther. 2016, 18, S-101–S-111. [Google Scholar] [CrossRef] [PubMed]
Mahmud, S.; Khandakar, A.; Chowdhury, M.E.; AbdulMoniem, M.; Bin Ibne Reaz, M.; Bin Mahbub, Z.; Sadasivuni, K.K.; Murugappan, M.; Alhatou, M. Fiber Bragg Gratings based smart insole to measure plantar pressure and temperature. Sens. Actuators A Phys. 2023, 350, 114092. [Google Scholar] [CrossRef]
Paton, J.; Bruce, G.; Jones, R.; Stenhouse, E. Effectiveness of insoles used for the prevention of ulceration in the neuropathic diabetic foot: A systematic review. J. Diabetes Its Complicat. 2011, 25, 52–62. [Google Scholar] [CrossRef] [PubMed]
Moulaei, K.; Malek, M.; Sheikhtaheri, A. A smart wearable device for monitoring and self-management of diabetic foot: A proof of concept study. Int. J. Med. Inform. 2021, 146, 104343. [Google Scholar] [CrossRef]
Rescio, G.; Leone, A.; Francioso, L.; Losito, P.; Genco, E.; Crudele, F.; D’Alessandro, L.; Siciliano, P. Fully Integrated Smart Insole for Diabetic Foot. In Ambient Assisted Living: Italian Forum 2018; Springer International Publishing: Berlin/Heidelberg, Germany, 2019; pp. 221–228. [Google Scholar] [CrossRef]
Deng, Z.S.; Liu, J. Mathematical modeling of temperature mapping over skin surface and its implementation in thermal disease diagnostics. Comput. Biol. Med. 2004, 34, 495–521. [Google Scholar] [CrossRef]
Abbott, C.; Chatwin, K.; Foden, P.; Hasan, A.; Sange, C.; Rajbhandari, S.; Nadipi Reddy, P.; Vileikyte, L.; Bowling, F.; Boulton, A.; et al. Innovative intelligent insole system reduces diabetic foot ulcer recurrence at plantar sites: A prospective, randomised, proof-of-concept study. Lancet Digit. Health 2019, 1, e308–e318. [Google Scholar] [CrossRef]
Korada, H.; Maiya, A.; Rao, S.K.; Hande, M. Effectiveness of customized insoles on maximum plantar pressure in diabetic foot syndrome: A systematic review. Diabetes Metab. Syndr. Clin. Res. Rev. 2020, 14, 1093–1099. [Google Scholar] [CrossRef]
Tsung, B.Y.S.; Zhang, M.; Mak, A.F.T.; Wong, M.W.N. Effectiveness of insoles on plantar pressure redistribution. J. Rehabil. Res. Dev. 2004, 41, 767–774. [Google Scholar] [CrossRef]
Bagavathiappan, S.; Philip, J.; Jayakumar, T.; Raj, B.; Rao, P.N.S.; Varalakshmi, M.; Mohan, V. Correlation between Plantar Foot Temperature and Diabetic Neuropathy: A Case Study by Using an Infrared Thermal Imaging Technique. J. Diabetes Sci. Technol. 2010, 4, 1386–1392. [Google Scholar] [CrossRef]
Bus, S.A. Innovations in plantar pressure and foot temperature measurements in diabetes. Diabetes/Metab. Res. Rev. 2016, 32, 221–226. [Google Scholar] [CrossRef] [PubMed]
Sawacha, Z.; Guarneri, G.; Cristoferi, G.; Guiotto, A.; Avogaro, A.; Cobelli, C. Integrated kinematics-kinetics-plantar pressure data analysis: A useful tool for characterizing diabetic foot biomechanics. Gait Posture 2012, 36, 20–26. [Google Scholar] [CrossRef] [PubMed]
Bennetts, C.J.; Owings, T.M.; Erdemir, A.; Botek, G.; Cavanagh, P.R. Clustering and classification of regional peak plantar pressures of diabetic feet. J. Biomech. 2013, 46, 19–25. [Google Scholar] [CrossRef] [PubMed]
Kaur, H.; Kumari, V. Predictive modelling and analytics for diabetes using a machine learning approach. Appl. Comput. Inform. 2022, 18, 90–100. [Google Scholar] [CrossRef]
Faruqui, S.H.A.; Du, Y.; Meka, R.; Alaeddini, A.; Li, C.; Shirinkam, S.; Wang, J. Development of a Deep Learning Model for Dynamic Forecasting of Blood Glucose Level for Type 2 Diabetes Mellitus: Secondary Analysis of a Randomized Controlled Trial. JMIR mHealth uHealth 2019, 7, e14452. [Google Scholar] [CrossRef]
Sonia, J.J.; Jayachandran, P.; Md, A.Q.; Mohan, S.; Sivaraman, A.K.; Tee, K.F. Machine-Learning-Based Diabetes Mellitus Risk Prediction Using Multi-Layer Neural Network No-Prop Algorithm. Diagnostics 2023, 13, 723. [Google Scholar] [CrossRef]
Rodriguez-León, C.; Villalonga, C.; Munoz-Torres, M.; Ruiz, J.R.; Banos, O. Mobile and Wearable Technology for the Monitoring of Diabetes-Related Parameters: Systematic Review. JMIR mHealth uHealth 2021, 9, e25138. [Google Scholar] [CrossRef]
Vettoretti, M.; Cappon, G.; Facchinetti, A.; Sparacino, G. Advanced Diabetes Management Using Artificial Intelligence and Continuous Glucose Monitoring Sensors. Sensors 2020, 20, 3870. [Google Scholar] [CrossRef]
Zhou, H.; Myrzashova, R.; Zheng, R. Diabetes prediction model based on an enhanced deep neural network. EURASIP J. Wirel. Commun. Netw. 2020, 2020, 148. [Google Scholar] [CrossRef]
Ramezankhani, A.; Pournik, O.; Shahrabi, J.; Khalili, D.; Azizi, F.; Hadaegh, F. Applying decision tree for identification of a low risk population for type 2 diabetes. Tehran Lipid and Glucose Study. Diabetes Res. Clin. Pract. 2014, 105, 391–398. [Google Scholar] [CrossRef]
Ramezankhani, A.; Hadavandi, E.; Pournik, O.; Shahrabi, J.; Azizi, F.; Hadaegh, F. Decision tree-based modelling for identification of potential interactions between type 2 diabetes risk factors: A decade follow-up in a Middle East prospective cohort study. BMJ Open 2016, 6, e013336. [Google Scholar] [CrossRef] [PubMed]
HD, S.; Sanskar, S.; Tiwari, P.; Kumar, K. Diabetes Prediction Using Machine Learning Algorithm. Int. J. Innov. Res. Inf. Secur. 2023, 9, 115–120. [Google Scholar] [CrossRef]
Khanam, J.J.; Foo, S.Y. A comparison of machine learning algorithms for diabetes prediction. ICT Express 2021, 7, 432–439. [Google Scholar] [CrossRef]
PyCaret-Low-Code Machine Learning. MIT License. Available online: https://pycaret.org/ (accessed on 14 April 2024).
Bergmeir, C.; Hyndman, R.J.; Koo, B. A note on the validity of cross-validation for evaluating autoregressive time series prediction. Comput. Stat. Data Anal. 2018, 120, 70–83. [Google Scholar] [CrossRef]
Vabalas, A.; Gowen, E.; Poliakoff, E.; Casson, A. Machine learning algorithm validation with a limited sample size. PLoS ONE 2019, 14, e0224365. [Google Scholar] [CrossRef]
Rodriguez-Galiano, V.; Sánchez-Castillo, M.; Chica-Olmo, M.; Chica-Rivas, M. Machine learning predictive models for mineral prospectivity: An evaluation of neural networks, random forest, regression trees and support vector machines. Ore Geol. Rev. 2015, 71, 804–818. [Google Scholar] [CrossRef]
Sheykhmousa, M.; Mahdianpari, M.; Ghanbari, H.; Mohammadimanesh, F.; Ghamisi, P.; Homayouni, S. Support Vector Machine Versus Random Forest for Remote Sensing Image Classification: A Meta-Analysis and Systematic Review. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 6308–6325. [Google Scholar] [CrossRef]
Chen, R.; Dewi, C.; Huang, S.W.; Caraka, R. Selecting critical features for data classification based on machine learning methods. J. Big Data 2020, 7, 52. [Google Scholar] [CrossRef]
Ghojogh, B.; Crowley, M. The Theory Behind Overfitting, Cross Validation, Regularization, Bagging, and Boosting: Tutorial. arXiv 2019, arXiv:1905.12787. [Google Scholar]
Uddin, S.; Khan, A.; Hossain, M.E.; Moni, M. Comparing different supervised machine learning algorithms for disease prediction. BMC Med. Inform. Decis. Mak. 2019, 19, 281. [Google Scholar] [CrossRef]

Figure 1. A 25 m walkway; the designated area avoided “quick” twisting movements.

Figure 2. Thermal image capture setup.

Figure 3. Regions of interest on the feet were marked for thermal and pressure measurements: hallux, 1st metatarsus, 3rd metatarsus, 5th metatarsus, midfoot (proximal to 5th metatarsus apophysis), medial arch on proximal 1st metatarsus, and heel.

Figure 4. Correlation index matrix representing the relationship between temperature and pressure features across different individuals. The matrix shows correlation coefficients for pressure and temperature data between the left and right feet of each individual, as well as across different individuals.

Figure 5. Performance comparison between the Extra Trees Classifier and Random Forest Classifier. (a,c) display the feature importance plots, with (a) highlighting the Extra Trees Classifier’s focus on thermal data and (c) illustrating the Random Forest Classifier’s balanced consideration of both temperature and pressure features. (b,d) depict the decision boundaries for the Extra Trees Classifier and Random Forest Classifier, respectively, showing how the models classify diabetic (1) and non-diabetic (0) cases based on these features. Random Forest’s mixed use of temperature and pressure data underscores its more comprehensive approach to predicting diabetes.

Table 1. Default hyperparameters for classification and regression models.

Model	Type	Default Hyperparameters
Logistic Regression	Classification	C = 1.0, penalty = l2, solver = lbfgs, max_iter = 100
Random Forest Classifier	Classification	n_estimators = 100, criterion = gini, max_depth = None, min_samples_split = 2, min_samples_leaf = 1
K-Nearest Neighbors	Classification	n_neighbors = 5, weights = uniform, algorithm = auto
Support Vector Machine (SVC)	Classification	C = 1.0, kernel = rbf, gamma = scale
Gradient Boosting Classifier	Classification	n_estimators = 100, learning_rate = 0.1, max_depth = 3, min_samples_split = 2, min_samples_leaf = 1
Extra Trees Classifier	Classification	n_estimators = 100, criterion = gini, max_depth = None, min_samples_split = 2, min_samples_leaf = 1
Naive Bayes	Classification	No tunable parameters
XGBoost Classifier	Classification	n_estimators = 100, learning_rate = 0.1, max_depth = 6, min_child_weight = 1, subsample = 1.0, colsample_bytree = 1.0
LightGBM Classifier	Classification	n_estimators = 100, learning_rate = 0.1, max_depth = −1, num_leaves = 31
Decision Tree Classifier	Classification	criterion = gini, max_depth = None, min_samples_split = 2, min_samples_leaf = 1
AdaBoost Classifier	Classification	n_estimators = 50, learning_rate = 1.0
Ridge Classifier	Classification	alpha = 1.0
Random Forest Regressor	Regression	n_estimators = 100, criterion = mse, max_depth = None, min_samples_split = 2, min_samples_leaf = 1
Gradient Boosting Regressor	Regression	n_estimators = 100, learning_rate = 0.1, max_depth = 3, min_samples_split = 2, min_samples_leaf = 1
Extra Trees Regressor	Regression	n_estimators = 100, criterion = mse, max_depth = None, min_samples_split = 2, min_samples_leaf = 1
XGBoost Regressor	Regression	n_estimators = 100, learning_rate = 0.1, max_depth = 6, min_child_weight = 1, subsample = 1.0, colsample_bytree = 1.0
LightGBM Regressor	Regression	n_estimators = 100, learning_rate = 0.1, max_depth = −1, num_leaves = 31
Decision Tree Regressor	Regression	criterion = mse, max_depth = None, min_samples_split = 2, min_samples_leaf = 1

Table 2. Regression model metrics: hallux average peak pressure.

Model	MAE	MSE	RMSE	R²	RMSLE	MAPE
Extra Trees Regressor	23.7315	839.6758	27.1971	−0.4872	0.3569	0.3657
K Neighbors Regressor	26.994	1024.495	30.2197	−0.9288	0.3863	0.4422
Dummy Regressor	27.1484	1107.591	31.1707	−0.9942	0.3928	0.4359
Light Gradient Boost. Mch.	27.1484	1107.591	31.1707	−0.9942	0.3928	0.4359
Bayesian Ridge	27.1514	1107.961	31.1748	−0.9948	0.3929	0.436
Random Forest Regressor	27.8012	1089.168	31.2258	−1.0301	0.3966	0.4397
Gradient Boosting Regressor	27.2654	1064.454	30.9733	−1.0617	0.4079	0.4161
AdaBoost Regressor	27.9178	1150.729	32.0767	−1.0725	0.4172	0.4512
Extreme Gradient Boosting	27.5578	1153.395	31.5512	−1.094	0.4305	0.4192
Orthogonal Matching Pursuit	27.5907	1232.316	32.613	−1.216	0.4108	0.4602
Elastic Net	30.3813	1469.462	34.6957	−1.3458	0.4321	0.4989
Lasso Least Angle Regr.	31.8224	1645.612	36.0818	−1.5213	0.4541	0.5232
Lasso Regression	31.8109	1646.714	36.0897	−1.5227	0.454	0.5232
Ridge Regression	32.7574	1845.964	37.4148	−1.7363	0.4679	0.5401
Decision Tree Regressor	30.4242	1434.073	36.7814	−2.6933	0.4578	0.4362
Huber Regressor	38.303	2300.472	43.3332	−2.93	0.5605	0.5833
Linear Regression	46.8023	3601.207	57.1221	−7.4216	0.985	0.7242
Passive Aggressive Regr.	48.2434	3274.756	51.2894	−9.6462	0.5529	0.6799
Least Angle Regression	43,123.6	3.15 × 10¹⁰	56,615.54	−2.6 × 10⁸	2.0646	421.0547

Table 3. Regression model metrics: 3rd metatarsal average peak plantar pressure.

Model	MAE	MSE	RMSE	R²	RMSLE	MAPE
Light Gradient Boost. Mach.	59.1336	5961.295	69.5438	−0.7524	0.3315	0.307
Dummy Regressor	59.1336	5961.295	69.5438	−0.7524	0.3315	0.307
Bayesian Ridge	59.1476	5962.777	69.5575	−0.754	0.3316	0.3071
Elastic Net	65.731	7520.082	77.2534	−1.6133	0.3521	0.327
Orthogonal Matching Purs.	64.355	6624.744	74.2898	−1.8195	0.349	0.3254
Extra Trees Regressor	59.0005	5953.229	68.2479	−2.8548	0.3307	0.3114
AdaBoost Regressor	61.2344	6188.385	72.3118	−3.0504	0.3522	0.3163
Random Forest Regressor	65.8635	7280.352	78.2418	−3.33	0.3709	0.3446
Ridge Regression	70.6273	9422.834	84.8931	−3.4673	0.3803	0.3492
Lasso Least Angle Regr.	70.6124	9466.471	84.8651	−3.5245	0.3794	0.3512
Lasso Regression	70.8137	9498.548	85.0506	−3.542	0.3805	0.3518
Gradient Boosting Regr.	68.614	8445.285	84.826	−4.0885	0.4049	0.367
K Neighbors Regressor	74.8539	8246.182	87.825	−7.5013	0.4041	0.3986
Huber Regressor	86.2494	12,962.42	104.4957	−8.203	0.4775	0.4194
Passive Aggressive Regr.	97.7131	14,739.44	107.2971	−10.8507	0.4947	0.4661
Extreme Gradient Boosting	73.9059	9137.847	87.0243	−12.7287	0.3982	0.3953
Decision Tree Regr.	91.1842	14,836.03	109.1365	−14.7182	0.509	0.4827
Linear Regression	123.951	31,724.05	157.4115	−16.7257	1.1418	0.5985
Least Angle Regr.	1358.865	495,2249	1583.866	−1376.8	1.5223	6.971

Table 4. Regression model metrics: heel peak plantar pressure.

Model	MAE	MSE	RMSE	R²	RMSLE	MAPE
Extra Trees Regr.	42.6667	2665.684	48.1738	−0.8203	0.2074	0.1917
Random Forest Regr.	45.6522	2938.924	50.5072	−1.0423	0.218	0.2057
AdaBoost Regr.	45.5731	2933.01	51.091	−1.1171	0.2212	0.2052
Extreme Gradient Boost.	50.0482	3324.043	54.9778	−1.3777	0.2399	0.2268
Bayesian Ridge	45.1373	2821.918	50.9436	−1.5651	0.2218	0.2073
Dummy Regressor	45.1358	2821.794	50.9427	−1.5652	0.2218	0.2073
Light Gradient Boost.Mach.	45.1358	2821.794	50.9427	−1.5652	0.2218	0.2073
Orthogonal Matching Purs.	46.3616	2878.473	51.2707	−1.6113	0.2229	0.2115
Gradient Boosting Regr.	46.6507	3403.991	54.3723	−1.6691	0.2319	0.2048
K Neighbors Regr.	49.9279	3561.278	56.7779	−1.8124	0.2451	0.231
Elastic Net	46.6024	3122.97	53.1349	−1.8358	0.2296	0.2069
Lasso Least Angle Regr.	51.2852	4159.062	59.577	−2.2904	0.2611	0.2229
Lasso Regression	51.3096	4161.013	59.5913	−2.2912	0.2612	0.223
Ridge Regression	55.8088	5147.188	65.1659	−2.866	0.3025	0.2412
Huber Regressor	62.6196	8839.295	77.4237	−5.3749	0.2903	0.2608
Decision Tree Regr.	67.4958	6356.389	77.734	−5.5824	0.3223	0.2948
Passive Aggressive Regr.	75.5966	8443.785	86.2818	−7.3284	0.3946	0.3159
Linear Regression	100.326	20,864.63	119.8718	−22.0593	0.435	0.4494
Least Angle Regr.	1385.571	7,262,251	1595.164	−5175.72	1.3363	6.3554

Table 5. Regression model metrics: 1st metatarsal average peak pressure and temperature consolidated.

Model	MAE	MSE	RMSE	R²	RMSLE	MAPE
Light Gradient Boost. Mach.	24.3803	1179.303	29.9025	−0.6947	0.3968	0.41
Dummy Regr.	24.3803	1179.302	29.9026	−0.6947	0.3968	0.41
Bayesian Ridge	24.9298	1202.427	30.5468	−0.8289	0.4033	0.416
Random Forest Regr.	25.7203	1176.716	31.5095	−1.4364	0.4089	0.41
AdaBoost Regr.	25.1165	1214.541	31.2653	−1.4662	0.4057	0.3773
Extra Trees Regr.	26.0886	1220.183	32.2855	−2.5147	0.4293	0.4204
K Neighbors Regr.	29.2116	1556.675	36.5677	−3.0332	0.4713	0.4735
Gradient Boosting Regr.	26.5199	1303.739	33.9157	−3.5019	0.4332	0.3903
Passive Aggressive Regr.	28.393	1426.168	33.6168	−4.4534	0.4424	0.4123
Extreme Gradient Boost.	31.1872	1505.87	36.6103	−4.625	0.4932	0.4867
Elastic Net	30.5991	1644.279	37.4245	−5.7394	0.4984	0.4812
Lasso Least Angle Regr.	33.2624	1656.707	39.0917	−12.0911	0.6536	0.5131
Decision Tree Regressor	32.6131	1951.408	41.2692	−12.3658	0.5654	0.5034
Lasso Regression	33.5323	1677.464	39.3684	−12.689	0.6416	0.5169
Huber Regressor	36.7847	2045.709	42.9871	−14.2362	0.5908	0.5723
Orthogonal Matching Purs.	45.0718	3058.593	51.3642	−30.2947	0.8606	0.687
Ridge Regression	36.4573	2127.21	43.6394	−32.8895	0.623	0.5588
Linear Regression	38.0126	2422.576	45.5997	−45.557	0.6342	0.5902
Least Angle Regr.	5.46 × 10³⁵	inf	inf	−inf	31.1333	6.62 × 10³³

Table 6. Regression model metrics: 5th metatarsal average peak plantar pressure and temperature consolidated.

Model	MAE	MSE	RMSE	R²	RMSLE	MAPE
Light Gradient Boost. Mach.	37.8672	2304.44	46.2579	−0.7488	0.6112	1.3117
Dummy Regressor	37.8672	2304.44	46.2579	−0.7488	0.6112	1.3117
Bayesian Ridge	39.0378	2378.554	47.1246	−0.818	0.6213	1.3153
Passive Aggressive Regr.	41.3224	2650.283	49.1166	−0.8453	0.6669	1.369
AdaBoost Regressor	40.0842	2665.721	50.02	−1.1158	0.6383	1.3454
Random Forest Regr.	40.9052	2627.747	49.4219	−1.1796	0.6472	1.301
K Neighbors Regr.	43.6027	3003.919	52.6696	−1.3401	0.6822	1.4252
Gradient Boosting Regr.	46.0786	3136.678	53.7797	−1.5054	0.6705	1.2397
Extra Trees Regr.	42.7361	2879.329	52.3997	−1.6931	0.6889	1.3874
Elastic Net	46.6453	3320.467	55.4159	−1.7926	0.7383	1.5432
Extreme Gradient Boost.	50.896	3660.532	57.8543	−1.9084	0.7317	1.4089
Huber Regressor	51.3562	3689.396	59.3965	−2.5813	0.8133	1.5922
Decision Tree Regr.	60.6246	5272.278	69.8224	−3.4066	0.8352	1.7657
Lasso Regr.	51.0932	4061.626	60.3932	−3.4134	0.9889	1.5364
Lasso Least Angle Regres.	51.3896	4068.553	60.5001	−3.6598	0.9164	1.5353
Orthogonal Matching Purs.	49.5623	4642.656	62.5531	−3.8977	0.8912	1.5586
Ridge Regression	50.7713	4263.169	62.1715	−4.1066	1.01	1.6337
Linear Regression	53.8113	4777.258	65.7032	−4.9478	1.0016	1.6863
Least Angle Regr.	6.31 × 10³⁵	inf	inf	−inf	23.5834	6.13 × 10³³

Table 7. Regression model metrics: lateral midfoot peak plantar pressure and temperature consolidated.

Model	MAE	MSE	RMSE	R²	RMSLE	MAPE
Extra Trees Regr.	21.6342	781.5449	24.3457	−4.1466	0.3499	0.3726
Elastic Net	23.205	975.5359	26.9775	−4.509	0.3894	0.3974
Extreme Gradient Boost.	22.6037	879.9069	26.2291	−4.8219	0.3769	0.3893
Passive Aggressive Regr.	17.9418	512.5657	20.8999	−5.027	0.3086	0.2925
Bayesian Ridge	19.1977	610.182	22.0609	−5.2521	0.326	0.3344
AdaBoost Regressor	21.5522	826.1589	24.9973	−5.3733	0.358	0.3591
Huber Regressor	25.1347	1149.741	29.2435	−6.1187	0.4382	0.4092
Random Forest Regr.	21.5135	782.1238	24.7137	−6.4551	0.3547	0.3785
Gradient Boosting Regr.	23.9051	1007.286	27.9547	−6.6773	0.3961	0.393
Decision Tree Regr.	26.5163	1247.107	31.3636	−8.7018	0.455	0.4326
Light Gradient Boost. Mch.	19.7522	641.3545	23.1027	−9.1444	0.3378	0.3345
Dummy Regressor	19.7522	641.3545	23.1027	−9.1444	0.3378	0.3345
Lasso Least Angle Regr.	28.9892	1527.706	33.0902	−11.1096	0.4667	0.4874
Lasso Regression	29.0708	1528.006	33.1449	−11.2433	0.4685	0.4889
K Neighbors Regr.	23.5397	834.7291	27.003	−11.4923	0.3894	0.4338
Ridge Regression	26.8591	1437.279	32.0659	−12.3371	0.5478	0.4719
Linear Regression	28.1566	1517.806	33.3176	−16.3879	0.6098	0.4869
Orthogonal Matching Purs.	29.692	1476.125	33.879	−16.5291	0.6524	0.5192
Least Angle Regr.	1.36 × 10³⁶	inf	inf	−inf	62.5706	1.69 × 10³⁴

Table 8. Comparative performance of diabetes prediction models employing combined thermal time series and plantar pressure data.

Model	Accuracy	AUC	Recall	Prec.	F1	Kappa
Extra Trees Classifier	0.9375	1	0.875	1	0.9333	0.875
Random Forest Classifier	0.75	0.875	0.625	0.8333	0.7143	0.5
Extreme Gradient Boosting	0.75	0.8438	0.5	1	0.6667	0.5
Ada Boost Classifier	0.625	0.7812	0.375	0.75	0.5	0.25
Gradient Boosting Classifier	0.625	0.625	0.375	0.75	0.5	0.25
Naive Bayes	0.6875	0.7734	0.625	0.7143	0.6667	0.375
Logistic Regression	0.5625	0.6406	0.375	0.6	0.4615	0.125
Decision Tree Classifier	0.625	0.625	0.375	0.75	0.5	0.25
Linear Discriminant Analysis	0.6875	0.875	0.5	0.8	0.6154	0.375
Ridge Classifier	0.6875	0.6875	0.5	0.8	0.6154	0.375
Quadratic Discriminant Analysis	0.125	0.125	0.25	0.2	0.2222	−0.75
K Neighbors Classifier	0.4375	0.5391	0.625	0.4545	0.5263	−0.125
SVM—Linear Kernel	0.5	0.5	0	0	0	0
Light Gradient Boosting Machine	0.5	0.5	0	0	0	0
Dummy Classifier	0.5	0.5	0	0	0	0

Table 9. Performance of diabetes prediction models employing thermal time series data.

Model	Accuracy	AUC	Recall	Prec.	F1	Kappa
Extra Trees Classifier	0.8125	0.9375	0.875	0.7778	0.8235	0.625
Random Forest Classifier	0.75	0.9375	0.75	0.75	0.75	0.5
Extreme Gradient Boosting	0.75	0.9062	0.75	0.75	0.75	0.5
Ada Boost Classifier	0.8125	0.9297	0.875	0.7778	0.8235	0.625
Gradient Boosting Classifier	0.8125	0.7812	0.875	0.7778	0.8235	0.625
Naive Bayes	0.875	0.875	0.75	1	0.8571	0.75
Logistic Regression	0.8125	0.9375	0.875	0.7778	0.8235	0.625
Decision Tree Classifier	0.8125	0.8125	0.875	0.7778	0.8235	0.625
Linear Discriminant Analysis	0.8125	0.8984	1	0.7273	0.8421	0.625
Ridge Classifier	0.9375	0.9375	1	0.8889	0.9412	0.875
Quadratic Discriminant Analysis	0.4375	0.4375	0.625	0.4545	0.5263	−0.125
K Neighbors Classifier	0.8125	0.875	0.75	0.8571	0.8	0.625
SVM—Linear Kernel	0.5	0.5	1	0.5	0.6667	0
Light Gradient Boosting Machine	0.5	0.5	0	0	0	0
Dummy Classifier	0.5	0.5	0	0	0	0

Table 10. Performance of diabetes prediction models employing plantar pressure data.

Model	Accuracy	AUC	Recall	Prec.	F1	Kappa
Extra Trees Classifier	0.375	0.2969	0.5	0.4	0.4444	−0.25
Random Forest Classifier	0.3125	0.3281	0.375	0.3333	0.3529	−0.375
Extreme Gradient Boosting	0.4375	0.3594	0.375	0.4286	0.4	−0.125
Ada Boost Classifier	0.4375	0.5	0.25	0.4	0.3077	−0.125
Gradient Boosting Classifier	0.5	0.3594	0.5	0.5	0.5	0
Naive Bayes	0.5	0.5938	0.625	0.5	0.5556	0
Logistic Regression	0.6875	0.6719	0.75	0.6667	0.7059	0.375
Decision Tree Classifier	0.375	0.375	0.375	0.375	0.375	−0.25
Linear Discriminant Analysis	0.5	0.5	0.5	0.5	0.5	0
Ridge Classifier	0.4375	0.4375	0.5	0.4444	0.4706	−0.125
Quadratic Discriminant Analysis	0.625	0.625	0.25	1	0.4	0.25
K Neighbors Classifier	0.25	0.3359	0.375	0.3	0.3333	−0.5
SVM—Linear Kernel	0.5	0.5	0	0	0	0
Light Gradient Boosting Machine	0.5	0.5	0	0	0	0
Dummy Classifier	0.5	0.5	0	0	0	0

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gerlein, E.A.; Calderón, F.; Zequera-Díaz, M.; Naemi, R. Can the Plantar Pressure and Temperature Data Trend Show the Presence of Diabetes? A Comparative Study of a Variety of Machine Learning Techniques. Algorithms 2024, 17, 519. https://doi.org/10.3390/a17110519

AMA Style

Gerlein EA, Calderón F, Zequera-Díaz M, Naemi R. Can the Plantar Pressure and Temperature Data Trend Show the Presence of Diabetes? A Comparative Study of a Variety of Machine Learning Techniques. Algorithms. 2024; 17(11):519. https://doi.org/10.3390/a17110519

Chicago/Turabian Style

Gerlein, Eduardo A., Francisco Calderón, Martha Zequera-Díaz, and Roozbeh Naemi. 2024. "Can the Plantar Pressure and Temperature Data Trend Show the Presence of Diabetes? A Comparative Study of a Variety of Machine Learning Techniques" Algorithms 17, no. 11: 519. https://doi.org/10.3390/a17110519

APA Style

Gerlein, E. A., Calderón, F., Zequera-Díaz, M., & Naemi, R. (2024). Can the Plantar Pressure and Temperature Data Trend Show the Presence of Diabetes? A Comparative Study of a Variety of Machine Learning Techniques. Algorithms, 17(11), 519. https://doi.org/10.3390/a17110519

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Can the Plantar Pressure and Temperature Data Trend Show the Presence of Diabetes? A Comparative Study of a Variety of Machine Learning Techniques

Abstract

1. Introduction

2. Methods

2.1. Dataset Description

2.2. Experimental Setup and Protocol

2.3. Data Acquisition Procedure

2.4. Feature Description and Data Structure

Analysis of Correlation Between Individuals’ Feet

2.5. Machine Learning Algorithms

2.5.1. Regression Models

2.5.2. Classification Models

2.5.3. Handling of Correlated Features in Machine Learning Models

2.5.4. Hyperparameter Tuning

3. Results

3.1. Correlation Between Plantar Pressure and Temperature Data

3.1.1. Calculation of Correlation Coefficients

3.1.2. Anatomical Points and Correlation with Temperature

First Metatarsal (1stM)

Third Metatarsal (3rdM)

Fifth Metatarsal (5thM)

Hallux

Heel

Lateral and Medial Midfoot (LatM and MedMF)

3.1.3. Possible Impact of Correlation on Machine Learning Performance

3.2. Machine Learning Analysis for Pressure Estimation

3.2.1. Pressure Estimation at Individual Anatomical Points

3.2.2. Analysis of Correlation Between Consolidated Temperature and Pressure Prediction

3.2.3. Implications of Correlations for Machine Learning Model Performance

3.3. Diabetes Prediction from Temperature and Pressure Data

4. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI