1. Introduction
Osteoporosis is a widespread disease that results in fragile bones and raises the risk of fractures. In 2020, the prevalence of osteoporosis was 12% for men and 23% for women worldwide, with Africa having the highest rate [
1,
2,
3]. It is a substantial public health concern, demanding comprehensive attention in terms of prevention, treatment, and ongoing monitoring. However, amidst these critical aspects, the facet of early diagnosis often remains overlooked. Recognizing the paramount importance of early detection, particularly in the context of osteoporosis, emerges as a crucial factor in averting fractures and alleviating the overall burden imposed by this debilitating condition. Regrettably, a noteworthy challenge persists in the realm of osteoporosis diagnostics, as a substantial majority of cases go undetected until a bone fracture occurs [
4,
5]. This delay in diagnosis not only impedes timely intervention but also underscores the urgent need for increased emphasis on proactive screening and early detection strategies. By addressing this gap in the current healthcare landscape and placing a renewed focus on early diagnosis, we can significantly enhance our capacity to prevent fractures, mitigate the impact of osteoporosis, and ultimately improve the overall quality of care for individuals at risk. Therapy, medication, and fall prevention measures, along with lifestyle changes, can impact the daily living of a person with osteoporosis, reducing the occurrence of fractures by between 21% and 66% [
6]. Hence, early intervention to prevent fractures can be achieved through earlier osteoporosis detection.
Typically, osteoporosis is diagnosed when one of three things occur [
7]: (1) fragility fractures at the vertebral column, hip joint, wrists, or shoulders, which can lead to further fractures and reduced quality of life, disability, and even mortality [
8]; (2) a dual-energy X-ray absorptiometry (DEXA) scan showing a bone mineral density (BMD) T-score of less than or equal to −2.5 in the lower spine and distal radius; or (3) the Fracture Risk Assessment Tool (FRAX), an online tool, showing a T-score of −1.0 to −2.5 with increased fracture risk (according to country-specific threshold). These diagnosis options, however, are not very useful in the early stages and are even less useful in trabecular bone osteoporosis. For example, DEXA scans are considered the gold standard for diagnosing osteoporosis and estimating fracture risk. However, DEXA scans may not produce accurate results if the person has osteoarthritis, spinal fractures, or if the bone mineral density is caused by vitamin D deficiency [
9]. DEXA scanners produce 2D images of complex 3D structures, calculating bone density by the ratio of mineral content to bone area. However, this method has a drawback, as larger bones might appear stronger despite having similar density to smaller bones. Machine learning, a field concerned with training algorithms on pre-collected data, is gaining traction in osteoporosis diagnosis. Analyzing extensive patient data, machine learning identifies patterns and predicts osteoporosis development earlier, enabling more effective prevention and treatment strategies.
A review of machine learning-based approaches to diagnosing osteoporosis reports that there is vast diversity in the success rates of the chosen machine learning algorithms [
10]. Support Vector Machines (SVMs) and Artificial Neural Networks (ANNs) are among the most popular machine learning algorithms [
8,
11,
12,
13,
14,
15,
16,
17]. Typically, studies involving classification tasks evaluate model performance through metrics such as Pearson’s correlation coefficient, accuracy, and AUC [
18,
19,
20,
21]. Yet, accuracy, influenced by the majority class, tends to overstate the true performance of machine learning models [
22].
In this regard, a study was conducted on the use of machine learning to create personalized gait anomaly detectors that focus on individuals who have multiple sclerosis (MS) and need walking aids [
23]. This technique used sensorized tips mounted on canes as data collectors to produce individual models that could identify alterations in the walking pattern reflecting changes in functional states. Finally, the outcome of testing both healthy individuals and MS patients showed that the system had an average accuracy of 87.5% and 82.5%, respectively, when it came to identifying abnormalities in such diseases with varying progress rates among patients. However, there were a few misclassifications caused by the small sample size and slight variations in gait patterns, although the overall system performance was satisfactory.
In another study [
24], a novel automatic gait anomaly detection method called AGD-SSC was introduced, which works online to differentiate between normal and abnormal gaits of individuals. Gait features were extracted by calculating F-GEI (cumulative energy normalized across walking cycles), and the BC-COP-K-means clustering algorithm along with a boundary clamping mechanism were employed for classification. The proposed method was evaluated in experiments using a human gait dataset, comparing the results to IF, LOF, and direct COP-K-means. The outcome indicated usage possibilities in situations where the silhouette distortion caused by apparel or shoe type was minimal, including large grazing animals’ abnormal gait. Nazmul Islam Shuzan and Chowdhury [
25] sought to determine if gait analysis through ground reaction forces (GRFs) could be of use in any way to medical practice and sports. The GaitRec and Gutenberg databases were employed using machine learning (ML) techniques to investigate gait patterns in patients with different disorders as compared to healthy controls. GRF signals were pre-processed, several feature extraction methods were employed, and the selection of features was also undertaken. Using the K-Nearest Neighbors (KNN) algorithm appeared to enable better performance. Several classification schemes were used in the study, which found that three-dimensional GRF data improved general performance. Also, time-domain and wavelet features were identified as valuable for distinguishing between different gait patterns. The results showed a potential step forward towards automating the diagnosis of locomotor dysfunction.
The investigation into human gait activity recognition, as documented in the National Institutes of Health’s PMC database, involved the comparison of various algorithms for the classification of gait patterns. Notably, a combination of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) achieved exceptional accuracy levels, reaching approximately 99%. Surprisingly, simpler models such as the Support Vector Machine (SVM) and K-Nearest Neighbors (KNN) algorithms also demonstrated noteworthy performance, surpassing 94% accuracy. This underscores the pivotal trade-off between model complexity and performance, emphasizing that while complex models may not always be requisite, the selection of an appropriate simpler model remains paramount.
The research on gait pattern recognition for fall prediction delved into the utilization of machine learning techniques for predicting falls through gait analysis [
26]. The findings revealed that the Support Vector Machine (SVM) algorithm exhibited the highest accuracy, approximately 95%, in distinguishing various gait patterns compared to the Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) models. However, a notable limitation identified was the utilization of a relatively small dataset comprising 750 gait samples. To enhance the robustness of gait detection and fall prediction, larger and more diverse datasets are imperative. A study explored the application of machine learning algorithms for the early detection of Parkinson’s disease via gait analysis [
27]. Although the results exhibited promise, the authors acknowledged the necessity for further investigation into factors such as clothing variations and walking surfaces, which could potentially influence the accuracy of gait analysis. This underscores the ongoing need for comprehensive research to refine and optimize machine learning-based approaches in gait analysis for Parkinson’s disease detection.
Traditional gait analysis methods encounter several limitations, including sensor dependence, privacy concerns, and data bias. Camera systems are susceptible to lighting conditions and require unobstructed visibility, while wearable sensors may be uncomfortable or impractical for certain users. Utilizing assistive devices such as crutches, canes, and walkers provides a practical approach to acquiring training datasets for machine learning algorithms in osteoporosis management [
28]. These devices, designed to support and restore normal locomotion in individuals affected by injuries or pathologies, are easily equipped with sensors to capture gait and posture data [
29,
30]. A smart walker, for instance, functions as a minimalist tool in gait classification system development, simultaneously supporting users while acquiring kinetic data [
30]. Moreover, while video-based gait analysis raises privacy issues, smart walkers collect data solely from the user’s interaction with the device, thus alleviating privacy concerns. Additionally, conventional systems may suffer from data bias due to the limited representation of diverse populations in training data. In contrast, gait data collected through smart walkers, utilized by a broader range of individuals, can contribute to the development of more inclusive and accurate detection models.
Smart walkers present numerous advantages, including ease of use, a controlled environment for data collection, and standardized data output. A smart walker functions as a minimalist tool in gait classification system development, simultaneously supporting users while acquiring kinetic data [
30]. This technology not only monitors osteoporosis progression and evaluates treatment effectiveness but also reduces fall risks and enhances overall mobility in individuals with osteoporosis [
28]. Many users regularly utilize these devices as common assistance tools, especially those who can benefit from gait analysis and require methods needing less user effort for data collecting. The walker functions as a consistent background for gathering data, with minimal interference from external elements such as uneven terrain or garment variations that may impact the sensory information captured by these devices. Furthermore, if the data are collected directly from the walker, they may be pre-processed and standardized, which simplifies analysis and has the potential to enhance the accuracy of machine learning models.
This study aimed to develop a machine learning-based model for the detection of osteoporosis using the upper limb force profile of the user obtained with the use of smart walkers. Multiple models analyzed gait data collected by a smart walker to identify patterns that are indicative of osteoporosis.
2. Materials and Methods
2.1. Research and Study Setting
The data were collected from the Biomechanics & Rehabilitation Engineering Lab, Ziauddin University, Faculty of Engineering, Science, Technology, & Management, the Rehabilitation Lab at the Faculty of Physical Therapy and Rehabilitation Sciences, Anmol Zindagi, Gill Shelter Home, and Gosha-e-Afiyat old age home.
2.2. Participants
In the recruitment phase, a total of forty participants were enrolled to form the training dataset. Within this cohort, 20 individuals represented the healthy group (11 males and 9 females, mean age: 69.85 ± 10.17 years), demonstrating no history of osteoporosis or any other degenerative diseases. Concurrently, the remaining 20 participants comprised individuals with osteoporosis (7 males, 13 females, mean age: 70.85 ± 10.18 years) who lacked any other degenerative diseases. Each volunteer underwent a pre-screening process based on the inclusion criteria, which necessitated the ability to walk with a walker and an age falling within the range of 50 to 90 years. Additionally, all participants provided written informed consent, affirming their willingness to participate in the experiments. The Ziauddin University ethical approval policies for research conduct were followed, and informed consent was acquired from all participants.
2.3. Equipment Setting
A total of four force sensors were affixed to the legs of the prototype of a smart walker, specifically a force sensor on the rear right side (
FsRRs), a force sensor on the rear left side (
FsRLs), a force sensor on the front left side (
FsFLs), and a force sensor on the front right side (
FsFRs). The prototype smart walker, equipped with labeled sensor placements and electronic components, serves as a device utilized to collect data in this investigation. A walker composed of an aluminum frame and handlebars with soft rubber pads was employed to aid individuals with osteoporosis in maintaining gait balance, stability, and support during their rehabilitation process. We installed four sensors on the front and rear legs of the commercially available walker to measure the forces exerted by the upper limbs while walking. For detailed equipment and sensor settings, refer to our previously published work [
31].
2.4. Data Collection
After an explanation of the data collection protocol, the participants were asked to walk with a standard smart walker for a distance of 10 m (see
Figure 1). For this 10 m walk test, we used the protocol defined by de Baptista et al. [
32]. In this protocol, each participant was asked to walk a distance of 2 m as a preliminary walk. As soon as the 2 m distance was covered, the investigator started the stopwatch and recorded the time until the participant covered a 10 m distance. The time it took the participant to walk this distance was noted. The participant walked for another 2 m as a washaway trial. The handles of the smart walker were fitted with force sensors, recording force values exerted by the upper limbs with respect to time during the walk.
2.5. Data Processing
Figure 2 summarizes the workflow. The data from the force sensors were exported in a .csv extension. Forty .csv files were obtained from the experiment, each for one participant. We then merged these .csv files into one file so that the data could be classified into healthy vs. osteoporosis patients based on this dataset. Irrelevant information like date, time, patient ID, etc., was removed from this file. We then checked for missing values and found that there were none. The participants, especially the osteoporosis group, were well trained and acquainted with the walker, so these data contained no significant errors. A few negative values appeared in the dataset due to hand movements during walking; these were removed in the data cleaning process. The size of the data for both groups was not equal.
Addressing class imbalance: Upon initial examination, it was discovered that there was an imbalance in the distribution of classes within the dataset. The group of individuals with osteoporosis showed a clear and specific pattern over time. This pattern was characterized by taking a longer time to cover a distance of 10 m, resulting in a greater number of data points compared to the group of healthy individuals. This disparity could lead to prejudice in favor of the dominant category during the training of the model. In order to address this issue, the Synthetic Minority Oversampling Technique (SMOTE) [
33] was utilized. SMOTE mitigates class imbalance by creating artificial data instances within the minority class (namely, osteoporosis in this scenario), hence achieving a balanced dataset and promoting a fairer representation of both classes.
Data normalization: Following SMOTE, data normalization was performed due to inherent variations observed within the range of 0 N to 100 N, measured at a frequency of 20 Hz (every 0.05 s). Min–Max scaling [
34] was employed to normalize the data between a boundary of 0 and 1. This transformation facilitates improved model convergence and learning by scaling all features to a common range. The formula for this normalization technique is given as:
Windowing technique for data augmentation: In order to increase the size of the dataset, a windowing approach was employed. Windowing partitions the existing data into smaller subsets, hence augmenting the overall quantity of data points. Windowing in this case involved dividing the force data into segments, resulting in a significant rise in the number of data points that may be effectively used for resampling approaches.
k-fold cross-validation: The machine learning model was made more robust by implementing k-fold cross-validation, a well-established technique for resampling [
35]. This method divides the data into k-folds of equal size. Afterward, the model is trained on k-1-folds and then tested on the remaining fold one at a time. This study utilized a 10-fold cross-validation approach with a value of k set to 10. Cross-validation serves as a protective measure against overfitting and offers valuable insights regarding the model’s potential to perform well on new, unseen data. This is particularly important when considering the model’s practical use in real-world scenarios. For each fold, 80% of the data were assigned to the training set, and the remaining 20% was set aside for testing. This ensured a thorough assessment of the model’s performance.
Choosing the appropriate machine learning model: In order to attain the best possible classification results, a thorough assessment of different machine learning methods was conducted. The classifiers used were the XGB classifier, Random Forest classifier, SVM classifier, GB classifier, SGD classifier, Multinomial Naive Bayes classifier, Decision Tree classifier, Logistic Regression, KNN, and AdaBoost classifier. After a thorough evaluation process, the Random Forest classifier was identified as the highest-performing model, achieving an accuracy rate of 95.40%. This result confirms the effectiveness of the suggested data preparation methods when used with the selected machine learning model for identifying osteoporosis based on gait.
Deployment and testing of the model: Following the completion of training, the model was equipped to handle novel, unobserved datasets in CSV format. The fresh datasets were subjected to the same pre-processing and normalizing procedures as the training data. Afterward, the pre-processed data were inputted into the classification system, allowing for the examination of force data from unidentified individuals and their categorization into either the healthy or osteoporosis category.
2.6. Data Analysis
For each of the ten classifiers, we calculated the confusion matrix, precision, recall, and F1 score, as well as the overall classification accuracy, to evaluate the performance. Precision is a measure of how many times a true positive (tp) was identified among the total number of tps and false positives (fps), i.e., . It gives a measure of how many times a positive was predicted as positive among everything that was recognized as positive. Recall, on the other hand, gives a measure of how many times a positive was recognized as positive from everything that should have been predicted as positive, including false negatives (fns), i.e., . In other words, precision measures the accuracy of positive predictions, while recall measures the completeness of positive predictions. Support is the number of occurrences of the class (healthy or osteoporosis) within the specified dataset. These metrics are often used together to evaluate the performance of classification models.
The accuracy of the model was calculated through confusion matrices. A confusion matrix shows the count between anticipated values (tp and true negative (tn)) and actually observed values (all predicted values). The number of accurate predictions divided by the total number of predictions gives us the overall accuracy, i.e., . The harmonic mean of recall and precision is the F1 score, calculated as . To obtain better accuracy, we performed hyperparameter tuning on the best-performing classifier, the Random Forest classifier. A Random Forest classifier is a meta-estimator that uses numerous Decision Tree classifiers trained on different subsets of the dataset to improve the predicted accuracy and reduce overfitting by averaging the results. The default number of trees in the forest is set by the n_estimators argument. In version 0.22, the default value was changed from 10 to 100, and it now stands at 145. The criterion parameter determines the approach for evaluating the quality of a split, with choices such as “gini”, “entropy”, and “log_loss”. Additional parameters, such as max_depth, min_samples_split, min_samples_leaf, and max_features, provide the ability to finely adjust the growth and splitting criteria of the tree. The decision to employ bootstrapping during tree construction is determined by the bootstrap parameter in the bootstrap approach. Furthermore, the oob_score parameter enables the estimation of generalization performance by utilizing out-of-bag samples. Additional options govern factors such as parallelization (n_jobs), unpredictability (random_state), and verbosity (verbose). In addition, the Random Forest implementation incorporates features such as class weighting, complexity pruning, and monotonicity restrictions. The Random Forest model possesses various attributes such as details regarding the fitted estimators, class labels, feature importances, and out-of-bag estimates, among other characteristics.
A random search is a more efficient method for hyperparameter tuning compared to a grid search, as it explores a wider range of hyperparameter combinations. Additionally, the Gini index is preferred over other impurity measures like entropy in certain scenarios due to its simplicity and computational efficiency. For a two-class case, the Gini index is calculated as follows:
where
is the probability of recognizing data points with class
. Hence, the Gini index can be expressed as follows:
Since we have only two classes, then both
and
are equal to 0.5.
The hyperparameters in a Random Forest model are utilized to enhance the predictive capability of the model or to optimize its computational efficiency. The n_estimators hyperparameter represents the number of trees that the algorithm constructs before determining the final forecast by maximum voting or averaging. Typically, a greater quantity of trees enhances performance and stabilizes forecasts while also hindering computational speed. Another crucial hyperparameter is max_features, which represents the upper limit on the number of features that the Random Forest algorithm takes into account when deciding how to split a node. The final crucial hyperparameter is min_sample_leaf. This refers to the minimal number of leaf nodes needed to divide an internal node. The n_jobs hyperparameter specifies the maximum number of processors that the engine can utilize. If the value is one, it can only utilize a single processor. A value of “−1” indicates an absence of any restriction or limit. The random_state hyperparameter ensures that the model’s output can be reproduced. The model will consistently yield the same outcomes when it possesses a certain value for random_state, as well as when it is provided with identical hyperparameters and training data. Finally, there is the oob_score (also known as oob sampling), which is a cross-validation method used in random forests. Approximately one-third of the data in this sampling are allocated for evaluation purposes rather than being utilized for training the model. This portion of the data is reserved to assess the performance of the model. The term “out-of-bag samples” refers to these specific samples. It bears a strong resemblance to the leave-one-out cross-validation method, although it incurs minimal additional computational effort.
For the hyperparameter tuning of our Random Forest classifier, we used the following settings: ‘n_estimators’ = [100, 200, 500, 1000], criterion = ‘gini’, ‘max_depth’ = [3, 5, 7, 9, 11, None], ‘min_samples_split’ = [2, 5, 10], ‘min_samples_leaf’ = [1, 2, 4], ‘max_features’ = [‘sqrt’, ‘log2′, None]. A randomized search was used.
3. Results and Discussion
The training dataset consisted of force values from left and right force sensors on the smart walker’s handlebar.
Figure 3 shows scatter plots comparing the forces exerted by osteoporosis patients and healthy participants on the right- and left-side handlebars, respectively.
Figure 4 shows the distribution of these forces categorized into forces applied by healthy participants on the left side (HealthyFLS), forces applied by healthy participants on the right side (HealthyFRS), forces applied by osteoporosis patients on the left side (OsteoporosisFLS), and forces applied by osteoporosis patients on the right side (OsteoporosisFRS). The average force exerted by the osteoporosis patients (95.08 N) on the right handlebar was 69.80% greater than that of healthy participants, and the average force exerted by the osteoporosis patients was 68.77% greater for the left handlebar (97.09 N).
The smart walker prototype was used to measure participants’ weight distribution while walking, with no significant errors or missing values in the data obtained. Participants began walking by stomping their right heel on the floor, and force variations were used to compute spatiotemporal gait characteristics such as time, cadence, step length, step time, and velocity. When users began a heel strike, the force on the handlebar rose on one side and decreased on the other, making it easy to calculate the number of steps taken by counting inflection points. For easier data display, forces from the two left-side sensors (
FsRLs +
FsFLs) were combined to create a single force peak signal for the left heel strike, as shown in
Figure 5. The same forces were combined for the right-side sensors (
FsRRs +
FsFRs) to create a single force peak signal. Gait metrics were measured, including the variations in forces recorded from the left and right sides, the occurrence of peaks at particular times, and the distance traveled by the user.
The observed profiles show significant differences between osteoporosis patients and healthy participants in terms of their performance in the 10 m walk test. Notably, osteoporosis patients demonstrated a prolonged duration to cover the 10 m distance, coupled with a heightened application of forces on the handlebar. On average, the osteoporosis group took 78.63% more time to complete the 10 m walk test compared to their healthy counterparts. Similar results were obtained from our previous study where we compared data from people with and without osteoporosis using an instrumented non-wheeled walker [
31]. These findings align with our initial hypothesis, suggesting that individuals with osteoporosis rely more heavily on the walker for gait support. This reliance may stem from their inherent weakness, coupled with the presence of stiff and brittle bones, and a heightened fear of falling. Consequently, osteoporosis patients tend to distribute their body weight onto the walker, leading to an increased exertion of force on the handlebars and a subsequently extended time to cover a distance of 10 m. Conversely, the healthy participants exhibited a contrasting profile, indicative of independence in their gait. They bore their body weight while walking, resulting in minimal force exertion on the handlebars during the walking phase. This dichotomy in behavior between osteoporosis patients and healthy individuals further underscores the distinct impact of osteoporosis on gait patterns and the consequential reliance on assistive devices.
The training loss represents the discrepancy between the model’s anticipated outputs and the actual values during the training phase. On the other hand, the testing loss measures the difference between the expected outputs and the actual values. We have visualized both the training and testing loss in our scenario in
Figure 6. The graph’s arrangement is a result of implementing k-fold cross-validation, which displays the loss for each fold.
As we performed 10-fold cross-validation, i.e., the dataset was trained and tested ten times, we obtained ten confusion matrices. We calculated a sum of the confusion matrices, presented in
Figure 7. The classification report for our model with the Random Forest classifier is given in
Table 1. The confusion matrix shows that our model correctly predicted an osteoporosis gait pattern as an osteoporosis case 437 times, while 432 times, the model correctly predicted a healthy case as a healthy case. However, the model showed a slightly higher accuracy in detecting osteoporosis cases (96.63%) compared to its accuracy in identifying healthy cases (94.17%). This indicates that the model is more likely to correctly identify a person with osteoporosis than a healthy individual.
The overall accuracy of the model using the Random Forest classifier after hyperparameter tuning was found to be 95.40%. Other classifiers performed in the range of 91% to 94% for the same training condition. A similar study on gait classification methods for normal and abnormal gait also reported that the Random Forest technique outperformed Decision Tree and KNN techniques in classifying gait parameters obtained through wearable technology [
36]. The Random Forest technique-based model had the benefit of employing ensemble methods. As such, the variance decreased because the results were obtained by averaging multiple decision tree predictions. Overall, the empirical findings supported the proposition that the suggested model is capable of categorizing the force profiles of the participants despite the small size of the training dataset.
Shi et al. 24 introduced a gait identification system that utilizes an inertial measurement unit (IMU) to recognize five different walking patterns. Their model was constructed using the Random Forest algorithm, and they demonstrated its superiority compared to other methods such as Support Vector Machines (SVMs) and K-Nearest Neighbors (KNNs). Shuzan et al. 25 employed the Random Forest algorithm to identify Parkinson’s disease through the examination of gait patterns. Luo et al. 26 utilized a Random Forest classification technique that demonstrated superior performance compared to the SVM and Bayes approaches. The researchers employed the Random Forest algorithm to assess the significance of several gait variables in the classification of hemiplegia. Gupta et al. 13 employed Random Forest, Decision Tree, and KNN algorithms to conduct gait analysis. Their findings demonstrated that the Random Forest algorithm outperformed alternative classifiers.