1. Introduction
A chronic disease, also known as a non-communicable disease, is a health condition that is not contagious and can endure for a long time. According to a recent World Health Organization (WHO) report [
1], chronic diseases claim the lives of 41 million people each year, accounting for 71% of all deaths worldwide. Low and middle-income countries make up for 77% of chronic disease mortality. The majority of chronic disease fatalities, which account for 17.9 million deaths per year, are from cardiovascular disease. Hypertension is a crucial factor in the development of cardiovascular disease. Hypertension, often known as high blood pressure, is defined as a systolic blood pressure reading ≥140 mmHg and/or a diastolic blood pressure reading ≥90 mmHg. Systolic blood pressure measurements show the pressure in the blood vessels when the heart beats or contracts, whereas diastolic blood pressure measurements represent the pressure in the blood vessels when the heart rests in between beats. A recent study [
2] reported that every 20 mmHg systolic and 10 mmHg diastolic pressure increase above a baseline blood pressure of 115/75 doubles the risk of cardiovascular death.
Hypertension is no longer only an “adult disease”; a growing number of teenagers and younger children are succumbing to it [
3,
4], as a result of today’s youth’s physically inactive lifestyle. There is growing evidence that childhood hypertension is a precursor to adult hypertension [
5]. Unfortunately, children’s hypertension is not diagnosed until it has progressed to the point of being life-threatening or until they reach adulthood [
6]. Given the long-term health consequences of uncontrolled hypertension, as well as the fact that pediatric (age 2 to 18) hypertension is a diagnostic signal for numerous important underlying medical illnesses, the need for early and correct diagnosis cannot be overstated. Furthermore, before any further clinical indications arise, childhood and adolescence are the important stages for effective treatment and prevention of hypertension-related cardiovascular problems. Malaysia, a middle-income country, has a prevalence of 24.5% for hypertension in adolescents [
7].
Anthropometric measures are non-invasive quantitative measurements of the body that include height, weight, head circumference, Body Mass Index (BMI), body circumferences (waist, hip, and limbs) to determine adiposity, and skinfold thickness [
8]. There is a growing body of evidence on the use of anthropometric measures to predict hypertension in children and adolescents, some of which may be found in [
4,
9,
10,
11,
12]. The commonly used anthropometric measurements for hypertension prediction include body mass index (BMI), waist circumference (WC), weight-to-hip ratio (WHR), and weight-to-height ratio (WHtR). Nonetheless, data suggest that the predictive abilities of anthropometric measurements for hypertension vary by country and ethnicity [
13].
As machine learning (ML) has gained traction in the medical field, new algorithms for predicting hypertension have emerged. When it comes to hypertension, machine learning technologies might be used as a supplementary tool or a second opinion to assist medical doctors in making timely decisions. The use of anthropometric measurements in ML models yielded varied results with different models. The following review focuses on some of the latest research that employed multiple ML models to predict hypertension using anthropometric measures as the input features. Zhao et al. [
14] utilized a dataset of 29,700 participants aged 18 to 70 years old to deploy four ML models, namely Random Forest (RF), CatBoost, Multi-layer Perceptron (MLP) neural network, and Logistic Regression (LR) for hypertension risk prediction. Along with anthropometric measures, their work utilized demographic and lifestyle data as inputs to the machine learning algorithms. The ten selected input features were age, gender, BMI, WC, family history, occupation, smoke, drink, healthy diet, and physical activity. The data were randomly divided into training and validation in the ratio of 4:1. During the training stage, the training set was divided into 9:1 for training and verification sets. On the test set, the models’ performance was measured by the Area Under Curve (AUC), accuracy, sensitivity, and specificity. They concluded that RF performed the best with AUC = 0.92, accuracy = 0.82, sensitivity = 0.83, and specificity = 0.81.
In another research by Boutilier et al. [
15], ML models were used to develop risk stratification algorithms for diabetes and hypertension. Five ML models, including Decision Tree, regularized Logistic Regression, k-Nearest Neighbor, RF, and AdaBoost, were developed and tested in their study. The input of the models included data from the questionnaire: weight, height, waist circumference, blood pressure, heart rate, and blood glucose, and the output for hypertension classification was based on the assessment of the medical doctor. Using AUC to measure the performance of the models, they discovered that RF (0.792) performed slightly better than Logistic Regression (0.776), followed by AdaBoost (0.770), k-Nearest Neighbor (0.705), and Decision Tree (0.610). The sample size employed in the study was 2278, with an average age of 50.6.
A three hidden layers Artificial Neural Network (ANN) model was developed as a classification model for hypertension patients using gender, race, BMI, age, smoking, kidney disease, and diabetes in [
16]. Using an imbalanced dataset of 24,434, with 69.71% non-hypertensive patients and 30.29% hypertensive patients, the model was compared with decision forest, Logistic Regression, Support Vector Machine, boosted Decision Tree and Bayes point machine. The ANN model developed managed to achieve a sensitivity of 40%, a specificity of 87%, precision of 57.8%, and a measured AUC of 0.77. They concluded that the accuracy of the approaches is relatively similar when compared to the other five ML models, but that the AUC and
F1-score of the ANN method are somewhat higher and more competitive.
As evidenced by the review, various ML models were developed and employed for hypertension prediction. It was observed that different performance metrics were used to choose the best model. This makes it difficult for field researchers to select the most appropriate candidate for this. Furthermore, the researchers compared three to six ML models without concentrating on the three categories of supervised algorithms, namely neural network, ensemble model, and classical model. In terms of input features, it is obvious that anthropometric measurements were not the only data used in the models produced. Aside from demographic data, lifestyle data, such as smoking, a healthy diet, and physical data, as well as physiological data, such as blood pressure, heart rate, and blood glucose, were used. As self-reporting lifestyle parameters are subjective [
17], and there is a need for specialized instruments to acquire physiological data, our study will investigate the use of anthropometric measures along with easily collected demographic data for hypertension prediction using ML models. Furthermore, studies on the associations between anthropometric measures and hypertension in adolescents are relatively limited when compared to adults [
4,
18]. Therefore, our study intends to fill this research gap. In our previous work [
19], we used anthropometric measurements and simple demographic data to develop a one hidden layer of 50 neurons Multilayer Perceptron (MLP) neural network to predict hypertension in adolescents, yielding a sensitivity of 0.41, specificity of 0.91, precision of 0.65,
F1-score of 0.50, accuracy of 0.76, and AUC of 0.75. In this study, we extend our previous work by investigating the efficacy of thirteen different ML models: Logistic Regression, Decision Tree, Random Forest, Support Vector Machine, Naïve Bayes, k-Nearest Neighbor, Multilayer Perceptron, Gradient Boosting, XGBoost, LightGBM, CatBoost, AdaBoost, and LogitBoost, from the three supervised ML categories of neural network, ensemble model and classical model for hypertension prediction in adolescents using anthropometric measurements and simple demographic data. In order to tackle the imbalanced data problem, we implement and evaluate the Synthetic Minority Over-sampling Technique (SMOTE) and the combination of SMOTE with under sampling. To the best of our knowledge, this study is the first to look at the efficacy of ML models for hypertension prediction utilizing thirteen distinct algorithms of the three supervised ML categories in adolescents. In this study, we would be investigating if simple anthropometric measurements are viable for hypertension prediction using machine learning models, and what are the effects of the different models on the prediction results? The objectives of the study are two-fold: (a) investigate the feasibility of anthropometric measurements and simple demographic data for hypertension prediction, and (b) implement, evaluate, and analyze the performance of the thirteen different ML models for hypertension prediction in adolescents using easy-to-collect data.
4. Discussion
Machine learning techniques are increasingly being used to predict hypertension. However, the models’ comparability and effectiveness in real-world applications have been hampered by the incorporation of diverse features and learning techniques. In this work, we examined the possibility of employing simple anthropometric factors to predict hypertension. Additionally, we examined the effects of various machine learning approaches on this prediction using only basic anthropometric data. The use of simple anthropometric measures promises a simple, straightforward, affordable, and practical technique of predicting hypertension, particularly when a blood pressure monitor is not accessible. We defer the use of additional physiological data, such as blood pressure and heart rate, as well as self-reported lifestyle parameters, for the sake of simplicity and objective input features.
In terms of machine learning models, none of the 13 ML models scored well across all seven performance metrics: accuracy, precision, sensitivity, specificity,
F1-score, misclassification rate, and AUC. Except for sensitivity, LightGBM, Random Forest, CatBoost, and XGBoost are the four leading models in the six performance criteria. The performance rankings of these four models, on the other hand, are inconsistent across these six performance measures. On the other hand, the Decision Tree has the lowest performance across all performance measures. In this study, we investigated three different types of supervised learning algorithms: neural networks, ensemble models, and classical models. The models in each of these categories are listed in
Table 9. The kNN model beat the other classical models in terms of AUC and
F1-score but lagged behind the Naïve Bayes model in terms of accuracy, precision, specificity, and misclassification rate. The Logistic Regression model performed the best in terms of sensitivity. While ensemble models are known to produce more accurate results than classical models, their performances for certain types of these models are not as good as classical models. AdaBoost, for example, falls short of kNN in terms of AUC. On the other hand, the MLP model performs modestly across the board, although it is an interesting model for sensitivity.
The results revealed that each model outperformed the others in terms of the numerous performance metrics used. Because of this, selecting the most appropriate model for practical application might be challenging. For the purpose of selecting a realistic model, we propose that Bayes’ Theorem be used to verify the model’s applicability before selecting it. Using Bayes’ Theorem, we can determine how well a model performs in a particular population when the prevalence of a specific condition is taken into consideration [
19]. Bayes’ Theorem is a calculation of the posterior probability based on the mathematical formula shown in (7):
P(A): prevalence of adolescent hypertension in Sarawak population = 0.301
P(B): probability of the model returning positive
P(B|A): probability of event B given event A occurring
Table 10 presents a summary of the findings acquired through the use of Bayes’ Theorem. From this table, the top three performing ML models are LightGBM (0.5799), Random Forest (0.5542), and XGBoost (0.5397), whereas the bottom three are Decision Tree (0.3788), SVM (0.4471), and LogitBoost (0.4649). With the highest performing model, LightGBM predicts an adolescent in the Sarawak adolescent population of 200,130, with a hypertension prevalence of 30.1%, will have a 57.99% chance of being hypertensive if he or she is predicted as hypertensive using this model. If Decision Tree was chosen as the prediction model, an adolescent who is predicted to be hypertensive has a 37.88% chance of being hypertensive.
Although the study discovered that applying Bayes’ Theorem to the prediction of hypertension in adolescents using simple anthropometric data results in only a moderate level of reliability for the best performing machine learning model, this value is only applicable to the Sarawak adolescent population, which has a hypertension prevalence of 30.1%. Additionally, the predictive ability of anthropometric measurements for hypertension varies by nation and ethnicity [
13], hence, concluding that the use of basic anthropometric data is usually inapplicable for ML hypertension prediction would be biased. Additionally, we would like to emphasize that, in contrast to other research in [
14,
15,
16], our study utilized only simple anthropometric measures. A noteworthy finding from this study is that the disparities across models are discernible for the various models employed. As a result, when selecting a prediction model, it is critical to evaluate the appropriate model.
5. Conclusions
In this study, we managed to use simple anthropometric measurements for hypertension prediction in adolescents of the Sarawak population using 13 machine learning models. We had developed machine learning algorithms from three different supervised machine learning categories; namely, neural network, ensemble models and classical models. The feature dependency was evaluated using the correlation coefficient to eliminate redundant features. The original imbalanced dataset was resampled using SMOTE with random undersampling. While developing the ML models, grid search was used in order to find the optimal hyperparameters. The models were trained using 10-fold cross-validation using the resampled training dataset and the trained models were tested using the testing dataset. Seven performance metrics were used to evaluate the trained model.
According to the results of the study, the best-performing model was LightGBM, while the lowest-performing model was Decision Tree. Although the majority of the ensemble models outperformed the classical models, several ensemble models underperformed the classical models. We determined that the use of basic anthropometric measures for adolescents with hypertension in the Sarawak community is minor when using Bayes’ Theorem. In other words, the model could not be utilized as a clinical decision-making tool to diagnose hypertension in adolescents in this population. The model, on the other hand, might serve as an early warning system for individuals who may be hypertensive, particularly when a blood pressure monitor is not available. We also showed that there is a considerable difference between the results obtained from the different prediction models used. Our study is valuable as it will pave the way for future researchers to provide a better technique for generating a simple, inexpensive, straightforward, and reliable way to predict hypertension based on anthropometric measurement.