1. Introduction
Maternal health remains a critical global health priority, serving as a key indicator of healthcare quality, equity, access and population health. Despite progress in some regions, maternal mortality, defined as the death of a woman during pregnancy, childbirth or within 42 days of termination of pregnancy, remains a significant challenge, with approximately 287,000 women dying during pregnancy or childbirth in 2020 [
1]. The global maternal mortality rate (MMR) in 2021 was 158.8 deaths per 100,000 live births, which is higher than the 2020 rate of 157.1 deaths per 100,000 live births [
2]. Sustainable development goal (SDG) 3.1 underscores the urgency of this issue, aiming to reduce the global MMR to fewer than 70 deaths per 100,000 live births by 2030 [
1,
3]. However, trends over the past decade reveal stark disparities: while many nations have made substantial progress through investments in healthcare infrastructure and maternal health education, regions like sub-Saharan Africa and South Asia continue to experience high MMR far from the SDG target, with an MMR of 536 and 134, respectively [
1,
4]. This is due to the inadequate access to skilled birth attendants, emergency obstetric care, and, subsequently, high rates of complications such as postpartum hemorrhage and eclampsia. According to the World Health Organization (WHO), maternal deaths are overwhelmingly concentrated in low- and middle-income countries (LMICs), with sub-Saharan Africa accounting for nearly 70% of maternal deaths and Southern Asia accounting for around 16% [
1,
4].
Specifically, the maternal mortality rate in LMICs in 2020 was 430 per 100,000 live births versus 13 per 100,000 live births in high-income countries [
5]. LMICs bear the highest burden of maternal mortality, primarily driven by systemic barriers that limit access to essential healthcare services. Healthcare gaps in these regions include severe shortages of skilled birth attendants, inadequate infrastructure, and limited availability of prenatal monitoring, particularly in rural and underserved areas [
6]. Many regions solely rely on basic metrics such as vital signs due to a lack of advanced diagnostic tools that could enable early detection and management of pregnancy-related complications. Socio-economic disparities further exacerbate the problem, with poverty restricting access to care and inadequate funding for healthcare systems perpetuating poor maternal health outcomes. Additionally, cultural practices, such as home births without skilled care and delayed healthcare-seeking behavior, contribute significantly to maternal mortality. Finally, occasionally, geographic barriers, including remote locations and lack of transportation, make it difficult for women in regions like sub-Saharan Africa and rural South Asia to access emergency obstetric care [
7]. Relevant to the population considered in this study, in rural Bangladesh, for instance, maternal health challenges are compounded by a combination of limited healthcare facilities, cultural norms prioritizing home births by traditional birth attendants (TBA), and significant transportation hurdles [
8,
9]. These systemic barriers highlight the need for targeted interventions to strengthen healthcare systems in LMICs, expand access to skilled care, develop convenient and accessible technology-based solutions such as remote health monitoring, AI-driven tools, and other digital health innovations, and address the underlying socio-economic and cultural challenges that hinder equitable healthcare delivery [
10,
11].
Among the various technology-based solutions such as remote health monitoring and AI-driven tools, the potential of minimal data, such as vital signs, for predicting maternal health risks is increasingly recognized as a cost-effective and scalable approach for improving outcomes in resource-constrained and rural settings [
12]. Simple metrics like blood pressure and heart rate, which are universally collected during prenatal visits, require minimal medical training to measure and could provide meaningful insights into maternal health risks [
13,
14]. For example, elevated blood pressure is a well-established marker for preeclampsia, a leading cause of maternal mortality, while abnormal heart rates can signal underlying cardiovascular issues [
15]. Prior studies have demonstrated the efficacy of vital signs as predictors for health risks in other domains, such as cardiovascular health, where they have been used to predict hypertension and heart disease with high reliability [
16,
17]. Additionally, in acute care and hospital settings, vital signs can serve as early indicators of imminent patient deterioration [
18,
19]. Drawing from these parallels, leveraging vital signs as potential predictors in maternal health contexts offers an opportunity to develop predictive models that are both effective and accessible. These solutions could significantly impact LMICs, where advanced diagnostic tools are often unavailable, and healthcare providers must rely on easily measurable and interpretable data to guide interventions. By focusing on the scalability and simplicity of vital sign-based risk prediction tools, health systems can empower providers to detect and address maternal health complications early and remotely, ultimately reducing maternal mortality in resource-limited or constrained settings [
20,
21].
The last several years have seen a significant increase in translational research where researchers and practitioners have developed and deployed machine learning algorithms to predict outcomes and manage diseases such as diabetes, cardiovascular conditions, and sepsis, where it has demonstrated superior accuracy in identifying high-risk patients compared to traditional methods [
22,
23,
24]. Unlike traditional statistical models, machine learning algorithms’ ability to handle complex, nonlinear relationships and high-dimensional data allow them to uncover patterns and interactions that might otherwise go unnoticed [
25]. These advantages make ML particularly promising for risk assessment and decision-making in clinical settings. While ML is increasingly utilized in healthcare, its application to maternal health remains limited, particularly in low- and middle-income countries (LMICs) [
26,
27]. However, there is significant potential for ML models to address maternal health challenges by enabling early detection of risks such as preeclampsia and postpartum hemorrhage, even in resource-constrained environments. Integrating ML into low-resource settings is becoming feasible through mobile or cloud-based applications, which can bring advanced analytics to frontline healthcare providers and improve patient outcomes and health equity [
28,
29,
30].
As discussed above, despite the significant maternal mortality burden in LMICs, there is a notable lack of research focused on leveraging minimal data, such as vital signs for maternal health risk prediction, focusing on these areas. Additionally, current research studies often overlook how machine learning algorithms could help address these gaps, especially in resource-constrained settings with limited access to advanced diagnostics. One primary reason could be the lack of m-health solutions deployed in these areas, which limit the data availability for researchers to develop models. Additionally, most machine learning models require large datasets and might be unable to train and learn using sparse data and features. Furthermore, even across developed countries, the viability and comparative performance of different ML algorithms in predicting maternal health risks remain unexplored.
To address these research gaps identified through the literature review, we formulate two objectives for this study. First, we evaluate the predictive capability of sparse data, including age, blood pressure, temperature, heart rate, and blood glucose, for identifying maternal health risks using publicly available vital signs data from rural Bangladesh. Next, we compare the performance of various machine learning models and sampling methods using statistical tests to determine the best performing model to predict maternal health risk. We hypothesize that machine learning algorithms utilizing sparse data will effectively predict maternal health risk. Furthermore, we hypothesize that the ensemble model utilizes stratified sampling to outperform the other models and sampling approaches. The overarching aim of this study is to contribute to developing the literature on data-driven healthcare tools that can improve maternal outcomes and health equity and reduce global health disparities. The findings could pave the way for scalable applications in LMICs, offering healthcare providers robust, predictive tools.
The paper is structured as follows: we first describe the data used in the study, outline the data preprocessing steps and features, machine learning model development and hyperparameters, performance evaluation metrics and statistical analysis used for model comparison. Next, we present findings detailing the performance of individual models under different sampling techniques, the effectiveness of the ensemble approach, and insights from the comparative analysis. Finally, we provide recommendations and future research directions, focusing on deploying machine learning models and practical challenges associated with scaling the models developed in this research in resource-constrained settings for predicting maternal health risk.
2. Materials and Methods
2.1. Data
In this study, we used the publicly available maternal health risk dataset, which includes 1014 records of maternal health data collected from pregnant women in rural areas of Bangladesh through the IoT-based risk monitoring system [
31]. It includes features such as age, systolic and diastolic blood pressure, heart rate, and blood glucose levels, along with a target variable that classifies maternal health risk into three categories: low, medium, and high.
Table 1 presents descriptive characteristics of predictor variables.
The data did not include missing values, and no imputation methods were required. The dataset is multi-class (low risk, medium risk, high risk) but exhibits class imbalance, as low-risk cases dominate younger age groups. In contrast, high-risk cases are more prevalent among older maternal ages and elevated blood sugar or blood pressure levels. In the dataset, each record corresponded to an individual patient record at a single point and did not include time series or longitudinal tracking. The publicly available data is anonymized to ensure privacy, aligning with ethical standards for using protected health information (PHI), and is available under a Creative Commons Attribution 4.0 International (CC BY 4.0) license for reuse. Throughout the study, all the features were retained, and we tested the models with and without feature scaling but did not observe any significant improvements. Moreover, Random Forest, XGBoost, and CatBoost are tree-based algorithms that are unaffected by the input features’ scale as they work on the order and ranking of the features. Finally, for all the different models evaluated in this study, we used an 80:20 train test split with stratified sampling beyond the random sampling to ensure a proportional representation of all risk levels (low/medium/high) in both subsets, mitigating class imbalance effects during evaluation while preserving the dataset’s inherent distribution. Given the dataset’s limited scope of six features, we did not eliminate any features to retain all potentially meaningful variables. However, we evaluated the relevance of each feature through two complementary techniques: an F-value comparison to assess linear relationships and a Mutual Information Score comparison to capture potential non-linear dependencies.
Figure 1 below shows that Blood Sugar (BS) consistently ranked highest across both methods, indicating a strong relationship with the target variable. Systolic BP and Age also showed notable relevance, particularly in the Mutual Information analysis, which suggests non-linear contributions. Future research should expand the feature set and incorporate more rigorous importance assessments to evaluate each variable’s contribution and clinical implications in a broader context.
A prior study using this dataset utilized a logistics model tree (LMT) for maternal health risk prediction [
31]. While their predictions were highly accurate, our research focuses on improving this by exploring better approaches, such as ensembling and different sampling methods, while using multiple performance metrics for model evaluation.
2.2. Machine Learning Models
This study aims to predict categorical variables using supervised machine learning classification algorithms. The methodologies investigated include multinomial logistic regression, support vector machines (SVM), decision trees, random forests, and gradient boosting methods: gradient boosting machines (GBM), XGBoost, and CatBoost.
Multinomial logistic regression (MLR) and decision trees (DT) represent two distinct approaches to classification. Multinomial logistic regression extends binary logistic regression to multi-class problems by modeling the probability of each categorical outcome using a linear combination of input features. It calculates the log odds of each class relative to a reference category and transforms these into probabilities using the softmax function, enabling simultaneous prediction across multiple classes. This approach allows coefficients to be interpreted as the effect of predictors on the log odds of an outcome relative to the reference category, making it valuable for understanding relationships between features and outcomes. Decision trees predict outcomes by recursively partitioning the data space based on feature splits. These splits are chosen to maximize class separation using impurity measures such as Gini impurity or entropy. Gini impurity measures the likelihood of misclassification within a node, while entropy quantifies uncertainty or disorder in class distributions. Both metrics aim to maximize information gain at each split, ensuring nodes become increasingly “pure” as the tree grows. However, decision trees are prone to overfitting due to their hierarchical structure and lack of regularization mechanisms. A random forest algorithm can be used to address these decision tree issues. Furthermore, gradient boosting, XGBoost, and CatBoost are boosting methods that build an ensemble model by combining several weak decision trees sequentially and correcting errors made by previous models.
Random forests aggregate predictions from multiple decision trees trained on bootstrapped samples of data, reducing overfitting through ensemble averaging and feature subsampling. The inherent regularization provided by random forests avoids explicit penalty terms, making them robust for classification tasks. Gradient boosting methods (GBM, XGBoost, and CatBoost) iteratively build weak decision trees that correct errors made by previous models. Both CatBoost and XGBoost incorporate explicit regularization during training—CatBoost uses l2_leaf_reg for L2 regularization on leaf values, while XGBoost employs lambda for L2 penalties on weights—to control model complexity and prevent overfitting.
Hyperparameter tuning was performed to enhance the performance of the machine learning models and mitigate challenges such as overfitting, underfitting, and variability in predictions. While default hyperparameter settings offer a baseline, they often fail to fully capture the complexity of real-world datasets or leverage the full potential of the algorithms. For example, gradient boosting methods like CatBoost and XGBoost rely on parameters such as learning rate and tree depth to balance model complexity and stability during training. Similarly, ensemble methods like random forests benefit from tuning parameters such as the number of trees (n_estimators) and feature subset size (max_features) to improve generalization and reduce overfitting. To identify optimal configurations, we systematically explored the hyperparameter space using RandomizedSearchCV, aiming to maximize accuracy while minimizing variance across cross-validation folds. This process was conducted using 10-fold cross-validation to ensure that the tuning results were both reliable and generalizable.
From
Figure 2 below, it is evident that hyperparameter tuning improved the accuracy and reduced variability across models, as evidenced by the boxplots below, which show higher median accuracy and narrower interquartile ranges from cross-validation results for CatBoost, XGBoost, and random forests.
Hyperparameter tuning revealed optimal performance with n_estimators = 179 (number of trees) and max_features = ‘log2’ (feature subset size), balancing model complexity and generalization. CatBoost achieved optimal performance with learning_rate = 0.0367 and max_depth = 14, capturing complex patterns while maintaining stability during training. XGBoost performed best with a max_depth of 10, balancing predictive power and overfitting prevention.
We implemented a stacking ensemble approach that stemmed from the critical need for robust, generalizable predictions in healthcare applications, where accuracy directly impacts outcomes. While base models like random forest, CatBoost, and XGBoost each demonstrated strengths—such as reducing overfitting, handling categorical data, and optimizing gradient boosting with regularization—relying on a single model risked overlooking nuanced patterns or inheriting algorithm-specific biases.
A weighted average ensemble was designed to address these limitations, dynamically integrating predictions from all three models. The weights assigned to each model were determined by their classification accuracy, ensuring that more reliable models contributed more significantly to the final prediction. Specifically, random forest’s weight reflected its stability across training folds, CatBoost’s weight was adjusted for its categorical feature handling, and XGBoost’s weight accounted for its efficiency in gradient boosting and regularization. This weighted integration allowed the ensemble to dynamically combine model-specific strengths while mitigating errors from less reliable predictions. The framework achieved the reliability required for high-stakes medical predictions by mitigating model-specific errors and leveraging collective intelligence.
Figure 3 below illustrates the architecture of the ensemble model, highlighting the integration of multiple base models and the weighted averaging mechanism employed to enhance predictive performance.
2.3. Performance Metrics
To evaluate the performance of machine learning models in predicting maternal health risks, we utilize four key metrics: accuracy, precision, recall, and F1 score. Accuracy measures the proportion of correctly classified cases among all cases, providing an overall assessment of model performance. Precision evaluates the proportion of correctly identified positive cases among all predicted positives, reflecting the model’s ability to minimize false positives. Recall, also known as sensitivity, assesses the proportion of actual positive cases correctly identified by the model, emphasizing the detection of true positives—a crucial metric in maternal health where identifying high-risk cases is critical. The F1 score, a harmonic means of precision and recall, balances these two metrics to provide a single measure that accounts for false positives and false negatives. Using a combination of these metrics offers a comprehensive evaluation framework, enabling the comparison of models based on their ability to accurately, reliably, and effectively identify maternal health risks in resource-limited settings.
After computing performance metrics, we employed stratified 10-fold cross-validation to compare the models. This method allowed us to preserve class distributions in each fold, which is critical, given the class imbalances in the datasets. To further assess whether differences in the model performance were statistically significant, we performed an ANOVA test. In the case of significant findings, we performed a post hoc test to identify specific differences between different machine learning models.
3. Results
We start by presenting the performance metrics for each individual machine learning model. The analysis initially employs random sampling to generate training and test subsets, followed by stratified sampling to ensure consistent class distribution across these subsets. For random sampling, data points are randomly selected to form training and test subsets, giving each data point an equal probability of inclusion. In stratified sampling, subsets are created by maintaining equal population proportions based on maternal risk levels, ensuring consistent class distribution across the splits. Finally, we present the findings from the ensemble model, which was developed by combining the best-performing individual models identified during the analysis. This approach leverages the complementary strengths of these models to enhance accuracy and robustness.
Table 2 below presents the performance of different machine learning models while using random sampling for predicting maternal health risks across high-, low-, and mid-risk categories. Logistic regression–Gaussian naive Bayes achieved an accuracy of 57.6%, with reasonable precision for high-risk cases (0.79) but notably poor recall for mid-risk cases (0.18), indicating its limitations in detecting this category. Support vector machines (SVM) improved overall accuracy to 59.6% and showed a slightly better recall for high-risk cases (0.60), though performance for mid-risk cases remained low (precision: 0.61, recall: 0.25). Gradient boosting significantly increased accuracy to 73.9%, improving precision and recall across all risk levels, particularly high-risk cases (F1 score: 0.81). Decision trees and random forests further improved accuracy to 80.3% and 81.3%, respectively, delivering robust performances across all risk levels, with F1 scores for high-risk cases reaching 0.86 and mid-risk cases reaching 0.78 and 0.79, respectively. XGBoost showed a notable improvement with an accuracy of 84.2%, achieving high precision (0.87) and recall (0.82) for low-risk cases and an F1 score of 0.83 for mid-risk cases. Finally, CatBoost emerged as the best-performing model with the highest accuracy of 84.7%, excelling in high-risk predictions with an F1 score of 0.91 and delivering robust performance for low-risk and mid-risk cases with F1 scores of 0.85 and 0.80, respectively. While using random sampling, CatBoost and XGBoost performed the best, with CatBoost demonstrating the most robust and reliable performance across all risk categories.
Next, we present
Table 3, which outlines the performance of various machine learning models using stratified sampling for predicting maternal health risks across high-, low-, and mid-risk categories. A comparison between the results from stratified sampling and random sampling reveals a consistent improvement in the predictive performance of all machine learning models.
Logistic regression–Gaussian naive Bayes shows a significant increase in accuracy from 57.6% to 65.5%, with substantial gains in precision, recall, and F1 scores across all risk categories, particularly mid-risk cases (F1 score improvement from 0.28 to 0.82). Support vector machines (SVM) exhibit a modest accuracy increase from 59.6% to 61.5. However, the model efficacy in predicting the mid-risk cases does not improve, as reflected by the low F1 score (0.36 in the first table to 0.35 in the second). The performance of the gradient boosting models also increased significantly from 73.9% to 81.3%, with improved precision and recall for high-risk cases (F1 score improving from 0.81 to 0.93) and low-risk cases (F1 score improving from 0.73 to 0.82). Decision trees also show a substantial performance improvement, with accuracy increasing from 80.3% to 85.7% and robust precision, recall, and F1 scores across all risk levels, particularly mid-risk cases, where the F1 score rose from 0.78 to 0.83. Similarly, the performance of random forest models also improved significantly from 81.3% to 85.7%, with gains in low-risk precision (0.86 to 0.89) and mid-risk recall (0.84 to 0.79).
As observed with random sampling, XGBoost and CatBoost remain the best-performing models. However, unlike the results from random sampling, XGBoost achieves the highest accuracy among all models, increasing from 84.2% to 86.7%, while CatBoost maintained its accuracy at 84.7%. XGBoost showed a significant increase in the F1 score for high-risk cases (from 0.86 to 0.93) and consistently robust performance across all risk categories. Overall, except for CatBoost, all the other models demonstrated significant performance improvements using stratified sampling. These improvements highlight the efficacy of refined sampling methods and parameter tuning in enhancing model reliability for maternal health risk prediction.
Based on the performance of machine learning models under both random and stratified sampling, we developed an ensemble machine learning model combining the three best-performing models: random forest, XGBoost, and CatBoost. We selected these three models because, beyond their high overall accuracy in classifying the maternal risk, we observed that random forest exhibited strong recall and performance in predicting low-risk categories. At the same time, XGBoost achieved the highest accuracy (86.7%) and F1 scores, performing best in high-risk cases (F1: 0.93). Meanwhile, CatBoost maintained a consistent performance across both sampling methods and achieved the highest precision for high-risk categories (0.94). Overall, our ensemble modeling approach aimed to leverage the individual strengths of each machine learning model to develop a better-performing model with higher classifying capability.
Table 4 below presents the performance of the ensemble model.
The ensemble model outperformed the individual machine learning models, achieving the highest overall accuracy of 87.2%. It also enhances recall and F1 scores for low-risk categories (recall: 0.91, F1: 0.89) compared to all three individual models. Additionally, the ensemble achieved a robust prediction performance across all risk levels, emphasizing its ability to combine the capabilities of individual models to achieve better performance.
Figure 4 below represents the accuracy of different models using stratified 10-fold cross-validation results, demonstrating that the ensemble model achieved the highest prediction accuracy, outperforming both the CatBoost and XGBoost models. Notably, the ensemble model exhibited a tighter confidence window, as evidenced by its compact interquartile range, indicating more consistent performance across different data partitions than other high-performing models. This stability in prediction accuracy further suggests that the ensemble approach effectively leverages the strengths of multiple algorithms while mitigating their weaknesses. The random forest model significantly underperformed in comparison to other models.
ANOVA tests confirmed statistically significant differences between the model performances (p < 0.001), and subsequent post hoc tests revealed that while the differences between CatBoost and XGBoost were not statistically significant at the 0.05 level, they both significantly outperformed the random forest model (p < 0.001 for all pairwise comparisons with random forest). Furthermore, we observed that the ensemble models outperformed (p < 0.05) for all pairwise comparisons with three models.
4. Discussion
This study investigated the potential of developing machine learning models to predict maternal health risks using sparse data and features, such as vital signs in resource-constrained settings such as LMICs. Using minimal data and sparse features collected from pregnant women in the rural areas of Bangladesh using an IoT-based risk monitoring system, we demonstrate the potential of sparse data trained with machine learning algorithms to predict maternal health risks. Among the individual models, XGBoost and CatBoost consistently achieved the highest performance, with XGBoost showing the highest accuracy (86.7%) and F1 scores (0.93) for high-risk cases under stratified sampling. Across sampling techniques, we observed that stratified sampling consistently provided better performance compared to random sampling. Furthermore, we observed that the ensemble model developed using random forest, XGBoost, and CatBoost further improved overall performance, achieving an accuracy of 87.2% and significant improvements in precision, recall, and F1 scores across all risk categories, supporting our hypothesis. This observation highlights the need to combine various machine learning models to leverage their specific strengths and sampling methods to improve accuracy and robustness.
Our findings align with prior studies that have developed and deployed machine learning algorithms for risk prediction in healthcare, such as cardiovascular disease and diabetes. However, using sparse data, our work addresses a significant gap by applying these techniques to maternal health risk prediction in LMICs, which is often overlooked. This is a critical gap as most prior studies rely extensively on large datasets, which often include subjective and objective data, including medical history, to achieve high prediction accuracy. Maternal health data often lacks comprehensive longitudinal records and does not contain highly variable but relevant socio-economic and cultural factors. Findings from this research contribute to the growing maternal health research by emphasizing the capability of minimal data, such as vital signs, and integrating it with machine learning algorithms to classify maternal health risks with high accuracy. Furthermore, observations from this study highlight that vital sign disturbances can be indicative of imminent decline from a plethora of conditions in pregnant mothers, including cardiomyopathy, thrombotic embolism, hypovolemic shock secondary to hemorrhage, underlying cardiac and coronary conditions, infections, and hypertensive disorders of pregnancy [
32,
33,
34,
35]. These conditions remain the leading cause of preventable maternal deaths and hospitalizations globally [
36].
Developing an ensemble model that combines the best-performing individual models that leverages a stratified sampling approach and uses sparse data is novel in maternal health research. As seen from the results, this was the best-performing model because using stratified sampling ensured consistent class representation, reduced biases, and improved model performance compared to random sampling. Along with the sampling method, the ensemble model that combined random forest, XGBoost, and CatBoost addressed the individual limitations of the model, leading to an improvement in the overall predictive performance across all risk levels. Specifically, we observed that this model was more accurate in classifying high-risk cases while maintaining robust performance for low- and mid-risk categories. This ability of the ensemble model to accurately clarify high-risk cases allows for quick medical intervention, which is crucial in resource-constrained settings, where access to healthcare and transportation are often limited. Moreover, its ability to accurately classify maternal risk using sparse data makes the developed model deployable, cost-effective, and practical for implementation in LMICs. Beyond implementation in LMICs, the proposed system’s seamless integration with existing community health networks—leveraging patient-specific electronic health records (EHR) makes it a valuable tool for improving healthcare access in rural and underserved populations across the United States [
37].
Although our study contributes significantly to maternal health research by developing machine learning models that use sparse data and features to predict maternal health risks and provide insights that can enhance the adoption of IoT devices and m-health applications in LMICs and resource-constrained settings, this study has limitations. Firstly, the dataset used in this study might not fully represent the diversity of maternal health cases observed across various LMICs, potentially limiting generalizability. Specifically, this study uses a relatively small dataset (~1000 data points) from one South Asian country and misses out on sub-Saharan Africa, which has the most LMICs. Furthermore, from a modeling standpoint, while the ensemble model improved the classifying accuracy compared to the individual models, we observed that the ensemble model did not perform well in classifying mid-risk cases where it reported lower recall. We hypothesize two potential reasons for this: the limited dataset, which includes roughly 1000 patients and, more critically, the sparse features available in the data. The latter is the most significant factor because it could be possible that the features used in this study are not clear enough to delineate the difference between the mid-risk from low- and high-risk, causing an overlap leading to incorrect predictions. From a clinical aspect, this could impact large-scale implementation. Finally, beyond the challenges discussed above, although relying on limited data to classify maternal health risks makes it easy for scalability in resource-constrained settings, this approach could overlook other critical data points, such as socio-economic, behavioral and medical history data, that can potentially improve classification accuracy.
Future research should prioritize testing and validating the ensemble model developed in this research using larger datasets with more features from diverse LMICs, mainly from sub-Saharan Africa—where maternal health disparities are most pronounced and often overlooked. Furthermore, investigating additional features not included in this data could shed more light on predictors of maternal health risk, especially with the mid-risk cases. Some features worth exploring include social determinants of health, prior medical history, and comorbidities. Beyond investigating larger datasets, it is critical to explore diverse data from various LMICs and other resource-constrained areas from developed countries where features such as accessibility to medical care, community of residence etc., can potentially act as a strong predictor. Furthermore, piloting the system in different healthcare settings, including larger hospital systems, private practices etc., is crucial to assessing the algorithm’s potential in reducing maternal morbidity and mortality. Research should also evaluate its integration with existing community health frameworks and explore its scalability to other health conditions.
While these are some potential factors that future researchers should consider, implementation may face challenges, including limited internet connectivity, digital literacy barriers, adoption by healthcare providers and communities, and data security concerns. Understanding local capabilities, such as digital literacy levels and the willingness of traditional birth attendants (TBAs) and patients to adopt intelligent technology, is essential for successful integration into prenatal care. Finally, researchers should develop deployment frameworks using secure mobile or cloud-based platforms to improve accessibility and scalability in LMICs.
5. Conclusions and Future Directions for Deploying IoT
Integrating IoT devices and m-Health applications for maternal health monitoring and prenatal care offers a possibility for early risk detection and medical intervention, particularly in resource-constrained communities, including LMICs that may have limited access to medical care. IoT-enabled wearable and portable biosensors, such as smart blood pressure monitors, pulse oximeters, and non-invasive glucose monitors, can continuously collect real-time physiological data from pregnant women. The data collected via these devices using built-in sensors are transmitted using communication protocols such as Wi-Fi, Bluetooth, cellular (4G/5G), or LoRaWAN. Given the sensitive nature of data and complying with HIPAA, the data are secured using encryption protocols SSL/TLS to the respective cloud storage. The patient’s medical care provider can access the data from the cloud through the secure EHR as necessary for proactive monitoring.
Specifically, combining machine learning models like those developed in this study with data collected from IoT devices allows for real-time monitoring and alerting medical providers and patients with any early warning, such as preeclampsia, gestational diabetes, or hypertensive disorders. By leveraging IoT technology, maternal health risk prediction can shift from periodic clinical assessments to a continuous, real-time system that identifies adverse trends before they escalate into life-threatening conditions [
38,
39]. Additionally, integrating the framework discussed above can empower community health workers (CHWs) and traditional birth attendants (TBAs) with actionable insights, enabling them to make data-driven decisions and intervene early in high-risk cases to provide appropriate medical care. Furthermore, given the critical role CHWs and TBAs play in LMICs and resource-constrained settings, equipping them with IoT-enabled devices linked to AI risk prediction systems can significantly improve maternal health outcomes and reduce the burden on overcrowded hospitals.
To implement the machine learning models developed in this research in LMICs and other research-constrained settings, it is critical to have a robust IoT architecture while considering the challenges faced in these settings, such as limited internet connectivity, power supply constraints and affordability. Additionally, a scalable and cost-effective IoT framework should integrate multiple components, including real-time data collection, edge computing, cloud analytics, and decision support systems. Wearable or portable IoT devices should capture maternal health indicators such as blood pressure, heart rate, oxygen saturation, and temperature while offline to accommodate areas with poor network coverage [
40,
41]. As discussed above, while it is preferable to have machine learning models deployed on the cloud for anomaly detection and assessing maternal health risk based on data collected from IoT devices, edge computing devices, such as mobile phones or low-power processing hubs, can enable preliminary data analysis at the point of collection, reducing reliance on cloud computing and connectivity issues while ensuring real-time anomaly detection [
42]. While local edge computing devices enable real-time anomaly detection from biosensors, maintaining privacy and offline functionality, these devices have computational, storage, and battery limitations, making scaling challenging for anomaly detection and long-term monitoring. Future researchers should prefer cloud-based computing for early maternal risk detection and continuous patient monitoring.
For data transmission to cloud servers, we recommend a hybrid communication system using Bluetooth, GSM, and LoRaWAN to enhance connectivity in remote areas and LMICs where traditional internet access could be unreliable [
43,
44]. Once transmitted, the data can be securely stored and analyzed within a cloud-based system using machine learning algorithms developed to generate predictive insights and automated alerts for healthcare providers and community health workers. To adhere to HIPAA compliance and patient privacy, we recommend using an Advanced Encryption Standard with a 256-bit key (AES-256) encryption before transmission, with secure communication protocols such as TLS 1.2/1.3 protecting data in transit. As an enhanced security feature, anonymizing patient identifiers and other Protected Health Information (PHI) using de-identification and tokenization can minimize privacy risks. We recommend deploying machine learning models in containerized environments (e.g., Docker, Kubernetes) on HIPAA-compliant cloud platforms such as AWS HealthLake, Google Cloud Healthcare API, or Microsoft Azure for Healthcare. For model inference, secure enclaves (e.g., AWS Nitro, Intel SGX) can help prevent unauthorized access by leveraging role-based access controls (RBAC) and multi-factor authentication (MFA), restricting access to authorized healthcare providers. Finally, depending on the laws and compliance requirements, we also recommend adding a layer of security where PHI data can be encrypted at rest using FIPS 140-2 compliant encryption, with regular data integrity checks and audit logs monitoring access and modifications. Upon detecting high-risk cases, the system will trigger an alert system to medical providers. Furthermore, an SMS-based alert system can notify local clinics or CHWs about high-risk patients, facilitating timely medical care [
45,
46]. These security measures allow real-time maternal health monitoring while maintaining HIPAA compliance, data confidentiality, and integrity, facilitating timely and secure healthcare interventions in underserved regions.
While AI-driven adaptive models capable of functioning with limited or intermittent data can enhance the quality and speed of care, making it a viable solution for improving maternal health outcomes globally [
47,
48], there are several barriers to implementation, such as affordability, digital literacy, and data security. When considering resource-constrained settings such as rural communities and LMICs, the affordability of devices is a significant factor, and the cost of IoT devices must be minimized through strategies such as public-private partnerships and subsidized programs to ensure accessibility in low-resource settings [
47,
49]. Training community health workers and traditional birth attendants to use new health tools is essential for their successful implementation, as is building trust with patients about how their personal information is kept private and secure. Future research should focus on leveraging larger and more diverse data from LMICs, particularly in sub-Saharan Africa, where maternal health challenges are most severe and often overlooked. Furthermore, researchers should make a conscious effort to consider additional factors, such as social determinants, medical history, and other health issues, which could provide deeper insights into the risks faced by mothers, especially those at moderate risk. It is also important to study how factors like access to healthcare and living conditions influence maternal health in different regions, including resource-limited areas in developed countries. Testing these approaches in various healthcare settings, such as large hospitals and private clinics, can help determine their effectiveness in reducing complications and deaths during pregnancy. Additionally, research should examine how these strategies can be integrated into existing community health systems and whether they can be adapted to address other health concerns.