Next Article in Journal / Special Issue
An Improved Ensemble-Based Cardiovascular Disease Detection System with Chi-Square Feature Selection
Previous Article in Journal
Enhancing Brain Segmentation in MRI through Integration of Hidden Markov Random Field Model and Whale Optimization Algorithm
Previous Article in Special Issue
A Systematic Review of Using Deep Learning in Aphasia: Challenges and Future Directions
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Wireless Noninvasive Blood Pressure Measurement System Using MAX30102 and Random Forest Regressor for Photoplethysmography Signals

by
Michelle Annice Tjitra
1,2,
Nagisa Eremia Anju
1,
Dodi Sudiana
1,2 and
Mia Rizkinia
1,2,*
1
Department of Electrical Engineering, Faculty of Engineering, Universitas Indonesia, Depok 16424, Indonesia
2
Artificial Intelligence and Data Engineering (AIDE) Research Center, Faculty of Engineering, Universitas Indonesia, Depok 16424, Indonesia
*
Author to whom correspondence should be addressed.
Computers 2024, 13(5), 125; https://doi.org/10.3390/computers13050125
Submission received: 19 March 2024 / Revised: 2 May 2024 / Accepted: 10 May 2024 / Published: 17 May 2024
(This article belongs to the Special Issue Machine and Deep Learning in the Health Domain 2024)

Abstract

:
Hypertension, often termed “the silent killer”, is associated with cardiovascular risk and requires regular blood pressure (BP) monitoring. However, existing methods are cumbersome and require medical expertise, which is worsened by the need for physical contact, particularly during situations such as the coronavirus pandemic that started in 2019 (COVID-19). This study aimed to develop a cuffless, continuous, and accurate BP measurement system using a photoplethysmography (PPG) sensor and a microcontroller via PPG signals. The system utilizes a MAX30102 sensor and ESP-WROOM-32 microcontroller to capture PPG signals that undergo noise reduction during preprocessing. Peak detection and feature extraction algorithms were introduced, and their output data were used to train a machine learning model for BP prediction. Tuning the model resulted in identifying the best-performing model when using a dataset from six subjects with a total of 114 records, thereby achieving a coefficient of determination of 0.37/0.46 and a mean absolute error value of 4.38/4.49 using the random forest algorithm. Integrating this model into a web-based graphical user interface enables its implementation. One probable limitation arises from the small sample size (six participants) of healthy young individuals under seated conditions, thereby potentially hindering the proposed model’s ability to learn and generalize patterns effectively. Increasing the number of participants with diverse ages and medical histories can enhance the accuracy of the proposed model. Nevertheless, this innovative device successfully addresses the need for convenient, remote BP monitoring, particularly during situations like the COVID-19 pandemic, thus making it a promising tool for cardiovascular health management.

1. Introduction

Cardiovascular disease is one of the leading causes of death worldwide, accounting for 17.9 million deaths per year due to undetected risk factors such as hypertension [1]. Hypertension is a lethal risk factor for cardiovascular disease known as “the silent killer” since it exhibits no symptoms or complaints. However, hypertension may cause additional illnesses or problems such as organ damage [2]. Measuring physiological factors such as blood pressure (BP) is critical for detecting and analyzing cardiovascular disorders. When BP measures are frequently acquired and evaluated using other physiological indicators, they can identify and be used to diagnose cardiac abnormalities and cardiovascular risk factors, such as hypertension, and they can even predict cardiovascular events [3].
Traditionally, various invasive and noninvasive methods, such as catheterization, auscultation, and oscillometry, have been used to monitor BP. However, the current measurement techniques require expensive instruments with high precision and sensitivity, necessitating the expertise of specialized healthcare professionals. Furthermore, these methods involve physical contact, which increases the risk of COVID-19 exposure. In addition, the use of a cuff in a sphygmomanometer for continuous BP monitoring can be uncomfortable [4]. Therefore, a convenient, remote, and accurate BP monitoring system is required.
COVID-19, caused by the SARS-CoV-2 virus [5], has posed challenges to BP measurement as it is an infectious illness affecting the respiratory system. COVID-19 is highly contagious, spreading through droplets from the nose or mouth of an infected person while coughing, sneezing, talking, singing, or breathing. Since assessing physiological measures requires physical contact, the danger of COVID-19 exposure increases for healthcare personnel and patients, thereby aggravating health conditions and leading to difficulties [5]. Therefore, reducing COVID-19 transmission by minimizing the physical contact between healthcare professionals and patients is crucial, thus prompting the need for remote measurements or monitoring solutions.
In recent years, there has been an ongoing evolution in BP measurement, especially in the realm of wireless or cuffless methods; such techniques address the limitations of other approaches, such as inconvenience and discomfort. Recently, researchers have used photoplethysmography (PPG) signals for wireless BP measurements. Multiple studies have focused on BP measurements by analyzing the critical features extracted from PPG signals. Subsequently, they have employed machine learning or deep learning to process the information from PPG signals, thereby aiming to generate accurate and reliable estimates.
Gaurav et al., in their BP measurement based on PPG using a Samsung Galaxy Note5, employed min-max scaling for preprocessing, successfully achieving mean absolute errors (MAEs) for the predictions of systolic BP (SBP) and diastolic BP (DBP), which were measured at 6.9 mmHg and 5 mmHg, respectively. They utilized a set of unique PPG features, which were specifically interpolated and normalized pulses with pulse lengths, in their artificial intelligence (AI) model, and they employed an artificial neural network to predict BP [6].
Xie et al. proposed a real-time BP measurement method using PPG signals and feature extraction. They employed two PPG probes to extract the essential features for BP estimation, achieving a mean absolute difference (MAD) of 4.21 ± 7.59 mmHg for SBP and 3.24 ± 5.39 mmHg for DBP [7]. Radha et al. introduced a wrist-worn PPG sensor-based method, and they complemented this approach with long short-term memory (LSTM) networks for BP trend tracking and SBP dip estimation [8].
Furthermore, Slapničar et al. [9], Attivissimo et al. [10], Kachuee et al. [11], Omer et al. [12], Kachuee et al. [13], and Liu et al. [14] presented BP estimation systems utilizing PPG databases such as MIMIC II and MIMIC III. They employed various preprocessing and feature extraction techniques, including the use of first and second derivatives, maximal overlap discrete wavelet transform (MODWT), pulse transit time, and wavelet scattering transform (WST). Machine learning techniques such as artificial neural network, regression forest, secpro-temporal ResNet, XGBoost, support vector machine, LSTM, AdaBoost, and support vector regression have also been utilized.
Building on the advancements highlighted above, this paper describes a novel, cost-effective, and cuffless BP measuring system that capitalizes on precise BP estimation. Using a simple optical method, PPG can effectively capture changes in arterial volume [4]. Our method for translating PPG signals into BP readings involves extracting key features from PPG signal components [4]. We introduced a unique feature extraction algorithm to identify the amplitudes of the foot, notch, systolic, and diastolic components. By emphasizing the affordable and innovative feature extraction techniques, our work lays the groundwork for advancing cuffless, wireless, noninvasive, and real-time BP monitoring. Additionally, this system enables the remote monitoring of patients’ BP, thus facilitating measurements from a distance.

2. Materials and Methods

BP measurements provide SBPs and DBPs [15]. SBP is the arterial pressure while the heart beats, whereas the BP in the arteries when the heart is at rest between beats is referred to as the DBP [15]. PPG utilizes the variations in light absorption caused by changes in tissue characteristics during the cardiac cycle. This cycle comprises the systole and diastole parts, and the resulting changes in light absorption generate a PPG waveform that is synchronized with each heartbeat. Light is directed at the tissue to capture a PPG signal, with a portion absorbed and the remainder reflected or transmitted. The modulated light intensity is measured using a photodetector. The intensities of the reflected and scattered light reaching the photodetector are measured, and variations in the photodetector current are associated with the changes in blood volume [16]. A schematic of the PPG waveform is shown in Figure 1.
As illustrated in Figure 1, an inverse relationship exists between the recorded PPG intensity (I) and the light absorbance (A). The intensity of PPG is categorized into two sets depending on its impact on light absorption in both pulsating and non-pulsating tissue segments. The initial nonpulsatile classification entails a relatively steady direct current (DC) component generated through light absorption in nonpulsating tissues such as muscle and bone. The second type, referred to as pulsatile, comprises a pulsatile alternating current (AC) component generated by light absorption in the pulsating arterial blood, with variations corresponding to the pulse. The AC PPG is segmented into two phases: the rising edge of the pulse, known as the anacrotic phase, which is linked to the systole; and the falling edge of the pulse, termed the catacrotic phase, which is associated with the diastole. Furthermore, the dicrotic notch indicates the end of the aortic systole and the beginning of the diastole [4].
A linear solid viscoelastic model establishes a direct relationship between the AC components of PPG and BP waveforms in the frequency domain (∆V(ω) and ∆P(ω)) according to the following equation:
V ω = 1 E 1 + E 2 j ω + E 2 η j ω + E 1 E 2 E 1 + E 2 η P ( ω ) .
Here, E 1 represents the low-frequency elastic modulus (the inverse of compliance), E 1 + E 2 denotes the high-frequency elastic modulus, and η stands for the viscosity of the arterial wall. This model shows that the PPG waveform acts as a low-pass filtered version of the BP waveform at the same arterial site, indicating that PPG waveform characteristics may contain information regarding BP [4].
PPG can be used to measure BP, as shown in the equation above. Designing a PPG-based blood measurement system involves acquiring relevant equipment, such as datasets and devices, and executing experiments using the appropriate models. The objective is to establish a wireless connection between the sensor and the microcontroller to generate PPG signals from the subjects, thus enabling a more accessible and convenient approach for monitoring BP.

2.1. PPG BP Measurement Systematic Flow

BP measurement begins by connecting the ESP-WROOM-32 to the available Wi-Fi by uploading a code from the Arduino IDE to the microcontroller, which contains the IP address of the computer, Wi-Fi name, Wi-Fi password, and the code responsible for measuring the PPG signal from the finger. Suppose that the ESP-WROOM-32 and MAX30102 sensor are successfully connected to the internet. In this case, the index finger is attached to the PPG measurement device consisting of the ESP-WROOM-32 and MAX30102 sensor, and the infrared amplitude forms the PPG signal obtained from the sensor, which is sent to the cloud database using the Wi-Fi protocol available on the ESP-WROOM-32.
In this study, the data were initially stored in a cloud database on various platforms, including the ThingSpeak library, which is accessible on the Arduino IDE, Antares, which is integrated with the ESP-WROOM-32 and a MySQL database. Thirteen data points per second were sent to the MySQL Database from the MAX30102 sensor. In comparison, other database types can only receive one data point every 2 s for Antares and every 18 s for ThingSpeak. Therefore, the MySQL Database (MySQL 8.0.28, Oracle, TX, USA) used as the storage medium in this study is superior to the other two platforms.
PPG signals stored in the database were visualized on a graph displayed on the web using a local server approach. The graphical user interface (GUI) display in this study comprised several components, including the title of the webpage, a description of the webpage contents, and a graph illustrating the PPG measurements. The x-axis represented the time of the measurement, and the y-axis represented the amplitude of the infrared light intensity absorbed as the light passed through the finger. A prediction button was placed at the bottom of each page. When pressed, this button processed the PPG signal graph by preprocessing the signal and making estimations using the AI model, which was trained using the collected dataset. The processed PPG signal was recorded for the minute preceding the button press. The BP estimation results from the PPG signal appeared in the text box below the prediction button as SBPs and DBPs. The overall systematic flow of the PPG BP measurements is shown in Figure 2.

2.2. Components

The proposed BP measurement system was designed using two electronic components: a MAX30102 PPG sensor (Maxim Integrated, CA, USA) and an ESP-WROOM-32 microcontroller (Espressif Systems, Shanghai, China). The MAX30102 is a module with integrated pulse oximetry and heart rate monitoring capabilities [17]. The ESP-WROOM-32 is another module with integrated Wi-Fi, Bluetooth, and Bluetooth LE capabilities, and it is suitable for various applications ranging from low-power sensors to heavy-duty tasks such as voice encoding, music streaming, and MP3 decoding [18].
The MAX30102 and ESP-WROOM-32 devices were linked using a female-to-female jumper cable to connect the pins on the sensor and microcontroller. The connections were as follows: Vin to Vin, which function as power pins for the boards; SDA to SDA, serving as serial data pins used to send and receive data for I2C communication; SCL to SCL, which function as serial clock pins for clock signals; and GND to GND, which serve as ground pins. The device assembly is shown in Figure 3.
The MAX30102 light-emitting diode emitted an infrared light that was absorbed, reflected, and dispersed by the subject’s tissue and blood. The photodetector measured the modulated light intensity and generated a PPG signal. The sensor retrieved data from the measurement program code in the Arduino IDE, which was uploaded to the microcontroller linked to the PPG sensor.

2.3. AI Model: Random Forest Algorithm

The BP measurement AI model started by collecting data from a dataset. Subsequently, the collected data were preprocessed to eliminate undesired noise and artifacts. Once the preprocessing was complete, the proposed model proceeded with feature extraction, and the resulting data were used as the training dataset for the AI model. A comprehensive flow of the AI model is shown in Figure 4.
Machine learning empowers computers to learn without being programmed directly. It involves the use of computers with the ability to learn. There are three categories of machine learning methods: supervised, unsupervised, and reinforcement learning [19]. A supervised learning method was used to develop the BP measurement system. In the development of this system, a regression approach was chosen because the output variable represents BP, which is a continuous real value (whereas classification is used to categorize variables into different categories [19]). The supervised learning algorithm that was used with the regression category utilized in designing this system was random forest.
Random forest is a popular machine learning algorithm patented by Leo Breiman and Adele Cutler that aggregates predictions from numerous decision trees to produce a unified outcome. The random forest algorithm utilizes tree-based models by splitting a dataset into groups based on specific criteria until a predefined stopping condition is reached. At the end of each decision tree, the leaf nodes are known as leaves. Its widespread acceptance is attributed to its simplicity and adaptability, which make it suitable for classification and regression tasks [20,21].

2.3.1. Data Acquisition

The data were retrieved by holding a finger on the sensor for 1 min, thereby ensuring that the PPG graphic data obtained were PPG graphs. The data gathering subjects were six healthy individuals, between 21–24 years old, with the BP measurements falling into the categories of normal and elevated. The distributions of the SBP and DBP in the raw dataset before cleaning are shown in Figure 5. Data were collected from all participants when they were seated. Every subject was assessed using a PPG BP measuring system and an OMRON HEM-8712 BP measurement device to collect the actual measurements. The PPG signal data were then retrieved by the sensor and sent to the server using the Wi-Fi protocol. Subsequently, the data received were stored in a MySQL Database. The database collected sensor data, which were then evaluated and displayed.
The data collection process resulted in a unified dataset that included comprehensive measurement data from six subjects, totaling 114 BP measurements and comprising 2856 rows of extracted features representing the unique pairs of systolic and diastolic values from each measurement. Each row corresponded to a signal representing various physiological parameters, including foot, notch, diastolic, and systolic values. Note that each measurement may have a different number of rows of extracted features. This dataset provides a detailed perspective on these physiological parameters, which require further analysis.

2.3.2. Preprocessing

Preprocessing is an important step that must be completed before data processing. This step aims to eliminate noise or artifacts from the data. Motion artifacts are the most common issues observed when assessing physiological data using PPG sensors. Baseline wander correction was employed in this phase to address the baseline drift that may have been caused by motion artifacts [22]. The result of applying the baseline wander correction to our signal can be observed in Figure 6.

2.3.3. Feature Extraction

The preprocessed data underwent a 2-step process: peak detection and feature extraction. Feature extraction is the latest method for converting PPG into BP. This method measured the PPG signals based on their morphological features and the characteristics of the PPG signal [4]. BP measurement from the PPG signal can be achieved by extracting four points from the PPG signal, as shown in Figure 7, which indicate the beginning of the systolic phase, the systolic peak representing the maximum peak of the systolic phase, the dicrotic notch marking the end of the systole and the beginning of the diastole, and the diastolic peak representing the maximum peak of the diastolic phase.
This study utilized peak detection to identify the foot, notch, diastolic, and systolic peaks in the PPG signal, and this was conducted because these peaks contain information that is crucial for BP measurement. These peaks were categorized into two types: maximum peaks and minimum peaks. The maximum peak contains the systolic and diastolic peak values, whereas the minimum peak encompasses the foot and dicrotic notch values of the PPG signal. Hence, the peak detection process generates two lists: the maximum and minimum peaks.
Our proposed feature extraction method separates the foot and notch peaks from the maximum peak values which can be seen as purple circles on Figure 8, as well as the systolic and diastolic notches from the minimum peak which can be seen as red crosses on Figure 8. Features that were combined in both lists had to be separated. Therefore, feature extraction was performed in four steps. The first step was to input the amplitude values of the foot and notch peaks into separate lists, namely the foot list for the amplitude of the foot peak and the notch list for the amplitude of the notch peak. Next, the index of the notch peak was stored in a new list by determining its index from the baseline variable (a variable containing y-axis values in the PPG signal or amplitude values). This notch list index was used to determine the diastolic index from the minimum peak list. The last step was to extract, while equalizing the total sum of all data in each feature, the systolic index from the minimum peaks and the foot peak index from the peaks.
The results of the feature extraction comprised four columns: foot, notch, diastolic, and systolic. These columns were in addition to two other columns representing the actual BP measurements, namely SBP and DBP. The extraction results were saved in a comma-separated values (.csv) file format. Each measurement may have a different number of lines of extracted features depending on the number of PPG signals present in each measurement.
Furthermore, owing to the difference in the number of extracted features in each BP measurement, a reduction in the number of feature extraction rows was performed to achieve balance in the data, specifically to 20 rows as the highest average number of extracted features was found in 20 rows. Thus, a total of 1540 feature extraction rows were obtained, representing 77 BP measurements, each representing unique pairs of systolic and diastolic values from each measurement. The dataset was then divided into 90% training data, which consisted of 1400 rows representing 70 BP measurements, and 10% testing data, which consisted of 140 rows representing 7 BP measurements. The training and testing datasets were divided by stratified sampling. This method involves a probability sampling of the elements from the target population, where the elements are grouped into distinct strata. Within each stratum, the elements shared characteristics that were important for the survey [23]. The results of the feature extraction process are presented in Figure 8.

2.4. Model Evaluation

The training dataset was then used to train the AI model, thereby resulting in a model, which was saved as a pickle (.pkl) file. The proposed model was subsequently assessed to determine which of the models provided the best predictions. The proposed model was evaluated using various metrics, including the MAE, which measures the average difference between the predicted and actual values. A low MAE value indicates high accuracy. In the below formula, n represents the number of samples, y i is the actual sphygmomanometer value for sample, and y ^ i is the predicted data for sample i. The MAE was calculated using the following formula:
MAE = 1 n i = 1 n y i y ^ i ,
The coefficient of determination (R2) measures how well the proposed model predicts the measurement value, with an evaluation value ranging from 0 to 1; if the R2 is close to 1, the model is very accurate. The evaluation results of the tested models were compared, and the model with the highest accuracy was obtained. In this context, y i represents the actual sphygmomanometer value for sample i, y ^ i represents the predicted data for sample i, and y ¯ represents the average of all response values. This equation takes the following form:
R 2 = 1 y i y ^ i 2 y i y ¯ 2

2.5. Model Deployment

The model with the highest accuracy can reliably predict the physiological parameter readings. Subsequently, the best model was deployed in the GUI development stage. This stage was conducted so that users can utilize the product and find it easier to apply the research results. The GUI displayed the PPG signal graphs and conducted peak identification and feature extraction. The prediction model to estimate the BP values was published locally on the web, as shown in Figure 9.

3. Results

In this section, the performance of the random forest algorithm is discussed. We demonstrate the performance of the random forest algorithm using various evaluation metrics. We assess its effectiveness using metrics such as the MAE and R2. Additionally, we evaluate its performance using a testing dataset to ensure the robustness and generalization of the proposed model.

3.1. Results of Random Forest Algorithm

3.1.1. Hyperparameters

The entire dataset was trained using the random forest algorithm. The hyperparameters used in this model were n_estimators, max_depth, min_samples_split, min_samples_leaf, and max_features. N_estimators represents the number of trees in the forest, with a default value of 100; max_depth represents the maximum depth of the tree; min_samples_split is the minimum number of samples required to split an internal node with a default value of two; min_samples_leaf is the minimum number of samples required to be at a leaf node with a default value of one, which may have the effect of smoothing the model (especially in regression); and max_features is the number of features to consider when looking for the best split [24].
The values used for these parameters were selected using a grid search, which is a method for determining the best hyperparameter values for a model and ensuring optimal parameter selection [25]. The ranges of the values and best results for each hyperparameter are listed in Table 1.

3.1.2. Model Training Results

The obtained training dataset was trained using the random forest algorithm, with a training to validation dataset ratio of 90:10. This ratio yielded the best results compared to other ratios, namely 70:30 and 80:20. In this algorithm, a multi-output regressor function that produced more than one output was used because the desired outcome was BP, which consists of two values: SBP and DBP. Table 2 presents the evaluation results, which consist of the MAE and R2 values of the training model. The line plot comparison of the validation the BP values with the predicted BP values from the training model can be seen in Figure 10.

3.1.3. Model Testing Results

After the proposed model was trained using the random forest algorithm, it was tested using the previously created testing dataset. The results of the model testing using the testing dataset are listed in Table 3.
The results reported in Table 2 and Table 3 demonstrate the similarities in the MAE for the SBP when using the training and testing datasets, with values of 4.38 and 4.43, respectively. For the DBP, the MAE was 4.49 for the training dataset and 3.71 for the testing dataset.
However, the R2 for the SBP for the training and testing datasets were 0.37 and −1.08, respectively. Similarly, for the DBP, the R2 was 0.46 for the training dataset and −2.13 for the testing dataset.
In the provided results, the MAE values for both SBP and DBP are relatively low, thus indicating that the predictive models are accurate in estimating blood pressure values. However, the R2 values for both SBP and DBP are lower, especially for the testing dataset, thereby indicating that the models do not explain a large proportion of the variance in the blood pressure values. The negative R2 values for the testing dataset suggest that the models perform worse than a simple horizontal line, thus indicating poor predictive accuracy. Therefore, while the MAE values suggest high accuracy, the R2 values indicate that the models may not adequately capture the variability in the blood pressure measurements, particularly in the testing dataset.

4. Discussion

We present a cuffless BP measurement system that estimates SBP and DBP values based on PPG signal features, including the amplitudes of the foot, notch, and the systolic and diastolic components. Based on the model results from the previous section, the random forest model achieved an MAE of 4.38 for the SBP and 4.49 for the DBP. However, notable discrepancies existed in the R2s between the training and testing datasets. This discrepancy may be attributed to the very small amount of measurement data, which could limit the proposed model’s ability to learn and generalize patterns effectively, causing it to struggle to capture variability in the data.
Considering the results of the random forest model, further analysis was conducted and is described below. This includes a comparison with other algorithms and studies to assess the performance of the cuffless BP measurement system comprehensively.

4.1. Comparison with Another Algorithm

In addition, we employed the XGBoost algorithm to develop a BP prediction model based on the PPG signal extraction findings. XGBoost is an AI algorithm derived from ensemble learning techniques. Ensemble learning, also known as committee-based learning or the learning of multiple classifier systems, trains multiple learners to solve the same problem [26]. The working principle of ensemble learning is similar to that of decision making in the real world, wherein it seeks different opinions from different experts to help determine the best choice, thus increasing confidence in the decision made [27]. The most commonly used algorithms for ensemble learning are bagging, stacking, and boosting. The boosting method, which is capable of converting weak learners into strong ones, was employed in the design of this system [27]. The key to boosting the ensemble is to rectify errors in the predictions. Models are added to the ensemble sequentially, with the second model attempting to correct the predictions of the first model, and the third model improving upon the second [28]. In this method, the training dataset remains unchanged; only the learning algorithm is modified to focus on the details of the data rows based on what has been correctly or incorrectly predicted by the previous ensemble members.
The hyperparameters used in this algorithm were the learning_rate, gamma, max_depth, n_estimators, alpha, and lambda. Learning_rate was used to prevent overfitting. This was chosen because, after every boosting step, it directly obtains the weights of new features; learning_rate shrinks the feature weights to make the boosting process more conservative. The default learning_rate was 0.3. Gamma was used to minimize the loss reduction and further partition the leaf node of the tree. Max_depth is the maximum depth of a tree, and its default value is six. N_estimators represents the number of trees in the XGBoost model, with a default value of 100. Alpha and lambda are hyperparameters for regularization.
Regularization is employed to reduce errors in learning algorithms by modifying their behaviors. L1 regularization (lasso) and L2 regularization (ridge) are common techniques used for this purpose. Lasso regression methods are often utilized to prioritize efficiency in fields that deal with large datasets. However, they encounter challenges with highly correlated predictors, often arbitrarily selecting one while disregarding the others. Moreover, the lasso may encounter issues when the predictors are identical. The penalty associated with the lasso is that numerous coefficients are expected to be close to zero, with only a small subset being larger and non-zero. Ridge regression is ideal for datasets with many predictors, all of which have non-zero coefficients drawn from a normal distribution. This performs well when predictors have small effects, and it prevents high variance in models with correlated variables by equally shrinking their coefficients toward zero [29]. The L1 parameter in the XGBoost method is alpha, and the L2 parameter is lambda [30,31]. A grid search was used to identify hyperparameter values. The hyperparameter values are listed in Table 4.
Based on the results of the hyperparameter values, it can be seen that the configuration indicated a conservative approach to prevent overfitting, as indicated by the low learning_rate and max_depth values. In addition, it can also be seen that both alpha and lambda had values based on the grid search, which means that elastic net (ENET) regularization was implemented. The ENET serves as an extension of the lasso, thereby offering robustness against the extreme correlations among predictors. This addresses the instability encountered in lasso solutions when the predictors are highly correlated by introducing a combination of L1 (lasso) and L2 (ridge) penalties. Proposed for analyzing high-dimensional data, ENET provides a solution to the challenges posed by highly correlated predictors [29]. The training evaluation results for XGBoost and random forest are compared in Table 5, with the superior value between XGBoost and random forest highlighted in bold.
The training evaluation results indicate that the random forest approach achieved a lower MAE and a higher R2 for both systolic and diastolic BPs. This may be attributed to random forest being easier to tune than XGBoost as it has only two main parameters: the number of features and the number of decision trees. Consequently, random forest is less prone to overfitting than XGBoost, which is more sensitive to overfitting and more difficult to tune.

4.2. Comparison with Other Studies

We also compared our study with some related studies on BP estimation using PPG, as shown in Table 6. However, a comparison with other studies poses challenges owing to differences in data acquisition methods, evaluation metrics, and datasets. In this study, the selection and identification of other studies for comparison were based on the types of data acquisition methods, features employed, and algorithms used. Specifically, the criterion for comparison was the training of the machine learning algorithms using the features extracted from the PPG signal, which constituted the research methodology.
A direct comparison between our method and the approaches of Xie et al. [7] and Radha et al. [8] is not feasible, as shown in Table 6. This is because of the use of different measurement systems, specifically the MAD, standard deviation, and root mean square error (RMSE). As mentioned previously, comparing our findings with those of other studies is challenging. Table 6 illustrates this difficulty as the studies utilized different datasets and various machine learning algorithms. For example, Attivissimo et al. [10] achieved a smaller MAE for both SBP and DBP. This disparity could be attributed to differences in the datasets and the inclusion of demographic features that were potentially influenced by the sensors used to measure the PPG signals. In our study, optimal results were obtained using random forest instead of XGBoost. This approach appears to outperform other AI algorithms, as indicated in Table 6.

4.3. Sources of Error and Limitations, Clinical Significance, and Future Advancements

The findings of this study offer promising indications for estimating BP using PPG signals. However, several limitations should be addressed in future studies to enhance the effectiveness of this model. One probable limitation may arise from the small sample size (six participants) with a specific waveform shape, all of whom were healthy young individuals under seated conditions. This limitation may hinder the ability of the proposed model to learn and generalize patterns effectively, thereby impeding its capacity to capture data variability. Additionally, the proposed model might only be able to predict within the range of BPs that were trained in the model, and it was only trained with one condition, which was seated. The struggle to capture variability in the BP data can have far-reaching consequences in clinical practice, thus affecting diagnostic accuracy, treatment decision making, risk stratification, patient monitoring, and overall trust in predictive models. A possible solution to this limitation is to increase the number of participants with a broader age range and use different medical histories to enhance the proposed model’s accuracy.
Despite these limitations, this cost-effective, cuffless, wireless, and noninvasive system is capable of real-time monitoring and can be set for long-term BP changes at 1 min intervals, thereby allowing the system to measure the patient’s BP every minute. As this system is also wireless, it can perform a remote monitoring of patients’ BP, thus enabling measurements from a distance. Moreover, calibration is not required to implement this system in the future.

5. Conclusions

The development of a cuffless, noninvasive, wireless, and real-time BP measurement system was successful. The system comprises a MAX30102 PPG sensor and an ESP-WROOM-32 microcontroller integrated with a GUI. The random forest algorithm was selected as the optimal model with a 90:10 training:validation split, which was achieved by utilizing a preprocessed dataset that underwent baseline wander correction and data filtering. The performance evaluation yielded an MAE value of 4.38 for the SBP and 4.49 for the DBP, with an R2 of 0.37 for the SBP and 0.46 for the DBP. One limitation was the small sample size (six participants) of a specific waveform form. This included all healthy young people in the seated position configuration, which may impair the model’s capacity to learn and generalize patterns successfully. This limitation may affect diagnostic accuracy, treatment decision making, risk stratification, patient monitoring, and overall trust in predictive models. Considering these results, our future work will focus on incorporating additional data by increasing the number of participants of various ages and medical histories, which could improve the proposed model’s accuracy to enable the measurement device to capture all possible combinations of systolic and diastolic BP values.

Author Contributions

Conceptualization, M.A.T. and M.R.; methodology, M.A.T. and N.E.A.; software, M.A.T. and N.E.A.; validation, M.A.T., D.S. and M.R.; formal analysis, M.A.T.; investigation, D.S.; resources, M.R.; data curation, M.A.T. and N.E.A.; writing—original draft preparation, M.A.T.; writing—review and editing, D.S. and M.R.; visualization, M.A.T.; supervision, D.S. and M.R.; project administration, M.R.; funding acquisition, M.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Hibah Publikasi Terindeks Internasional (PUTI) Pascasarjana 2023–2024 from Universitas Indonesia, grant number NKB-265/UN2.RST/HKP.05.00/2023.

Institutional Review Board Statement

This project was ethically approved by the Faculty of Nursing Ethics Committee of Universitas Indonesia with the Institutional Review Board number KET-090/UN2.F12.D1.2.1/PPM.00.02/2023. The data were collected from each participant with informed consent. Data were anonymized, with no inclusion of personal details such as phone numbers or email addresses. All participants were given the option to discontinue their involvement at any point during this study.

Data Availability Statement

The datasets presented in this article are not readily available because the data are part of an ongoing study. Requests to access the datasets should be directed to the corresponding authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Cardiovascular Diseases. Available online: https://www.who.int/health-topics/cardiovascular-diseases#tab=tab_2 (accessed on 18 October 2022).
  2. Fakta dan Angka Hipertensi. Direktorat P2PTM. Available online: http://p2ptm.kemkes.go.id/kegiatan-p2ptm/subdit-penyakit-jantung-dan-pembuluh-darah/fakta-dan-angka-hipertensi (accessed on 26 September 2022).
  3. Liu, Z.; Zhou, C.; Wang, H.; He, Y. Blood Pressure Monitoring Techniques in the Natural State of Multi-Scenes: A Review. Front. Med. 2022, 9, 851172. [Google Scholar] [CrossRef] [PubMed]
  4. Mukkamala, R.; Hahn, J.-O.; Chandrasekhar, A. Photoplethysmography in Noninvasive Blood Pressure Monitoring; Elsevier: Amsterdam, The Netherlands, 2022; pp. 359–400. [Google Scholar] [CrossRef]
  5. World Health Organization. Coronavirus Disease (COVID-19); World Health Organization: Geneva, Switzerland, 2020; Available online: https://www.who.int/health-topics/coronavirus#tab=tab_1 (accessed on 18 October 2022).
  6. Gaurav, A.; Maheedhar, M.; Tiwari, V.N.; Narayanan, R. Cuff-Less PPG Based Continuous Blood Pressure Monitoring—A Smartphone Based Approach. In Proceedings of the 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Orlando, FL, USA, 16–20 August 2016. [Google Scholar] [CrossRef]
  7. Xie, Q.; Wang, G.; Peng, Z.; Lian, Y. Machine Learning Methods for Real-Time Blood Pressure Measurement Based on Photoplethysmography. In Proceedings of the 2018 IEEE 23rd International Conference on Digital Signal Processing (DSP), Shanghai, China, 19–21 November 2018. [Google Scholar] [CrossRef]
  8. Radha, M.; de Groot, K.; Rajani, N.; Wong, C.C.P.; Kobold, N.; Vos, V.; Fonseca, P.; Mastellos, N.; Wark, P.A.; Velthoven, N.; et al. Estimating Blood Pressure Trends and the Nocturnal Dip from Photoplethysmography. Physiol. Meas. 2019, 40, 025006. [Google Scholar] [CrossRef] [PubMed]
  9. Slapničar, G.; Mlakar, N.; Luštrek, M. Blood Pressure Estimation from Photoplethysmogram Using a Spectro-Temporal Deep Neural Network. Sensors 2019, 19, 3420. [Google Scholar] [CrossRef] [PubMed]
  10. Attivissimo, F.; D’Alessandro, V.I.; De Palma, L.; Lanzolla, A.M.L.; Di Nisio, A. Non-Invasive Blood Pressure Sensing via Machine Learning. Sensors 2023, 23, 8342. [Google Scholar] [CrossRef] [PubMed]
  11. Kachuee, M.; Kiani, M.M.; Mohammadzade, H.; Shabany, M. Cuff-Less High-Accuracy Calibration-Free Blood Pressure Estimation Using Pulse Transit Time. In Proceedings of the 2015 IEEE International Symposium on Circuits and Systems (ISCAS) 2015, Lisbon, Portugal, 24–27 May 2015. [Google Scholar] [CrossRef]
  12. Omer, O.A.; Salah, M.; Hassan, A.M.; Abdel-Nasser, M.; Sugita, N.; Saijo, Y. Blood Pressure Estimation from Photoplythmography Using Hybrid Scattering–LSTM Networks. BioMedInformatics 2024, 4, 139–157. [Google Scholar] [CrossRef]
  13. Kachuee, M.; Kiani, M.M.; Mohammadzade, H.; Shabany, M. Cuffless Blood Pressure Estimation Algorithms for Continuous Health-Care Monitoring. IEEE Trans. Biomed. Eng. 2017, 64, 859–869. [Google Scholar] [CrossRef] [PubMed]
  14. Liu, M.; Po, L.-M.; Fu, H. Cuffless Blood Pressure Estimation Based on Photoplethysmography Signal and Its Second Derivative. Int. J. Comput. Theory Eng. 2017, 9, 202–206. [Google Scholar] [CrossRef]
  15. CDC. About High Blood Pressure (Hypertension); Centers for Disease Control and Prevention: Atlanta, GA, USA, 2021. Available online: https://www.cdc.gov/bloodpressure/about.htm (accessed on 5 November 2022).
  16. Kyriacou, P.A.; Chatterjee, S. The origin of photoplethysmography. In Photoplethysmography; Academic Press: Cambridge, MA, USA, 2022; pp. 17–43. [Google Scholar] [CrossRef]
  17. MAX30102. Available online: https://www.analog.com/media/en/technical-documentation/data-sheets/max30102.pdf (accessed on 11 December 2022).
  18. ESP32-WROOM-32 Datasheet. Available online: https://www.espressif.com/sites/default/files/documentation/esp32-wroom-32_datasheet_en.pdf (accessed on 11 December 2022).
  19. Sah, S. Machine Learning: A Review of Learning Types. Preprints 2020. [Google Scholar] [CrossRef]
  20. Schonlau, M.; Zou, R.Y. The Random Forest Algorithm for Statistical Learning. Stata J. Promot. Commun. Stat. Stata 2020, 20, 3–29. [Google Scholar] [CrossRef]
  21. What Is Random Forest?|IBM. Available online: https://www.ibm.com/id-en/topics/random-forest#:~:text=Random%20forest%20is%20a%20commonly (accessed on 1 March 2024).
  22. BaselineRemoval: Perform Baseline Removal, Baseline Correction and Baseline Substraction for Raman Spectra Using Modpoly, ImodPoly and Zhang Fit. Returns Baseline-Subtracted Spectrum. PyPI. Available online: https://pypi.org/project/BaselineRemoval/ (accessed on 26 December 2022).
  23. Parsons, V.L. Stratified Sampling. In Wiley StatsRef: Statistics Reference Online; Wiley: Hoboken, NJ, USA, 2017; Volume 1, pp. 1–11. [Google Scholar] [CrossRef]
  24. Scikit-Learn. Sklearn.Ensemble.RandomForestClassifier—Scikit-Learn 0.20.3 Documentation. Scikit-Learn.org. Available online: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html (accessed on 26 February 2024).
  25. Scikit-Learn. Sklearn.Model_Selection.GridSearchCV—Scikit-Learn 0.22 Documentation. Scikit-Learn.org. Available online: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html (accessed on 1 March 2024).
  26. Zhou, Z.-H. Ensemble Methods: Foundations and Algorithms; CRC Press, Cop: Boca Raton, FL, USA; London, UK; New York, NY, USA, 2012. [Google Scholar]
  27. Zhang, C.; Ma, Y. (Eds.) Ensemble Machine Learning; Springer: Boston, MA, USA, 2012. [Google Scholar] [CrossRef]
  28. Brownlee, J.A. Gentle Introduction to Ensemble Learning Algorithms. Machine Learning Mastery. Available online: https://machinelearningmastery.com/tour-of-ensemble-learning-algorithms/ (accessed on 2 December 2022).
  29. Ogutu, J.O.; Schulz-Streeck, T.; Piepho, H.-P. Genomic Selection Using Regularized Linear Regression Models: Ridge Regression, Lasso, Elastic Net and Their Extensions. In BMC Proceedings; BioMed Central: London, UK, 2012; Volume 6. [Google Scholar] [CrossRef]
  30. XGBoost Parameters—Xgboost 1.5.2 Documentation. XGBoost.Readthedocs.io. Available online: https://xgboost.readthedocs.io/en/stable/parameter.html (accessed on 3 March 2024).
  31. Brownlee, J. How to Tune the Number and Size of Decision Trees with XGBoost in Python. Machine Learning Mastery. Available online: https://machinelearningmastery.com/tune-number-size-decision-trees-xgboost-python/ (accessed on 3 March 2024).
Figure 1. Schematic of the PPG waveform [16].
Figure 1. Schematic of the PPG waveform [16].
Computers 13 00125 g001
Figure 2. Overall system flow.
Figure 2. Overall system flow.
Computers 13 00125 g002
Figure 3. Device assembly.
Figure 3. Device assembly.
Computers 13 00125 g003
Figure 4. Block diagram of AI model development.
Figure 4. Block diagram of AI model development.
Computers 13 00125 g004
Figure 5. Distribution of the SBP and DBP in the dataset before cleaning.
Figure 5. Distribution of the SBP and DBP in the dataset before cleaning.
Computers 13 00125 g005
Figure 6. The results following the baseline wander correction.
Figure 6. The results following the baseline wander correction.
Computers 13 00125 g006
Figure 7. PPG signal.
Figure 7. PPG signal.
Computers 13 00125 g007
Figure 8. Results of the feature extraction.
Figure 8. Results of the feature extraction.
Computers 13 00125 g008
Figure 9. Results of the GUI.
Figure 9. Results of the GUI.
Computers 13 00125 g009
Figure 10. Comparison of the validation with the predicted value of the BP from the training model.
Figure 10. Comparison of the validation with the predicted value of the BP from the training model.
Computers 13 00125 g010
Table 1. The ranges and results of the random forest hyperparameters.
Table 1. The ranges and results of the random forest hyperparameters.
HyperparameterRange 1Best Hyperparameter Value
N_estimators[5, 100, 5]60
Max_depth[1, 20, 1]15
Min_samples_split[2, 10, 1]6
Min_samples_leaf[1, 10, 1]1
Max_features[1, 4, 1]1
1 The notations [x, y, and z] signify the initial, final, and value increments, respectively.
Table 2. Training evaluation results for the SBP and DBP values using the random forest algorithm.
Table 2. Training evaluation results for the SBP and DBP values using the random forest algorithm.
BPMAER2
SBP4.380.37
DBP4.490.46
Table 3. The testing evaluation results of the random forest model using the testing dataset.
Table 3. The testing evaluation results of the random forest model using the testing dataset.
BPActual
(mmHg)
Prediction
(mmHg)
Absolute ErrorMAER2
SBP9510384.43−1.08
1041040
961004
9510510
1031041
971047
1041031
DBP576363.71−2.13
63652
61610
59667
63641
55649
61621
Table 4. Results of the XGBoost hyperparameters.
Table 4. Results of the XGBoost hyperparameters.
HyperparameterBest Hyperparameter Value
Learning_rate0.1
Max_depth6
N_estimators40
Gamma1
Alpha3
Lambda1
Table 5. Training evaluation results for the SBP and DBP values using the random forest and XGBoost algorithm.
Table 5. Training evaluation results for the SBP and DBP values using the random forest and XGBoost algorithm.
AlgorithmBlood PressureMAER2
Random ForestSBP4.380.37
DBP4.490.46
XGBoostSBP4.440.35
DBP4.690.36
Table 6. Comparison with other studies.
Table 6. Comparison with other studies.
AuthorData AcquisitionMethodMAE SBP/DBP
(mmHg)
Other Metrics
SBP/DBP
Gaurav et al. [6]Samsung Galaxy Note5Min-max scaling and artificial neural network6.9/5-
Xie et al. [7]Two PPG probesExtracted eight features (the time and area under selected points such as pulse onset, systolic peak, dicrotic notch, and diastolic peak) and random forest regression-MAD & STD =
4.21 ± 7.59/3.24 ± 5.39
Slapničar et al. [9]MIMIC III databaseFirst and second derivative signal and the spectro-temporal deep neural network9.43/6.88-
Attivissimo et al. [10]MIMIC IIIMaximal overlap discrete wavelet transform (MODWT) and XGBoost3.12/2.11RMSE = 5.67/3.95
R = 0.95/0.91
ME = 0.01/0.02
Kachuee et al. [11]MIMIC II (1000 subjects)Pulse transit time and support vector machine method12.38/6.34-
Omer et al. [12]MIMIC IIWavelet scattering transform (WST) and LSTM13.3852/9.5390RMSE = 15.4742/11.2034
Kachuee et al. [13]MIMIC II (1000 subjects)Feature extraction and adaBoost8.21/4.31-
Radha et al. [8]Wrist-worn PPG sensor (106 subjects)LSTM-RMSE = 8.22 ± 1.49/6.55 ± 1.39
Liu et al. [14]MIMIC IISecond derivative PPG features and support vector regression8.54/4.34-
Our studyPPG sensor MAX30102Baseline wander correction, feature extraction, and random forest4.38/4.49R2 = 0.37/0.46
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Tjitra, M.A.; Anju, N.E.; Sudiana, D.; Rizkinia, M. A Wireless Noninvasive Blood Pressure Measurement System Using MAX30102 and Random Forest Regressor for Photoplethysmography Signals. Computers 2024, 13, 125. https://doi.org/10.3390/computers13050125

AMA Style

Tjitra MA, Anju NE, Sudiana D, Rizkinia M. A Wireless Noninvasive Blood Pressure Measurement System Using MAX30102 and Random Forest Regressor for Photoplethysmography Signals. Computers. 2024; 13(5):125. https://doi.org/10.3390/computers13050125

Chicago/Turabian Style

Tjitra, Michelle Annice, Nagisa Eremia Anju, Dodi Sudiana, and Mia Rizkinia. 2024. "A Wireless Noninvasive Blood Pressure Measurement System Using MAX30102 and Random Forest Regressor for Photoplethysmography Signals" Computers 13, no. 5: 125. https://doi.org/10.3390/computers13050125

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop