1. Introduction
Wearable devices have garnered an increasingly significant attention owing to their various applications [
1,
2]. Among them, smart watches provide healthcare services, such as measuring body composition and heart rate, by employing electrocardiography (ECG), bioelectrical impedance analysis (BIA), and photoplethysmography (PPG) sensors [
3,
4]. In contrast, true wireless stereo (TWS) does not provide healthcare service yet although heart rate and oxygen saturation is feasible to measure at the arterial blood of the outer ear. To provide the healthcare services with TWS, there is a need to mount a new biosensor but it is difficult as TWS allows only small areas. Therefore, there is a need to provide multiple functions simultaneously with one sensor.
PPG sensors are widely employed in wearable devices [
5,
6]. PPG sensors emit infrared rays to the skin and measure the amount of blood flow by determining the amount of rays absorbed in red blood cells. Because the PPG data are affected by heart rate due to this operation method, PPG sensors provide healthcare services such as heart rate measurement, breathing rate estimation, atrial fibrillation, and blood pressure measurement [
7,
8,
9,
10]. In addition, since heart rate has specific patterns, applying the deep learning for pattern recognition on PPG signal was researched [
11,
12]. The atrial fibrillation detection with the hybrid model of convolutional neural network (CNN) and recurrent neural network (RNN) achieved accuracy of over 99% [
13]. The DeepCNAP model for heart rate measurement using PPG signals was presented [
14]. A deep learning model for robust PPG wave detection was proposed in [
5]. The best performing model was a CNN-long short-term memory (LSTM) algorithm with a PPG synchro-squeezed Fourier transform (SSFT) and the accuracy, precision, and recall were 0.894, 0.923, and 0.914, respectively.
The PPG data are also influenced by the subject’s skin characteristic and motion artifact; these factors make raw data produce noise [
6,
15,
16]. In order to reduce the noise of signal, a motion reduction technique for respiratory rate was proposed [
17]. The proposed technique reduced motion interference by removing similar spectra with an accelerometer sensor and an adaptive filter. By applying the technique to the raw PPG data, a clear spectrum was produced. Similarly, extracting heart rate and respiration rate values from raw PPG data with a three-axis accelerometer for motion reference was studied [
18]. This study proposed an adaptive notch-filtration architecture, which comprises the adaptive moving average filter, the adaptive notch filter, and the extraction for physiological parameters. With the proposed filter, the filtered PPG signals for the calculation of the heart rate and respiratory rate were similar to measurements from commercial devices for the IEEE-SPC dataset and the in-house dataset. For the noise reduction, the enhanced empirical wavelet transform algorithm was proposed [
19]. This algorithm employs a fast Fourier transform and the order statistical filter. Compared with other conventional methods, the proposed method shows the best accuracy.
On the one hand, in addition to studies that employ the raw PPG data [
7], studies that reduce the instability of the raw PPG data by applying several filters have been conducted [
20,
21,
22,
23,
24,
25,
26]. Multi-mode particle filtering methods that demonstrate the performance improvement of an average error of less than 2 BPM compared to single-mode particle filtering and advanced methods with approximately 47 PPG recordings were introduced [
23]. Two cutting-edge pulse detection algorithms on actual raw PPG data were studied [
24]. This work demonstrated the effect of preprocessing pulse peak positions and the performance of peak detection algorithm was analyzed on 21,806 pulse data [
25]. Meanwhile, a study on the data compression method with stochastic modeling for power efficiency was performed [
26]. The method that models the single cardiac period of PPG waveform applying two sets of Gaussian functions on the forward and backward wave of PPG pulse outperformed conventional delta-modulation-based methods.
In this paper, we propose a distance estimation algorithm between the user and the sensor based on a waveform adjustment (WA) filter for the PPG data of TWS. By mounting the PPG sensor on TWS, various healthcare services were implemented with one sensor. However, because of the size limitation of the TWS, the existing sensor has to be removed when the PPG sensor is built into the TWS. Accordingly, the PPG sensor is responsible for the function that was implemented with the removed sensor.
Figure 1 shows the working principle of the PPG sensor mounted on TWS. As the PPG data output the amplitude value, the data are different according to the distances between user and sensor. The existing distance estimation function of the wearable device is replaced by utilizing this characteristic of the PPG data. We designed a PPG monitoring testbed for collecting analog PPG signals and a signal processing logic for distance estimation. Owing to the instability of the PPG signal, the distance estimation logic includes the filter for noise reduction. We developed our PPG dataset according to the three criteria for distance estimation. The dataset was trained on various machine learning models, and we analyzed the performance of each model according to the inference result. The highest accuracy was 92.5% with the proposed model when the signal length was 15.
The contributions of this paper are as follows. To the best of our knowledge, it is the first work that proposes distance estimation with a PPG sensor. In order to provide healthcare services by mounting PPG sensors on the TWS, we designed a distance estimation algorithm to increase the area efficiency of the TWS by replacing a sensor for existing distance estimation to the PPG sensor. Digital filter IP and an analog-to-digital converter (ADC) controller were designed with Verilog HDL and implemented on field-programmable gate array (FPGA).
The remainder of this paper is organized as follows. In
Section 2, we introduce the system architecture of the distance estimation for the PPG sensor, which includes the function for waveform adjustment and MobileNet, which is a lightweight deep-learning model.
Section 3 presents the flow of the proposed algorithm.
Section 4 explains the implementation of the proposed algorithm and analyzes the results. Finally, a discussion is provided in
Section 5.
4. Experiment
Figure 7 illustrates the experimental environment. Xilinx’s FPGA development board called Artix-7 was utilized for EISC processor and digital filter IP. We collected the PPG dataset with the monitoring testbed. The total dataset comprised 144,000 sampling points, and we collected 600 s twice from one person, and another 600 s from six people for each criterion of the distance estimation. To reduce the similarity between the datasets, 1500 sampling points from 500 to 1999 out of 6000 sampling points were set as a training dataset, and 3500 sampling points from 2000 to 5499 were set as an inference dataset. Accordingly, the total number of sampling points of the training and inference datasets are 36,000 and 84,000, respectively.
In order to verify the WA filter for the PPG signal, we designed the Kalman filter, short-time Fourier transform (STFT), modified average filter, bandpass filter (BPF)+single moving average (SMA) filter for performance analysis. The Kalman filter is a recursive algorithm that estimates unknown variables with previous and present data via noise reduction [
30]. When the motion and measurement models are linear in the Gaussian distribution, this filter is available. The Kalman filter process comprises prediction and update steps. In the prediction step, the prediction vector is calculated using the motion model and the previous state vector. In the update step, the Kalman gain is updated by the difference of the prediction and measurement vectors, and is utilized to determine the state vector. Using this recursive process, the Kalman filter represents the state vector as the denoised data.
STFT is a filter for the audio signal process [
31]. We expect the STFT to be appropriate for the PPG signal because the distribution of frequency for time was computed. STFT performs the Fourier transform while moving a window with a specific length in the signal. In this case, the Fourier transform is calculated several times for a specific time, and the frequency spectrum at that specific time is obtained by averaging the calculated timed. The most influential variable is the window length. It is important to set the proper window size because the resolution of the frequency domain decreases if the window length is short, and the resolution of the time domain decreases if the window length is long. Hence, we determined the window length to number 60 of the sampling points.
The modified average filter is a filter for noise reduction. Because amplitude fluctuation, due to signal bouncing, is fatal in estimating the distance between the user and sensor, we tried to correct bounced PPG signals based on the average. Accordingly, among the sampling points of the PPG signal, values of 2 times more and less than 1/2 of the average are regarded as incorrect data and replaced with averages. However, as this filter discards the relationship between the data before and after, the performance analysis on the accuracy demonstrates the importance of the relationship in the proposed algorithm. The modified average filter is employed in verifying that the distance estimation is feasible only with noise reduction.
The BPF+SMA filter is a hybrid filter for noise reduction. BPF discards noise by passing only a specific frequency band. The SMA filter utilizes a mean of previous data. Because the number of previous data increases, SMA becomes less sensitive to changes in the data and more robust to noise. In contrast, SMA becomes more sensitive to changes and less robust to noise when the number of previous data decreases. Therefore, the BPF+SMA filter is more effective in denoising than the single filter.
By analyzing the raw PPG signal according to the distances, the amplitude becomes smaller and the noise increases as the distance between the user and sensor increases. If the noise is significant, the amplitude of the near-distance data becomes similar to the amplitude of the far-distance data. Therefore, it is important to minimize this effect when estimating the distance between the user and sensor.
Figure 8b,
Figure 9b and
Figure 10b presents results obtained by the Kalman filter. Because the Kalman filter is a recursive filter based on the original data, no significant differences exist between the raw PPG signal and filtered data. However, the values of raw- and Kalman-filtered data differ from each other, and large noises are discarded certainly. STFT results demonstrate that
Figure 10c, the spectrogram for 0.8 mm, differs from
Figure 8c, a spectrogram for 0 mm; however,
Figure 8c and
Figure 9c do not differ significantly. The modified average filter exerts more influence when the average of amplitudes is low because the filter is based on the average. The noise disappears when comparing
Figure 10a,d; however, it is not effective for large amplitudes, as illustrated in
Figure 8a.
Figure 8e,
Figure 9e and
Figure 10e present the results of the BPS+SMA filter. Because SMA filter was applied after noise reduction, the overall amplitude was significantly decreased. In addition, the relationship between data before and after disappeared.
Figure 8f,
Figure 9f and
Figure 10f show the results of the proposed filter. Although the waveform appears to converge to one value, it is filtered while maintaining the relationship according to all distance criteria.
To verify the suitability of MobileNet for the distance estimation, we employed Intellino, LeNet-5, and a calculation method using the difference between amplitudes without AI. Intellino is an AI with a distance-calculation-based k-neighbor nearest algorithm, not a layer architecture [
32]. By reducing the multiplier with the Manhattan distance, the suitability for the embedded system was verified. The accuracy of the audio signal and image data of Intellino was measured at 0.91 and 0.94, respectively [
31,
33]. Intellino is possible to experiment by freely reconfiguring the size of the input data and the number of neuron cells using the simulator [
34]. LeNet-5 is a representative AI for optical character recognition, which has a seven-layer CNN architecture [
28]. The convolution layers, sub-sampling layers, and the fully-connected layer are included. Because the size of output data decreases as the sub-sampling layers exist, the minimum size of input data is 32 × 32. Accordingly, we only utilized 120 and 60 as signal lengths. We analyzed the accuracy of the proposed algorithm via the combinations of various filters and four distance estimation methods; the obtained results are presented in
Table 1,
Table 2,
Table 3 and
Table 4. Intel-core i5-2500 CPU and 6GB RAM were configured for performance analysis. For the four distance-estimation methods, the tables demonstrate that the accuracy of the WA filter is higher than the accuracy of other filters. Among all the combinations of the filters and distance estimation methods, Intellino exhibits the highest accuracy. However, as presented in
Table 5, the inference time is large compared to MobileNet.
We analyzed other metrics such as precision, recall, and f1 score for MobileNet. Precision is the ratio of what is actually true to what the model classifies as true. Recall is the ratio of what is actually true to what the model predicts as true. The F1 score is the harmonic mean of precision and recall. The Precision, recall, and f1 score of MobileNet for the WA filtering data are shown in
Table 6. Similar to the results of accuracy, precision, recall, and f1 score were the highest when the signal length was 15. In conclusion, the combination of WA filter and MobileNet for distance estimation achieves high accuracy and practical inference time.
In addition, we verified the proposed algorithm for PPG signals measured at wrists. Dataset obtained from wrists comprised 54,000 sampling points. MobileNet was trained with finger and wrist datasets and was inferenced with wrist dataset. The amplitudes according to the distance of the PPG signal extracted from the wrists were not clearly different from the PPG signal extracted from fingers. As a result, the accuracy, precision, recall, and f1 score were 80.7%, 80.7%, 81.0%, and 80.8, respectively, when the signal length was 20.