*2.5. The Detection Algorithm*

Two main approaches were taken in the design of the heartbeat detector: the classical shallow feed-forward neural networks (FF ANN) and the nonlinear autoregressive exogenous model network (NARX) as representative of the feedback-based topologies. The NARX model took as an input the current sample in the radar data stream and the previous 100 samples, together with their calculated output. The NARX topology was tested for a single hidden layer with 10 and 20 neurons, NARX 10 1 and NARX 20 1 respectively. For the classic shallow FF ANN, 4 configurations were tested: (1) a single hidden layer with 10 neurons, FF 10 1, (2) a single hidden layer with 20 neurons, FF 20 1, (3) two hidden layers with 20 and 2 neurons respectively, FF 20 2, and (4) two hidden layers with 40 and 4 neurons respectively, FF 40 4. The output layer in all the cases consisted of a single neuron with a sigmoid activation function. Activation functions for the hidden layers also varied during this topology search, including a hyperbolic tangent and log-sigmoid and linear transfer functions. Furthermore, for the FF ANNs different loss functions were tested: the mean squared normalized error (MSE), the mean squared error with regularization (MSEREG) and the sum squared error (SSE).

As the FF ANN, containing a single hidden layer with 20 neurons, a hyperbolic tangent activation function, trained using Levenberg–Marquardt optimization and MSE loss function, outperformed all the other ANN topologies, the results presented in this paper were focused mainly on this FF 20 1 ANN topology. The whole flowchart, including the preprocessing and the detection algorithm based on the FF 20 1 ANN, is presented in Figure 5. The ANN input consisted of unprocessed in-phase and quadrature components of a Doppler radar signal (discretized and resampled *I*(*t*) and *Q*(*t*) signals from Equation (3) and Equation (4), respectively). In the FF ANN topology, each neuron in the hidden and output layers calculated a linear combination of its inputs in Equation (5):

$$a\_i^j = \sum\_{i=1}^N w\_i^j a\_i^{j-1} + b\_{i'}^j \tag{5}$$

where *j* refers to the current layer, *N* is the number of inputs to the current layer, *w* is the weights and *b* is the corresponding biases. The output of each neuron was passed through the hyperbolic tangent function, with the exception of the output neuron which used a sigmoid function. The weights were calculated through a numerical optimization of the mean squared error loss function in Equation (6):

$$MSE = \frac{1}{M} \sum\_{i=1}^{M} \left( b\_i - a\_i \right)^2,\tag{6}$$

where *M* is the number of data points, *bi* is the binarized target signal and *ai* is the network output. This was an iterative procedure that was set to run for a maximum of 1000 epochs or to stop early if the solution became sufficiently close to the minimum, that is, if the gradient became smaller than 10−7. The training would also stop if the error on the portion of the data set aside for validation (30%) failed to decrease for 6 consecutive epochs.

To remove fast noisy changes, the sequence of outputs of the ANN calculated for each sample was smoothed. This stage of the detection algorithm was implemented as a moving average filter with a width of 10 consecutive ANN outputs.

The next stage of the algorithm was the peak detection subroutine which marked local maximums of the continuous probabilities output, imposing established constraints on the minimal distance between the consecutive peaks based on the known physiological range in rest 40–120 beats/min [38,39] and their prominence (detection amplitude). When the duration of the detected inter-pulse interval (IPI) was twice as large as the previously detected IPIs (within the established constraints), a beat was interpolated as having occurred at the point in time that was the arithmetic mean between the occurrences of the current and previously detected heartbeats. The detection amplitude was defined empirically for each ANN topology on a small test sample using the error of the number of detected heartbeats as a metric. The same detection amplitude was then used for all the subjects.

**Figure 5.** Flowchart for the proposed method for heartbeat detection based on the classical shallow feed-forward neural network with a single hidden layer with 20 neurons (FF 20 1 ANN). The artificial neural network (ANN) input is the 200-sample long vector containing resampled 100 in-phase and 100 quadrature component samples. For the FF 20 1 ANN there are 20 neurons in the single hidden layer and 1 neuron in the output layer. MA Filter—moving average filter.

#### *2.6. Error Estimation and Statistical Analysis*

The inter-pulse interval was calculated as the time elapsed between every two adjacent heartbeats. The classification error was determined through the percentage error in the total number of detected heartbeats and the error in the estimation of median IPIs. The similarities were also assessed between the distributions of the ANN-detected IPIs and those extracted from the reference ECG method.

The performance was evaluated using a three-fold cross-validation: the data were split into three equal subject groups (folds), each containing recordings acquired from 7 subjects, out of which two folds were used for training and one for testing. The training and testing were repeated three times for a different fold held out for evaluation (as shown in Figure 6).

The evaluation metrics were calculated over the set of predictions obtained on the three folds used in the test mode. A statistical analysis was performed on the results obtained via radar and those extracted from the reference ECG signal. The number of detected heartbeats and median IPI for all the recordings used as test were compared to those calculated from the reference signal. Apart from group evaluations, a statistical comparison was also performed on the level of the individual heart event detection within each of the 21-subject sets in the test mode. As in all of the statistical tests, one or more samples were found to be significantly non-normal (Lilliefors test with a 0.05 significance level),Wilcoxon signed rank test was used for statistical comparisons.

All processing was done using the Matlab2018b (The MathWorks, Inc., Natick, Massachusetts, United States).

**Figure 6.** Database organization for the purpose of the ANN training and the cross-validation. The dataset was divided into three equal subsets and two subsets were used for training, while the remaining subset was used for testing. This process was repeated for all the combinations of the subsets in the training set. The three training datasets comprised 3473, 3429 and 3386 heartbeats, while the three testing datasets comprised 1671, 1715 and 1758 heartbeats.

#### **3. Results**

The results obtained using six different tested ANN topologies are presented in Table 1.

The smallest error in the number of detected beats (count error—CNT error) was achieved by the FF ANN with 20 neurons in the hidden layer (2%), which was notably better than any other tested topology (12.3% < CNT Error < 34.3%). This topology also had the smallest error when the median IPIs were compared. When comparing at the level of the individual IPIs, the number of subjects whose median was significantly different from the reference was also higher than the rest, but was still relatively low.


**Table 1.** Error metrics for the six different ANN topologies with reference to the ECG-derived number of heart beats and inter-pulse intervals.

<sup>1</sup> Percentage error of the number of the detected heartbeats out of a total 5144 heartbeats. The negative values correspond to fewer detected heartbeats by ANN compared with the "ground truth"; <sup>2</sup> Inter-pulse interval (IPI) mean relative error; <sup>3</sup> Number of subjects with no significant difference between the medians of the estimated IPI-s and the IPI-s from the ECG reference, out of the total 21 subjects; <sup>4</sup> Feed forward ANN; <sup>5</sup> Nonlinear autoregressive exogenous model; The numbers in the topology descriptions stand for the number of units in each layer.

Figure 7 shows an example of the output of the FF 20 1 ANN, the topology which showed the best performance, smoothed with a moving average window, with the prominent peaks detected and marked. This example shows the typical behavior and errors of the methodology.

**Figure 7.** An example of the heartbeat's method detection. Panel (a) shows the output of the FF 20 1 ANN smoothed with a moving average window, with the detected prominent peaks in a 50 s interval. Panel (b) shows the output of the FF 20 1 ANN smoothed with a moving average window on a shorter time scale plotted against the reference ECG (derived heartbeat events) for an easier visualization of the detected heart events.

The error of the total number of detected heart events for the FF 20 1 ANN configuration was −2% (104 undetected beats out of a total of 5144 heartbeats extracted from the ECG). The statistical tests showed that there were no significant differences between these two methods in terms of the number of detected events (p>0.05). The difference in the medians of the IPIs calculated using the reference ECG and the FF 20 1 ANN was −2 samples (−20 ms) and was not statistically significant. When it comes to individual heart event detection, for 11 out of the 21 subjects the medians of the ANN detections were not significantly different from the reference.

For the specific ANN that showed the best performance with the recorded database, there were 20 neurons in the first hidden layer, resulting in ~4000 multiply–accumulate operations. In the implementation on an embedded platform (Teensy 4.0 programmed in Arduino IDE) this calculation took 66 μs, which was more than enough for executing the proposed method in real time. As the calculation was done with a 100 Hz rate, there were ~10 ms in between the consecutive heart event estimations.

#### **4. Discussion**

The work presented in this paper is intended for the detection of individual heartbeats using a state-of-the-art mm-wave radar sensor. The radar sensor relied on the Doppler shift in the signal reflected from the objects within its field of view to detect any movements, even small ones. The real challenge of such a measurement was separating the influence from the different sources. Furthermore, smaller displacements, such as chest movements due to heart activity could be completely hidden or distorted by other physiological sources, such as breathing, talking or change of posture. Thus, the focus of this research was on the specific radar signal footprint in the time domain resulting from the heartbeats and the method by which to identify such small signal ripples. The computational tool selected for this task was shallow artificial neural networks for their high capacity for generalization, ability to be trained without prior knowledge of the signal properties and being computationally inexpensive, enabling easy implementation in an embedded or a high-level system. For the shallow ANNs that were tested, the dominant part of the computational complexity was related to the first hidden layer which performed multiplications of all the input signal samples with the weights (Figure 5).

With the aim of tracking vital signs and the presence of vehicle drivers in a contactless manner, a database of radar signals, alongside ECG and respiration, was gathered from 21 participants sitting comfortably in a cushioned chair. Up to this date, the number of participants in published papers that presented traditional or machine learning approaches for heartbeat detection was less than 12 [6–32], but considering our goal to obtain the realistic results of using a trained ANN on the unseen data, we acquired a larger dataset than in the previous studies. This objective imposed the strict condition of the testing of the performance on radar signals that were completely new to the ANN. Care was taken to split the data in such a manner that the signals obtained from the same person could all be found either in training or in testing (Figure 6). Although the subjects were instructed to sit quietly in the chair, some of them did substantially move their upper body and head but these recordings were nevertheless included in the database, bearing in mind the potential of neural networks to abstract over a wide variety of inputs and their robustness to noise.

After analyzing the performance of various ANNs it was somewhat surprising to find that a relatively basic ANN outperformed more complex networks. The more complex networks were able to pick up minute details in the signals and use them to model the training set more closely at the cost of loss of generalization, while the reduced single layer network did not have such capacity to overfit. One of the conditions that favored the simpler network architectures could still be the limited database used for the training. The acquisition of yet larger amounts of data in the future could make space for more complex network architectures, such as sequential deep learning models, to further improve the results. This would, however, come at the cost of higher processing requirements.

With respect to the main idea of the ANN-based method, which was the identification of individual heartbeats, the most important metric of the ANN testing was the number of detections. Due to the lack of similar metrics in other scientific publications, the result presented in this paper of 2% undetected heartbeats could not be put into perspective with other approaches.

As another performance evaluation metric, we calculated the time between the consecutive detections. This metric was directly compared with the IPIs from the reference ECG signal and it was shown that the difference between their medians was also not statistically significant, confirming that the method could be used to accurately track averaged heart rates in longer periods. These results were also comparable with the findings presented in [22] where the relative error between the averaged radar and ECG rates was between 0.55% and 1.97%. The method proposed by the authors [24] outperformed all of the previously published methods based on the CW Doppler radar technology regarding the mean relative error (2.07% on the dataset of ten subjects, algorithm latency ~2.5 s). The ANN-based approach presented here brings multiple advantages over other methods presented in relevant papers. The ANN method used raw radar signals without the need for any preprocessing or calibration, which can require the implementation of complex algorithms, nor prior knowledge of the signal properties

and process models. The implementation of the shallow ANN within a microprocessor system was quite computationally inexpensive as it comprises only a small number of basic arithmetic operations (multiplications and additions for neural layers, while activation functions can be obtained using look-up tables). In comparison with the FFT-based approaches and the heavy use of digital filters, this feature presented a significant improvement in the computational complexity. This method is also beneficial in applications that require the fast detection of human presence, as the latency was below the width of the processing window (1 s), while to our knowledge, the shortest latency reported in relevant publications is 2.5 s [24]. This estimate of 1 s was made for the worst-case scenario of system power-up which requires the initial filling of the ANN input buffer. Once the system is up and running, with a human entering its field of view, this latency is expected to be significantly lower. Namely, due to the training procedure, the heart detections could occur in the 400 ms window that contains the latest signal samples. The smoothing by moving average filter was done on only 10 ANN outputs, which brought negligible delay. The last step of the detection chain which was peak detection required only a few extra samples to identify a local maximum.

To summarize the above contributions of our work, we listed all the relevant previously published methods for heartbeat extraction in normal breathing conditions based on the CW Doppler radar technology in Table 2. This work is the first approach that used artificial neural networks for the heartbeat detection based on the CW Doppler radar technology. The method was not person-specific (as opposed to the supervised machine learning approach applied in [32]) and we performed a more realistic scenario on 21 subjects where all the results were presented on "unseen data". This was a fast and reliable calibration-free method, with low percentage of failed heartbeat detections and with the latency that outperformed all relevant previously published non-person-specific approaches.


**Table 2.** Comparison of the methods for heartbeat extraction in normal breathing condition based on the continuous-wave (CW) Doppler radar technology.

<sup>1</sup> Number of subjects (N); <sup>2</sup> Test conditions (TC) during the measurement (S XX= Sitting at distance XX, T XX = Sitting and typing at distance XX); <sup>3</sup> Total measurement time (*T*) of normal breathing for each session; <sup>4</sup> Data processing approach; <sup>5</sup> Tested on data that were "unseen" in the training process; <sup>6</sup> Time window (W); <sup>7</sup> Calibration-free (CF) for I/Q imbalance, the offset compensation or usage of any demodulation techniques; <sup>8</sup> Failed detection (FD) of heartbeats; <sup>9</sup> Average Error of estimated heart rate or IPIs (HR/IPIs Avg. Error).

As shown in Table 1, the ANN implemented in this paper showed weakness in the estimation of the IPI with high accuracies. Consequently, the utilization of the methodology for the evaluation of heart rate variability (HRV) parameters is limited. The goal of the present study was to develop a method for the fast detection of individual heartbeats; thus, all of the ANN optimizations were governed by the error of the detected heartbeats count. The HRV-related errors were not included in any of the methodology steps; therefore, it would be unrealistic to expect high accuracies compared with the methods specifically designed for the HRV estimation. In addition, the choice of the targets has a significant influence on the HRV parameters estimation. As aforementioned, the main goal was related to the robust detection of individual heartbeats, so the target window was made relatively wide (400 ms) to enable the identification of any signal waveform that was related to the mechanical displacements due to heartbeats. This wide window, except for providing more room for heartbeat detection, means that if a heartbeat was detected in any part of the target window, it would be considered as successful during the training phase. In an example, a heartbeat could be detected with the highest probability at the beginning of the target window, while the following heartbeat could be detected with the highest probability at the end, without any penalty related to the ANN performance during training. On the other hand, this kind of detection would result in 400 ms error in estimating the beat-to-beat interval.

In future work, the focus will be on increasing the HRV estimation accuracy. To achieve this goal, the error of the R–R interval estimation will be introduced as an additional loss function during the training procedure. This would inherently force the ANN to bind the maximal detection probability with a specific part of the mechanical oscillation. However, this approach would require some topological ANN changes, such as a feedback loop with the previously detected IPI and an extension of the memory depth.

In this study, there were several limitations that should be noted. All the subjects that participated in the study were young and physically fit adults. During the selected time period they were mostly sitting calmly and were instructed to restrain from prominent body movements. In this study, the chair was placed in a room with no moving objects. In a scenario which involves nonstationary objects, or recording within confined environments, such as inside a car, the performance of the system could deteriorate due to clutter and multipath effects. Given the continuous nature of the unmodulated signal, Doppler radar has no exact information on the absolute position of the observed person. Tracking vital signs of multiple people simultaneously would most likely have to be performed using modulated signals that can resolve observed targets in a space.

The usage of a high-carrier frequency provided small dimensions of the radar system, since small antennas were used, unlike for lower frequency radar systems whose antennas usually need to be larger. An additional advantage of the high-frequency radar was its sensitivity to the small chest displacements that come from heartbeats. Since the method in this paper was using in-phase and quadrature radar signals directly, the radar sensitivity was of crucial importance for the heartbeat detection. The usage of even higher carrier frequency could improve the results, since the sensitivity to small displacements would be larger.

The radar used for this study had a relatively broad field of view, which made it susceptible to picking up clutter from the surroundings. Additionally, in applications that would require a larger distance between the radar and a patient, this broad field of view would pick up even more surrounding movements. Using an antenna with a narrower beam could improve the performance of the system in the future. It is expected that the signal-to-noise ratio (SNR) of the received signals would be smaller if the radar was placed at larger distances. This means that further tests need to be done in order to determine the influence of the SNR on the detection accuracy. Future work will also include tests of the ANN performance in cases when subjects perform natural movements, to estimate the reliability of the sensor and the method in an environment saturated with motion originating from different sources.
