**About the Editors**

**Maysam Abbod** received his PhD in Control Engineering from The University of Sheffield, U.K., in 1992. He is currently a Reader of Electronic Systems with the Department of Electronic and Computer Engineering, Brunel University London, U.K. He has authored over 50 papers in journals, 9 chapters in edited books, and over 50 papers in refereed conferences. His current research interests include intelligent systems for modeling and optimization. He is a member of the IET (U.K.), and a Chartered Engineer (U.K.). He is currently working as an Associate Editor of the Engineering Application of Artificial Intelligence (Elsevier).

**Jiann-Shing Shieh** received his BSc and MSc degrees in Chemical Engineering from National Cheng Kung University, Taiwan, in 1983 and 1986, respectively, and his PhD in Automatic Control and Systems engineering from The University of Sheffield, U.K., in 1995. He is currently a Professor with the International Program in Engineering for Bachelor, a Joint Professor with the Department of Mechanical Engineering, Graduate School of Biotechnology and Bioengineering, and also serves as the Provost and Dean of the College of Engineering, Yuan Ze University, Taiwan. His research interests are focused on biomedical engineering, particularly in bio-signal processing, intelligent analysis and control, medical automation, pain model and control, critical care medicine monitoring and control, dynamic cerebral autoregulation research, and brain death index research.

## *Editorial* **Special Issue "Advanced Signal Processing in Wearable Sensors for Health Monitoring"**

**Maysam Abbod 1,\* and Jiann-Shing Shieh 2,\***


Wearable sensors are becoming very popular recently due to their ease of use and flexibility in recording data from home. They can range from simple adhesive sensors to more sophisticated, stretchable implants to monitor health or for diagnosis. The basic unit of a wearable sensor is the electrodes or wires, the power source, and the interface/communication unit, which can be a smartphone or other types of signal receivers. One of the most important features of a wearable sensor is flexibility: It has to flex, stretch and twist without straining the sensory part and maintain the quality of the measured signal. Most wearable sensors measure physiological signals and incorporate a real-time decision system to interpret the signal and detect symptoms or measure context awareness. Different system implementations and technological approaches are used for the design of the state-of-the-art in wearable biosensors: some have advantages and some have shortcoming. The literature of such techniques is surveyed in order to provide direction for future research improvements [1].

Technology is continually improving, making many tools and algorithms available to developers with diverse applications and connectivity. Wearable sensors are benefiting from the underlying versatile technologies enabling them to capture rich contextual information that deliver a legitimately personalized experience. The extensive and diverse classification of wearable devices with wireless communication, data processing and on-board classification is reported in a survey that highlights the challenges and future solutions in this field [2].

The market of wearable sensors is growing exponentially, with an annual growth rate of 20%. Moreover, the outbreak of COVID-19 had a tremendous impact on the evolution of wearable device, driven by the requirements of home sensing and diagnosis devices [2]. Many systems have been developed for different applications, such as protection for the elderly, health home monitoring, gait analysis, interactive media and animation that helps people become familiar with such technologies [3].

In this Special Issue, special attention is given to the AI technologies that are utilized in signal processing and diagnosis. Signals usually need filtering, conditioning and processing. Advanced algorithms are computationally intensive and require fast hardware. This can represent a major hurdle in developing such systems, and most developers revert to performing the heavy signal processing on advanced computing systems such as GPUs, computer clusters, cloud applications and edge computing. This serves wearable sensor very well, as information can be transferred via the Internet, providing sophisticated algorithms and high-speed processing power.

Papers published in this Special Issue are focused onto two subjects: health monitoring using biological signal (EEG, ECG) and physical health monitoring (movements). The subject of health monitoring focused on recording vital signs such as brain activities (EEG) and applications to the cardiovascular system (ECG, HR, PPG); this issue includes seven papers in this field. The application of EEG is rather difficult due to the need for good wearable sensors that can detect the signal reliably. However, filtration and signal

**Citation:** Abbod, M.; Shieh, J.-S. Special Issue "Advanced Signal Processing in Wearable Sensors for Health Monitoring". *Sensors* **2022**, *22*, 2189. https://doi.org/10.3390/ s22062189

Received: 7 March 2022 Accepted: 8 March 2022 Published: 11 March 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

1

conditions are some of the techniques that can be used to extract meaningful information from noisy signals. EEG can be used to monitor different activities; one particular example is drowsiness detection, which is very useful for drivers who might lose attention during driving [4]. On the other hand, the use of ECG signal processing using deep learning (DL) algorithms tend to be the latest in the field. One particular application is the monitoring of the hemodynamics using ECG and DL without the need for invasive sensing [5].

Other biological signal applications are mainly related to the cardiovascular system, such as the detection of heart rate and blood pressure using either facial expressions [6] or PPG sensors [7,8] respectively. Such measurements are not easy to realize with high accuracy, hence, there is a need for DL systems to process images or signals in order to obtain good accuracy. Finally, DL is also used for the detection heart rhythm anomalies, and a short survey is presented in [9] that looks at the different techniques utilizing wearable sensors. The final application is the use of ECG for the detection of myocardial infarction (MI), which is one of the most prevalent cardiovascular diseases. An LSTM network is used to detect MI based on ECG signal [10].

Physical health monitoring using wearable sensors is presented in two papers related to physical movements, such as the prediction of joint momentum for the purpose of predicting the force generated by the muscle using an ANN for the purpose of skeleton control [11]. Another related work presented in [12] is based on the detection of chewing event using EMG signals.

The last paper is a mini-review that addresses pain as a subjective feeling. The review presents the correlation between pain and stress, and the measurement approach uses wearable sensors. Various physiological signals (i.e., heart activity, brain activity, muscle activity, electrodermal activity, respiratory, blood volume pulse, skin temperature) as well as expression/behavior are listed as measurable signs using wearables sensors. Wearable sensors used for healthcare monitoring systems can detect pain and stress. As a consequence, pain leads to multiple symptoms such as muscle tension and depression; hence, integrating modern computing techniques with wearable sensor measurements can help in pain control [13].

**Author Contributions:** M.A. and J.-S.S. wrote the paper. These authors contributed equally to this work. All authors have read and agreed to the published version of the manuscript.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **EEG Signal Multichannel Frequency-Domain Ratio Indices for Drowsiness Detection Based on Multicriteria Optimization**

**Igor Stancin, Nikolina Frid, Mario Cifrek and Alan Jovic \***

Faculty of Electrical Engineering and Computing, University of Zagreb, Unska 3, 10000 Zagreb, Croatia; igor.stancin@fer.hr (I.S.); nikolina.frid@fer.hr (N.F.); mario.cifrek@fer.hr (M.C.) **\*** Correspondence: alan.jovic@fer.hr

**Abstract:** Drowsiness is a risk to human lives in many occupations and activities where full awareness is essential for the safe operation of systems and vehicles, such as driving a car or flying an airplane. Although it is one of the main causes of many road accidents, there is still no reliable definition of drowsiness or a system to reliably detect it. Many researchers have observed correlations between frequency-domain features of the EEG signal and drowsiness, such as an increase in the spectral power of the theta band or a decrease in the spectral power of the beta band. In addition, features calculated as ratio indices between these frequency-domain features show further improvements in detecting drowsiness compared to frequency-domain features alone. This work aims to develop novel multichannel ratio indices that take advantage of the diversity of frequency-domain features from different brain regions. In contrast to the state-of-the-art, we use an evolutionary metaheuristic algorithm to find the nearly optimal set of features and channels from which the indices are calculated. Our results show that drowsiness is best described by the powers in delta and alpha bands. Compared to seven existing single-channel ratio indices, our two novel six-channel indices show improvements in (1) statistically significant differences observed between wakefulness and drowsiness segments, (2) precision of drowsiness detection and classification accuracy of the XGBoost algorithm and (3) model performance by saving time and memory during classification. Our work suggests that a more precise definition of drowsiness is needed, and that accurate early detection of drowsiness should be based on multichannel frequency-domain features.

**Keywords:** drowsiness detection; EEG; frequency-domain features; multicriteria optimization; machine learning

#### **1. Introduction**

Drowsiness is the intermediate state between wakefulness and sleep [1]. Terms such as sleepiness or tiredness are used synonymously with drowsiness in related studies [2–4]. Although it is intuitively clear what drowsiness is, it is not so easy to determine exactly whether a person is in a drowsy state or not. The reason for this is the unclear definition of drowsiness. Some researchers define drowsiness as stage 1 sleep (S1) [5–9], which is also known as non-rapid eye movement 1 (NREM 1) sleep. Da Silveira et al. [10] used S1 sleep stage data in their research of drowsiness. Johns [11] claims that the S1 sleep stage is equivalent to microsleep (episodes of psychomotor insensitivity due to sleeprelated wakefulness loss [12]), while drowsiness is stated to occur before S1 sleep, but it is not stated when it begins and what characterizes it. Researchers who do not use any of the aforementioned definitions of drowsiness typically use a subjective assessment of drowsiness, e.g., the Karolinska sleepiness scale [13]. In this paper, the term drowsiness is used as a synonym for the S1 sleep stage.

In a drowsy state, people are not able to function at the level required to safely perform an activity [14], due to the progressive loss of cortical processing efficiency [15]. Drowsiness is, therefore, a significant risk factor for human lives in many occupations, e.g., for air traffic

**Citation:** Stancin, I.; Frid, N.; Cifrek, M.; Jovic, A. EEG Signal Multichannel Frequency-Domain Ratio Indices for Drowsiness Detection Based on Multicriteria Optimization. *Sensors* **2021**, *21*, 6932. https://doi.org/ 10.3390/s21206932

Academic Editors: Maysam Abbod and Jiann-Shing Shieh

Received: 30 August 2021 Accepted: 17 October 2021 Published: 19 October 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

controllers, pilots and regular car drivers [16]. According to the reports from NASA [17] and the National Transportation Safety Board [18], one of the main factors in road and air accidents is drowsiness. Gonçalves et al. [19] conducted a study across 19 European countries and concluded that in the last two years, 17% of drivers fell asleep while driving, while 7% of them had an accident due to drowsiness. The high frequency and prevalence of drowsiness-related accidents speak in favor of the development of early drowsiness detection systems, which is the subject of this paper.

Many researchers are trying to solve the problem of early detection of drowsiness in drivers. Balandong et al. [20], in their recent review, divided the techniques for detecting driver drowsiness into six categories: (1) subjective measures, (2) vehicle-based systems, (3) driver's behavior-based systems, (4) mathematical models of sleep–wake dynamics, (5) human physiological signal-based systems and (6) hybrids of one or more of these techniques. Currently, the most common techniques used in practice are vehicle-based systems [5], but these systems are mostly unreliable and depend largely on the driver's motivation to drive as well as possible [20].

Physiological signals are the promising alternative for reliable drowsiness detection [21]. The main problem with this approach is that these systems are often not easy to use and are intrusive to drivers [20]. Nevertheless, many researchers are working on small, automated and wearable devices [21–24], or on steering wheel devices [25,26] in order to overcome these obstacles. Techniques for detecting drowsiness based on physiological signals can be further subdivided according to the type of signal used, such as electroencephalogram (EEG) [27], electrooculogram (EOG) [28] or electrocardiogram (ECG) [29].

The most studied and applied physiological signal to detect drowsiness is the EEG. In this paper, frequency-domain features of the EEG signal are analyzed and two novel multichannel ratio indices for the detection of drowsiness are proposed. Besides the frequency-domain features, there are also other types of features: (1) nonlinear features [30], (2) spatiotemporal (functional connectivity) features [31] and (3) entropies [32]. These three groups of features have a lower frequency of use compared to the frequency-domain features, so in this paper, we focus only on frequency-domain features. Based on the recent review [33] of EEG-based drowsiness detection systems, 61% of the included papers used frequency-domain features, 38% used entropies, 10% used nonlinear features and 10% used spatiotemporal features (some papers used multiple groups of features, so the sum of the percentages is greater than 100%). This shows the difference in the use of drowsiness detection systems, and the difference is even greater in the general field of neurophysiological scientific papers. Although the three feature groups mentioned above are used less frequently, there are still a certain number of papers that include them, especially entropies.

Frequency-domain features estimate the power spectral density in a given frequency band. The bands typically used in the analysis of EEG signals are delta (δ, 0.5–4 Hz), theta (θ, 4–8 Hz), alpha (α, 8–12 Hz), beta (β, 12–30 Hz) and gamma (γ, >30 Hz). An increase in theta activity [34] and an increase in alpha activity [35] indicate drowsiness. An increase in the beta activity, however, is a sign of wakefulness and alertness [36]. There are several widely used frequency-domain ratio indices for detecting drowsiness. Eoh et al. [36] proposed the θ/α and β/α ratio indices, Jap et al. [37] proposed the (θ + α)/β, θ/β and (θ + α)/(α + β) ratio indices and da Silveira et al. [10] proposed the γ/δ and (γ + β)/(δ + α) ratio indices. These ratio indices provide improvement in the detection of drowsiness compared to the frequency-domain features alone and are shown to correlate with drowsiness.

All these frequency-domain features and ratio indices are calculated from a single EEG channel, i.e., from a single brain region. In recent research, Wang et al. [38] showed that the significance of a decrease in delta and an increase in (θ + α)/β indices depends on the brain region. This significant diversity of the correlation of features with drowsiness in different brain regions is the motivation for this research. Since all currently used frequency-domain features and ratio indices are based on a single channel (single brain region), this work aims to use the best distinguishing features of each brain region for the detection of drowsiness and to combine them into a single multichannel ratio index feature.

In our work, we use a computational method based on multicriteria optimization to extract the multichannel EEG-based frequency-domain ratio index features. This method allows us to discover new multichannel ratio indices that show improvements in the detection of drowsiness compared to single-channel ratio indices. Finally, with the use of machine learning models, we prove that multichannel indices detect drowsiness with higher accuracy, higher precision, reduced memory and faster computation compared to single-channel features.

In the Materials and Methods Section, we show the methodology of our work, including a description of the dataset, preprocessing and feature extraction methods used. Novel multichannel ratio indices and the multi-objective optimization method are also described there. In the Results Section, we present the results of our work, including statistical analysis, drowsiness prediction and computational properties of the proposed indices. In the Discussion Section, we discuss in more detail the topics covered in this paper. Finally, in the last section, we conclude the paper.

#### **2. Materials and Methods**

#### *2.1. Dataset, Preprocessing and Feature Extraction*

The data used in this paper were obtained from the PhysioNet portal [39], in particular from the 2018 PhysioNet computing in cardiology challenge [40]. The original dataset contains data records from 1985 subjects, and each recording includes a six-channel EEG, an electrooculogram, an electromyogram, a respiration signal from the abdomen and chest, airflow and oxygen saturation signals and a single-channel electrocardiogram during the all-night sleep. The records were divided into training and test sets of equal size. The sleep stages [41] of all subjects were annotated by clinical staff based on the American Academy of Sleep Medicine (AASM) manual for the scoring of sleep [42]. There are six types of annotations for different stages: wakefulness (W), stage 1 (S1), stage 2 (S2), stage 3 (S3), rapid eye movement (REM) and undefined.

In this research, we wanted to use a training set (992 subjects) to detect drowsiness. The officially provided way of acquiring the data is through torrent download, but we managed to download only 393 subjects completely, due to a lack of seeders. Of these 393 subjects, EEG signal recordings from 28 subjects were selected, based on the condition that each recording had at least 300 s of the W stage and, immediately after that, at least 300 s of the S1 stage. From each recording, a fragment of 600 s (300 s of W stage and 300 s of S1 stage) was used for analysis. In the original dataset, each EEG signal recording consists of six channels (*F*3, *F*4, *C*3, *C*4, *O*1 and *O*2, based on the International 10/20 System), with a sampling frequency of 200 Hz. Table 1 shows the identification numbers of all the selected subjects. The subjects were divided into two groups, one group used for training of the model (16 subjects) and the other one for the test of the obtained models (12 subjects). The training set was used to obtain novel ratio indices (with the method described below) and the test set was used to check these novel indices on the unseen data.

**Table 1.** The identification numbers of all the selected subjects. The training set is in the upper part and the test set is in the lower part of the table.


Before feature extraction, the EEG signal must be filtered. For this purpose, the DC component was removed from the signal and the signal was filtered with a Butterworth filter to remove high-frequency artifacts and low-frequency drifts. We used the sixth-order Butterworth filter, the low-cut frequency of 1 Hz and the high-cut frequency of 40 Hz. In the selected fragments of the recordings, there was an insignificant number of eye-related artifacts, so we decided not to use the independent component analysis for their removal in order to prevent potential information loss due to component removal.

The signals were divided into epochs to calculate features. The epochs were five seconds long with a 50% overlap between them. Frequency-domain features are often used in EEG signal analysis. These features were extracted from the power spectral density (PSD) of the signal. To obtain the PSD of the signal, Welch's method [43] was used. Welch's method is used more often than Fast Fourier transform in the field of EEG signal analysis since it produces PSD with lower variance. The standard frequency-domain features were calculated, i.e., delta (δ, 0.5–4 Hz), theta (θ, 4–8 Hz), alpha (α, 8–12 Hz) and beta (β, 12–30 Hz) bands. We also calculated the less frequently used frequency-domain features, i.e., gamma (γ, >30 Hz), sigma (σ, 12–14 Hz), low alpha (α1, 8–10 Hz) and high alpha (α2, 10–12 Hz) bands [44].

#### *2.2. Novel Multichannel Ratio Indices*

Ratios between frequency-domain features have often been used as new features in different areas of EEG signal analysis [10,36]. All these features have a simple mathematical formulation but often lead to an improvement in detection and reduction of dimensionality for drowsiness. Moreover, they are calculated based on a single channel only. The idea behind the novel indices we present in this work is to design the feature formulation in such a way that frequency-domain features from different channels can be combined. Figure 1 illustrates the difference between these two approaches. For simplicity of visualization, only four epochs, two channels (*F*3 and *F*4) and three features per channel are shown in Figure 1.



**Figure 1.** A visualization of tables with features. The green color represents the possibilities for creating a ratio index, the first table (**top**) are the possibilities reported in the related work to create a single-channel ratio index, while the second table (**bottom**) are the possibilities explored in our novel multichannel approach.

We define a new index, *I*, for each epoch, *e*, which is calculated as a ratio of the feature values, *F*(*e*), for all six channels in the epoch, *e*. In both the nominator and denominator, the feature value of each channel, *j*, is multiplied with a dedicated coefficient, *Cij* or *Kij* respectively, as indicated in the Equation (1):

$$I(\boldsymbol{\varepsilon}) = \frac{\sum\_{i=f\_{\text{causal}}} \sum\_{j=\text{channels}} \mathbb{C}\_{ij} F\_{ij}(\boldsymbol{\varepsilon})}{\sum\_{i=f\_{\text{causal}}} \sum\_{j=\text{channels}} K\_{ij} F\_{ij}(\boldsymbol{\varepsilon})} \tag{1}$$

The purpose of the coefficients is to reduce or even eliminate the influence of certain channels of frequency-domain features, by setting the value in the range [0, 1-, or increase the influence of certain channels of the frequency-domain features by setting the corresponding coefficient to a value in the range [1, ∞-. There are 48 (6 channels and 8 features per channel) *C* coefficients and 48 *K* coefficients.

The ideal output of *I*(*e*) should look like a step function (or an inverse step function), which would indicate a clear difference between the two stages: W and S1. Figure 2 illustrates the main features of the output. The output can be divided into two parts: the left one corresponds to stage W and the right one to S1. While the output in each part should be as smooth as possible, i.e., with minimal oscillations, it is expected that there will be a transition period between the phases, which may have significant oscillations. This transition period would ideally be the step function, but in realistic settings, it is expected that the transition between phases of brain activity will probably last several epochs and would not be considered as either stage W or S1.

In order to determine the appropriate value of the coefficients that would provide the output as close as possible to the ideal, at least two criteria must be taken into account: the absolute difference between the mean values left and right of the transition window and the quantification of the oscillations in each part. This can be defined as a multi-objective optimization problem that we want to solve using a metaheuristic multi-objective evolutionary optimization method, as described in the next section. To the best of our knowledge, this state transition problem has never been approached with evolutionary computation.

#### *2.3. Multi-Objective Optimization*

The optimization of a step function that is representative of the problem of flat surfaces is generally a challenge for any optimization algorithm because it does not provide information about which direction is favorable and an algorithm can get stuck on one of the flat plateaus [45]. To overcome this challenge, instead of optimizing the function according to one criterion, we define two objectives that we optimize simultaneously: (1) to maximize the absolute difference between the mean value of *I*(*e*) output for the W

and S1 stages, and (2) to minimize the oscillations of the output value around the mean value in each stage. According to Figure 2, the left part of the *I*(*e*) output occurring before the transition phase corresponds to the W stage, and the right part, occurring after the transition phase, corresponds to the S1 stage. Since optimization problems are usually expressed as minimization problems, where the first objective function, *O*1, is defined as the inverse absolute difference between the mean value of *I*(*e*) of the left part (*avgleft*) and the right part (*avgright*), Equation (2) is established:

$$\text{O1} = \frac{1}{\left|avg\_{right} - avg\_{left}\right|}\tag{2}$$

The second objective function, *O*2, expresses the oscillations in the function and is defined as the number of times the difference between the output values of *I*(*e*) for two adjacent epochs was greater than a given limit. The exact value of this limit will be discussed later in this section as it is closely related to the specifics of the optimization method used. The main goal of the objective function *O*2 is to minimize the influence of the biggest flaw in the way that the objective *O*1 is calculated, i.e., to use the averaging function. For example, if a possible solution is a completely straight line, except for a large negative spike in the left part and a large positive spike in the right part, based only on the objective function *O*1, this would be a good solution, while the objective function *O*2 would penalize this solution.

As mentioned above, the transition between two stages will probably take several epochs and show significant oscillations of the function output values. According to the annotation made by clinical personnel, the transition phase should be approximately in the middle of the *I*(*e*) output, but it cannot be determined exactly how long it will last. In our work, which is based on expert knowledge of human behavior in the case of drowsiness, we assume that it lasts about one minute, which corresponds to about 30 epochs. Within the transition window, neither one of the two objective functions is calculated, since it is assumed to belong neither to the W nor to the S1 stages. We also allow it to move around the center, shifting left and right, due to a possible error of the human observer who marked the data.

The multi-objective optimization problem can now be expressed as min{*O*1,*O*2}, where *O*1 and *O*2 are the conflicting objective functions, as defined above. The evolutionary metaheuristic algorithm NSGA-II [46] was applied to solve this multi-objective optimization problem. The genetic algorithms (GAs) are normally used to solve complex optimization and search problems [47]. NSGA-II is one of the most popular evolutionary multicriteria optimization methods due to its versatility and ability to easily adapt to different types of optimization problems. The strong points of this MO algorithm are: (1) the fast nondominated sorting ranking selection method used to emphasize Pareto-optimal solutions, (2) maintaining the population diversity by using the crowding distance and (3) the elitism approach, which ensures the preservation of best candidates through generations without the setting of any new parameters other than the normal genetic algorithm parameters, such as population size, termination parameter, crossover and mutation probabilities. Additionally, it was often used for the elimination of EEG channels with the similar purpose as in our case-dimensionality reduction [48]. This paper uses the implementation of NSGA-II provided by the MOEA framework [49] and is based on the guidelines defined in [46,50].

NSGA-II was used with the following configuration. The chromosome was divided into two parts: in the first part, genes represented the nominator coefficient values (*Cij*), and in the second part, genes represented the denominator coefficient values (*Kij*). In each part, the genes were grouped by frequency-domain features and channels, as illustrated in Figure 3. The genes were encoded as real values in the range [0.0, 10.0], and standard NSGA-II crossover and mutation operators were used to support operation on real values.

**Figure 3.** An illustration of a chromosome structure in the proposed optimization problem solution.

Each solution is evaluated based on the values of objectives *O*1 and *O*2, as described in the pseudocode in Algorithm 1. First, the chromosome is decoded (line 1). Then, for each test fragment, two values are calculated: (1) the inverse absolute difference (IAD) between the mean index value, *I*(*e*), of the left part and the right part, represented by the invAbsDiff variable in the pseudocode, and (2) the oscillations in the function, represented by the oscillation variable in the pseudocode (lines 3–5). Finally, the value of each objective *O*1 for the given solution is defined as the average value of invAbsDiff for all test fragments, and the value of objective *O*2 is defined as the average value of oscillation for all test fragments (lines 7–8).


The algorithm for the IAD calculation is provided in the pseudocode in Algorithm 2. The calculation of the IAD for each fragment was slightly modified compared to Equation (1) to allow a faster convergence of the search algorithm. The transition phase was not in the same position in each fragment but allowed to move more loosely away from the center because the annotation in the original dataset was performed manually and there was a possibility of human error in case the observer would register a transition from W to S1 a little too early or too late. The algorithm allows the transition phase to begin no earlier than 30 epochs from the fragment start, and end no later than 60 epochs before the fragment end (line 2). The algorithm assumes the transition phase by looking for a window of 30 epochs which has the maximum difference of index, *I*(*e*), values between the left and the right part (lines 9–13).

The gradation of the absolute difference between the mean value of the left and the right parts is also introduced (lines 19–22) to allow easier and faster convergence of the algorithm. The optimization of the objective *O*1 can be considered as an optimization problem with soft constraints that are related to how much *O*1 deviates from the optimal value. However, it is quite difficult to determine the optimal value precisely a priori. As indicated in [51,52], constraints are often treated with penalties in optimization techniques. The basic idea is to transform a constrained optimization problem into an unconstrained one by introducing a penalty into the original objective function to penalize violations of constraints. According to a comprehensive overview in [51], the penalty should be based on the degree of constraint violation of an individual. In [53], it is also recommended that instead of having just one fixed penalty coefficient, the penalty coefficient should increase when higher levels of constraint violation are reached. The greatest challenge, however, is to determine the exact penalty values. If the penalty is too high or too low, evolutionary algorithms spend either too much or too little time exploring the infeasible region, so it is

necessary to find the right trade-off between the objective function and the penalty function so that the search moves towards the optimum in the feasible space. As the authors have shown in [54], the choice of penalty boundaries is problem-dependent and difficult to generalize. Since we cannot strictly determine the optimal value of *O*1 in our case, we have chosen several thresholds for the absolute difference value, with the penalty increasing by a factor of 10 for each new threshold. The exact thresholds were selected based on the experience gained from the first few trial runs of the algorithm. Based on the observations from the trial runs, a third modification was also introduced: the difference is calculated with a relative, instead of absolute, value of *I*(*e*). The relative value of *I*(*e*) is calculated by using the lowest *I*(*e*) value as a reference point, instead of zero, i.e., the zero is "moved", as shown in code lines 16–18 in Algorithm 2.

**Algorithm 2.** IAD Calculation.


The pseudocode for calculating the oscillations in the function as the second objective, *O*2, is provided in Algorithm 3. Again, the optimization of the oscillations can be considered a constrained optimization problem, so that, in the same way as in the case of the IAD calculation discussed previously, a gradation of the difference between the output values of *I*(*e*) for two adjacent epochs is used to penalize the larger differences more severely (lines 7–10 and 15–18). The exact thresholds were chosen based on the experience gained from the first few trial runs of the algorithm. In order to make the algorithm converge more easily and quickly, the concept of "moved zero" was used again (lines 2, 3, 6 and 14).


Finally, to further minimize the oscillations, and help the search algorithm converge more quickly, the maximum change in the *I*(*e*) value between two adjacent epochs is set to 10% of the first of the two epochs. The mathematical formulation of this limit is provided in Equation (3):

$$Index(e) = \begin{cases} 1.1 \ast I(e-1), & \text{if } I(e) > 1.1 \ast I(e-1) \\ 0.9 \ast I(e-1), & \text{if } I(e) < 0.9 \ast I(e-1) \\ & I(e), & \text{else} \end{cases} \tag{3}$$

#### **3. Results**

The optimization algorithm was executed over 107 generations, using 100 randomly selected chromosomes as a starting point. Ideally, the optimization algorithm would have many *C* and *K* coefficients equal to zero and only a few non-zero coefficients in order to obtain a simple and easily understandable mathematical formulation of a novel multichannel ratio index. Unfortunately, even the best solutions of the optimization algorithm had only up to 20 *C* and *K* coefficients equal to zero. Although such a novel multichannel ratio index showed good behavior in detecting drowsiness, it is impractical to use a formula with 76 coefficients. We consider anything above 15 coefficients to be impractical.

In order to reduce the number of coefficients and to simplify the formulation of the novel multichannel ratio index, some coefficients were manually set to zero. In order to decide which coefficients have the least influence on the final solution, we counted how often a large value of the coefficient is fixed to a certain frequency-domain feature. By analyzing the coefficients of all solutions in the final population of the optimization algorithm, we concluded that the most frequently selected features were δ, α, α1 and α2. After manually fixing the coefficients of all other frequency-domain features to zero, the search range for the optimization algorithm was reduced to half.

Although 48 *C* and *K* coefficients remained in the solution at that time, the algorithm provided equally good results in terms of drowsiness detection, but with a much simpler mathematical formulation. In addition to the 48 coefficients that were manually set to zero, the algorithm often set many more coefficients to zero. A decision on the best solution in the final population was made based on the *O*1 and *O*2 values of the optimization algorithm

in combination with the number of coefficients set to zero after using the floor operator on the coefficients. The floor operator was used to simplify the equation by removing the decimal numbers. Preferred solutions are those with a higher number of coefficients set to zero. Our choice was the solution with 13 non-zero coefficients, as shown in Equation (4):

$$I1(\varepsilon) = \frac{a\_{F3} + 4a\_{O2} + 9a1\_{F3} + 3a1\_{C3} + 9a1\_{C4} + a1\_{O2} + 4a2\_{O1} + 8a2\_{O2}}{\delta\_{F3} + 3\delta\_{F4} + 3\delta\_{C3} + 2\delta\_{C4} + 9\delta\_{O2}}\tag{4}$$

All *C* and *K* coefficients were rounded to a lower value (floor operator). Here, *e* represents the current epoch and all the features on the right side were from that same epoch.

The goal of the second condition of the optimization algorithm was to minimize the oscillations of the *I*(*e*) function. The results were much better with this condition than without it, but the resulting function still oscillated strongly. In order to additionally minimize the oscillations, a limitation was performed. The maximum change between any two adjacent samples was set to 10% of the value of the first sample. Equation (5) shows the mathematical formulation of this limitation of the maximum change:

$$Index1(e) = \begin{cases} 1.1 \ast I1(e-1), & \text{if } I1(e) > 1.1 \ast I1(e-1) \\ 0.9 \ast I1(e-1), & \text{if } I1(e) < 0.9 \ast I1(e-1) \\ & I1(e), \text{ else} \end{cases} \tag{5}$$

where *I*1(*e*) is defined by Equation (4) and *e* is the current epoch. Limiting the maximum change of adjacent samples further improves the detection model, and therefore Equation (5) presents the first novel multichannel ratio index.

We have tried to further simplify the formulation of the multichannel ratio index. This time, brute force search for the best solution was applied with the following constraints: (1) encoding of all *C* and *K* coefficients was set to integer values of zero or one for the sake of simplicity, and (2) a maximum of five addends in the equation was allowed. With these constraints, we obtained Equation (6):

$$d2(c) = \frac{\delta\_{F3} + \delta\_{F4} + \delta\_{O2}}{\kappa\_{C3} + \alpha \mathcal{Q}\_{O2}} \tag{6}$$

Again, similar to the first index, the maximum change was limited, so that the final equation for the second ratio index was obtained as:

$$\text{Index2}(\varepsilon) = \begin{cases} 1.1 \ast l2(\varepsilon - 1), & \text{if } l2(\varepsilon) > 1.1 \ast l2(\varepsilon - 1) \\ 0.9 \ast l2(\varepsilon - 1), & \text{if } l2(\varepsilon) < 0.9 \ast l2(\varepsilon - 1) \\ & l2(\varepsilon), \text{ else} \end{cases} \tag{7}$$

where *I*2(*e*) is defined by Equation (6) and *e* is the current epoch. After obtaining the two novel indices, they were normalized to the range [0, 1] for each subject to eliminate interindividual differences between the subjects.

The two novel multichannel ratio indices defined by Equations (5) and (7) were compared with the seven existing indices θ/α and β/α [36], (θ + α)/β, θ/β and (θ + α)/(α + β) [37], and γ/δ and (γ + β)/(δ + α) [10]. The indices γ/δ and (γ + β)/(δ + α) were calculated based on the wavelet transform, i.e., in the same way as in the original paper. Figure 4 shows a comparison of our novel indices with the best and the worst channel for θ/α and (θ + α)/β single-channel indices for subject tr08-0111. These two single-channel indices were selected because they are the best predictors of drowsiness for a given subject among all single-channel indices.

**Figure 4.** The comparison of the two novel multichannel indices with the best and the worst channel for θ/α and (θ + α)/β single-channel indices for subject tr08-0111. The white part of the diagram represents an awake state, while the yellow part of the diagram represents stage 1 of sleep, i.e., a drowsiness state.

#### *3.1. Statistical Analysis*

The Wilcoxon signed-rank test [55] was used to analyze the statistical differences between the awake state and the S1 state. This test was chosen because it refers to data that do not necessarily follow the normal distribution. Table 2 shows *p*-values for each subject in the training set and each index. The significance level α<sup>0</sup> = 0.01 was used together with the Bonferroni correction [55] to reduce the probability of false-positive results, as the test was repeated 144 times (16 subjects and 9 indices), giving us the final <sup>α</sup><sup>p</sup> = 6.9 × <sup>10</sup><sup>−</sup>5. For the existing indices, the *p*-value was calculated for each channel, but only the *p*-values of the best channel (the lowest average of *p*-value for all subjects) are shown in Table 2.

The two novel indices show *p*-values lower than α<sup>p</sup> for most subjects. From this, we can conclude that, for Index1, 14 of 16 subjects show two different distributions for the W stage and the S1 stage, while 13 of 16 subjects show significantly different distributions of the W stage and the S1 stage for Index2. There are only two existing indices where the *p*-value is lower than αp in more than ten cases. These are θ/β and (θ + α)/(α + β), both by Jap et al. [37].

Table 3 shows *p*-values for each subject in the test set and each index. Again, the two novel indices, together with the (γ + β)/(δ + α) [10] index, show *p*-values lower than α<sup>p</sup> for most subjects.

#### *3.2. Drowsiness Prediction Analysis*

An additional comparison of ratio indices was performed by analyzing the drowsiness detection accuracy and precision, as obtained with the XGBoost algorithm [56]. Default parameters were applied: learning rate eta equal to 0.3, gamma equal to 0 and a maximum depth of a tree equal to 6. For a detailed comparison of the indices, classification accuracy and precision were calculated for each subject. Namely, each subject has 238 epochs of the measured signal, with the first half representing the W state and the second half the S1 state. The algorithm classified the subject's state for each epoch (238 classifications per subject), and the accuracy for each subject was calculated based on these classifications. The leave-one-subject-out cross-validation method was applied on the training set, i.e., the algorithm was trained on the data of 15 subjects and tested on the subject excluded from the training set, and this was repeated 16 times to evaluate drowsiness detection on each subject from the training set. Table 4 shows the classification accuracy achieved on the training set.




**Table 3.** Statistical significance *p*-values were obtained by the Wilcoxon signed-rank test for distinguishing the awake state from the S1 state. The shaded green cells with bold text represent the lowest *p*-value for each subject in the test set. At the bottom, the index having *p*-values lower than α<sup>p</sup> for most subjects is marked in the same way.

**Table 4.** The classification accuracy was obtained with the XGBoost algorithm for each subject in the training set. The shaded green cells with bold text show the highest accuracy obtained for each subject. At the bottom, the best mean accuracy for each ratio index is marked in the same way.


Index1 has the highest average accuracy and the highest classification accuracy for 3 of 16 subjects. Index2 has the second-highest average accuracy and the highest classification accuracy for 4 of 16 subjects, which is the most of all indices. θ/α [36] and (θ + α)/(α + β) [37] are the only other indices with an average classification accuracy above 0.58, while θ/α [36] and θ/β [37] are the only other indices with the highest accuracy for 3 of 16 subjects. The β/α [36] index has the lowest average classification accuracy on the training set (0.5515).

Table 5 shows the classification accuracy on the test set. Index1 has the highest average accuracy and the highest classification accuracy for 3 of 12 subjects. Index2 has the thirdhighest average accuracy and the highest classification accuracy for 4 of 12 subjects, which is the most of all indices. The only other index with comparable accuracy is θ/α [36], with the second-highest average accuracy. All other indices have at least 2.5% lower accuracy than the two novel indices.

Table 6 shows the degree of precision of drowsiness detection on the training set. Index2 has the highest average precision of drowsiness detection and the highest precision

of drowsiness detection for five subjects, which is the highest of all indices. Index1 has the second-best average precision of drowsiness detection. (θ + α)/(α + β) [37] and γ/δ [10] have a precision of drowsiness detection comparable to Index1 and Index2, while all other ratio indices have lower precision.

**Table 5.** The classification accuracy was obtained with the XGBoost algorithm for each subject in the test set. The shaded green cells with bold text show the highest accuracy obtained for each subject. At the bottom, the best mean accuracy for each ratio index is marked in the same way.


**Table 6.** The precision of drowsiness detection was obtained with the XGBoost algorithm for each subject in the training set. The shaded green cells with bold text show the highest precision obtained for each subject. At the bottom, the best mean precision for each ratio index is marked in the same way.


Table 7 shows the degree of precision of drowsiness detection achieved on the test set. Index1 has the highest average precision. θ/α [36] and γ/δ [10] have 1% lower precision than Index1, while all other indices have at least 4% lower precision. Index2 has the second-highest average precision.


**Table 7.** The precision of drowsiness detection was obtained with the XGBoost algorithm for each subject in the test set. The shaded green cells with bold text show the highest precision obtained for each subject. At the bottom, the best mean precision for each ratio index is marked in the same way.

#### *3.3. Computational Analysis*

With regard to the classification and the use of machine learning algorithms, an advantage of using the novel multichannel indices compared to the existing single-channel indices is also the saving of memory and time, due to the reduction of dimensionality. The accuracies of Index1 and Index2 from Table 4 were achieved with the model constructed from the single feature only, while all other indices had six features since the dataset contains six EEG channels. For this reason, storing the novel indices consumes six times less memory. The time consumption was measured as an average of 100 executions. The measured time included classifier initialization, classifier training, classifications on the test subject and calculation of classification accuracy. Table 8 shows the results of time consumption measurements. The use of the novel multichannel indices saves about 30% of time compared to all other traditionally used single-channel ratio indices.

**Table 8.** The average time of 100 executions of the XGBoost classifier's initialization, training, classifications on the test subject and calculation of classification accuracy, expressed in milliseconds. The shaded green cells with bold text represent the best values for each subject and the best average value.


#### **4. Discussion**

The main idea of our research was to combine frequency-domain features from different brain regions into a multichannel ratio index to improve frequency-domain features for the detection of drowsiness and to gain new insights into drowsiness. The results in Tables 2–8 suggest that two novel multichannel ratio indices improve the detection of drowsiness based on the frequency-domain features and reduce the time required for detection.

We must note that the main idea of this research was not to create the best possible model for drowsiness detection but only to bring improvement into frequency-domain features that are often used for drowsiness detection. Our focus was on developing the method for obtaining these novel indices, which is explained in Section 2.3 "Multi-Objective Optimization". In order to confirm that our conclusions also hold for other classifiers besides XGBoost, Table 9 shows the average accuracy on the test set obtained with Naïve Bayes, k nearest neighbors, logistic regression, decision tree, random forest and support vector machine classifiers (using the scikit-learn library at default settings). The average accuracies of two novel indices vary from 56% to 65% among the algorithms. All the algorithms show that our novel multichannel indices are better than existing single-channel indices.

**Table 9.** The average accuracy was obtained on the test set with different classification algorithms. Each row is colored with a pallet of colors ranging from dark green for the highest number in the row to dark red for the lowest number in the row. The algorithms are: NB—Naïve Bayes, KNN—k nearest neighbors, Logistic—logistic regression, DT—decision tree, RF—random forest and SVM—support vector machine.


Our results were compared with the seven existing single-channel ratio indices that are currently state-of-the-art frequency-domain features. The newest one was introduced in 2016 [10], but all of these single-channel ratio indices are often used in the more recent drowsiness detection papers [57–59].

The authors in the aforementioned research report 92% accuracy as the best-obtained accuracy [57]. This accuracy was obtained based on the epoch-level validation. Epoch-level validation is a cross-validation procedure on the epoch level, which means that there is a very high probability that all subjects will have epochs in the training set and in the test set at the same time. On the other hand, subject-level validation is validation where it is ensured that subjects in the test set are not contained in the training set. An example of a subject-level validation is the leave-one-subject-out cross-validation that we used in this research. The only proper way for model validation is subject-level validation, as it represents the real-life setting in which the data from a new subject are used only for testing the model. Empirical tests conducted in related research showed a large difference in the accuracies between epoch-level validation and subject-level validation [60].

In a study from Mehreen et al. [57], the authors also provide subject-level validation, and the accuracy achieved was 71.5% based on 15 frequency-domain features. The highest accuracy achieved in our research is shown in Table 9, and it was 65.45%, achieved by logistic regression. This 65.45% accuracy is relatively close to 71.5%, and it must be noted

that it was obtained based only on the Index2 feature, with a simple algorithm and without any parameter optimization. Due to this, we are confident that the addition of our two multichannel ratio indices would lead to an improvement in all state-of-the-art drowsiness detection systems that use frequency-domain features. Again, our aim was not to create the best possible drowsiness detection model but to prove that the novel multichannel indices are better than the existing single-channel frequency-domain features.

The Equations (4) and (6) for these multichannel ratio indices, obtained after optimizing the parameters with the optimization algorithm, suggest that alpha and delta are two of the most important frequency power bands for drowsiness detection. Equation (6) suggests that delta power in the frontal region describes drowsiness better than in the central region, while alpha power in the occipital and central regions describes drowsiness better than in the frontal region.

These results are consistent with several previous research papers on drowsiness detection that reported the importance of increasing alpha power [22,35,61,62]. Delta power is usually only present in deep sleep stages [36], so some researchers studying drowsiness do not include delta in their research [63]. However, there is still much research that includes delta power. The increase in delta power is considered to be an indicator of drowsiness [4]. Our research found that theta and beta powers are not as good drowsiness indicators as alpha and delta powers, while many other research studies disagree. A decrease in beta power was found to be an indicator of drowsiness in [4,36,64,65] and an increase in theta power was found to be an indicator of drowsiness in [27,34,61,62,65]. Wang et al. [38], in their study of microsleep events, found that alpha and delta rhythms characterize microsleep events. As mentioned earlier, there is an inconsistency in terminology, and some researchers consider sleep stage S1 as drowsiness [5–9], while Johns [11] considers it equivalent to microsleep events in the driving scenario. We used the data from sleep stage S1 and referred to it in this research as drowsiness. Since our results suggest that delta and alpha are the most significant for the detection of drowsiness, as in the work of Wang et al. [38] on microsleep events, our work suggests that sleep stage S1 may be more similar to microsleep events than to drowsiness, but further research is needed to support this as a fact.

Apart from the indication that drowsiness is closely related to microsleep events, it may also be closely linked to driver fatigue. Some researchers even use the term fatigue as a synonym for drowsiness [66]. Fatigue is a consequence of prolonged physical or mental activity [67] and can lead to drowsiness [68]. Normally, rest and inactivity relieve fatigue, however, they exacerbate drowsiness [69]. Lal and Craig [70] found that delta and theta band activities increase significantly during fatigue. Craig et al. [71] reported significant changes in the alpha 1, alpha 2, theta and beta bands, while they did not find any significant changes in the delta band when observing driver fatigue. Simon et al. [68] report that alpha band power and alpha spindles correlate with fatigue.

These three research papers [68,70,71] all use visual inspection to define the ground truth of fatigue. This approach to defining the ground truth is prone to subjectivity. A similar problem occurs when drowsiness is defined by using subjective drowsiness ratings, such as the Karolinska sleepiness scale [72].

Driver drowsiness, driver fatigue and microsleep events are defined as different internal states of the brain, but show similar behavior when observing the features obtained from the EEG. Possible explanations could be that fatigue, drowsiness and microsleep have a similar effect on brain functions and cause the driver's inability to function at the desired level. Most researchers of these three driver states only use frequency-domain features, while there are a number of other features (nonlinear features [30], spatiotemporal features [31] and entropies [32]) that could be used. Further studies with these features could find some features of the EEG signal that distinguish drowsiness, fatigue and microsleep. Distinguishing features of these three brain states could lead to the exact definitions of these terms. Precise and standardized definitions of fatigue, drowsiness and microsleep would help researchers to compare their work more easily.

Figure 4 shows that the proposed procedure for creating the novel ratio indices has succeeded in creating step-like indices for a given subject. In addition to Index1 and Index2, which show desirable behavior, the indices (θ + α)/β and θ/β show similar, favorable behavior for a few channels. Figure 5 shows a comparison of novel ratio indices with the best and the worst channel for γ/δ and (γ + β)/(δ + α) single-channel indices for subject tr04-0726. Index1, index2, θ/α [36], (θ + α)/β, (θ + α)/(α + β) [37] and γ/δ [10] show similar behavior. These indices seem to detect drowsiness well, but with about a 50 epochs delay. Since several different single-channel indices that were previously shown to correlate with drowsiness together with two novel multichannel indices show the same delay in detecting drowsiness, this suggests that there may be shortcomings in the labeling of the initial signals. The manual for scoring sleep [42] provides guidelines for labeling, and it may be possible that the professionals who labeled the sleep signals labeled an approximate time of transition from the W state to the S1 state, as it is known that labeling any kind of several-hour-long EEG signal is a very tedious, hard and time-consuming job [73]. For this reason, the loose transition window is applied in the optimization algorithm, as described in Section 2.3.

The main shortcoming in applying our approach is the need to place six EEG electrodes on the driver's scalp while driving. Apart from being intrusive, there is also a problem with noise in real-world applications that cannot be neutralized with the current state-ofthe-art filter technology. All electrophysiological signals measured with wearable devices have a similar problem with intrusiveness and noise. ECG measurements, for example, are somewhat less susceptible to noise than EEG. Several recent works have shown that ECG can be used as a good predictor of sleep stages based on deep learning classifiers. Sun et al. [74] combined ECG with abdominal respiration and obtained a kappa value of 0.585, while Sridhar et al. [75] obtained a kappa value of 0.66. Combining EEG and ECG measurements has also been proposed in the context of driver drowsiness detection under simulator-based laboratory conditions [76]. Despite the problems of intrusiveness and noise susceptibility, research based on the electrophysiological signals brings a shift towards a precise definition of drowsiness. Once there is an exact definition of drowsiness or at least guidelines and manuals that accurately describe drowsiness (similar to the manuals for evaluating sleep stages), a big step will be taken to solve the problem of early detection of drowsiness [77]. It is doubtful that a wearable system based on electrophysiological signals will ever be widely used in real-world driving, but they still need to be developed. In our opinion, such wearable electrophysiological devices are more likely to be used for calibration/validation of non-intrusive systems (such as the driving performance-based or video-based systems) in controlled/simulated driving scenarios. In such scenarios, it is possible to control ambient noise, leading to a reduction in the effects of noise sensitivity.

An additional limitation of this work is that we were able to download data from 393 of 992 subjects completely, and only 28 of these 393 subjects were included in our study due to the inclusion condition that we described in Section 2.1 "Dataset, Preprocessing and Feature Extraction". Although it is a small subset of data, with the use of 12 subjects as a test set, we showed that the dataset is large enough to provide a good generalization (as seen in Tables 3, 5 and 7). In a recent review paper about state-of-the-art drowsiness detection [33], the authors reviewed 39 papers, and the average number of subjects in the included works is 23.5, which also indicates that our number of subjects included in the current study (28) is acceptable.

**Figure 5.** The comparison of the two novel multichannel indices with the best and the worst channel for γ/δ and (γ + β)/(δ + α) single-channel indices for subject tr04-0726. The white part of the diagram represents the awake state, while the yellow part of the diagram represents the stage 1 of sleep, i.e., the drowsiness state.

#### **5. Conclusions**

This paper presented two novel multichannel ratio indices for the detection of drowsiness obtained by multi-objective optimization based on evolutionary computation. The results suggested that alpha and delta powers are good drowsiness indicators. The novel multichannel ratio indices were compared with seven existing single-channel ratio indices and showed better results in detecting drowsiness measured with precision and in the overall classification accuracy of both states using several machine learning algorithms. Our work suggests that a more precise definition of drowsiness is needed, and that accurate early detection of drowsiness should be based on multichannel frequency-domain ratio indices. The multichannel features also reduced the time needed for classification. The process of obtaining these indices by using a multi-objective optimization algorithm can also be applied to other areas of EEG signal analysis.

Research such as this, together with research on small hardware for physiologybased drowsiness detection, can eventually lead to an easy-to-use, non-intrusive device that reliably detects drowsiness. In addition, research on a reliable and standardized definition of drowsiness is needed and it would lead to improvements in the field of drowsiness detection.

**Author Contributions:** Conceptualization, I.S., N.F. and A.J.; methodology, I.S., N.F. and A.J.; software, I.S. and N.F.; validation, I.S., N.F., M.C. and A.J.; formal analysis, I.S., N.F. and A.J.; investigation, I.S.; resources, I.S. and N.F.; data curation, I.S.; writing—original draft preparation, I.S. and N.F.; writing—review and editing, I.S., N.F., M.C. and A.J.; visualization, I.S.; supervision, A.J.; project administration, M.C.; funding acquisition, M.C. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work has been carried out within the project "Research and development of the system for driver drowsiness and distraction identification—DFDM" (KK.01.2.1.01.0136), funded by the European Regional Development Fund in the Republic of Croatia under the Operational Programme Competitiveness and Cohesion 2014–2020.

**Institutional Review Board Statement:** The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Ethics Committee of the University of Zagreb, Faculty of Electrical Engineering and Computing (on 26 September 2020).

**Informed Consent Statement:** All the available information regarding patients is available from the PhysioNet portal [47], from the 2018 PhysioNet computing in cardiology challenge [48], at: https://physionet.org/content/challenge-2018/1.0.0/ (accessed on 24 September 2021).

**Data Availability Statement:** The data used in this paper were obtained from the PhysioNet portal [47], from the 2018 PhysioNet computing in cardiology challenge [48], at: https://physionet.org/ content/challenge-2018/1.0.0/ (accessed on 24 September 2021).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Fusion Method to Estimate Heart Rate from Facial Videos Based on RPPG and RBCG**

**Hyunwoo Lee 1, Ayoung Cho <sup>1</sup> and Mincheol Whang 2,\***


**Abstract:** Remote sensing of vital signs has been developed to improve the measurement environment by using a camera without a skin-contact sensor. The camera-based method is based on two concepts, namely color and motion. The color-based method, remote photoplethysmography (RPPG), measures the color variation of the face generated by reflectance of blood, whereas the motion-based method, remote ballistocardiography (RBCG), measures the subtle motion of the head generated by heartbeat. The main challenge of remote sensing is overcoming the noise of illumination variance and motion artifacts. The studies on remote sensing have focused on the blind source separation (BSS) method for RGB colors or motions of multiple facial points to overcome the noise. However, they have still been limited in their real-world applications. This study hypothesized that BSS-based combining of colors and the motions can improve the accuracy and feasibility of remote sensing in daily life. Thus, this study proposed a fusion method to estimate heart rate based on RPPG and RBCG by the BSS methods such as ensemble averaging (EA), principal component analysis (PCA), and independent component analysis (ICA). The proposed method was verified by comparing it with previous RPPG and RBCG from three datasets according to illumination variance and motion artifacts. The three main contributions of this study are as follows: (1) the proposed method based on RPPG and RBCG improved the remote sensing with the benefits of each measurement; (2) the proposed method was demonstrated by comparing it with previous methods; and (3) the proposed method was tested in various measurement conditions for more practical applications.

**Keywords:** heart rate measurement; remote HR; remote PPG; remote BCG; blind source separation

#### **1. Introduction**

Remote sensing of vital signs has been studied to improve the measurement burden. In particular, camera-based methods allow one to measure heart rate using a smartphone both anywhere and anytime. Despite the potential of camera-based methods, they have still been limited in their applications in daily life due to the noise generated by illumination variance and motion artifacts [1].

The camera-based method is based on two concepts, i.e., color and motion. First, the color-based method uses the principle of photoplethysmography (PPG), and thus it is called remote PPG (RPPG). RPPG measures the color variation generated by reflectance of blood due to the cardiac cycle from the heart and the head through the carotid arteries [2]. The color variance is most clearly measured at the green wavelength (i.e., 510~560 nm) since the pulsations of the arteries are able to be monitored through elastic-mechanical interaction of deep arteries with the superficial dermis [3]. However, the illumination variance of various frequency ranges that occur in daily life distorts the color variance caused by the heartbeat, so it is difficult to remove the noise using only green color. The studies on RPPG have focused on the blind source separation (BSS) method from the RGB colors to overcome the noise of illumination variance. Poh et al. [4] first proposed RPPG, which extracts the color variation from the RGB colors and estimates the plethysmographic signal by using the BSS

**Citation:** Lee, H.; Cho, A.; Whang, M. Fusion Method to Estimate Heart Rate from Facial Videos Based on RPPG and RBCG. *Sensors* **2021**, *21*, 6764. https://doi.org/10.3390/ s21206764

Academic Editor: Yvonne Tran

Received: 17 August 2021 Accepted: 7 October 2021 Published: 12 October 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

method based on independent component analysis (ICA). Xu et al. [5] reduced the noise of illumination variance using the partial least squares (PLS) method and multivariate empirical mode decomposition (MEMD). They evaluated their method in illumination changing conditions and showed more improved accuracy than the ICA-based method. Zhang [6] applied the separation of the luminance on RPPG by converting from RGB color space to LAB color space. They demonstrated that the luminance separation is resistant to natural, motion-induced, and artificial illumination variances. Although RPPG has been improved to overcome the noise of illumination variance, it is still difficult to apply in daily life since all RGB colors are affected by lighting at the same time and most proposed noise removal methods were validated in limited scenarios.

Second, the motion-based method uses the principle of ballistocardiography (BCG), so it is called remote BCG (RBCG). The RBCG measures the subtle motion generated by the contraction of the heart and the ejection of the blood from the ventricles into the vasculature [7]. The subtle motion is sensitive to measurement since its displacement is small at about 0.5 mm [8]. Thus, the major motion such as head movement and facial expression makes the measurement of subtle motion difficult. The studies on RBCG have developed by applying the BSS method to the subtle motions extracted from as many feature points as possible. Balakrishnan et al. [8] first proposed RBCG which extracts the subtle motions from feature points of the forehead and nose regions and applied the BSS method based on principal component analysis (PCA). Shan et al. [9] extracted the subtle motion from one feature point of the forehead region and applied the ICA on the displacement calculated in both the x-axis and y-axis. Haque et al. [10] increased the number of feature points by combining good features to track (GFTT) with facial landmark detection based on the supervised descent method (SDM). They demonstrated that increasing the number of feature points overcomes the tracking noise for extracting the subtle motion. Hassan et al. [11] improved the subtle motion extraction by the skin colorbased foreground segmentation and the motion artifact removal by the BSS method based on singular value decomposition (SVD). Despite the improvement of RBCG, it is difficult to apply this method in daily life due to the limitation of computational performance in real time on the measurement device and of noise removal on irregular motion artifacts.

Recent studies on remote sensing have focused on the combining of RPPG with RBCG. Shao et al. [12] developed the simultaneous monitoring of RPPG and RBCG. They demonstrated that it potentially provides a low-cost solution based on a single camera in daily life, but did not consider combining them together. Liu et al. [13] combined RPPG with BCG measured using an additional motion sensor, not a camera. They corrected the tracking noise caused by motion artifacts in RPPG by referring to BCG. They only used BCG to remove motion artifacts in RPPG and did not consider their combination. Thus, it is still necessary to develop the fusion method of RPPG and RBCG by considering their interaction effect.

In summary, the BSS method has been developed to overcome the noise of illumination variance and motion artifacts in both RPPG and RBCG. Also, the fusion method of RPPG and RBCG need to be further developed by considering their interaction effect. Thus, this study hypothesized that the BSS-based combining of RPPG and RBCG can improve the accuracy and feasibility of remote sensing in daily life. This study examined the BSS methods using ensemble averaging (EA), PCA, and ICA to estimate heart rate based on combining RPPG with RBCG. The proposed method was compared with the previous methods, which only used the color variation (i.e., RPPG) or the subtle motion (i.e., RBCG). The contributions of this study can be summarized as follows: (1) the proposed method based on RPPG and RBCG improved the remote sensing with the benefits of each measurement; (2) the proposed method was demonstrated by comparing it with previous methods; and (3) the proposed method was tested in various measurement conditions for a more practical application.

#### **2. Proposed Method**

This study proposed a fusion method to estimate heart rate based on RPPG and RBCG by their ensemble averaging. Figure 1 depicts the procedure of the proposed method. First, the face was detected and tracked from the consequence frame of facial video. Then, photoplethysmographic and ballistocardiographic signals were extracted from the face by RPPG and RBCG, respectively. These signals were combined with each other to minimize noise and to maximize cardiac components. Finally, the heart rate was estimated from the combined signal in the frequency domain.

**Figure 1.** Overview of proposed method.

#### *2.1. Face Detection and Tracking*

To extract color variation and subtle motion, the face was detected and tracked from facial video. This study focused on low misdetection and fast inference time for practical application. The Viola-Jones algorithm [14] is a basic and widely known face detection using AdaBoost with Haar features. AdaBoost selected the candidates from an image using simple Haar features and detected the face from the candidates using complex Haar features to improve inference time. It is easy to use since it is implemented in the OpenCV library [15], but it frequently mis-detected the face. Histogram of oriented gradients (HOG) algorithm [16] is also a representative face detection algorithm implemented in the DLIB library [17]. It computed the spatial gradients from an image and detected the face using histogram of gradient orientation. Although it has less misdetection than the Viola-Jones algorithm, it can detect only the frontal face and has a slow inference time.

Recently, face detection methods using deep learning have been proposed and have shown more enhanced performance than the Viola-Jones and HOG algorithms. Although their algorithm was complex and difficult to implement, they are easier to use than before by development of open-source framework such as Tensorflow [18]. Thus, this study employed the single shot detector (SSD) [19] with ResNet [20] trained by WIDER FACE dataset [21]. The ResNet extracted a high dimensional feature map which has facial features such as contrast or facial contour from an image. The SSD detected the face from the feature map by extracting image pyramids of various sizes. It has been implemented and available in OpenCV library [22] recently.

In addition, face tracking is important to extract the same facial region from successive frames. The facial region was divided into sub-regions by cropping the middle 50% of the width and top 20% of the height (i.e., forehead) and the middle 50% of the width and middle 25% of height (i.e., nose). Then, 80 facial points were determined within the sub-regions by dividing the forehead region into 32 cells and the nose region into 48 cells, respectively. The facial points were tracked on xy-coordinates by the Kanade-Lucas-Tomasi (KLT) tracker [23]. By empirically considering tracking accuracy and computing cost, the window size of the small subarea cells was determined as the detected face size divided by 10. If the tracked facial points suddenly moves more than 10 pixels, the facial points were re-defined in the forehead and nose regions as described above. Then, the similarity transformation

matrix was extracted from locations of the facial points on the previous and next frames to estimate the transformation of the facial region. The similarity transformation matrix hypothesized three characteristics (i.e., translation, rotation, and scaling). Finally, the facial region on next frame was tracked from previous frame by the similarity transformation matrix. Figure 2 shows the procedure of face detection and tracking.

**Figure 2.** Procedure of face detection and tracking.

#### *2.2. Photoplethysmographic Signal Extraction*

Color variation caused by the heartbeat was prominent in the cheek area [24], so that the photoplethysmographic signal was extracted by the RGB spectrums on the middle 50% of the width and middle 25% of height (i.e., nose), on the left 20–35% of width and top 45–70% of height (i.e., left cheek), 220–35% of width and top 45–50% of height (i.e., right cheek). Each RGB signal was normalized by subtracting its mean since the mean indicates melanin components (i.e., skin color) [25]. Then, RBG signals were combined with each other based on pulse blood-volume vector (PBV) modeling [26] for each face area. The combined signals were filtered by a second order Butterworth bandpass filter with a cut-off of 0.75–2.5 Hz corresponding to 45–150 bpm. Finally, the photoplethysmographic signal was extracted by applying the ICA on the filtered signals and selecting the signal with the highest signal to noise ratio (SNR) from three components. The SNR was calculated from the frequency domain of each component as:

$$SNR = \max(PS) / (\sum PS - \max(PS)),\tag{1}$$

where *SNR* is a signal to noise ratio and *PS* is a power spectrum of the signal. Figure 3 depicts the procedure of photoplethysmographic signal extraction.

**Figure 3.** Procedure of photoplethysmographic signal extraction.

#### *2.3. Ballistocardiographic Signal Extraction*

Ballistocardiographic head movements were generated up and down by the heartbeat so that the ballistocardiographic signal was extracted by the y-coordinate of each facial point on successive frames. In this study, the 80 ballistocardiographic signals were extracted from the 80 facial points and were normalized by subtracting its mean to make the unit the same as RPPG. The normalized signals were filtered by a second order Butterworth bandpass filter with a cut-off of 0.75–2.5 Hz. Voluntary head movements distorted the

signals to have a large amplitude, so that the signals were corrected as the mean if the amplitude is larger than twice the standard deviation. Also, facial expressions distorted the signals extracted from specific muscles (e.g., smile moves the cheek and eye muscles). Thus, the SNR was calculated from each signal and the signals which have lower SNR than mean SNR were removed in the next step. Finally, the ballistocardiographic signal was extracted by applying the PCA on the filtered signals and selecting the signal with the highest SNR from five components. Figure 4 shows the procedure of ballistocardiographic signal extraction.

**Figure 4.** Procedure of ballistocardiographic signal extraction.

#### *2.4. Signial Combining*

This study hypothesized that RPPG are robust to the motion artifacts and sensitive to the illumination variance, whereas RBCG are robust to the illumination variance and sensitive to the motion artifacts. Thus, the photoplethysmographic signal and the ballistocardiographic signal had combined with each other to improve robustness to both the illumination variance and the motion artifacts. This study tested three BSS methods (i.e., EA, PCA, and ICA) to combine RPPG and RBCG. EA was hypothesized to reduce the random noise by averaging repetitive signals and has verified its enhancement as a fusion method in a previous BCG study [27]. EA was calculated as:

$$EA = (RPPG + RBCG)/2,\tag{2}$$

where *RPPG* is a signal extracted by *RPPG* and *RBCG* is a signal extracted by *RBCG*. In addition, PCA and ICA are the representative BSS methods to reduce the various noise. PCA and ICA were applied on PPG signal and BCG signal extracted from Sections 2.2 and 2.3 and two components were extracted because the BSS applied on two signals. Then, the combined signal was selected from two components by highest SNR, respectively. Figure 5 depicts the procedure of signal combining.

#### *2.5. Heart Rate Estimation*

The heart rate estimated from the frequency domain of the signal. The power spectrum was converted from the combined signal by fast Fourier transform (FFT). This study extracted the band of power spectrum between the ranges of 0.75 and 2.5 Hz corresponding to the ranges of 45 and 150 bpm. The dominant frequency was identified with the highest power from the band of power spectrum. The heart rate was calculated by multiplying by the dominant frequency and 60 as:

$$HR = 60 \times freq\_{\prime} \tag{3}$$

where *HR* is a heart rate and *freq* is a dominant frequency. The heart rate for RPPG (i.e., *HRRPPG*) was estimated from the PPG signal extracted by Section 2.2, whereas the heart rate for RBCG (i.e., *HRRBCG*) was estimated from the BCG signal extracted by Section 2.3. The heart rates for proposed methods (i.e., *HREA*, *HRPCA and HRICA*) were estimated from the combined signals based on RPPG and RBCG extracted Section 2.4.

**Figure 5.** Procedure of signal combining.

#### **3. Experiments**

The experiments were conducted to evaluate the proposed method according to three measurement conditions in this study. The measurement conditions were determined by illumination and motion artifacts (i.e., normal, facial expressions, and human computer interactions). The experimental procedure was approved by the Institutional Review Board of the Sangmyung University, Seoul, Korea (BE2018-35).

#### *3.1. Experiment 1: Normal*

This experiment is to collect the normal dataset without illumination variance and motion artifacts. The participants consisted of 20 persons (12 males) and were asked to sit 1 m away from a camera for 3 min with a stationary state. The facial video was recorded by an RGB webcam (Logitech Webcam C270) with 640 × 360 resolution at 30 fps. Also, the ECG signal was simultaneously measured by an ECG measurement system with Lead-I (BIOPAC Systems Inc., Goleta, CA, USA) at a sampling rate of 500 Hz. It was employed as a ground-truth for the evaluation of the proposed methods.

#### *3.2. Experiment 2: Facial Expressions*

This experiment is to collect the dataset with facial expressions. The 20 persons who participated in experiment 1 were also asked to sit in front of a camera for 3 min with a stationary state. They then followed the six basic facial expressions (i.e., happiness, sadness, surprise, anger, disgust, and fear), which were displayed on the monitor for 30 s in random order to minimize the ordering effect. The facial video and the ECG signal were recorded and measured at the same manner in experiment 1.

#### *3.3. Experiment 3: Human Computer Interactions*

This experiment is to collect the dataset including illumination variance and motion artifacts occurred in human computer interactions. The 17 persons (8 males) were participated and asked to watch the four videos to cause to some emotions. Then, they wrote their emotions on self-report by two-dimensional model [28]. While watching the videos and writing the self-report, they asked to freely express the facial expressions and move their head. Also, the illumination on the face was changed by reflecting the light from the video. The facial video was recorded by an RGB webcam (Logitech Webcam C270) with a 1920 × 1080 resolution at 30 fps. The ECG signal was also measured at the same manner in experiment 1.

#### **4. Results**

This study demonstrated the improvement of proposed fusion methods by comparing them with previous RPPG and RBCG methods. The heart rate of ECG was calculated by the QRS detection algorithm [29] and determined as the ground-truth for evaluation of the proposed methods. The proposed method was verified by calculating the four metrics and by representing the one plot as follows: mean absolute error (MAE), standard deviation of absolute error (SDAE), root mean squared error (RMSE), Pearson's correlation coefficient (CC), and the Bland-Altman plot. MAE, SDAE, and RMSE describes the difference of mean heart rate and variation, respectively. CC determines the statistical similarity of heart rates over time, so that the coefficient value indicates a strong positive similarity if it is approaching to the one. The Bland-Altman plot represents graphically the statistical differences by assigning the mean (x-axis) and difference (y-axis) between the two measurements. The line on the plot is indicated the 95% limits of an agreement based on mean difference and the ±1.96 standard deviation of the differences. The statistical parameters were calculated from the heart rates for RPPG (i.e., *HRRPPG*), RBCG (i.e., *HRRPPG*), and the proposed methods (i.e., *HREA*, *HRPCA and HRICA*) by comparing them with ECG.

#### *4.1. Experiment 1: Normal*

Table 1 shows the estimation of heart rates from the normal dataset using RPPG, RBCG, and the proposed fusion methods (i.e., EA, PCA, and ICA). All fusion methods were more accurate than RPPG and RBCG. In addition, the ICA-based fusion method showed the lowest errors in the normal dataset (MAE = 1.04, SDAE = 0.91, RMSE = 1.39, CC = 0.999).

**Table 1.** Estimation of heart rates from the normal dataset without illumination variance and motion artifacts.


MAE, mean absolute error; SDAE, standard deviation of absolute error; RMSE, root mean square error; CC, Pearson's correlation coefficient. Two asterisk represents significant correlation levels at *p*-value < 0.01. The lowest error and highest correlation values are bolded.

The Bland-Altman plots of estimated heart rates from the normal dataset without illumination variance and motion artifacts using RPPG, RBCG, and the proposed fusion methods are shown in Figure 6. The mean errors were 2.19 with 95% limits of agreement (LOA) in −3.43 to 7.81 (RPPG), −1.93 with 95% LOA in −8.35 to 4.49 (RBCG), 0.37 with 95% LOA in −1.19 to 1.93 (EA), 0.34 with 95% LOA in −1.26 to 1.94 (PCA), and 0.18 with 95% LOA in −0.63 to 0.98 (ICA). The heart rates estimated using the ICA-based fusion method showed the lowest mean difference and variances.

#### *4.2. Experiment 2: Facial Expressions*

The heart rates were estimated from the dataset with facial expressions using RPPG, RBCG, and the proposed fusion methods as shown in Table 2. All fusion methods showed lower errors than RPPG and RBCG. In addition, the errors of the PCA-based fusion method were lower than one of other methods (MAE = 2.76, SDAE = 2.34, RMSE = 3.23, CC = 0.968).

Figure 7 shows the Bland-Altman plots of the heart rates estimated from the dataset with facial expressions using RPPG, RBCG, and the proposed fusion methods. The mean errors were 5.54 with 95% limits of agreement (LOA) in −6.39 to 17.48 (RPPG), −2.48 with 95% LOA in −8.56 to 3.59 (RBCG), 2.48 with 95% LOA in −4.59 to 9.55 (EA), 2.20 with 95% LOA in −4.04 to 8.45 (PCA), and 1.87 with 95% LOA in −4.12 to 7.86 (ICA). The heart rates estimated using the PCA-based fusion method showed the lowest mean difference and variances.

**Table 2.** Estimation of heart rates from the dataset with facial expressions.


MAE, mean absolute error; SDAE, standard deviation of absolute error; RMSE, root mean square error; CC, Pearson's correlation coefficient. Two asterisk represents significant correlation levels at *p*-value < 0.01. The lowest error and highest correlation values are bolded.

**Figure 6.** Bland-Altman plots of heart rates estimated from the normal dataset without illumination variance and motion artifacts using interactions using (**a**) RPPG, (**b**) RBCG, (**c**) EA, (**d**) PCA, and (**e**) ICA. The lines are the mean errors and 95% LOA.

#### *4.3. Experiment 3: Human Computer Interactions*

Table 3 shows the estimation of heart rates from the dataset in human computer interactions using RPPG, RBCG, and the proposed fusion methods. All fusion methods showed lower errors than RPPG and RBCG. Unlike the other datasets, the EA-based fusion method showed lowest errors in this dataset (MAE = 4.79, SDAE = 2.13, RMSE = 5.181, CC = 0.629).

Figure 8 shows the Bland-Altman plots of the heart rates estimated from the dataset in human computer interactions using RPPG, RBCG, and the proposed fusion methods. The mean errors were 6.05 with 95% limits of agreement (LOA) in −9.71 to 21.81 (RPPG), −12.35 with 95% LOA in −32.88 to 8.18 (RBCG), −0.09 with 95% LOA in −15.49 to 15.32 (EA), −0.49 with 95% LOA in −15.14 to 15.16 (PCA), and 0.25 with 95% LOA in −16.15 to 16.65 (ICA). The heart rates estimated using the EA-based fusion method showed the lowest mean difference and variances.


**Table 3.** Estimation of heart rates from the dataset in human computer interactions.

MAE, mean absolute error; SDAE, standard deviation of absolute error; RMSE, root mean square error; CC, Pearson's correlation coefficient. Two asterisk represents significant correlation levels at *p*-value < 0.01. The lowest error and highest correlation values are bolded.

**Figure 7.** Bland-Altman plots of heart rates estimated from the dataset with facial ex-pressions interactions using (**a**) RPPG, (**b**) RBCG, (**c**) EA, (**d**) PCA, and (**e**) ICA. The lines are the mean errors and 95% LOA.

**Figure 8.** Bland-Altman plots of heart rates estimated from the dataset in human computer interactions using (**a**) RPPG, (**b**) RBCG, (**c**) EA, (**d**) PCA, and (**e**) ICA. The lines are the mean errors and 95% LOA.

#### **5. Discussion**

In this study, the fusion method based on RPPG and RBCG was developed to enhance the heart rate estimation using EA, PCA, and ICA. This study evaluated the proposed method on three datasets according to illumination variance and motion artifacts as follows: (1) normal, (2) facial expressions, and (3) human computer interactions. The proposed method was more accurate than the previous RPPG and RBCG in all datasets. This result indicated that with the advancement of fusion methods based on RPPG and RBCG, ECG could eventually be replaced by remote sensing in daily life. Thus, this study strongly encourages the fusion method based on RPPG and RBCG as a requirement to estimate heart rate using a camera more accurately.

Overall, this study has drawn four significant findings. First, PCA and ICA were better than EA in the datasets including less noise such as normal and facial expressions. It indicated that the BSS algorithms can be improved if RPPG and RBCG are enhanced by reducing the noise. On the other hand, EA was better than PCA and ICA in daily life not yet.

Second, experiment 3 evaluated the proposed method and the previous RPPG and RBCG in human computer interactions environment which is similar to daily life. As shown in the Bland-Altman plots (Figure 7), the previous RPPG and RBCG estimated the heart rate higher (LOA in −9.71 to 21.81) and lower (LOA in −32.88 to 8.18) than the ground-truth (i.e., ECG), respectively. It indicated that RPPG has high frequency noise whereas RBCG has low frequency noise. The illumination variance has high frequency because it appears at a high rate that humans cannot perceive. On the other hand, the motion artifacts have low frequency because they occur instantaneously and briefly. RPPG and RBCG are sensitive to the illumination variance and motion artifacts, respectively, so that the result is acceptable. Note that the proposed method reduced both illumination variance and motion artifacts (LOA in −15.49 to 15.32) since it has low frequency and high frequency noise evenly and little.

Third, this study showed significant results compared to other fusion methods. Particularly, Liu et al. [13] developed a fusion method, combined RPPG with BCG measured using an additional motion sensor, and showed MAE of 6.20 bpm in motion state. Although the measurement condition is different, our proposed method showed MAE of 4.79 bpm in experiment 3 including illumination variance and motion artifacts. Note that it indicates possibility of combining PPG and BCG with only a camera without an additional sensor.

Finally, this study presented the fusion method based on RPPG and RBCG to enhance the heart rate estimation using a camera. Although it should be more improved, a novel approach was developed for the possibility of practical use of remote sensing in daily life. Many studies have been presented before, but it has not been developed as a product for real users due to restrictions on the use environment. The product for real users should be developed and tested to minimize the measurement burden and to improve the use environment, so that it can be used in practical domains such as self-driving cars or non-face-to-face communication.

#### **6. Conclusions**

This study developed a fusion method to estimate heart rate from facial videos based on RPPG and RBCG. The proposed methods using EA, PCA, and ICA had compared them with the previous RPPG and RBCG. As a result, the proposed methods showed enhanced accuracy from three datasets according to illumination variance and motion artifacts. The findings are a significant step toward ensuring the enhanced development of RPPG and RBCG. This study is expected to contribute to enhanced heart rate measurement by overcoming noise of illumination variance and motion artifacts and consequently improve the possibility of applications of remote sensing in daily life.

**Author Contributions:** H.L. and M.W. conceived and designed the experiments; H.L. and A.C. performed the experiments; H.L. analyzed the data; H.L. and M.W. wrote the paper. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2021R1I1A1A01052752) and by Electronics and Telecommunications Research Institute (ETRI) grant funded by the Korean government. [21ZS1100, Core Technology Research for Self-Improving Integrated Artificial Intelligence System].

**Institutional Review Board Statement:** This study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Institutional Review Board of the Sangmyung University, Seoul, Korea (BE2018-35).

**Informed Consent Statement:** Written informed consent has been obtained from the patients to publish this paper.

**Data Availability Statement:** The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Non-Invasive Hemodynamics Monitoring System Based on Electrocardiography via Deep Convolutional Autoencoder**

**Muammar Sadrawi 1, Yin-Tsong Lin 2, Chien-Hung Lin 2, Bhekumuzi Mathunjwa 1, Ho-Tsung Hsin 1,3, Shou-Zen Fan 4, Maysam F. Abbod <sup>5</sup> and Jiann-Shing Shieh 1,\***


**Abstract:** This study evaluates cardiovascular and cerebral hemodynamics systems by only using non-invasive electrocardiography (ECG) signals. The Massachusetts General Hospital/Marquette Foundation (MGH/MF) and Cerebral Hemodynamic Autoregulatory Information System Database (CHARIS DB) from the PhysioNet database are used for cardiovascular and cerebral hemodynamics, respectively. For cardiovascular hemodynamics, the ECG is used for generating the arterial blood pressure (ABP), central venous pressure (CVP), and pulmonary arterial pressure (PAP). Meanwhile, for cerebral hemodynamics, the ECG is utilized for the intracranial pressure (ICP) generator. A deep convolutional autoencoder system is applied for this study. The cross-validation method with Pearson's linear correlation (R), root mean squared error (RMSE), and mean absolute error (MAE) are measured for the evaluations. Initially, the ECG is used to generate the cardiovascular waveform. For the ABP system—the systolic blood pressure (SBP) and diastolic blood pressures (DBP)—the R evaluations are 0.894 ± 0.004 and 0.881 ± 0.005, respectively. The MAE evaluations for SBP and DBP are, respectively, 6.645 ± 0.353 mmHg and 3.210 ± 0.104 mmHg. Furthermore, for the PAP system—the systolic and diastolic pressures—the R evaluations are 0.864 ± 0.003 mmHg and 0.817 ± 0.006 mmHg, respectively. The MAE evaluations for systolic and diastolic pressures are, respectively, 3.847 ± 0.136 mmHg and 2.964 ± 0.181 mmHg. Meanwhile, the mean CVP evaluations are 0.916 ± 0.001, 2.220 ± 0.039 mmHg, and 1.329 ± 0.036 mmHg, respectively, for R, RMSE, and MAE. For the mean ICP evaluation in cerebral hemodynamics, the R and MAE evaluations are 0.914 ± 0.003 and 2.404 ± 0.043 mmHg, respectively. This study, as a proof of concept, concludes that the non-invasive cardiovascular and cerebral hemodynamics systems can be potentially investigated by only using the ECG signal.

**Keywords:** non-invasive system; hemodynamics; electrocardiography; arterial blood pressure; central venous pressure; pulmonary arterial pressure; intracranial pressure; deep convolutional autoencoder

#### **1. Introduction**

In the intensive care unit (ICU), the most precise health monitoring system is utilized to thoroughly observe the critically ill patients. Hemodynamics is the blood physical phenomena in the circulatory system and a fundamental standard utilized in the ICU. Arterial blood pressure (ABP), pulmonary arterial pressure (PAP), and central venous pressure (CVP) are hemodynamics measures related to the cardiovascular system. Meanwhile, the intracranial pressure (ICP) is applied pressure due to the fluid inside the skull, measured

**Citation:** Sadrawi, M.; Lin, Y.-T.; Lin, C.-H.; Mathunjwa, B.; Hsin, H.-T.; Fan, S.-Z.; Abbod, M.F.; Shieh, J.-S. Non-Invasive Hemodynamics Monitoring System Based on Electrocardiography via Deep Convolutional Autoencoder. *Sensors* **2021**, *21*, 6264. https://doi.org/ 10.3390/s21186264

Academic Editor: Biswanath Samanta

Received: 10 August 2021 Accepted: 15 September 2021 Published: 18 September 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

for the cerebral hemodynamics system evaluation. These measures are essential for the patients in the ICU.

For the cardiovascular hemodynamics, CVP evaluation is fundamental for heart failure patients. It is significant in measuring the cardiac preload and blood volume information. However, the measurement requires the central venous catheter, which is highly invasive. Meanwhile, another important investigation of cardiovascular hemodynamics is PAP, which is the driven force given by the heart to pump the blood from the heart to the lung. Elevated PAP is detected for pulmonary hypertension. This evaluation, similar to CVP, is highly invasive through right heart catheterization. Further, ABP is the least invasive system compared to CVP and PAP in the cardiovascular hemodynamics system. For the cerebral hemodynamics, ICP is critical to the brain condition. The elevated ICP can have a significant indication for traumatic brain injury (TBI), and hemorrhagic stroke patients [1–3]. However, besides their benchmark accuracy, these evaluations are highly invasive procedures and potentially lead to infection [4]. Therefore, these measurements are not suitable for a monitoring system.

Recently, non-invasive technologies have been developed for the monitoring of the cardiovascular and cerebral hemodynamics systems. Specifically, for the cardiovascular-based hemodynamics, the utilization of artificial intelligence (AI) with photoplethysmography (PPG) has been widely used for ABP evaluations non-invasively. A study was applied to a single PPG signal with artificial neural networks (ANN) to predict ABP [5]. Furthermore, electrocardiography (ECG) and PPG, with ANN and long short-term memory (LSTM) methods were applied for the evaluations [6]. Backpropagation with a genetic algorithm (GA) was also used for hemodynamics evaluation [7]. Meanwhile, another study utilized ballistocardiography (BCG) and ECG alongside the PPG with convolutional neural networks (CNN) and a gated recurrent units (GRU)-based technique [8]. Additionally, other studies investigated the continuous arterial blood pressure evaluation based on the single PPG signal using the LSTM network [9] and hybrid GA-based optimization with a convolutional autoencoder [10]. However, even though the PPG provides a well-classified result of the blood pressure estimation, the sensor is very sensitive to motion artifacts for vertical movement and several typical activities [11,12].

Several previous studies have also been conducted for non-invasive CVP and PAP measurements. An ultrasound-based system was developed for the non-invasive CVP without central venous access [13]. In addition, a linear regression method by utilizing ICU patient data was implemented for evaluating the CVP signal using echocardiography with comparison to right heart catheterization [14]. Another study also performed a linear regression method from heart failure patients. The patients were under right heart catheterization for the CVP evaluation. The inferior vena cava utilizing echocardiography was applied for the input signal. For the non-invasive PAP system, a study utilized electrical impedance tomography of the input of the system [15]. This previous study was initiated for healthy subjects. However, a single-lead ECG system is more suitable compared to these methods for an intensive monitoring system.

For the cerebral hemodynamics, several previous studies were investigated. Most of them used the ABP and cerebral blood flow velocity (CBFV). A study by Jaishankar et al. [2] utilized ABP and CBFV as inputs for interpreting the ICP and the frequencybased method was selected for the evaluation. Another study applying a Bayesian-based approach was conducted by Imaduddin et al. [3] from the patients with TBI, hydrocephalus, and hemorrhagic stroke, using ABP and CBFV signals in order to evaluate the ICP. The utilization of ABP makes these previous studies less invasive, but not fully non-invasive.

In general, ECG is the more regularly used measurement to evaluate the cardiovascular activities for a longer period. Most importantly, ECG is a non-invasive system. Furthermore, it has been fundamentally applied for the physiological signal evaluation of arrhythmia [16], anesthesia [17,18], and sleep-related evaluations [19–21].

Prior studies have investigated the interconnection between ECG alterations and cardiovascular hemodynamics. It was an indication of increased P wave incidence from the ECG that correlates to hypertension [22]. The P wave also appeared later from the ECG, especially when the diastolic blood pressure was greater than 120 mmHg in hypertensive patients compared to the normal subjects. Another P wave phenomenon was also revealed. A study depicted that the hypertensive patients have a wider P wave compared to the normal subjects [23]. Furthermore, a higher T wave with extended PR intervals was also seen on hypertensive patients [24]. In addition, ST segment depression [25] and higher QRS amplitude also exhibited the information of hypertension [26]. Meanwhile, the ECG can be potentially used for pulmonary hypertension evaluation [27,28]. Recently, a single ECG signal has been utilized to investigate the hypertension system [29,30]. As it can be seen, some previously conducted studies have morphologically enlightened some relationship between cardiac electricity and blood pressure.

On the other hand, an ECG also shows several promising results when evaluating the cerebrovascular hemodynamics. The brain–heart interaction studies revealed several relationships between these associating signals. ECG abnormalities—increased corrected QT (QTc) interval, much higher P wave amplitude, higher QRS amplitude, and longer ST segment—appeared in patients with head injury compared to healthy subjects [31,32]. Furthermore, another study also revealed that the more obvious the abnormalities, the more deteriorated the consciousness level of the patients. Meanwhile, subarachnoid hemorrhage condition (SAH) can be seen morphologically in T and R wave abnormalities [33]. Specifically, in head trauma, abnormalities are discovered from prolonged QTc [34]. Furthermore, this previous study also depicted that the more severe SAH, the more prolonged QTc. For intracranial investigation, morphologically, some changes appeared in the ECG—U wave, T wave, ST-T segment, QT interval, J wave [35,36]. In addition, ECG is considered as the secondary effect of traumatic brain injury (TBI) [37].

As previously mentioned, the evaluation of hemodynamics is significant, especially for cardiovascular- and cerebral-related conditions. However, most of the precise procedures are measured invasively. Therefore, the aim of this study is to generate cardiovascular and cerebral hemodynamics monitoring systems non-invasively using the ECG signal and a deep convolutional autoencoder system.

#### **2. Materials and Methods**

This study used two databases from PhysioNet [38]: Massachusetts General Hospital/Marquette Foundation (MGH/MF) Waveform Database [39] and Cerebral Hemodynamic Autoregulatory Information System Database (CHARIS DB) [40]. For the hemodynamics system, the shorter window size was evaluated, and the better prediction was obtained. This compensates for the rapid change in some circumstances. However, the ECG is unpractical to be analyzed in very short period. This is due to, for normal subjects, the QRS cycle being sometimes not fully formed within a too short period. Therefore, the 2 s window of signal was selected for the input and output system. It was then combined and randomly separated from all the patients for training and testing with no overlapping selection.

A deep convolutional autoencoder (DCAE) system was used for the signal generation model. The DCAE model is a modified model from the original U-Net model used in a biomedical image generator segmentation system [41]. This modified model deploys a multi-atrous U-Net based deep convolutional autoencoder (MA-UDCAE) system, which was originally applied by [10]. Figures 1 and 2 show the MA-UDCAE model structures for both cardiovascular and cerebral hemodynamics systems.

Figure 1 shows the convolutional autoencoder system applied to the cardiovascular hemodynamics system. For the first half, the encoder system decreases the shape of the layer. Meanwhile, the decoder increases the shape of the layer. This structure consists of several layers: convolution layer (CV), down pooling layer (DP), up sampling layer (UP), and merging layer (ME). Initially, the input of this system is the ECG. The next layer, the first convolution layer (CV\_01) with multi dilation is applied. The next layer of this multi-dilation layer is merged into ME\_01 followed by the down pooling (DP\_01) layer. For

the down pooling, this study utilizes the max pooling system. The second convolution layer (CV\_02) is used and merged into the ME\_02 layer. Later, the ME\_02 layer is down pooled into DP\_02. This block—the convolution layer, concatenation, and down pooling—is replicated until the 4th layer. For the decoder, this system is identical to the encoder system. However, the down sampling system is switched into the up sampling. Furthermore, the U-Net based system is also applied into this system. A layer in encoder is merged with the decoder layer. There are four merging layers: DP\_03-ME\_07, DP\_02-ME\_09, DP\_01-ME\_11, and input layer ME\_14. Finally, several convolution layers are applied before the output layer, which has three channels: ABP, CVP, and PAP signals.

**Figure 1.** Cardiovascular hemodynamics MA-UDCAE model structure. Note: CV: convolution layer; ME: merge layer; DP: down pooling layer; UP: up sampling layer.

Figure 2 shows the cerebral hemodynamics system, which is identical to Figure 1. However, this figure has two merged layers. Furthermore, this system only has a single output channel—intracranial pressure signal (ICP). Initially for the encoder, the ECG input processed by the multi-altrous convolution system (CV\_01), followed by the merging layer (ME\_01), and a down pooling layer (DP\_01). The next block is identical—the convolutional layer is followed by the concatenating layer and down pooling layer. For the decoder block, the first up sampling layer (UP\_01) is also processed by the convolutional layer (CV\_04) and concatenating layer (ME\_04). Moreover, in this decoder system, the ME\_02 is merged with ME\_04 to form the ME\_05 layer, similar to ME\_07 by combining between ME\_01 and ME\_06 layers. Finally, the last layer is the ICP waveform.

**Figure 2.** Cerebral hemodynamics MA-UDCAE model structure. Note: CV: convolution layer; ME: merge layer; DP: down pooling layer; UP: up sampling layer.

Initially, the Massachusetts General Hospital/Marquette Foundation (MGH/MF) Waveform Database was used for the cardiovascular hemodynamics system. It contains several physiological signals: ECG, ABP, CVP, and PAP. This system was uniformly sampled for 360 Hz. From this database, this study uses only the lead II ECG signal, as the input signal, to initially generate the waveforms of ABP, CVP, and PAP, as the outputs. The utilization of the lead II ECG signal is due to this lead being mostly utilized in ECG record consideration [42]. Furthermore, the systolic and diastolic pressures of ABP and PAP were calculated by taking the maximum and minimum values, respectively, from the signal. Meanwhile, the mean CVP value was taken from averaging the 2 s CVP waveform. There were 59,401 and 14,850 of 2 sec segments, respectively, for training and testing.

Furthermore, the Cerebral Hemodynamic Autoregulatory Information System Database (CHARIS DB) was used for the intracranial pressure (ICP) evaluation. This database was sampled for 50 Hz. The input signal was the ECG signal, meanwhile the output signal was the ICP signal. A 2 s window was also selected, both for the input and output systems. Furthermore, there were 1,451,281 and 362,560 of 2 sec segments, respectively, for training and testing. Even though this database contains smaller number of patients, the data collection was much longer compared to the dataset utilized for the cardiovascular hemodynamics system.

For pre-processing the data and post-processing the result, MATLAB R2014b (Math-Works, Inc., Natick, MA, USA) was utilized. The data were initially filtered manually by visualization. The input and output signal must be appropriate for the training and testing. If either the input or output signal is noisy, both will be deleted. For training the

deep convolutional autoencoder system, Python 3.6 was utilized with TensorFlow (Ver. 1.15.2) [43] and Keras (Ver. 2.3.1) under Google Colaboratory (Google Inc., 1600 Amphitheatre Parkway Mountain View, CA, USA). In more detail, the training of both cardiovascular and cerebral hemodynamics systems were conducted with the checkpoint system.

For the evaluation, a 5-fold cross-validation was performed with shuffled training data. Generally, this strategy is performed to evaluate the regularity of the data to the model. Furthermore, Pearson's linear correlation (R), root mean squared error (RMSE), and mean absolute error (MAE) evaluations were conducted. Furthermore, the Bland–Altman plot was conducted for the further evaluation. R, RMSE, and MAE evaluations are shown in Equations (1)–(3).

$$\mathcal{R}\_{\mathbf{x},\mathbf{y}} = \frac{\sum\_{i=1}^{n} (\mathbf{x}\_i - \overline{\mathbf{x}})(y\_i - \overline{\mathbf{y}})}{\sqrt{\left[\sum\_{i=1}^{n} (\mathbf{x}\_i - \overline{\mathbf{x}})^2 \sum\_{i=1}^{n} (y\_i - \overline{\mathbf{y}})^2\right]}} \tag{1}$$

$$\text{MAE} = \frac{1}{n} \sum\_{i=1}^{n} |x\_i - y\_i| \tag{2}$$

$$\text{RMSE} = \sqrt{\frac{1}{n} \sum\_{i=1}^{n} (x\_i - y\_i)^2} \tag{3}$$

where *xi* is the reference, *yi* is the predicted result, *n* is the number of samples, *x* is the mean of the reference, and *y* is the mean of the predicted result.

#### **3. Results**

The multi-atrous deep convolutional autoencoder models were applied to both cardiovascular and cerebral hemodynamics systems using a single ECG signal. Pearson's linear correlation, root mean squared error, and mean absolute error evaluations were deployed as performance indicators. Finally, a Bland-Altman plot was conducted to investigate the performance of the deep learning for both cardiovascular and cerebral hemodynamics models. The models initially generated the hemodynamics waveform. This waveform generating system is fundamental due to it accommodating the morphological cardiovascular conditions [44]. Finally, from the generated waveform, the systolic, diastolic, or the mean values were extracted.

#### *3.1. Cardiovascular Hemodynamics*

The Massachusetts General Hospital/Marquette Foundation (MGH/MF) Waveform Database was used for this cardiovascular hemodynamics system. The ECG was used to generate the ABP, CVP, and PAP signals. The model was trained for 500 epochs, as shown in Figure 3. As it can be seen, the model starts to saturate at 400 epochs. The training and validation curves are relatively stable. The 5-fold cross-validation results show 0.942 ± 0.001, 7.833 ± 0.061 mmHg, and 4.959 ± 0.044 mmHg, respectively, for the R, RMSE, and MAE of ABP waveform evaluations. Furthermore, the CVP evaluation records are: 0.852 ± 0.002, 3.155 ± 0.018 mmHg, 2.024 ± 0.020 mmHg for R, RMSE, and MAE, respectively. Finally, for the PAP waveform evaluation, the evaluations of R, RMSE, and MAE are 0.873 ± 0.002, 4.853 ± 0.031 mmHg, and 3.263 ± 0.034 mmHg. The detail about the waveform evaluations is given in Table 1.

Furthermore, the systolic and diastolic pressures were calculated for the ABP and PAP. Meanwhile, the mean CVP value was also predicted from the waveform. For the ABP system, the systolic and diastolic pressures, the R evaluations are 0.894 ± 0.004 and 0.881 ± 0.005, respectively. The RMSE evaluations are 8.997 ± 0.401 mmHg and 4.726 ± 0.129 mmHg, respectively. The MSE evaluations are, respectively, 6.645 ± 0.353 mmHg and 3.210 ± 0.104 mmHg. Furthermore, for the PAP system, the systolic and diastolic pressures, the R evaluations are 0.864 ± 0.003 mmHg and 0.817 ± 0.006 mmHg, respectively. The RMSE evaluations are 5.833 ± 0.075 mmHg and 4.360 ± 0.179 mmHg, respectively. The MSE evaluations are 3.847 ± 0.136 mmHg and 2.964 ± 0.181 mmHg, respectively.

Meanwhile, the mean CVP evaluations are 0.916 ± 0.001, 2.220 ± 0.039 mmHg, and 1.329 ± 0.036 mmHg for R, RMSE, and MSE, respectively. The details about these evaluations are shown in Tables 2–4.

**Figure 3.** MA-UDCAE model convergence for cardiovascular hemodynamics.



**Table 2.** Systolic and diastolic arterial blood pressure evaluations.


**Table 3.** Mean central venous pressures.



**Table 4.** Systolic and diastolic pulmonary arterial pressure evaluations.

The waveform evaluation result is shown in Figure 4. This figure also investigates the information of systolic, diastolic, and mean pressures from ABP, CVP, and PAP. Most importantly, the ability of MA-UDCAE deals with several conditions of the ECG, ABP, CVP, and PAP signals within two seconds and is also given in this figure. From Figure 4a, it can be seen how a relatively normal heartbeat generates normal ABP, CVP, and PAP signals. From this figure, the model predicts accurate systolic, diastolic, and mean values. In Figure 4b, ECG has a higher heart rate. In this case, the subject has very low ABP information, high CVP, and high PAP measures. A high error is given from the systolic ABP in the last period of the CVP waveform. However, the PAP is relatively good. Normal heart rate ECG is given in Figure 4c. However, this subject has relatively high systolic ABP, and relatively high CVP and PAP. Furthermore, Figure 4d shows relatively higher ECG heartbeat for generating normal ABP, normal CVP, and high PAP. It can be seen that the systolic PAP has a relatively high error.

**Figure 4.** MA-UDCAE-generated waveforms of ABP, CVP, and PAP: (**a**) Normal ECG with normal hemodynamics; (**b**) Fast ECG with abnormal hemodynamics; (**c**) Slow ECG with abnormal hemodynamics; (**d**) Normal ECG with abnormal hemodynamics. Note: Systolic reference (Sr); diastolic reference (Dr); systolic prediction (Sp); diastolic prediction (Dp); mean reference (Mr); mean prediction (Mp).

Finally, the Pearson's linear correlation and Bland–Altman plot are given in Figures 5 and 6. Figure 5 shows the Pearson's linear correlation result. It can be seen that the correlation coefficient of systolic ABP and diastolic ABP are 0.89 and 0.86, respectively. Meanwhile, the SPAP and DPAP are, respectively, 0.86 and 0.81. Finally, the mean CVP is 0.92. Figure 6 shows the Bland–Altman plot for the cardiovascular hemodynamics system. For systolic ABP, the reference and prediction have the mean difference of −4.182 mmHg, −1.96 STD of −21.268 mmHg, and +1.96 STD of 12.904. For the diastolic ABP, it has a mean difference of 0.202 mmHg, −1.96 STD of −9.987 mmHg, and +1.96 STD of 10.391. Furthermore, for the systolic PAP, the mean difference is 0.668 mmHg, −1.96 STD of −10.563 mmHg, and +1.96 STD of 11.899 mmHg. For diastolic PAP, the mean difference is −1.827 mmHg, −1.96 STD of 6.140 mmHg, and +1.96 STD of −9.794 mmHg. Finally, the mean CVP has mean difference of −0.121 mmHg, −1.96 STD of 4.099, and +1.96 STD of −4.341 mmHg. The graphs show some negative values appearing in the prediction. This situation likely happens due to the complexity of patient monitoring, and the method for selecting the systolic and diastolic peaks using maximum and minimum values.

**Figure 5.** Pearson's linear correlation coefficient results. Note: Red dotted line is the diagonal line.

**Figure 6.** Bland–Altmann plots for cardiovascular hemodynamics system. Note: Red line is the mean, and dash-dot lines are ±1.96 standard deviations.

#### *3.2. Intracranial Pressure*

For the intracranial pressure evaluation, the Cerebral Hemodynamic Autoregulatory Information System Database (CHARIS DB) was utilized. The MA-UDCAE model was also applied for the ECG signal to understand the ICP pattern. For the evaluation of the deep learning system from the ECG to ICP, the MA-UDCAE model was applied. This system was conducted for 20 epochs. The mean absolute error was used for the evaluation. The ICP training convergence is shown in Figure 7.

After the training season, the testing data were subsequently evaluated into the trained model. Some of the ECG-generated ICP waveforms can be seen in Figure 8. These results are selected based on the variation of the reference ICP. From this figure, it can be seen that the model is not only reasonably robust in handling the data that has either a relatively low or high ICP index, but also gives some information on how the model decodes the ICP waveform.

**Figure 7.** The MA-UDCAE model convergence for intracranial pressure.

**Figure 8.** MA-UDCAE-generated waveform of ICP. Note: Mean reference (Mr); mean prediction (Mp).

Furthermore, in more detail, for the intracranial pressure waveform evaluations, the cross-validation results generate 0.887 ± 0.003, 5.306 ± 0.041 mmHg, and 2.765 ± 0.026 mmHg, respectively, for R, RMSE, and MAE. For the mean ICP evaluation, the R, RMSE and MAE evaluations are 0.914 ± 0.003, 4.582 ± 0.044 mmHg, and 2.404 ± 0.043 mmHg, as shown in Table 5. The intracranial evaluations for Pearson's linear correlation and Bland–Altman are shown in Figure 9. Pearson's linear correlation for the ICP is 0.89, as given in Figure 9a. From this figure, it can be seen that the model likely starts to generate a bigger error when dealing with an ICP higher than 30 mmHg. In addition, Figure 9b shows the Bland–Altman plot for ICP evaluations. Results in Figure 9b support Pearson's linear correlation. As it can be seen, the mean difference is 0.492 mmHg, −1.96 STD of −8.32 mmHg, and +1.96 STD of 9.310 mmHg. Even though it has relatively low prediction error, the model gives lower accuracy on higher ICP measures. The entire ICP evaluation is shown in Table 5, indicating that all evaluations of R, MAE, and RMSE have fairly low standard deviation values. However, some negative values can be seen appearing in the prediction. This situation likely happens due to the complexity of the patient data.


**Table 5.** Intracranial pressure evaluations.

**Figure 9.** Intracranial pressure evaluations: (**a**) Pearson's linear correlation result for intracranial pressure evaluation. Note: Red dotted line is the diagonal line; (**b**) Bland–Altmann plot for intracranial pressure evaluation. Note: Red line is the mean, and dash-dot lines are for ±1.96 standard deviations.

#### **4. Discussion**

The novelty in this study is the utilization of a non-invasive ECG signal to investigate cardiovascular and cerebral hemodynamics. Initially, the models generated the cardiovascular hemodynamics ABP, PAP, and CVP from MGHDB. This study used two databases from PhysioNet: Massachusetts General Hospital/Marquette Foundation (MGH/MF) Waveform Database and Cerebral Hemodynamic Autoregulatory Information System Database (CHARIS DB). The MA-UDCAE deep learning model was deployed for modeling these systems.

Most of the previously conducted studies as shown in Table 6 utilized PPG signal as an input of the model. Slapniˇcar et al. [45] used PPG signal of 510 subjects of MIMIC III from a PhysioNet and ResNet-based model to investigate hypertension. This previous study had a MAE of 9.43 mmHg and 6.88 mmHg, respectively, for SBP and DBP. Chowdhury et al. administered the PPG signal of 126 subjects and a regression model. Their results achieved a correlation coefficient of 0.95 and 0.96, respectively, for SBP and DBP, and MAE of SBP and DBP are, respectively, 3.02 mmHg and 1.74 mmHg [46]. Meanwhile, Zadi et al. [47] applied the ARMA algorithm to PPG signals from 15 subjects. They achieved an RMSE of 7.21 mmHg and 5.12 mmHg for SBP and DBP, respectively. However, none of these studies provides information about the continuous waveform evaluation.


**Table 6.** Cardiovascular hemodynamics evaluations. Note: RMSE and MAE are in mmHg.

Several previous studies, shown in Table 6, have utilized PPG signal as their input, and investigated the waveform evaluation. Sideris et al. [9] performed evaluation of continuous blood pressure using PPG signals with the LSTM method, delivering an RMSE of 6.04 ± 3.26 mmHg, 2.58 ± 1.23 mmHg, and 1.98 ± 1.06 mmHg, respectively, on SBP and DBP evaluations, and a waveform correlation coefficient of 0.95 ± 0.045. Meanwhile, Sadrawi et al. [10] utilized PPG for arterial blood pressure estimation, reporting 0.984 linear correlation and MAEs of 2.54 mmHg and 1.48 mmHg for the systolic and diastolic, respectively. Furthermore, Aguirre et al. [48] utilized PPG signal of 1131 subjects from the PhysioNet database using the Seq2seq algorithm to evaluate the hemodynamics. This

study also investigated the arterial blood pressure waveform. The generated waveform had a correlation coefficient of 0.98. Meanwhile, the MAE of systolic and diastolic were 12.08 mmHg and 5.56 mmHg, respectively [48].

On the other hand, prior studies as shown in Table 6 also correlated ECG signal with blood pressure estimation. Tanveer et al. [6] used a combination of ECG and PPG, utilizing ANN and LSTM. This study reported 0.93 mmHg and 0.52 mmHg for SBP and DBP, respectively, on the MAE estimations. This earlier study has very high correlation coefficients, 0.999 and 0.998, respectively, for the SBP and DBP. Another study by Wu et al. [7] also used ECG and PPG with the hybrid ANN method, and produced 3.404 mmHg and 3.289 mmHg, respectively, for SBP and DBP. Meanwhile, a study by Eom et al. [8] conducted an investigation of multiple input signals, ECG, PPG, and BCG, combined with CNN, Bi-GRU, and attention methods. This previous study produced an MAE of 4.06 ± 4.04 and 3.33 ± 3.42, respectively, for SBP and DBP evaluations. Meanwhile, it had 0.52 and 0.49 for the R2 evaluations for the SBP and DBP, respectively. The study that only used the ECG signal to investigate hypertension was conducted by Fan et al. [23]. This previously conducted study reported the MAE for the systolic and diastolic as 7.69 and 4.36 mmHg, respectively.

For a comparison of intracranial pressure evaluation, this study is compared to several sub-studies, given in Table 7. These previously conducted works were undertaken by utilizing the ABP and cerebral blood flow velocity (CBFV). Imaduddin et al. [3] used these signals to estimate the ICP from 13 patients with the Bayesian system. This previous study provided an RMSE of 3.7 mmHg. Another study conducted by Jaishankar et al. [4] had RMSEs of 5.1 mmHg and 4.5 mmHg, respectively, for 13 pediatric and 5 adult subjects. This prior study utilized the spectral method. Even though the CBFV is a non-invasive system, the ABP is classified as an invasive technique. This study reported 4.582 ± 0.044 mmHg RMSE from five cross-validation systems. This result is slightly inferior compared to the previously compared study. However, the novelty of this study is the utilization of the non-invasive ECG signal as the input signal to generate the ICP waveform with further evaluation of the mean ICP.


**Table 7.** Intracranial pressure evaluations. Note: RMSE and MAE are in mmHg.

The association between ECG and hemodynamics is related to the fact that ECG and ABP have a very close relationship in time-related terms, especially in the evaluation of the heart-related system. This may have some similarity in CVP, PAP, and ICP due to the blood circulation. Furthermore, the systolic peak and the R peak intervals are shifted in some transmitting time. However, more investigation is required to be performed morphologically for the full cycle between the ECG and cardiovascular and cerebral hemodynamics signals, especially for CVP and ICP.

In order to evaluate the network performance, an ablation study was performed. The autoencoder structure was utilized based on [10]. From this previous study, specifically for hemodynamics, the UNET-based model provided a better result compared to the classical autoencoder network. Hence, based on reference [10], the structure was modified by utilizing the multi-atrous system. For simplification, this study investigates the results only at 25 and 5 epochs, respectively, for the cardiovascular and cerebral hemodynamics systems.

The ablation network evaluation was also conducted in our study, given in Figure 10. For the cardiovascular system, the ablated network convergence is shown in Figure 10a. Several ablations were conducted. The decreased layers are sequentially the layer connections between DP\_04 and CV\_05, DP\_03 and CV\_06, DP\_02 and CV\_07, DP\_01 and CV\_08, and the input layer and CV\_09. From this figure, it can be seen that the first ablated layer attaching between DP\_03 and CV\_06 tends to have a faster convergence rate compared to others. However, the results are relatively similar at the 25th epoch. For testing, the deleted connection systems tend to have some oscillation during this early period compared to the non-ablated networks.

(a) (b) **Figure 10.** Ablation network: (**a**) cardiovascular hemodynamics; (**b**) cerebral hemodynamics.

The cerebral hemodynamics ablation study was conducted by removing the concatenating layers. The first deleted layer is the connection between ME\_02 and CV\_04. The next one is between ME\_01 and CV\_05. The results are shown in Figure 10b, where in the training phase, most of the cross-validation models without ablation have relatively lower MAE values at the 5th epoch compared to the ablated models. In this system, the deleted connection models tend to saturate earlier compared to the system without any ablated connection. Furthermore, the ablated networks also have a slower convergence rate at the testing.

This study has several limitations due to the selection of systolic and diastolic techniques being sensitive to noise. In more detail, the noise has a positive effect on the systolic and a negative effect on the diastolic pressure. This phenomenon will generate higher error during the evaluation.

Due to the data limitations, one of the main limitations of this study is the testing data separated randomly from all patients instead of patient-based partition. Even though the data were not fully interpreted, this strategy will let the model learn and memorize some patterns of the output through the input. Another limitation is the high dependency on the quality of the ECG signal. The mitigations of the dataset unbalancing can be investigated further [49] and other deep learning structures can be applied in future [50–52].

There are also concerns regarding the evaluations of arrhythmia conditions. In this study, some of the arrhythmia ECG factors affecting the cardiovascular and cerebral hemodynamics systems have been investigated. In addition, the cardiovascular hemodynamics dataset was well annotated for some arrhythmia conditions. However, for the cerebral hemodynamics, the heartbeat in the dataset was not labeled. Nevertheless, it is still possible for the rapid and irregular beats to be investigated. The premature ventricular contraction (PVC) and supraventricular premature/ectopic beats were evaluated. Figure 11 shows how arrhythmia affects the cardiovascular hemodynamics. It can be seen that most of the abnormal ECG signals are relatively good in generating the ABP. However, there are some shifts in the SBP and DBP, as shown in Figure 11a,f. Bigger differences are shown for CVP and PAP signals. The generated ICP from the abnormal ECG is shown in Figure 12. From Figure 12a,b, the R-R interval irregularity has a worse effect on the ICP prediction compared to the arrhythmia generated from the rapid R interval. However, deeper evaluation with many more additional arrhythmia cases to investigate the effect of arrhythmia on the hemodynamics should be performed in the future.

**Figure 11.** Arrhythmic ECG predictions for cardiovascular hemodynamics. (**a**) rapid R waves with premature beat; (**b**) rapid heart rate with downward R waves; (**c**) Slow heart rate with relatively small downward R wave; (**d**) Slow heart rate with bigger downward R wave; (**e**) Rapid upward R wave with irregular interval; (**f**) Rapid heart rate with multiple downward R wave.

**Figure 12.** Arrhythmic ECG evaluations for the intracranial pressure. (**a**) Irregular interval heart rate; (**b**) Irregular and rapid heart rate; (**c**) Rapid heart rate with downward shifted R waves; (**d**) Rapid heart rate.

Since this study is preliminary and a proof of concept, it may not provide superior results compared to previous studies that used multi-input signals such as the combination of PPG and ECG, in which the shape of the PPG signal is much more identical to ABP signal compared to the ECG and ABP signal. However, as with preliminary and proof of concept studies, this study can be a finding of using the ECG in hemodynamics investigations. Finally, the cross-validation test using the same group of subjects in our current study was inner loop cross-validation, which only rotates the validation data and training data. However, to make the results more convincing, the testing data need to be rotated into training and validation, which is known as outer loop cross-validation, and is to be considered in future works. In addition, the pre-processing of the data was initially filtered manually by visualization. This manual filter may not be practical for the study. We still need further investigation about using automatic filtering to filter all these vital signs for future work.

#### **5. Conclusions**

In order to design the most precise health monitoring system to solve the hemodynamics system in ICU, in this study, as a proof of concept, a deep convolutional autoencoder

system has been implemented for a non-invasive system by only using a single ECG signal and utilizing two databases for cardiovascular and cerebrovascular hemodynamics systems. For the preliminary result, it can be seen that ECG has great potential in generating the ABP, CVP, PAP, and ICP waveform, as well as their essential information for the extensive evaluations.

**Author Contributions:** Conceptualization, M.S., H.-T.H., S.-Z.F., M.F.A. and J.-S.S.; Formal analysis, M.S.; Investigation, M.S., H.-T.H., S.-Z.F., M.F.A. and J.-S.S.; Methodology, M.S., Y.-T.L., M.F.A. and J.-S.S.; Project administration, J.-S.S.; Software, M.S., C.-H.L. and B.M.; Supervision, Y.-T.L., C.-H.L., H.-T.H., S.-Z.F., M.F.A. and J.-S.S.; Validation, M.S.; Visualization, M.S.; Writing—original draft, M.S., M.F.A. and J.-S.S.; Writing—review and editing, M.S., M.F.A. and J.-S.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** This study utilizes the publicly available dataset, from https:// physionet.org, accessed on 14 January 2021.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

