**Determining the Online Measurable Input Variables in Human Joint Moment Intelligent Prediction Based on the Hill Muscle Model**

**Baoping Xiong 1,2, Nianyin Zeng 3,\*, Yurong Li 4, Min Du 1,5,\*, Meilan Huang 1, Wuxiang Shi 1, Guojun Mao <sup>2</sup> and Yuan Yang 6,\***


Received: 15 January 2020; Accepted: 19 February 2020; Published: 21 February 2020

**Abstract:** *Introduction*: Human joint moment is a critical parameter to rehabilitation assessment and human-robot interaction, which can be predicted using an artificial neural network (ANN) model. However, challenge remains as lack of an effective approach to determining the input variables for the ANN model in joint moment prediction, which determines the number of input sensors and the complexity of prediction. *Methods*: To address this research gap, this study develops a mathematical model based on the Hill muscle model to determining the online input variables of the ANN for the prediction of joint moments. In this method, the muscle activation, muscle-tendon moment velocity and length in the Hill muscle model and muscle-tendon moment arm are translated to the online measurable variables, i.e., muscle electromyography (EMG), joint angles and angular velocities of the muscle span. To test the predictive ability of these input variables, an ANN model is designed and trained to predict joint moments. The ANN model with the online measurable input variables is tested on the experimental data collected from ten healthy subjects running with the speeds of 2, 3, 4 and 5 m/s on a treadmill. The variance accounted for (VAF) between the predicted and inverse dynamics moment is used to evaluate the prediction accuracy. *Results*: The results suggested that the method can predict joint moments with a higher accuracy (mean VAF = 89.67±5.56 %) than those obtained by using other joint angles and angular velocities as inputs (mean VAF = 86.27±6.6%) evaluated by jack-knife cross-validation. *Conclusions*: The proposed method provides us with a powerful tool to predict joint moment based on online measurable variables, which establishes the theoretical basis for optimizing the input sensors and detection complexity of the prediction system. It may facilitate the research on exoskeleton robot control and real-time gait analysis in motor rehabilitation.

**Keywords:** artificial neural network; joint moment prediction; extreme learning machine; Hill muscle model; online input variables

#### **1. Introduction**

Human joint moment prediction is crucial to rehabilitation evaluation [1–3], athlete training evaluation [4–6], prosthesis and orthosis design [7–9], intramedullary device design [10–12] and human-robot interaction [13–21]. The precise prediction of joint moment can be fulfilled by the use of instrumented implants [22] which measures the relevant parameters of joint load in real time. However, this approach is not always feasible since only few people (likely those suffering from musculoskeletal deficits) have implants.

Although computational models can serve as alternative methods for joint moment prediction when the implants are not available, they face a challenge of eliminating the measurement error. This is due to the individual differences in the anatomical and functional characteristics of the musculoskeletal system [22]. Furthermore, the joint moment is not easily measured in real time. Previous studies [23–26] indicated that this challenge may be addressed by using the artificial neural network (ANN) model, because of its excellent adaptive ability to individual characteristics [27,28]. For example, Uchiyama et al. [29], used an ANN model to predict the elbow joint moment with the inputs of EMG signals, elbow and shoulder joint angles, while Luh et al. [30], and Song and Tong [31] utilized an ANN model with EMG signals, elbow joint angle and angular velocity for the same purpose. Hahn [32] intelligently predicted the isokinetic knee extensor and flexor moment with the inputs of EMG signals, gender, age, height and body mass. Ardestani et al. [33], combined the EMG signals and ground reaction force (GRFs) with ANN model to study the lower limbs' joint moment. Recently, Xiong et al. [34], used the optimized EMG signals and joint angles as the inputs of ANN model to calculate the lower extremity joint moment.

As listed above, different studies used different input variables in their ANN models to predict joint moments. However, the number of input variables determines the number of sensors and the complexity of the system. It is yet to develop a mathematical model to determine the optimal online measurable input variables. This model will provide a theoretical basis for designing a system with few sensors and high accurate of joint moment prediction. Therefore, the purpose of this study is to introduce a novel method for determining the online measurable input variables for human joint moment intelligent prediction.

In this method, musculoskeletal geometry [35,36] comprised of Hill muscle models [37,38] are utilized for representing the muscle mechanical response. Furthermore, the input variables to predict joint moment based on the Hill muscle model includes four time-varying variables: the muscle activation, muscle-tendon moment arm, velocity and length are found [39], that generally cannot be measured online in vivo. Thus, a surrogate model is built for each tested muscle to convert these four input variables to the online measurable variables, i.e., muscles EMG, the muscle actuates joints' angles and angular velocities.

To test the predictive ability of the online measurable input variables, a commonly used ANN model, i.e., Extreme Learning Machine (ELM), is designed and trained to predict joint moments. The ELM is a feedforward ANN [40], which has a much lower computational cost than traditional machine learning algorithms, especially for the single hidden layer mode [41–43]. The method is tested on the experimental data of ten healthy male subjects running at different speeds, i.e., 2, 3, 4 and 5 m/s on a treadmill. The ELM predictions are validated against inverse dynamics and compared with those obtained by jack-knife cross-validation with other online measurable variables as inputs [29–31,34].

#### **2. Materials and Methods**

#### *2.1. Experimental Data*

The lower limbs' kinematics and dynamics experimental data of ten healthy male subjects (height 1.77 ± 0.04 m, age 29 ± 5 years, mass 70.9 ± 7.0 kg) was obtained from an open database (https://simtk.org/projects/nmbl\_running; accessed on, 18 October 2019). In the experiment, the motion data, EMG signals and ground reaction force were measured, while the subjects ran at different speeds of 2, 3, 4 and 5 m / s on the treadmill. At least six gait cycles were recorded for each speed. The EMG signals included gluteus medius, rectus femoris, gluteus maximus, vastus lateralis, biceps femoris long head, vastus medialis, tibialis anterior, soleus, gastrocnemius medialis and gastrocnemius lateralis. All the EMG signals were rectified, filtered and normalized. The motion and force data were filtered accordingly. A complete description of these data can be found in [44].

After obtaining the experimental data, all the ten subjects' moment of ankle plantar-dorsiflexion, knee flexion-extension, hip adduction-abduction and hip flexion-extension are firstly calculated by using the inverse dynamics method [45] with opensim software, then the moment, force, motion and EMG signals are resampled to obtain 101 time points of each gait cycle. All the inverse dynamics moment will be used as the target value of the ANN model's training samples.

#### *2.2. Determination of Online Measurable Variables*

In order to obtain the online measurable input variables, the Hill muscle model [37,38] and musculoskeletal geometry [35] is used to establish a mathematical model of input-output relation for joint moment prediction. The data processing pipeline is shown as Figure 1.

**Figure 1.** Data processing pipeline of the method based on Hill muscle model, where *l*(θ) is a polynomial function of the muscle spans joint angles.

In the Hill muscle model, the muscle moment about the spanned joint [46] is indicated by:

$$M = r \cdot F\_o^M \cdot \left[ a \left( emg(t - d) \right) \cdot f\_l(\frac{l - l\_s^T}{l\_o^M \cos \phi}) \cdot f\_v(\frac{\upsilon}{10 \cdot l\_o^M}) + f\_p(\frac{l - l\_s^T}{l\_o^M \cos \phi}) \right] \cos(\phi) \tag{1}$$

where *M* and *r* are the muscle moment and moment arm about the joint it actuates, *FM <sup>O</sup>* is muscle's peak isometric force, *a*() is the muscle's activation which can be calculated as a function of EMG data, *t* is the time, *d* is the electromechanical delay, *v* and *l* are muscle-tendon velocity and length, φ is pennation angle of the muscle, *l M <sup>o</sup>* is the optimal fiber length and *l T <sup>S</sup>* is the tendon slack length. The relationship of muscle-tendon length, muscle fiber length, tendon length, pennation angle can be seen in Figure 2. *fv*(), *fl*() and *fP*() represent muscle force-velocity, active force-length and passive force-length curve. *FM <sup>o</sup>* , *d*, φ, *l T <sup>s</sup>* and *l M <sup>o</sup>* are assumed to remain constant for the individual. *l*, *v* and *r* are time variables that can be calculated as polynomial functions of joint angles and angular velocities with the same constant coefficients [47,48]. When θ is the muscle spans joint angles, those time variables can be expressed as follows:

$$l(t) = l(\mathfrak{G})\tag{2}$$

$$v(t) = \frac{\partial l(t)}{\partial t} = \frac{\partial l(\Theta)}{\partial t} = \frac{\partial l(\Theta)}{\partial \Theta} \frac{\partial \Theta}{\partial t} = v(\Theta, \dot{\Theta}) \tag{3}$$

$$r(t) = -\frac{\partial l(\boldsymbol{\Theta})}{\partial \boldsymbol{\Theta}} = r(\boldsymbol{\Theta}) \tag{4}$$

where <sup>θ</sup>(*t*) and • θ(*t*) are the muscle spans joint angles and angular velocities; *l*(θ) is muscle-tendon length which is polynomial functions of the muscle spans joint angles;*v*(θ, • θ) is muscle-tendon velocity which is the first derivative of *l*(θ) with respect to time t; *r*(θ) is muscle-tendon moment arm which is the first derivative of *l*(θ) with respect to θ. The sign of the variable is used to determine the direction of the moment.

**Figure 2.** A diagram of muscle-tendon unit that shows the relationship of muscle-tendon length, muscle fiber length, tendon length, pennation angle. Where *l* is the muscle-tendon length, *l <sup>m</sup>* is the muscle fiber length, *l <sup>t</sup>* is the tendon length, φ is the pennation angle.

From Equations (1)–(4), the muscle moment about the spanned joint can be calculated as a function of the muscle's EMG signal, and the muscle actuates joints' angle and angular velocity (Figure 1):

$$M(\text{emg}, \stackrel{\bullet}{\Theta}, \stackrel{\bullet}{\Theta}) = r(\Theta) \cdot \text{F}\_o^M \cdot [a(\text{emg}(t - d)) \cdot f\_l(l(\Theta)) \cdot f\_v(v(\Theta, \stackrel{\bullet}{\Theta})) + f\_p(l(\Theta))] \cdot \cos(a) \tag{5}$$

where *d* is an electromechanical delay, and its value is generally 10-100ms [49]. From Equations (1)–(5) the j-th joint moment is represented by the following equation:

$$M^j = \sum\_{i=1}^{m} M(cm\lg(i), \hat{\mathfrak{G}}(i), \hat{\mathfrak{G}}(i)) \tag{6}$$

where m is the number of muscles associated with the joint moment.

It can be seen from Equation (6) that the online measurable input variables for the human joint moment prediction are joint moment-associated muscles' EMG signals, and their muscles actuates joints' angles and angular velocities.

#### *2.3. The Designed ANN*

To confirm the predictive effect of the online measurable input variables, the ELM is designed and trained as the ANN model to predict joint moments, which is a feedforward ANN algorithm [40]. It can be seen from Equation (6) that different joint moments correspond to different inputs which is not suitable to use the multi-output ANN model, so the ELM only has one output neuron. Its structure is generally shown as Figure 3, which is divided into an input layer, a hidden layer and an output layer. Its expression is provided as follows:

$$\mathbf{O} = \beta \lg(\mathcal{W} \cdot \mathbf{X} + b) \tag{7}$$

where *X* is the input, O is output, *W* = [*W*1, *W*2, ··· , *WL*] is the matrix of input-to-hidden-layer weights, β = [β1, β2, ··· , β*L*] is the matrix of hidden-to-output-layer weights, *b* = [*b*1, *b*2, ··· *bL*] is the matrix threshold of the hidden node and *g*() is the activation function. The distinguishing feature of ELM from the traditional feedforward neural network is that *W* and *b* are randomly selected and does not need to be adjusted during the training process, and β are calculated in the training process [45]. The feature makes the process of determining network parameters without iterations, reduces the adjustment time of network parameters, and greatly improves the learning speed. The ELM is widely used in regression analysis and classification [41,50].

**Figure 3.** Structure of the designed ELM.

The ELM is trained to predict four DOFs' moment in the right leg: ankle plantar-dorsiflexion (Ankle PDF), knee flexion-extension (Knee FE), hip adduction-abduction (Hip AA) and hip flexion-extension (Hip FE), and the inverse dynamics moment is used as the target value of the training sample. It can be seen from Table 1 with Equation (6) that the input variables of Hip FE's joint moment prediction contains the EMG signals of four muscles and three joint angles and angular velocities. There are 10 input variables in total.

**Table 1.** The list of EMG signal sources and their muscle actuates.


#### *2.4. Prediction Evaluation*

Considering that Equation (6) is obtained under the assumption that *F<sup>M</sup> <sup>o</sup>* (muscle's peak isometric force), *d* (the electromechanical delay), φ (pennation angle of the muscle), *l T <sup>s</sup>* (the tendon slack length) and *l M <sup>o</sup>* (the optimal fiber length) are remain constant for the individual, which is not suitable for training multiple subjects at a time, so per ELM only trains one joint moment of a subject. A generic three-layer ELM is designed and trained using two strategies for evaluating the generalization ability of the method at two different levels: (1) training with all four speeds (level 1) and (2) training only with the three low speeds (2, 3 and 4 m/s) (level 2). During the supervised training, the inverse dynamics moment is used as the target value of the training samples. The variance accounted for (VAF) [51] is used to evaluate the accuracy of the ELM, its expression is as follows:

$$\text{VAF} = \left[1 - \frac{\text{var}(\hat{y} - y)}{\text{var}(y)}\right] \times 100\% \tag{8}$$

where y is the inverse dynamics moment and *y*ˆ is predicted joint moment. For each speed, six gait cycles (6 × 101 = 606) are selected for training and testing. Since a complete gait cycle data may contain all gait features at the current speed, training and testing must take the whole gait cycle as input or it is easy to cause feature loss to make the prediction result unstable. Therefore, the data set is smaller, a greater percentage of 30% as testing data set and 70% as training data set must be used to train and test the ELM, so four (6 × 0.7 = 4.2) gait cycles (4 × 101 = 404 time points) data are randomly selected from each tested speed for training, and the remaining two (6 × 3 = 1.8) gait cycles (202 time points) for testing. Then, in order to set the appropriate number of neurons in the hidden layer for better prediction effect, an experiment is done to observe the relationship between the number of neurons in the hidden layer and the prediction accuracy. In the experiment, four gait cycles data are selected from each speed for training, and two gait cycles for testing. The ten subjects' average predicted accuracy evaluated by the VAF (%) are shown as Figure 4.

**Figure 4.** The ten subject's average predict accuracy evaluated by the variance accounted for (%) with the increase of neurons.

It can be seen from Figure 4 that the value of VAF increased rapidly with the increase of neurons at the beginning, but the value of VAF slowed down when the number of neurons exceeded 10. Considering the structural complexity of ELM and the time cost for training, the number of neurons in the hidden layer is set to 20.

#### **3. Results**

When training with all four speeds (level 1), the trained ANN model is used to predict the lower limbs' joint moment of all subjects at different speeds. Joint moment prediction of a typical subject at each speed are shown in Figure 5. As shown, the general pattern of lower limb joint moment can be predicted well at each speed. Comparing with inverse dynamics moment, there only have some difference in minimum and maximum values of waveforms (cross-correlation coefficient > 0.987). The VAF of the predicted joint moment for Ankle PDF, Knee FE, Hip FE and Hip AA at level 1, with the mean VAF (± standard deviation) of 97.15 ± 0.99%, 94.23 ± 2.99%, 95.39 ± 3.62% and 95.01 ± 7.46% as shown in Table 2.

**Figure 5.** Joint moment prediction of a typical subject at each speed when all four speeds are used for training (level 1).



When training with the three low speeds (level 2), the trained ANN model is also used to predict the lower limbs' joint moment of all objects at different speeds. Joint moment prediction of a typical subject at each speed are shown in Figure 6. As shown, the errors between the predicted and inverse dynamics moment were slightly increased, when compared to the corresponding errors at level 1 (cross-correlation coefficient > 0.984), especially the speed of 5m/s. The VAF of the predicted joint moment for Ankle PDF, Knee FE, Hip FE and Hip AA at level 1, with the mean VAF (± standard deviation) of 94.31 ± 7.13, 93.04 ± 3.62, 92.08 ± 2.93% and 89.95 ± 2.31% as shown in Table 3.

**Figure 6.** Joint moment prediction of a typical subject at each speed when only the three low speeds are used for training (level 2).


**Table 3.** Joint moment prediction performances for level 2, evaluated by VAF (%).

In order to examine generalizability over multiple conditions, a more exhaustive validation of the test result data is conducted using jack-knife cross-validation [52] which all cross-validation subsets consist of only one data set each. In the jack-knife cross-validation, six gait cycles at each speed are taken as one data set, and there are four data sets in total. In each test, three data sets are selected as training sets and one data set as test set, and their average VAF of ten subjects 'predicted joint moment for Ankle PDF, Knee FE, Hip FE and Hip AA are shown in Table 4. As shown in Table 4, the obtained results have little difference from level 2.


**Table 4.** Joint moment prediction performances for jack-knife cross-validation, evaluated by VAF (%).

Furthermore, the method (EAV) is compared with other combination of inputs using jack-knife cross-validation by VAF (Figure 7).

**Figure 7.** Comparison of performance by jack-knife cross-validation for several combination of inputs: EAV = relevant muscles' EMG, and their muscles actuate joints' Angles and angular Velocities; EA = relevant muscles' EMG and their muscles actuate joints' Angles; EV = relevant muscles' EMG and their muscles actuate joints' Angles; EJAV= relevant muscles' EMG, the Joint's Angle and Angular velocity; EJA = relevant muscles' EMG and the Joint's Angular velocity; E = relevant muscles' EMG signals.

They are five different inputs as following: (1) Relevant EMG signals and their muscles actuate joints' Angles (EA); (2) Relevant EMG signals and their muscles actuate joints' angular Velocities (EV); (3) Relevant muscles' EMG signals, the Joint's Angle and angular Velocity (EJAV); (4) Relevant muscles' EMG signals and the Joint's Angle (EJA); (5) Relevant muscles' EMG signals as inputs (E). The relevant muscles' EMG signals means that the joint moment-associated muscles' EMG signals. Take EAV (VAF= 89.67 ± 5.56%) as reference and compare with the above inputs respectively, It can be seen that the VAF of the moment predicted by the EA (VAF= 86.21 ± 6.60%), EV(VAF= 45.48 ± 5.08%), EJAV(VAF= 66.80±5.91%), EJA(VAF= 54.41±5.70%), and E(VAF = 15.39±4.81%) are almost reduced by 3.85%, 49.27%, 25.50%, 39.31% and 82.83%.

#### **4. Discussion and Conclusions**

This study demonstrated that the ELM with the online measurable input variables could be used as a real-time surrogate model to predict joint moments under different gait speeds. Compared with the previous studies [29–33,53–55], this research extends our knowledge by establishing the mathematical model of input-output relation in the human joint moment prediction based on the Hill muscle model. The online measurable input variables are obtained for the ANN model. It does not need ground reaction force and marker trajectories which increases the number of input sensors and the complexity of prediction. The novel method has high prediction accuracy with VAF = 96.07 ± 3.484%. Thus, the

proposed method is suitable for online rehabilitation assessment and human-robot interaction which need to obtain joint moment in real time.

It can be seen from Equations (1)–(6) that the muscles actuate joints are very limited, while inertial magnetic measurement systems are good at measuring the limited joints' angles and angular velocities [56], so unlike previous computational models, such as inverse dynamics [57,58] and EMG-driven models [39,46,59], the method can online predict joint moment without essential 3D motion capture and complicated calculation, which make the hospitals and laboratories to predict joint moments without site requirements, even in a free state. It can also adapt to the individual differences in the process of training, and does not need the musculoskeletal model or the scaling of specific objects, thereby reducing the error caused by individual differences. Furthermore, the training time is less than one second.

Compared level 2 with level 1 and the jack-knife cross-validation results (Table 4), the results suggest that the proposed method has a good generalization ability. Thus, in practice, a reduced amount of training data can be used when a large amount of data is not available. It can be seen from Figure 7 that EAV has the best prediction results in all joints compared with other inputs, which verifies the accuracy of the method proposed in this paper. Comparing our method with EA, the latter's VAF only reduced by 3.85%. Thus, it can be concluded that the effect of angular velocities on joint moment prediction is relatively small. Comparing the method with E, the latter's VAF reduced by 82.83%. This indicates that: (1) the EMG value alone cannot represent the value of the joint moment [60], and (2) the joint angle has a great influence on the joint moment prediction. From Figure 7, It can also be found that the EJAV has good prediction results, so it can be concluded that the effect of the joint moment's angle and angular velocity on joint moment prediction is very important. This is the reason why the musculoskeletal model use joint's angles and angular velocities as inputs to calculate joint moments. As the ANN model can adapt to the individual differences in the process of training and the muscle model is applicable to all muscles of any human body whether male or female, old or young and health or not, so the proposed method can also be applied to other joints of any human body theoretically.

It should be mentioned that the current study has some limitations. Firstly, there are only 10 muscles' EMG data of the right leg used in the method, which can't represent all muscles associated with the joint. our approach will be developed in a larger set in the future. Secondly, the gait patterns in the experimental only include run gait patterns, which is very limited. In the future study, more gait data will be collected, such as squatting, cutting and so on. Finally, the sample is only composed of young male subjects with similar anthropometry and age, which cannot ensure the diversity of the training samples. Data samples from different groups of people will be collected in the future, such as children, old people, women, patients and so on.

**Author Contributions:** B.X. and M.D. conceived the layout, the rationale, and the plan of this manuscript. B.X. wrote the first draft of the manuscript. N.Z., M.D., Y.L., M.H., G.M., W.S. and Y.Y. edited the manuscript. All authors have read and agreed to the published version of the manuscript.

**Founding:** This research received no external funding.

**Acknowledgments:** This work was supported in part by in part by National Nature Science Foundation of China (61773124, 61773415), in part by National Key Research and Development Program of China (2016YFE0122700), in part by UK-China Industry Academia Partnership Programmer\276, the Science and Tecohnology Project in Fujian Province Education Department (JT180344/ JT180320/JAT170398), and in part by the Scientific Fund Projects in Fujian University of Technology (GY-Z17151/GY-Z17144). Y.Y. is supported by the Dixon Translational Research Grants Initiative from the Northwestern Memorial Foundation.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

1. Park, H.S.; Peng, Q.; Zhang, L.Q. A Portable telerehabilitation system for remote evaluations of impaired elbows in neurological disorders. *IEEE Trans. Neural Syst. Rehabil. Eng.* **2008**, *16*, 245–254. [CrossRef] [PubMed]


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Review* **Pain and Stress Detection Using Wearable Sensors and Devices—A Review**

**Jerry Chen 1, Maysam Abbod 2,\* and Jiann-Shing Shieh 1,\***


**Abstract:** Pain is a subjective feeling; it is a sensation that every human being must have experienced all their life. Yet, its mechanism and the way to immune to it is still a question to be answered. This review presents the mechanism and correlation of pain and stress, their assessment and detection approach with medical devices and wearable sensors. Various physiological signals (i.e., heart activity, brain activity, muscle activity, electrodermal activity, respiratory, blood volume pulse, skin temperature) and behavioral signals are organized for wearables sensors detection. By reviewing the wearable sensors used in the healthcare domain, we hope to find a way for wearable healthcare-monitoring system to be applied on pain and stress detection. Since pain leads to multiple consequences or symptoms such as muscle tension and depression that are stress related, there is a chance to find a new approach for chronic pain detection using daily life sensors or devices. Then by integrating modern computing techniques, there is a chance to handle pain and stress management issue.

**Keywords:** pain detection; stress detection; wearable sensor; physiological signals; behavioral signals

#### **1. Introduction**

Pain is a highly inter-variated and subjective feeling. What makes one person feel excessive pain may not be exactly same for another. In order to reach a general perception of pain, people have been constantly looking for a relevant scale or index try to quantify this sensation objectively for hundreds of years [1]. To extract more information that helps better understanding of pain, required numerous studies based on experiments and clinical observations. Since pain generated in both types of scenarios is linked to the same original sensation that is embedded in the human body, the mechanism of the pain is being clarified by conducting more and more experiments or observing the symptoms in the clinic. Back in 1846, when the first anesthetic (ether) was publicly demonstrated for general anesthesia by Morton at Massachusetts General Hospital in Boston (MA, USA), the majority thought the agony of pain has become history. However, looking back from now, that event might just be the beginning of our understanding of the mechanisms of pain. In general anesthesia during surgery, anesthesiologists use their knowledge of anesthetics to make subjects go into unconsciousness and block the sensation of pain for the purpose of performing the surgery more smoothly, but pain is a strong sensation that not only exists during surgery but also can exist potentially in any moment of our lives. It acts as more than just an unpleasant experience to everyone but also plays the role of a useful reminder to avoid potential injuries or tissue damage. Thus, pain research is not only about how to stop it, but more importantly what is the problem that this sensation is implicitly pointing to. Their causation of some pains is easy to identify, and thus can easily taken care of by treating the causative wounds or injuries. However, not all types of pain have a clear or obvious reason responsible for it. Other kinds of pain have no clear correlated injuries or wounds that need to be cured. Sometimes this type of pain is observed even after the original injuries have healed. Unfortunately, pain is also tricky to study for two reasons:

**Citation:** Chen, J.; Abbod, M.; Shieh, J.-S. Pain and Stress Detection Using Wearable Sensors and Devices—A Review. *Sensors* **2021**, *21*, 1030. https://doi.org/10.3390/s21041030

Academic Editor: Ki H. Chon Received: 25 December 2020 Accepted: 2 February 2021 Published: 3 February 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

pain-inducing tests are rarely viable in research; in fact, it is hard to design a pain-related experiment without ethical conflicts with human rights. For this reason existing paininducing tests (e.g., hot plate test, tail flick test, etc.) solely use animals as experimental subjects or focus on the relation between a pre-existing pain condition with some specific movement (e.g., back pain-inducing test [2] examines the pain condition during movements of lying supine, rolling over, and sitting up). The challenge of designing a proper pain-inducing test leads to the second difficulty, which is that pain detection is thus limited in the medical field or clinical aspects. Most pain-related research is done by observing patients in the clinics or during surgery. The devices used in measuring pain are usually expensive and only viable in hospitals and mostly used in surgery rooms. Thus, this review article aims first to emphasize the association between pain and stress and then, by adapting resourceful tests and wearable sensors-based detection techniques for stress detection, it aims to overcome the bottleneck of the universal pain detection problem.

#### **2. Review Scope**

Different from previous reviews that were centered around the compliance and usability different among self-report pain scaling [3] such as Visual Analogue Scale (VAS), Verbal Rating Scale (VRS) and the Numerical Rating Scale (NRS) in clinical use, this review also brought up other pain scaling approaches that are based not solely on subjective self-reporting but rather objective physiological signals for pain detection. Such equipment is only applicable in hospitals for inter- or post-surgery studies. Its potential to be universalized by wearable sensors is discussed in this review. On the other hand, although reviews on the topic of stress detection [4] are often based on physiological signals and algorithmic approaches for feature extraction, wearable sensor systems for stress detection are rarely discussed with pain detection and its integration for health monitoring systems in mind. Furthermore, the difficulties of applying pain-inducing experiments urges this paper to dive into the relations between pain and stress, their mechanisms, correlations, assessment, and applied medical devices and wearable sensors. The purpose of this review is summing up the modern physiological and behavioral-based techniques for both pain and stress detection. Then we also discuss the demands for a wearable-based monitoring system, the evaluation of the system and its possibilities to overcome the issues of pain management and stress management.

#### **3. Mechanism of Pain**

The mechanisms of pain are being clarified by more and more studies and research. The pain process is coming to be understood as a dynamic phenomenon [5]. The nociceptive signal travels from receptors (nociceptors) to peripheral nerves then to the spinal cord and then to cerebral structures where the thalamus transmits the signals to the somatosensory cortex, frontal cortex and limbic system. Although the sensation of pain is being carried by nerve fibers [6], different types of nerve are used for different sensations as shown in Table 1. The first kind of nerve fiber is A-alpha nerve fibers; its diameter is about 13−20 μm long. Its signal conduction speed is about 80−120 m/s and it is in charge of carrying information related to position and spatial awareness. The second type of nerve fiber are called A-beta; its diameter is 6−13 μm and it conducts touching signals at a speed of 35−75 m/s. The third type of nerve fibers are A-delta ones; despite the fact they have a smaller diameter (around 1−5 μm) and slower speed for conducting signals (5−35 m/s), information such as sharp pain and temperature are delivered through them. The last type of nerve fiber is C fibers which have the smallest diameter at around 0.2−1.5 μm and the slowest signal conducting speed at 0.5−2.0 m/s but they can carry information such as dull pain, temperature, and itching. The different speeds of signal conduction may cause the sensation of a sequence of signals in subjects. For example, since t A-delta fibers are larger and surrounded by myelin (a lipid-rich substance that acts as an insulator for nerve cell axons), when someone pricks their finger, they are expected to sense the sharp


sensation followed by a slower ache due to the difference of speed between A-delta fibers and C fibers.


On the path of transmitting pain to the brain, nerve fibers go through the dorsal horn that acts as a relay station or gateway for the signals [7]. Inside the spinal cord, the dorsal horn intervenes in the transmission of nerve signals; it either amplifies the nociceptive signal and pass it through or decreases the amplitude of the signal and ends it. This gatelike behavior, first proposed by Melzack and Wall, is known as the "gate control theory of pain" [8]. According to this theory, when a pain signal reaches the spinal cord and the central nervous system (CNS) it could be either amplified, reduced, or blocked by the system. This kind of condition is commonly observed in cases after subjects have experienced a severe injury and suffering paralysis of the lower limbs. The intervention of pain signals is also related to the types of nerve fiber. That is, signals carried by different nerve fibers have different priority in the sensation mechanisms at the spinal cord [9]. One instance is when people rub a wounded body part, which seems to attenuate the sensation of pain. This is due to the modulating effect of the counter-mechanism on large-diameter afferent fibers inhibiting the transmission and small-diameter afferent fibers facilitating the transmission [10]. Thus, when the A-beta fibers synapse is activated, it has the tendency to close the gate then mediate the sensation of C nerve fibers. However, there do exist some cases (e.g., phantom limb pain) that gate control theory alone cannot explain. It also involves the mechanism of the brain [8].

Finally, besides the interactions between nerve fibers and the spinal cord, there are other factors that may deviate the perception of pain. Factors that could affect the perception of pain are emotions [11] and psychological state [12]. Different mindsets and expectations toward the pain could either enhance the pain experience or reduce it. Personal beliefs and values under social or cultural influences may alter the perception of pain or vice versa [13]. Physical state changes (e.g., age, health status, etc.) could also worsen the perception of pain [14].

#### **4. Classification of Pain**

Generally, pain is evaluated in multiple aspects such as the location of the pain, the possible causes, the frequency of the pain occurrence, its intensity and the period of the pain. Classifying pain benefits the communication between patients and clinicians which hence facilitates the assessment task, helps formulate treatment planning and increases the precision of diagnoses in the clinic. In reality, however, not all pains have clear causes linked to them or have an adequate treatment for the pain. Some pains might have apparent causes but no adequate treatment (e.g., deep tissue disorders, peripheral nerve disorders, etc.) while pains like trigeminal neuralgia have adequate treatment without the causation being known. Then there are other pains that neither have clear causes nor treatment such as back pain and fibromyalgia [15]. Due to the complexity of pain causes and adequate treatments, there exists numerous methods for the classification of pain. Such classifications are expanding and new types of pain are being overserved in the clinic. The classification itself can sometimes ironically be confusing to clinicians [16]. However, there is a general consensus on pain classification that is agreed upon by a majority of researchers and clinicians.

#### *4.1. Classification of Pain by Its Mechanisms*

One of the most common classification techniques is based on the different mechanisms that originate the pain [17], which classifies it into the following types:

#### 4.1.1. Nociceptive Pain

Pain caused by injuries to body tissues is classified as nociceptive pain. This is the pain that comes after a cut, burn or fracture type of body tissue injury. Other pain of this type can commonly be observed in subjects who have undergone surgery during the postoperative period. This type of pain is described as aching, sharp or throbbing. Since the pain is caused by a body tissue injury, any movement (e.g., coughing, touching, etc.) related to the injured part can amplify the pain sensation.

#### 4.1.2. Neuropathic Pain

Neuropathic pain originally meant the pain caused by a primary lesion, the dysfunction or transitory perturbation of nerves or the peripheral or central nervous system until it was redefined by International Association for the Study of Pain (IASP) taxonomy as "pain that caused by a lesion or disease of the somatosensory system". Neuropathic pain is not a single disease; it is a syndrome caused by different diseases and lesions for which some of the underlying mechanisms might be unknown [18]. Sometimes neuropathic pain is depicted as a burning, tingling, and numbness sensation. People suffering neuropathic pain can also feel excessive pain from minor stimuli such as a light touch.

#### 4.1.3. Nociplastic Pain

Nociplastic pain are defined in the IASP 2017 taxonomy as "pain that arises from altered nociception despite no clear evidence of actual or threatened tissue damage causing the activation of peripheral nociceptors or evidence for a disease or lesion of the somatosensory system causing the pain". Nociplastic pain is a relatively new term compared with nociceptive pain and neuropathic pain. In fact, there was only nociceptive pain and neuropathic pain before the IASP added the third mechanistic descriptor to its taxonomy. The call for the third mechanistic descriptor was to fill the lack of a proper valid pathophysiological descriptor among patient groups having fibromyalgia, complex regional pain syndrome (CRPS) type 1, or other instances of "musculoskeletal" pain and functional visceral pain disorders. As stated in [19]: "This group comprises people who have neither obvious activation of nociceptors nor neuropathy but in whom clinical and psychophysical findings suggest altered nociceptive function". However, signs of this altered nociception have not yet been characterized by IASP [20] and it requires more studies on patients suffering from chronic pain. Furthermore, there is also a proposal for definition modification of nociplastic pain as "pain that arises from altered nociceptive function" in [21].

#### *4.2. Classification of Pain by Its Time Period*

The most common classification is concerned with the time duration of the pain. That is, by observing how long the symptom lasts, the pain could also be divided in two types: acute pain and chronic pain [22,23].

#### 4.2.1. Acute Pain

The term acute pain often refers to the occurrence of damage to tissues. It is a shortlived pain that works as a warning sign from the body. In most cases (i.e., broken bones, surgery, dental work, labor and childbirth, cuts, burns), the pain last fewer than six months and disappears once the injury or disease is cured or healed.

#### 4.2.2. Chronic Pain

The features of chronic pain (e.g., osteoarthritis, frequent headaches, low back pain, etc.) are its long duration and its complicated mechanism. It usually lasts for more than

six months, even after the original injury was healed. Due to its stubbornness, people living with chronic pain may develop symptoms of anxiety, depression, or other conditions (e.g., tense muscles, lack of energy, limited mobility, etc.).

#### **5. What Is Stress and what is Its Correlation with Pain?**

Pain and stress are interleaved and connected in many ways and the consensus of the highly intertwined relations between these two mechanisms has been established in [24]. Stress is a feeling of emotional strain and psychological stress and a state of threatened homeostasis and a reaction that breaks the balance of physiological processes. The sources of stress could be the experience of the pain or the consequences of ongoing pain or other psychological reasons. Experiencing pain is stressful enough, especially when it is not negligible, but what is worse is when the pain lasts for a longer time, then it could lead to a vicious cycle. For example, people suffering back pain could easily develop stress by further induced muscle tension or spasms [25]. The muscle tension produces more pressure on the nerves, not only causing more pain and stress but also squashing the nerve harder and exacerbating the pain. On the other hand, the consequences of ongoing pain usually last longer than half a year and this kind of chronic pain would have a greater impact to the patient's quality of life [26]. The stress and frustrated mood brought by the chronic pain are already tough, but the restricted movement or low physical activity in fear of amplifying the pain furthermore are even worse in these situations. People in the fear of being in pain tend to avoid any potential movement that does or may induce the pain. The avoidance and anticipation of pain that causes a lot of stress is the beginning of the vicious cycle. This kind of symptom are called "pain catastrophizing" [27]. It is a negative cognitive-affective response toward actual pain or the anticipation of pain. Furthermore, experiencing stress could also affect the endocrine system balance and then induce endocrine disorders which are linked back to chronic pain [28].

Upon encountering stress, the human body would respond with three components: adrenal medulla, hypothalamus and pituitary gland. These three components constitute the so called hypothalamic-pituitary-adrenal axis (HPA) which react to the stress by releasing hormones (the adrenal medulla could release norepinephrine) helping or exciting other parts of the organism through the sympathetic nervous system (SNS) [29]. When the SNS is activated, the subject's heart rate and blood pressure would increase in a short period of time; their breathing may get faster, adrenalin levels raise as do the blood sugar and cholesterol levels. The blood flow would also be redirected from lower priority organs such as the organs in the digestive system to higher priority vital organs such as the heart and the brain. The function of the immune system would be driven up since there is an immediate danger and the body needs to handle the "fight or flight" situation [30]. However, if this situation cannot be resolved immediately (e.g., chronic pain), this selfprotecting mechanism might harm the body instead and becomes maladaptive in the long term. Excessive or prolonged activation of the SNS causes muscle tension, headaches, high blood pressure or even promotes the development of cancer [31]. People who are physically inactive due to a stress state could rather end up with depression.

#### **6. Assessment for Pain and Stress**

#### *6.1. Pain Assessment*

Pain is gradually being accepted as the fifth vital sign [32] since this was firstly proposed by American Pain Society (APS) in 1996. Different kinds of pain (i.e., acute pain or chronic pain) are assessed separately and serve different purposes. The assessment for acute pain is to avoid provoking the pain onset and to monitor the effect of the suppressant that is used. Contrarily, the goal of assessing chronic pain is collecting related signs in the early stages or to gather enough symptoms to track down the origin of the pain. In practice, there are multiple scales and measures that are helpful for tracking painrelated treatment outcomes. These kinds of measurement are resources for clinicians to select a treatment plan and validate the treatment effects. Commonly used measures for

pain are: (1) Self-report measures: self-report measurement is a subjective score related to pain given by the subject ranging from 0 (no feeling of pain) to 10 (extreme pain). It usually refers to a numerical pain rating scale, and similar measurements are VAS [33]; (2) Physical performance tests: the 5-minutes walking, stair-climbing task, 15 meters walking, sit-to-stand and loaded forward-reach test [34] and the Abbey Pain Scale for the non-verbal individuals (e.g., patients with dementia) [35]; (3) Physiological response measures: the physiological and autonomic response measures are the most objective and physiological approach to pain. By observing the changes of multiple physiologic signals such as skin conductance and heart rate and other signals, researchers can formulate a valid index for pain evaluation (e.g., analgesia nociception index [36]). However, the correlation of such measurement with pain are still under debate. Since the idea of using such measurement is to apply the physiological signals that are the subject of the activation of the automatic nerve system. However, the activity of the automatic nerve system (mainly about the balance of sympathetic nervous system and parasympathetic nervous system) may be reacting not only to pain but also other factors. Thus, the physiological measurement in most cases is used in the surgical room where the subject is unconscious so the physiological approach is the only way to obtain any relevant information for pain monitoring.

To date, numeric pain scales based on the patient's self-reporting is still the easiest and most popular assessment for pain. However, the lack of an objective assessment for pain may cause the overuse of opioids and to their addiction in clinic. This problem could further lead to opioid-related unintended deaths [37].

#### *6.2. Stress Assessment*

Similar to pain assessment, one way to assess stress is by a self-report scale in a clinical environment. Rather than a subject filling out a questionnaire, the VAS provides a rapid quantitative assessment in a 10-points range [38]. Since stress is defined as a state in which homeostasis is threatened, the adaptive processes that are activated would cause both physiological and behavioral changes. In order to comprehend this stress response mechanism, numerous studies have been conducted observing the physiological and behavioral changes in the body under stress induction tests. The observation of physiological or pathophysiological changes in response to stress is fundamental to the development of novel pharmacological agents for stress management [39]. In the rapid development of modern society, people are dealing with stress and work fatigue on a daily basis; thus, like pain assessment, stress assessment could also benefit from observation during daily life. Without further inducing stress to the subjects, such cases are associated to fatigue and work-related stress [40] in the concern about the mental and physical health of employees.

#### Stress Induction Tests

Setting up stress-inducing scenarios helps researchers collect and validate the stresscorrelated physiological signals or behavioral signals. These stress induction tests usually involve asking the subject to finish a certain task or perform a certain action in a specific condition designed by researchers; then the researchers could conclude which signals are related to stress by monitoring the changes of signals during the tests.

#### • Trier Social Stress Test

The original Trier Social Stress (TSST) consists of an anticipation period and a test period for 10 minutes each [41]. During the test, subject is told to take role of a job applicant and prepare for a 5-minutes speech. An audience of three persons plays as interviewers and managers. The subject must convince the interviewers of his/her suitability for the imaginary job without touching any topics that is previously noted before the test. If the subject finishes his/her presentation early, he/she will be asked to continue by the interviewers. Then after the speech period is over, the subject is asked to do a mental arithmetic which is counting down numbers from 1,022 in steps of 13. Once the subject

makes a mistake, he/she will need to start all over from number 1,022 again. Following the mental arithmetic test, the subject is given details of the experiment and will be allowed to take a rest while his/her physiological signals are still under monitoring. There are other kinds of TSST variations such as using the virtual reality (VR) to reduced cost [42].

• Stroop Color-Word Inference Test

The idea of the Stroop color-word inference is asking the subject to read out the color of the words while that word is printed in different color of the word is represented literally. The stress is induced by the contradiction between the linguistic and visual perceptions. The Stroop color-word test has been widely used in psychology for a long time [43]. It has the advantages of high reliability and stability in measuring for individual differences with only relatively simple rules. The performance of Stroop color-word test also has a positive outcome in VR environment [44].

• Cold Pressor Test/Hot Water Immersion Test

In the cold pressor test, the subjects are asked to put their hand into a bucket of cold water and keep there it as long as they can. The subjects should notify the researcher when they first feel that the cold water starts causing pain to their hands. Then at any time after the first notification is given, the subjects are free to remove their hands when they feel the pain is unendurable. Then according to the timing of two notifications (i.e., when does the subject start feeling pain and when they remove their hands due to the intolerable pain) and the continually collected blood pressure and heart rate, researchers can further analyze the physiological features of stress. The cold pressor/hot water immersion test are basically the same and the only different is the temperature of the water that is being used in the test. The cold pressor test is efficient experimental stress induction [45] which has been observed to reliably increase HPA activity [46].

• International Affective Picture System Test

In psychological studies, one of the most common tests for emotion and attention research is the International Affective Picture System (IAPS). By providing pictures ranging from simple daily objects to extreme pictures that involve violent or erotic contents, the test induces stress or emotion in the subject. A relevant application is used to detect IAPS stress levels in human pilots [47].

#### *6.3. Physiological Signals for Assessment*

#### 6.3.1. Heart Activity

Since stress causes fundamental disturbances in the autonomic nervous system (ANS) which has major effects on heart activity [48], some useful detection methods for stress are based on heart-related signals [49]. Heart activity could be represented by an electrocardiogram (ECG), which is recorded by measuring the electrical activity of heartbeats. Usually, a normal heartbeat includes three distinguishable waves: the P wave, QRS complex wave and T wave. Most of the studies on heart activity are related to three aspects of the heart: time domain, frequency domain and non-linear features of heart. The research in the time domain focuses on parameters such as heart rate (HR), inter-beat (RR) intervals and heart rate variability (HRV) [50]. For RR intervals, it could be further studied as its mean value, the standard deviation or root mean square. Frequency domain studies analyze the components in the low-frequency (LF), high-frequency (HF) or the LF/HF ratio. As for the non-linear features there are algorithms such as entropy, complexity, Poincare Plots [51], recurrence and fluctuation slopes.

#### 6.3.2. Brain Activity

The brain activity is recorded as the electroencephalogram (EEG) for brain-related research (e.g., emotion changes, stress-related studies [52] or consciousness studies for anesthesia, etc.). The four bands of the EEG signal are alpha (8−13 Hz) which indicates the sign of calmness and balanced state of mind; beta (13−30 Hz), which is related to emotional

and cognitive processes which correlates to stress; delta (0.1−4 Hz) which is associated with deep sleep stages (e.g., high brain activity in this range are being viewed as a sign of unconsciousness) and theta (4−8 Hz) that generates the theta rhythm which is a neural oscillation in the brain that is linked to interpretation of cognition [53] and behavior such as learning, memory and spatial navigation.

#### 6.3.3. Muscle Activity

Muscle tension usually comes along with stress. Researchers are studying changes of muscles activities in human body under certain stress-presented activities [54]. The state of muscles like stretching and releasing could be monitored by electromyogram measurements. Electrodes placed on certain areas of muscle could detect the potential changes due to the locomotion of the body. After obtaining the measurement of muscle activity, statistical techniques can be used to enhance the understanding of the signals. Such applications are often referred to "myomonitoring" and can be adopted by studies in the interest of muscle tension (e.g., monitoring mandibular closure maximum intercuspation of the teeth [55]), and muscle fatigue (with a sonic approach [56]).

#### 6.3.4. Electrodermal Activity

EDA is a useful indicator for neurocognitive stress by giving the change of electrical properties of skins. When a stress-inducing scenario is applied on a subject, the body is expected to start sweating; this further increases the skin conductance [57]. The long-term shifts in tonic level are called skin conductance level (scl); and the transient responses within seconds are the galvanic skin response (GSR). The tonic and phasic measurement are the two main aspects of EDA. One example of using EDA to validate pain stimulation is given in [58]. Posada-Quintero et al. proved that thermal grill stimulation is highly correlated with VAS. In this research, the observed EDA also shows significant increases as the stimulation level goes up. A systematic review of EDA data collection and signal processing presented in [59] by Posada-Quintero and Chon provides a summary of EDA recording devices, signal analysis methods, and the synthesis framework for EDA-related research.

#### 6.3.5. Blood Volume Pulse

Blood volume pulse (BVP) provides the changes of volume in blood between each heartbeat and it fluctuates along with the changes in heartbeats. BVP is measured by optical, non-invasive sensors by comparing the light absorbed by the blood. Xie et al. used BVP for identifying strong stress and weak stress [60].

#### 6.3.6. Skin Temperature

Stress influences both the core and peripheral body temperature [61]. While the core temperature tends to rise in response to stress, the distal skin locations tends to decrease. When acute stress is present, it triggers peripheral vasoconstriction which causes a rapid drop in skin temperature [62].

#### *6.4. Behavioral Signals for Assessment*

#### 6.4.1. Speech

Voice patterns can be quite different for a person under stress or not. Multiple features related to the voice patterns could be altered when stress is present such as the changes in pitch [63], tone and speaking rate, or even the words they choose in the speech.

#### 6.4.2. Facial Expressions

Natural habits, such as facial expressions are the reflection of the psychological state and the indication of emotion that a person is experiencing. Lots of researchers are trying to capture these subtle-signs and correlate them with stress situations through facial electromyography (EMG) [64] or image recognition based on facial expression [65].

#### 6.4.3. Keystroke and Mouse Dynamics

In modern times, most people use computers on a daily basis. The way they use them could provide clues about their mental state or emotions in the present. The speed of typing and mouse dynamics could provide useful clues to indicate whether a person is stressed or not [66,67]. The use of excessive strength when hitting a keyboard and clicking a mouse is no doubt an obvious sign of an upset mind.

#### 6.4.4. Body Gestures and Movements

A stressful person also displays signs of their state of mind with their body actions such as jaw clenching, constant finger rubbing, or even posture changes while they are standing or sitting. Multiple behavioral features are provided in [68] for stress detection.

#### 6.4.5. Mobile Phone Usage

A person experiencing stress could either choose to fight it or flee from it. Using a cell phone might provide an easy way to forget about the stress. By distracting themselves with various features built into a smartphone, the stress may seem to fade away temporarily. Mobile phone addiction could also be a sign of anxiety [69] which is common symptom for people under stress. One study [70] finds a significant correlation between mobile phone use and stress.

#### 6.4.6. Questionnaires and Surveys

Questionnaires and surveys are already being widely used in psychological research for assessment of psychological state. By asking subjects questions that serve to identify a specific mental approach, subjects might expose their deepest concerns or stress on their minds. Sometimes, even the subject could not know the source of their own stress which requires questionnaires or consultation by a psychologist to unravel.

#### **7. Medical Devices or Wearable Sensors used in Pain and Stress Detection**

In this section, the common devices and wearable sensors that are suitable for detecting pain and stress are organized. Pain within a short time-period usually can be located by the person themselves or be detected by a clinician in a clinical environment. Even in surgery when the subject is unconscious, there are devices to provide a valid index for the degree of pain being experienced [71]. However, sometimes pain like chronic pain or stress are intermittent rather than constant. This features highly irregular seizure timings which are hard to detect in a specific time window using traditional medical devices or clinical assessments; in fact, relevant diagnoses are mostly dependent on self-reporting which is based on the subject's memory to get any relevant information. In this case, wearable sensors could be used to collect data when subjects are not in the clinic and monitor both physiological and behavioral signals for longer periods which provides useful information for clinicians [72].

#### *7.1. Medical Devices Used in Pain Detection*

Nociception is the most relevant and effective approach for pain detection. Even though pain is a subjective perception, nociception is a physiological reaction to the nociceptive stimuli; this nociceptive stimulus is based on the reaction of the autonomic nervous system (i.e., the balance of the sympathetic nervous system and parasympathetic nervous system). Then by analyzing the corresponding physiological signal variation caused by the activity of autonomic nervous system, researchers manage to formulate an index useful as a pain reference. A popular way to monitor the balance of the sympathetic nervous system and parasympathetic nervous system is by analyzing the heart rate variability. Other methods use the number of skin conductance fluctuations per second (NFSC), the size of pupil and its variability when illuminated, blood vessel contraction, EMG, EEG, and changes of body temperatures, etc. Heart rate variability and plethysmography are widely used in both types of research since these two signals are easier to obtain during surgery and they

are highly sensitive to the activity of autonomic nervous system. The two most common medical devices used in assessing pain are the analgesia nociception index that is based on the heart rate variability and the surgical pleth index that is based on plethysmography.

#### 7.1.1. Analgesia Nociception Index

The analgesia nociception index (ANI) is a technology that provides a measurement of the parasympathetic tone on a scale from 0 to 100. It has been used in the surgical room or post-operative room for pain assessment. By analyzing the electrocardiographic data which reflecting parasympathetic activity, the ANI provides a reference to the sympathetic/parasympathetic balance that allow doctors to control surgical stress. The ANI has shown a correlation to the self-rating system in the postoperative period after volatile agent and opioid-based anesthesia in [73]. Since this index is solely based on the physiological signals rather than being a self-rating system, it could be applied to patients under general anesthesia or in critically ill condition who have communication problems. Besides, in a study of ANI in pain-related surgical conditions (e.g., during labor [74], laparoscopic abdominal surgery [71]), there is also other research such as [75] that find the relations between ANI and emotional status. Basically, mechanisms that are correlated to parasympathetic changes could adopt ANI as a measurement tool.

#### 7.1.2. Surgical Pleth Index

The surgical pleth index (SPI) is a digital monitor based on the patient's hemodynamic responses to surgical stimuli and analgesic medications during general anesthesia [76]. SPI measures the sympathetic activity as a reaction to painful stimuli; it creates a single index (using a scale from 0 to 100) by integrating the photoplethysmographic amplitude and photoplethysmographic pulse interval with algorithms. The SPI has been proved to correlate with self-reporting systems [77]; however, the appropriate selection of SPI target values has not been established yet. The representation of different score ranges of SPI (e.g., prediction of moderate-to-severe postoperative pain [78]) is still a major topic in the research field.

#### *7.2. Wearable Sensors Used in Stress Detection*

Assessments for stress detection could benefit from the proper use of wearable sensors for data collection based on the physiological/behavioral signal of interest. The useful physiological signals that are of interest in stress detection research are heart activity (ECG), brain activity (EEG), muscle activity (EMG), skin conductance (EDA), BVP, and skin/body temperatures and relevant wearable sensors and devices for stress detection are organized in Table 2. Most of the behavioral signals could be obtained by smartphone sensors [79] or recorded by video cameras for image analysis and voice analysis. Each sensor used in stress detection is listed in the following paragraphs, with additional details of the sensor placement illustrated in Figure 1.



**Table 2.** Wearable sensors used in stress detection.

Notes: Empatica E4 wrist band is used in [80–83]; AutoSense is used in [84–86]; SleepSense is used in [87,88]; BN-PPGED is used in [89]; Cardiosport TP3 is used in [90]; Q-sensor is used in [70]; Wahoo chest belt is used in [91]; BioHarness 3, Shimmer sensor, and MindWave mobile EEG headset are being used as an integrated system for stress monitoring in [92]; DataLOG is used in [93]; Device 1 is a EEG wearable sensor developed in Online Predictive Tools for Intervention in Mental Illness (PTIMI) project funded by European Union [94]; Device 2 is a noninvasive physiological sensor for stress assessment presented in [95]; Device 3 is used in [96] which they collect the EMG signals of the left trapezius muscle and then remove the contained ECG signal components.

**Figure 1.** Illustration of sensors placement on human body.

#### **8. Wearable Sensors in Healthcare**

Wearable sensors are under rapid development and have been applied in many fields through the years, especially in healthcare. Recently, due to the severe pandemic, wearable sensors are also being used for COVID-19 detection [97]. The expanding needs of wearable sensors in healthcare domain is motivated by the increasing healthcare costs; the success application of wearable sensors in such domain is the result of advanced technologies of microelectronics and wireless communication [98]. The increasing healthcare costs are highly related to the ageing world population [99]. Constant monitoring for physiological and psychological advantages chronic diseases prognosis and detection in the early stage. A real-time feedback information about subject's health status could greatly improve the accuracy and capability to identify abnormal condition and prevent it in advance. Wearable sensors are the applicable solution for such tasks; in fact, a type of system based on wearable sensors are called the wearable health–monitoring systems (WHMS) which can be adopted for supervising subjects that are elderly people, have chronic disease or special abilities [100].

According to Alexandros et al. [98], WHMS are founded with a base of various types of miniature sensors, wearable or implantable for the purpose of measuring physiological signals. The collected parameters are transmitted through wireless or wired link to a central node (e.g., microcontroller board) to processing and display the information to the users. Then, these aggregated vital signs can be sent to the medical center for further analyses and diagnoses by medical professionals. A WHMS has multiple components (i.e., sensors, wearable materials, smart textiles, power supplies, communication modules, central processing unit (CPU), software and advanced algorithms, etc.). Yet, it also has to meet several criteria for practical use such as light weight and small size in order to enhance its comfortableness, and low radiation and mild heat dissipating to ensure safety. Finally, appropriate security during the data transmission for privacy concerns. After all of the above requirements are satisfied, the overall WHMS also has to be low-cost and affordable for majorities or even the underprivileged minority. Several available WHMS were reviewed by Alexandros and Nikolaos [98] according to the maturity level of the system. The maturity level of system is evaluated to their ability to measure multiple parameters, the detail level of the data documentation, the popularity of the system based on its citation number, the applied

hardware technologies in the system and the algorithms used for feature extraction and decision support. A maximum maturity level WHMS also must face problems like battery lifetime shortage and private information security. However, integrating with modern techniques such as cloud computing, fog computing resources, the WHMS can further improve their performance with minimum cost on electronic components, processing unit and even save more space for individual sensors. Such integrations [101] release the frontier data collecting unit from massive computing tasks which may free more space and drop power consumption to the sensors. Then using smart phone as data transmission and processing relay station to construct a more comprehensive and compatible platform, and with federated learning an explicit data could be reproduced to implicit data which reduce the risk of data leakage and accelerate the process for optimizing diagnosis model.

#### **9. Discussion**

Pain and stress are useful mechanisms that help humans survive and is also a part of evolution. With the proper induction of the unpleasant feeling, individuals could sense the danger that is happening or might be harmful to its body; it is especially useful when the danger that caused the threat is beyond the individual's knowledge. Pain and stress serve as warning signals to acknowledge any incidents that are potentially harmful or fatal. This mechanism also prevents the same or similar incidents that might happen again by introducing stress before the harmful incident can cause further damages. This kind of experience based on threat learning could help to address threats and are useful but not harmful to the body in the short term. However, under the rapid development of human society, these mechanisms for survival instinct are no longer needed as much as they used to be; contrarily, the downside of these two mechanisms in the long term has gradually surfaced. Pain-inducing tests are much less effective compared to stress-inducing tests, and any experiment that may induce pain affects human rights. Thus, a possible solution for pain monitoring might rely on more research to find out more details about the relationship between pain and stress, then using WHMS is expected to resolve the pain management issue.

Pain management is a raising issue worldwide [102]. The access to pain management has been defined as a human right, despite the differences in social status and economic condition. Everyone should be able to be free from suffering pain, but in reality, it is truly sad that not everyone could afford a physical examination in the hospital or treatment resources. Pain management is also a public health issue [103]. In addition, in developed countries, the aging population also must face chronic age-related diseases [104]. On the other hand, stress management also demands people's attention as it is a common issue in modern society. Stress management techniques and relevant education are all necessary for students [105] and for workers [106], for everyone to meet their needs.

Fortunately, with the help of well-developed techniques, devices for monitoring pain or stress are becoming more and more accessible. The use of wearable sensors may allow the diagnose of pain and stress no longer restricted to hospital but everywhere by online doctors or artificial intelligence (AI) models.

#### **10. Conclusions**

Pain is an annoying feeling that everyone must have experienced; yet it is a subjective sensation to everyone. What is considered painful by one person might not be interpreted the same by others. The mechanisms of pain are so far being understood as a signal that travels from receptors to peripheral nerves, to the spinal cord, then to cerebral structures. Different nerve fibers have their own priority and duty for carrying sensations. There are a few classifications for pain according to their characteristics and mechanisms. One way to classify pain is by using the time-period of the pain duration. If the pain lasts less than 6 months, it is called acute pain; otherwise, if it lasts more than 6 months, it is known as chronic pain. Acute pain usually comes with a specific causation and the perception of it is constant until the causation to the pain has been removed or the causing injury healed.

Chronic pain, on the contrary, does not affects the subject for a much longer time but the symptoms are often intermittent, which raises challenges to detect or to find the origins of that pain. Moreover, having a persistent pain issue leads to the development of stress which is also hard to detect or treat. A person suffer either from pain or stress could end up with both, due to the vicious cycle. Luckily, with multiple physiological signals and behavioral signals collecting by wearable sensors, there is hope for detection to seek moderate treatment in the early stage. Available physiological signals for stress detection on wearable sensors are heart activity, brain activity, muscle activity, electrodermal activity, respiratory, blood volume pulse, skin temperature. Furthermore, wearable sensors canwork with multiple components (e.g., communication modules, CPU, advanced algorithms, etc.) to construct a wearable health-monitoring system for chronic disease detection and health status monitoring.

This article has presented the mechanisms of pain and stress, the correlation between them, their assessment, and their detection devices as well. Finally, wearable sensorbased health-monitoring systems are presented and discussed in the hope of solving the imbalanced resources global-wide for diagnosing pain and pain treatment issues. The low cost and easy to use features of wearable sensors might provide a perfect solution for this. Awareness about the importance of pain management is rising along with the promotion among humanity. Integrated with AI algorithms and cloud computing resources, wearable sensors could act as more than a component that collects data but as a foundation of a health monitoring and treatment system. Furthermore, by analyzing and quantifying pain and stress, they provide an opportunity to deal with the worldwide issues of pain t and stress management.

**Funding:** This research was funded by the Ministry of Science and Technology (MOST) of Taiwan (grant number: MOST 107-2221-E-155-009-MY2)

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Review* **A Survey of Heart Anomaly Detection Using Ambulatory Electrocardiogram (ECG)**

#### **Hongzu Li \* and Pierre Boulanger**

Computing Science Department, University of Alberta, Edmonton, AB T6G 2R3, Canada; pierreb@ualberta.ca **\*** Correspondence: hongzu@ualberta.ca

Received: 31 January 2020; Accepted: 2 March 2020; Published: 6 March 2020

**Abstract:** Cardiovascular diseases (CVDs) are the number one cause of death globally. An estimated 17.9 million people die from CVDs each year, representing 31% of all global deaths. Most cardiac patients require early detection and treatment. Therefore, many products to monitor patient's heart conditions have been introduced on the market. Most of these devices can record a patient's bio-metric signals both in resting and in exercising situations. However, reading the massive amount of raw electrocardiogram (ECG) signals from the sensors is very time-consuming. Automatic anomaly detection for the ECG signals could act as an assistant for doctors to diagnose a cardiac condition. This paper reviews the current state-of-the-art of this technology discusses the pros and cons of the devices and algorithms found in the literature and the possible research directions to develop the next generation of ambulatory monitoring systems.

**Keywords:** Review; ECG; Signal Processing; Machine Learning; Cardiovascular Disease; Anomaly Detection

#### **1. Introduction**

According to the World Health Organization (WHO), cardiovascular diseases (CVDs) are the number one cause of death globally. An estimated 17.9 million people die from CVD, representing 31% of all global deaths. Four out of five CVD deaths are due to heart attacks and strokes, and one third of these deaths occurs prematurely in people under 70 years of age [1]. An electrocardiogram (ECG) can record a patient's heart electrical signal activities over a long period [2] by measuring voltages from electrodes attached to the patient's chest, arms, and legs. ECGs are a quick, safe, and painless way to check for heart rate, heart rhythm, and signs of potential heart disease.

A twelve lead ECG is today's standard tool and is used by cardiologists for detecting various cardiovascular abnormalities. However, heart problems may not always be observed on a standard 10-second recording from the 12-lead ECG measurements performed in hospitals or clinics. Therefore, long term ECG monitoring that tracks the patient's heart condition at all times and under any circumstance has become possible with the development of new sensing technologies. Portable ECG recording devices such as the Apple Watch [3], AliveCor [4], Omron HeartScan [5], QardioMD [6], and, more recently, the Astroskin Smart Shirt [7] are revolutionizing cardiac diagnostics by measuring a patient's 24/7 cardiac activities and transmitting this information to a cloud service to be stored and processed remotely.

By itself, this massive data set is not very useful to the medical community as they usually do not have enough time or resources to read through long ECG recordings (two to three weeks) to detect any possible heart problems. For this technology to work, new automatic and reliable heart anomaly detection algorithms must be developed to assist doctors in coping with this massive data set. Our aim in this paper is to review the current state-of-the-art that address the challenges of performing ambulatory ECG anomaly detection and to highlight possible solutions. In order to do so, we will first review the medical background associated to ECG analysis and then review the state-of-the-art of automated anomaly detection in ambulatory and non-ambulatory contexts. We will then conclude by discussing the possible research directions to develop the next generation of ambulatory monitoring systems.

#### **2. ECG Monitoring and Its Signals**

#### *2.1. Standard 12-lead ECG*

A standard 12-lead electrocardiogram provides views of the heart in both the frontal and horizontal planes and views the surfaces of the left ventricle from 12 different angles. A 12-lead ECG has six limb leads (I, II, III, aVF, aVL, and aVR), and six chest leads (V1–V6). The standard 12-lead ECG is used as a standard clinical dysrhythmia analysis tool for chest pain or discomfort, electrical injuries, electrolyte imbalances, medication overdoses, ventricular failure, stroke, syncope, and unstable patients. It is widely used in clinics and hospitals for heart disease diagnosis [8]. However, when the patient needs to be monitored continuously, a 12-lead ECG is impractical as the patient needs to be attached to 10 electrodes.

#### *2.2. Three-lead vs. 12-lead ECGs*

Due to the fact that the standard 12-lead ECG is impractical for continuous ECG recording, therefore, three (3)-lead ECGs are widely used in portable ECG devices for a 24-h recording. Frank's lead system [9] is a 3-lead system that is practical for clinical use. In addition, much research has been done to show that a 3-lead ECG is useful to make a valid diagnosis. Antonicelli [10] was able to validate the accuracy of a 3-lead telecardiology (tele)-ECG compared to a 12-lead tele-ECG in an older population. Their study demonstrated a high level of concordance between the ECG diagnosis using a simple home telecardiology device (3-lead tele-ECG) and more complex instruments, like the 12-lead tele-ECG, as well as the standard 12-lead ECG. The study also demonstrated that a simple 3-lead tele-ECG could be used to detect cardiac alterations, such as arrhythmias, atrioventricular blocks, and re-polarization abnormalities, with good agreement with the observations measured by a 12-lead tele-ECG and the standard 12-lead ECG.

Kristensen et al. also evaluated how well an inexpensive portable three-lead ECG monitor (PEM) can detect patients with atrial fibrillation (AF) compared to a standard 12-lead ECG [11]. In their study, the results demonstrated that the sensitivity of diagnosing AF using PEM recordings was 86.7% and the specificity was 98.7% when compared to a 12-lead ECG. According to cardiologists, the misclassification of three PEM recordings was due to interpretation errors and not related to the PEM recording. In their article, they concluded that PEM devices could be used to diagnose AF. Dehnavi et al. performed an analysis of 3-lead vectorcardiogram (VCG) signals for the detection of cardiovascular diseases [12]. In the study, the authors experimented with detecting ischemia using a VCG algorithm using 3-lead and 12-lead ECGs and demonstrated a similar performance in both cases.

Furthermore, many researchers have tried to reconstruct a 12-lead ECG signal from 3-lead signals. Piotr Augustyniak reviewed and compared two transformation functions between 3-lead VCG and 12-lead ECGs [13]: the Dower and Levkov transformations. The author then tested how a synthesized 12-lead ECG and a VCG compared to an actual 12-lead ECG. The results showed that the synthesized 12-lead ECG was 10.08% distorted and the synthesized vectocardiogram was 6.347% distorted. Atoui [14] introduced a neural network-based model that could derive a standard 12-lead ECG from a serial 3-lead ECG. As a result, the derived 12-lead ECG from the ANN model has an average correlation coefficient of 0.93 compared to the actual 12-lead ECG.

Figueriedo et al. proposed using a 3-signal-lead sensor to synthesize a 12-lead ECG [15]. The authors used a linear equation to combine the collected signals from a 3-signal-lead sensor to output the 12-lead ECG. H. Zhu et al. proposed a novel, lightweight synthetic method [16], which could reconstruct the standard 12-lead ECG from 3-leads: I, II, and V2. The proposed method is called the

adaptive region segmentation-based piece wise linear (APSPL) method. It consists of adaptive region segmentation, linear regression operation, and ECG sequence restoration.

Moreover, Nelwan et al. [17] and Drew et al. [18,19] have done several studies demonstrating that it is possible to reconstruct a standard 12-lead ECG from a reduced lead set ECG. I. Tomasic et al. performed a study [20] to investigate how a regression trees algorithm can be used to transform a 3-lead ECG into a synthesized 12-lead ECG. Their study demonstrated that the regression trees algorithm can synthesized an accurate 12-lead reconstruction and that the reduced ECG lead set, contains enough information to detect most heart anomalies.

#### *2.3. Normal ECG Signals*

To detect anomalies on ECG signals, one must first know what a normal heartbeat looks like. In [8], a normal rhythm (see Figure 1) is defined as the result of an electrical impulse that starts from the sinoatrial (SA) node, propagates through the heart muscles, and then to the patient's chest. A normal rhythm is composed of the following segments in sequence: a P wave generated by the atrial depolarization, the QRS complex generated by the ventricular depolarization, and a T wave and U wave generated by ventricular re-polarization. In normal ECG signals, the P wave, QRS complex, and T wave should be similar over time at a frequency ranging from 60 to 100 bpm. A normal ECG signal should have paced rhythm (PR) intervals within 0.12–0.2 s, and QT intervals less than half of the corresponding RR interval. Also, in a normal ECG signal, the variation between the shortest PP interval/RR interval and the longest PP interval/RR interval should be less than 0.04 s (see Figure 2).

**Figure 1.** Normal sinus rhythm (NSR) [21].

**Figure 2.** A normal electrocardiogram (ECG) signal and the corresponding notation [21].

#### *2.4. Abnormal ECG Signals*

The anomalies in ECG signals can be categorized into three subsets: irregular heart rate, irregular rhythm and ectopic rhythm. The heart rate could be counted by measuring the PP/RR intervals on the ECG. If the PP/RR interval is long, this indicates a low heart rate, otherwise, it indicates a high heart rate. If the heartbeats start from SA node, but the PP/RR intervals are longer than 1 s, this may indicate sinus bradycardia (Figure 3a), which indicates that the heart is pumping too slow. When the PP/RR intervals are shorter than 0.6 s, this may be the sign of Sinus Tachycardia (Figure 3b). Moreover, if the variations between the PP/RR intervals are too large, this may indicate Sinus Arrhythmia, Sinus Block, and Sinus Arrest (Figure 3c–e).

These ECG anomalies may indicate a patient's current conditions. For instance, Sinus Bradycardia may be associated with hypothyroidism, hyperkalemia, sick sinus syndrome, sleep apnea syndromes, carotid sinus hypersensitivity syndrome, and vasovagal reactions. Sinus Tachycardia is commonly associated with anxiety, excitement, pain, drug reactions, fever, congestive heart failure, pulmonary embolism, acute myocardial infarction, hyperthyroidism, pheochromocytoma, intravascular volume loss, and alcohol intoxication or withdrawal. Sinus Block, and Sinus Arrest can be caused by hypoxemia, myocardial ischemia or infarction, digitalis toxicity, and a toxic response to drugs [22].

**Figure 3.** Abnormal sinus rhythms: (**a**) sinus bradycardia, (**b**) sinus tachycardia, (**c**) sinus arrhythmia, (**d**) sinus block, (**e**) sinus arrest [21].

Even if the heartbeat starts from the SA node, the heartbeat signal shape could be abnormal as well. For example, in the ECG signal, the ST segment and the T wave could have abnormal shapes; these are usually called ST-T changes. The ST-T changes could indicate hyperkalemia, ischemia, and so on [23]. Some examples of ST-T changes can be found in Figure 4.

Ectopic rhythms are started from a source other than the sinus node. For example, Atrial Rhythms begin in the atria. In this case, the P wave is shaped differently from the P wave beginning in the SA node. There are several abnormal rhythms that can occur when the Atria is firing the heartbeat: Premature Atrial Contraction, Wandering Atrial Pacemaker, Atrial Tachycardia, Atrial Flutter, Atrial Fibrillation. Examples are shown in Figure 5. The Premature Atrial Contraction is a very common heartbeat that could be caused by emotional stress, excessive intake of caffeine, and hyperthyroidism. If Premature Atrial Contraction consecutively occurs three or more times, the rhythm is considered as Atrial Tachycardia. It may cause light-headiness or even fainting. Atrial Flutter and Atrial Fibrillation

are two distinct but closely related tachyarrhythmias. They could lead to many symptoms, such as palpitations, light-headiness, fainting, angina, and congestive heart failure.

**Figure 5.** Abnormal Atrial Rhythms: (**a**) Premature Atrial Contraction, (**b**) Wandering Atrial Pacemaker, (**c**) Atrial Tachycardia, (**d**) Atrial Flutter, (**e**) Atrial Fibrillation [21].

Junctional Rhythms are another kind of ectopic rhythm. These occur when the atrioventricular (AV) junction paces the heart. In such a case, the P wave on the ECG signal may disappear or become negative. There are several anomaly examples shown in Figure 6: Premature Junctional Complex, Junctional Escape Rhythm, Junctional Tachycardia. The Premature Junctional Complex usually has the same cause as the Premature Atrial Contraction described previously. A Junctional Escape Rhythm could be caused by sick sinus syndrome, digitalis toxicity, excessive effects of beta-blockers or calcium channel blockers, acute myocardial infarction, hypoxemia, and hyperkalemia. One of the most common

anomalies is the Junctional Tachycardia, the Atrioventricular nodal re-entrant tachycardia (AVNRT). This is an arrhythmia that results from a rapidly recirculating impulse in the nodal part of the AV junction, and could be caused by digitalis toxicity [22].

**Figure 6.** Abnormal Junctional Rhythms: (**a**) Premature Junctional Contraction, (**b**) Junctional Escaped Rhythm, (**c**) Junctional Tachycardia [21].

Ventricular Rhythms is another kind of ectopic rhythm. It occurs when an ectopic site within a ventricle assumes responsibility for pacing the heart. As a result, the ventricular heartbeats and rhythms usually have QRS complexes that have abnormal shapes and longer lengths. The following are the examples of abnormal Ventricular Rhythm: Premature Ventricular Contraction, Ventricular Escaped Rhythm, Accelerated Idioventricular Rhythm, Ventricular Tachycardia, and Ventricular Fibrillation, Ventricular Asystole. We can see the ECG signals in Figure 7. Individuals with Premature Ventricular Contraction may have the marker of severe organic heart disease associated with an increased risk of cardiac arrest and sudden death from Ventricular Fibrillation. Ventricular Tachycardia consists of three or more consecutive Premature Ventricular Contraction, and it could lead to more life-threatening Ventricular Fibrillation. With Ventricular Fibrillation, the ventricles do not heartbeat in any coordinated fashion but instead, fibrillate or quiver asynchronously and ineffectively. It will cause the patient to become unconscious immediately [22].

**Figure 7.** Abnormal Ventricular Rhythms: (**a**) Premature Ventricular Contraction, (**b**) Ventricular Escaped Rhythm, (**c**) Accelerated Idioventricular Rhythm, (**d**) Ventricular Tachycardia, (**e**) Ventricular Fibrillation, (**f**) Ventricular Asystole [21].

As depolarization and re-polarization are slow in the atrioventricular (AV) node, this area is vulnerable to blocks in conduction. Therefore, when a delay or interruption happens during impulse conduction from the atria to the ventricle, AV blocks may occur. AV blocks, also called Heart blocks, are classified into: First-degree AV blocks; Second-degree AV blocks (types I and II); Third-degree AV blocks (complete) see Figure 8. Among the heart blocks, the lower degree heart blocks could lead to Third-degree AV blocks, also called Complete Heart blocks, which are the most severe heart anomaly. With the Complete Heart blocks, the atria and ventricle are pacing independently, which could slow down the ventricular rate, and eventually lead to fainting [22].

**Figure 8.** AV Blocks: (**a**) First-degree AV blocks, (**b**) Second-degree AV blocks type I, (**c**) Second-degree AV blocks type II, (**d**) Third-degree AV blocks [21].

#### **3. Automatic Heart Anomaly Detection: A State-of-the-Art**

#### *3.1. Automatic Heart Anomaly Detection*

The objective of detecting anomalies in ECG signals consists of finding the irregular heart rates, heartbeats, and rhythms. To achieve this goal, an anomaly detection system must be able to find them on all heartbeat sequences; therefore, to obtain the essential metrics as stated in Section 2. Also, the system looks at the entire recording to detect any irregular rhythm segments such as an inconsistent R-R interval and ectopic rhythms. Therefore, an anomaly detection system is composed of five different sub-systems: noise removal (Section 3.2), heartbeat detection (Section 3.3), heartbeat segmentation (Section 3.3), heartbeat classification (Section 3.4), and rhythm classification (Section 3.5).

A typical heartbeat anomaly detection system can be seen in Figure 9. The noise reduction process intends to minimize its effect on signal interpretation caused by the recording device or patient's movement. The heartbeat detection aims to find the location of the heartbeats to calculate the heart rate. The heartbeat segmentation extracts the entire heartbeat based on a known heartbeat location. The heartbeat classification checks for any abnormal heartbeat shape on the ECG signal. The irregular heart rhythm classification is similar to the heartbeat classification, but instead of checking only one heartbeat shape, it checks a period signal on the ECG record. Pertinent research found in the literature relating to the five sub-systems are introduced in the following sections.

**Figure 9.** Typical Heartbeat Anomaly Detection.

#### MIT-BIH Database

Before explaining the sub-systems of the anomaly detection, the MIT-BIH Arrhythmia Database [24,25] need to be described first, as it is widely used in the ECG analysis related research. It was the first generally available set of standard test materials for the evaluation of the arrhythmia detector. It contains 48 half-hour excerpts of two-channel ambulatory ECG recordings from 47 subjects. The ECG data was collected with Del Mar Avionics model 445 two-channel reel-to-reel Holter recorders. The database has the annotation labels for 16 different heartbeat types and 15 different types of rhythms. All the selected research in this review used the MIT-BIH database, which allows us to test and compare the performance of the algorithms.

The annotation labels and the corresponding heartbeat types and rhythm types used in this database are listed below. The heartbeat types are:


The rhythm types are:


#### *3.2. Noise Removal*

ECG signals may be distorted by many other artifacts that have nothing to do with the heart functions. The ECG artifacts have various and uncertain forms. Some physiologic artifacts could mimic true dysrhythmia, leading to false diagnostics [26]. Therefore, noise removal is a necessary step for anomaly detection in ambulatory ECGs.

There are two main groups of artifacts: non-physiological and physiological artifacts. The first is caused by equipment problems, such as power-line interference, and the other one is caused by muscle activities, skin interference or body motion such as baseline wander, electromyogram, and motion artifacts. For example, the motion wander could significantly affect the measurement of the ST segment in an ECG signal [27]. Among all the artifacts, the motion artifact is the most challenging noise to remove as the noise spectrum overlaps the ECG signal [28]. Various ECG motion artifact examples are shown in Figures 10 and 11 [29].

**Figure 10.** ECG Artifact examples: (**a**) Baseline Wander, (**b**) Power line Interference, (**c**) Muscle Interference.

**Figure 11.** ECG Motion Artifact.

In this section, various noise removal algorithms in the research literature are categorized and compared. There are four conventional methods used for noise removal in ECG signals.

The first approach consists of using digital low-pass, high-pass, band-pass, and notch filters to remove the noise. Many studies, such as [30–35], use a combination of low-pass and high-pass filters to remove the corresponding noise on an ECG signal. The low-pass filter cut-off frequency is in the range of 11 Hz to 45 Hz, and it mainly suppresses the high-frequency noise. The high-pass filter cut-off frequency is in the range of 1 Hz to 2.2 Hz, and it focuses on removing the baseline wander in the signal. In [30,32], notch filters range from 50 Hz to 60 Hz and are used for removing the power-line interference. Band-pass filters with cut-off frequencies from 0.1 to 100 Hz are used by [36] to remove the noisy components of electronic noise. The advantage of using a fixed digital filter is that it is easy to implement and is highly efficient.

The second approach is to use a discrete wavelet transform (DWT) to remove the noise components from a signal. Wavelet transform is a powerful method for analyzing non-stationary signals, such as ECGs [37]. The DWT noise removal method is used in [38–40]. This method decomposes the signal into the approximation and detail coefficients by using a wavelet function. The selection of the wavelet function in the wavelet transform is the most important task, which depends upon the type of signal [41]. The commonly used Mother Wavelet basis functions are Daubechies filters (Db), Symmlet filters (Sym), Coiflet filters (C), Battle-Lemarie filters (Bt), Beylkin filters (Bl), and Vaidyanathan filters (Vd) [42].

According to studies in [41–43], the Daubechies filters of order 4 and 8 (Figure 12), and the Symmlet filters of order 5 and 6 (Figure 13) are the best wavelet functions for ECG signal analysis due to their similar signal structure to the QRS complex. After decomposing the ECG signal, a threshold method is applied to the DWT coefficients. A clean ECG signal could be reconstructed from the thresholded DWT coefficients.

DWT relies on the choice of the wavelet basis [44]. The level of DWT may be different between different data sets; therefore, re-implementation is needed. Another wavelet analysis method is the empirical mode decomposition (EMD). The EMD is an adaptive and fully data-driven technique that obtains the oscillatory modes present in the data [44]. The EMD, similar to the wavelet analysis, decomposes a time series signal into individual components without leaving the time domain. In EMD, the high-frequency components are called the intrinsic mode function (IMF), and the low-frequency part is called the residual. The procedure can be applied to residuals iteratively until no IMFs can be extracted. The IMFs must satisfy two conditions:


After decomposition using EMD, an IMFs and one residual signal will be obtained. Let *c*(*t*) be the IMFs, we will have *c*1(*t*) to *cn*(*t*) from higher frequency components to lower frequency components. Then digital filters or thresholds can be applied to the IMFs that contain the noise. After processing, the signal can be reconstructed using the following equation:

$$x(t) = \sum\_{i=1}^{n} c\_i(t) + r(t) \tag{1}$$

where *x*(*t*) is the reconstructed signal, *c*(*t*) is the IMFs, and *r*(*t*) is the residual signal.

In [45–47], the authors performed an EMD on the MIT-BIH database to suppress the high frequency noise and the baseline wander. Ensemble empirical mode decomposition (EEMD) [48] fixed the EMD shortcoming of mode mixing. The mode mixing can cause serious aliasing in the time-frequency distribution, and also makes the physical meaning of individual IMF unclear. The EEMD adds one extra step comparing to the EMD. By adding white noise to the original signal before decomposing the signal into IMFs using EMD. Many noise removal works were found using the EEMD, such as [49–51].

The previous approaches work well when the noise is in a fixed frequency range. However, there are some cases where these approaches could fail. The first one is in motion wander removal. Raimon Jane et al. stated in [27] that the motion wander frequency may not always be below 0.05 Hz. It could depend on the frequency of the heart rate, which could be less than 0.8 Hz. Also, a fixed digital filter could introduce nonlinear phase distortion and key point displacement [52]. These two approaches could not remove the motion artifact from the ECG signal, as its spectrum completely overlaps with the ECG signal. Therefore, many approaches use adaptive filtering to solve the proposed problem.

In 1991, Thakor et al. [28] introduced the least mean squares (LMS) adaptive filter (ARF) to reduce the baseline wander, 60 Hz power line noise, muscle noise, and motion wander. In their research, two adaptive filter structures were proposed. The first one has the primary input as *s*<sup>1</sup> + *n*1, while the reference input is noise, *n*2, which could be recorded from another generator that is correlated with *n*1. The second one is an ECG that is recorded from several electrode leads, the primary input is *s*<sup>1</sup> + *n*<sup>1</sup> from one of the leads, the reference input is *S*<sup>2</sup> from another lead that is noise-free. In both cases, the signal *s*<sup>1</sup> can be extracted by recursively minimizing the mean squared error (MSE) between the primary and the reference inputs. The MSE can be calculated as:

$$E[\epsilon^2] = E[(s\_1 - \underline{y})^2] + E[N\_1^2].\tag{2}$$

The least mean squares (LMS) algorithm was used to minimize the MSE. The LMS algorithm could be written as:

$$\mathcal{W}\_{k+1} = \mathcal{W}\_k + 2\mu \epsilon\_k \mathcal{X}\_k \tag{3}$$

where *Wk* is a set of filter weights at time *k*, *Xk* is the input vector at time *k* of the samples from the reference signal, = primary input *dk*− filter output *y*, and parameter *μ* is empirically selected to produce convergence at a desired rate. The error *<sup>k</sup>* can be calculated as:

$$
\epsilon\_k = d\_k - y\_k \tag{4}
$$

where *dk* is the desired primary input from the ECG to be filtered, and *yk* is the filter output that is the best least squares estimate of *dk*.

As LMS adaptive filters are sensitive to scaling of the input, a power normalized least mean squares has been introduced to solve this problem [53]. Another convention adaptive filter type is the recursive least square (RLS) adaptive filter. The RLS algorithm recursively finds the filter coefficients that minimize a weighted linear least-squares cost function relating to the input signal. It is known for its excellent performance when working in time-varying environments but at the cost of increased computational complexity and some stability problems [54]. The algorithm updates the filter weight vector using the following equations:

$$w(n) = \overline{w}^T(n-1) + k(n)\overline{e}\_{n-1}(n),\tag{5}$$

$$k(n) = u(n)/(\lambda + \mathbf{x}^T(n)u(n)),\tag{6}$$

$$\mathbf{x}(n) = \overbrace{\mathbf{w}\_{\lambda}^{-1}}^{-1}(n-1)\mathbf{x}(n),\tag{7}$$

where *w*(*n*) is the weights vector of iteration n, *x*(*n*) is the input signal, and *λ* is a small positive constant very close to but smaller than 1.

The filter output *yn*−1(*n*) and the error signal *en*−<sup>1</sup> are calculated using the filter tap weights of the previous iteration and the current input vector as in the following equations:

$$
\overline{y}\_{n-1}(n) = \overline{w}^T(n-1)\mathbf{x}(n),\tag{8}
$$

$$
\overline{\varepsilon}\_{n-1} = d(n) - \overline{\mathcal{Y}}\_{n-1}(n). \tag{9}
$$

An adaptive filtering approach could remove baseline wander, motion artifacts, power-line interference, and the muscle noise; however, it requires a reference input that is correlated to the original noisy input. Obtaining a clean ECG signal is very difficult to acquire. Due to the added complexity for the data collection, many studies have considered using an accelerometer as the reference noise signal for the adaptive filter. For example, in [55], Raya et al. explored the possibility of using both a signal axis and dual-axis accelerometer signal as the noise reference input to a least mean square (LMS) adaptive filter and a recursive least square (RLS) adaptive filter. As a result, the RLS adaptive filter outperformed the LMS adaptive filter. Using an accelerometer signal showed better results than using a dual-axis accelerometer signal. The authors believed that the use of one axis reference input, particularly the y-axis, was sufficient to minimize the noise.

#### *3.3. Heartbeat Detection and Segmentation*

Heartbeat detection is often related to the detection of an irregular heart rate and inconsistent RR-intervals, which are explained in Section 2. Heartbeat detection is also the key step to extract the heartbeats from the ECG signal to be used for classification. Heartbeat detection consists of three main parts: P wave detection, QRS complex detection, and T wave detection. Therefore, it is usually related to heartbeat segmentation. Heartbeat segmentation usually means segmenting a heartbeat from its start point (onsite) of P wave to its endpoint (offsite) of the T wave.

However, the P wave and T wave may not be detectable in certain types of abnormal heartbeat, and the QRS complex is the most obvious waveform. Thus the location of the QRS complex is often used to locate the origin of the heartbeat; see Figure 2. There are many studies that detect the R peak location in the QRS complex.

The Pan–Tompkins algorithm [56] is one of the most popular and earliest algorithms that has been implemented (Figure 14). It is widely used in many applications due to its robustness and computational efficiency. The algorithm uses a filter bank that consists of band-pass filters, a differentiator, a squaring filter, and a moving window integrator to reduce the signal noise so that only R wave information is present. Inspired by the Pan–Tompkins algorithm, many researchers, such as [57–60] developed their own filter banks to improve the accuracy of the detection. In order to reduce the detection of false positives, [58,60] used a predefined amplitude threshold, [59,60] used a predefined RR interval length threshold.

**Figure 14.** The Pan–Tompkins Algorithm.

Zidelmal et al. introduced a QRS detection method based on wavelet decomposition [61]. In the algorithm, the authors decomposed the raw ECG signal using a discrete wavelet transform, then reconstructed the signal by selecting only the sub-signals that contained ECG information. To detect the QRS complex, a threshold was set to select the peaks that have a large amplitude. Similar works have been done in [62].

Manikandan et al. introduced a new algorithm that uses a Shannon energy envelop and Hilbert-transform (SEEHT, Figure 15) to detect the QRS complex location [63]. In the preprocessing stage of their algorithm, a band-pass filter is applied to the raw ECG signal to remove the baseline wander and high-frequency noise. After that, a differentiator and normalizer is applied to the clean signal to highlight the QRS complex components. The Shannon energy of the processed signal is calculated using the following equation:

$$s[n] = -d^2[n] \log(d^2[n]),\tag{10}$$

where d[n] is the processed signal. The calculated Shannon energy sequence is then processed by a zero-phase filter to preserve the sharp peaks around the QRS complex and smooth out the noisy peaks. In the peak finding algorithm, a Hilbert transform is applied on all the candidate R peaks to obtain the R wave envelope. In each R wave envelope, the zero-crossing locations indicate an R peak.

Inspired by SEEHT, [64] introduced An R-peak detection method based on peaks of Shannon energy envelope(PSEE) that improves the computational inefficiency of the Hilbert transform by using both predefined amplitude thresholds and predefined RR interval length thresholds. An improved R-peak detection method based on Shannon energy envelope (ISEE) [65] improved further the SEEHT and PSEE algorithms by using a filter bank consisting of a moving average filter, a differentiator, a normalizer, and a squaring filter to eliminate the noisy peaks. The filter bank computational costs is less than the Hilbert transform and does not use a predefined threshold. Most recently, Park combined discrete wavelet transform and ISEE to detect R peaks on the ECG signals [66].

As explained previously, the P and T waves represent important information and the heartbeat segmentation depends on the P and T wave detection. Therefore, a good detection of the P and T waves is critical for diagnosis. Pal and Mitra proposed an algorithm that could detect the PQRST peak points [67]. The algorithm is based on discrete wavelet decomposition. It reconstructs the signal from selected wavelet coefficients, which are related to peaks such as: R, QS, and PT. For example, when the algorithm is detecting the R peak, a signal is reconstructed with d3, d4, and d5 coefficients, and this preserves the information for the R peaks but diminishes the other peaks.

**Figure 15.** SEEHT R-peak detection algorithm.

A few years later, Banerjee also developed a T wave and QRS complex detection algorithm based on discrete wavelet decomposition and adaptive thresholding [68]. Karimipour uses discrete wavelet transform and adaptive thresholding to detect the QRS complex location, and give an estimate of the P-wave and T-wave locations [69]. In practice, many studies, such as [31,70] used the 'ecgpuwave' detector from PhysioNet for heartbeat segmentation [25]. However, because the P and T wave detection works well with normal heartbeats, but not for many abnormal heartbeat types. Many researchers choose manual annotation, such as [71], or a fixed window, such as [32,33,38,71,72], for their heartbeat segmentation.

In Table 1, we compare the performance of some of the heartbeat detection algorithms that have been tested on the MIT-BIH Arrhythmia database. [24].

The metrics used to compare each algorithm are calculated as follow:



**Table 1.** The heartbeat detection performance comparison using the MIT-BIH data set.

#### *3.4. Irregular Heartbeat Classification*

Irregular heartbeat classification focuses on the shape of the heartbeats, and aims at classifying the type of a single heartbeat. As discussed previously, the heartbeat shape may vary when the heartbeat starts from an ectopic location. For example, a premature heartbeat may have a missing P wave. The abnormal shape of a heartbeat may indicate potential heart disease. By classifying and annotating the types for all the heartbeats on the ECG, one could easily notice the frequency of anomalies that happens in the heart to make an appropriate diagnosis and treatment. Heartbeat classification consists of two main parts: feature extraction and model training.

#### 3.4.1. Feature Extraction

The feature extraction step converts the raw ECG signal to machine-readable information. Based on the existing research, there are two common features: morphological features and derived features. The morphological features describe the heartbeats based on the observations of the signal itself. There are many morphological features (see in Table 2) that have been used in various studies.

Other derived features are calculated from the ECG signal. There are many different methods are proposed in the literature:






Vectorcardiography (VCG) is one of the ECG analysis tools. It displays the various complexes of the ECG. It provides the possibility to use vector analysis on the cardiac electric potentials [80].

Discrete Wavelet Transform (DWT) decomposes the signal into many sub-signals (detail coefficients) with different frequency ranges, as described in Section 3.2. Not only could the DWT method be used to remove unwanted noises, it could also find features for the heartbeats as the heartbeat waves are much clearer in the specific detail coefficients, such as D4 and D5. Therefore, much research, such as [38,71], uses features from the detail coefficients to classify the heartbeat.

The conventional DWT technique lacks the property of shift-invariance due to the downsampling operations at each stage of DWT implementation. Hence, the energy of the wavelet coefficient changes significantly for a small-time shift in the input pattern. The Dual-Tree Complex Wavelet Transform [81] is a simple technique that overcomes the DWT shortcomings. The DTCWT uses two sets of filters: one is used for level 1 decomposition, and the other one is used for the higher levels. In the first level decomposition, the original signal is decomposed into two Trees, and each Tree contains two sub-band signals. One tree could be interpreted as the real part of a complex wavelet, and the other tree could be the imaginary part. For each tree, the conventional DWT is applied for further decomposition [32]. The DTCWT method was used by Thomas to extract heartbeat features to classify the heartbeat type [32].

Similar to DWT and DWCWT, the ICA, PCA, and EMD/EEMD also decompose the signal into many sub-signals. The difference is that the ICA and PCA aims to reduce the input size to minimize the computation speed. The EMD/EEMD, as explained in Section 3.2, does not require the knowledge of the level of scale and the basis function that is needed in DWT. The ICA method has been used in [71] to produce the independent components to be part of the heartbeat feature set. The PCA method used in [82] reduces the input size for higher efficiency. Rajesh et al. computed the heartbeat features from IMFs by applying the EMD/EEMD method to the ECG signal.

Eigenvector methods are used for estimating the frequencies and powers of signals from noise-corrupted measurements. These methods are based on an eigendecomposition of the correlation matrix of the noise-corrupted signal [83]. In [83], Ubeyli et al. used three kinds of eigenvector methods to generate the feature set: Pisarenko, Multiple Signal Classification (MUSIC), and Minimum-Norm. The Pisarenko method is particularly useful for estimating a PSD that contains sharp peaks at the expected frequencies. The MUSIC method is a noise subspace frequency estimator and could eliminate the effects of spurious zero on the noise subspace. The Minimum-Norm method aims to differentiate spurious zeros from real zeros, and it uses a linear combination of all noise subspace eigenvectors.

Dynamic Time Warping measures the similarity between two heartbeat segments. It computes the distance between these two heartbeat segments. Therefore, if we let one of the heartbeat segments to be the sample heartbeat of a specific type, and the other one to be the test heartbeat, then the distance indicates the similarity score between the test heartbeat and the sample heartbeat. The similarity score could be used as a feature that represents the heartbeat, such as in work by [74,76]. Details of the features of each method reviewed can be seen in Table 3.





#### 3.4.2. Model Training

Once the feature vectors are extracted from the raw ECG signal, then they can be used by a model for training and classification. There are several methods that have been proven to be valid for identifying heartbeat types. They are clustering, traditional machine learning classification, and deep learning classification.

The clustering aims to find the similarity between the two groups (heartbeat segments) by computing the distance between the two groups. The conventional distances for ECG signals are the Euclidean Distance and Dynamic Time Warping Distance. [84].

The Euclidean distance is the most common distance when comparing two groups with the same dimensions. An example of using Euclidean distance for abnormal heartbeat detection can be found in Chuah and Fu's [76]. They introduce an adaptive window discord discovery (AWDD) to detect the anomaly in ECG recordings. It was developed from a brute force discord discovery (BFDD) algorithm [85]. The algorithm finds candidates with an abnormal heartbeat by selecting the largest Euclidean distance when comparing the heartbeats to each other. Also, they have set a threshold for the Euclidean distance to reduce the false alarm rate. The Euclidean distance only works when both heartbeat segments are the same length.

K-mean clustering is a popular clustering method that builds on the Euclidean distance. The K-mean clustering clusters the heartbeat segments into many different clusters. Veeravalli et al. developed an algorithm for real-time and personalized anomaly detection from wearable health care ECG devices [86]. The K-means cluster algorithm is used to cluster all the heartbeat classes. To avoid calibration of the technique for individual users, they assigned the most frequent heartbeat segments as the normal heartbeat segments. The authors tested their algorithm on the MIT-BIH database and the European ST-T Database. They were able to achieve 97.1% sensitivity and 99.5% specificity.

Sivarake and Ratanamahatana proposed a robust and accurate anomaly detection algorithm (RAAD) that reduced the false alarm detection rate on ECG anomaly detection [34]. They extracted heartbeat morphological features to be their input feature vectors. Then, they calculated the dynamic time warping distance to measure the similarity between two variable-length heartbeats. In their experiment, they tested their algorithm on INCARTDB01-05 [25], the MIT-BIH arrhythmia database [24,25], and the MIT-BIH long term database [25]. Overall, their algorithm achieved 94.35% accuracy and a 0% false alarm rate.

Another major method is the traditional machine learning classification algorithms: Kth nearest neighbor(KNN), Linear Discriminant Analysis(LDA), Quadratic Discriminant Functions(QDF), Support Vector Machine(SVM), and Multilayer perceptron neural network(MLPNN). These algorithms build a mathematical model based on the provided training data. The trained model could correlate the input data with its corresponding label. Many research could be found in this field.

Ivaylo Christov et al. used both the ECG morphology features and VCG features to represent the heartbeat, and then train the feature vectors and its labels with Kth nearest neighbor. As a result, the classification performance on both feature sets is over 96% for five heartbeat types (N, PVC, LBBB, RBBB, and P) [30].

Philip de Chazal et al. used linear discriminant analysis as a classification algorithm. The input feature vectors are ECG morphology features. As a result, this algorithm could perform around 97% accuracy on MIT-BIH database with five heartbeat types (N, S, V, F, and Q) classification [31].

Mariano Llamedo et al. validated a heartbeat classification method for Normal, Supra-ventricular, and Ventricular heartbeats based on ECG interval features, morphological features, and DWT features [38]. The feature vectors are trained with quadratic discriminant functions. The model had a 94% overall classification accuracy on the test dataset.

Li et al. uses the concept of transductive transfer learning to detect the abnormal instance on an ECG signal. They trained a model to learn from a labeled data set to detect irregular heartbeats, and then they use a kernel mean matching (KMM) algorithm [87] to enable knowledge transferring between a labeled data set and unlabeled data set. The model they used was a weighted transductive

one-class support vector machine, which could solve the problem of imbalanced data set [78]. The authors performed experiments on records 100, 101, 103, 105, 109, 115, 121, 210, 215, and 232 from the MIT-BIH database. They achieved a 87.89% average accuracy.

Ye et al. classified 16 heartbeat types by using both morphological and dynamic features of ECG signals. Then, both morphological and dynamic features were trained by the support vector machine for the classification. Also, two channels of the ECG signal in the database were trained separately and generated two models. Both models were used for the final classification part. The authors introduced two ways of making a final decision: one is rejection, which requires both models to make the same decision, and the other one is Bayesian, which is based on the fusion of both model's results [71]. The experiment result of this research is compared in Table 4.

Zhang et al. built 46 feature vectors to represent the heartbeat to classify the abnormal heartbeat shape on MIT-BIH database [74]. In the study, the authors apply the ecgpuwave tool from PhysioNet [25] to detect the boundaries of the P wave, QRS complex, and ST waves. Then they have collected five types of features, which are five inter-heartbeat intervals, five intra-heartbeat intervals, 29 morphological amplitudes, six morphological areas, and morphological distance. The five types of features could generate a feature vector with 46 morphological features. In the classification step, the author used the support vector machine to learn the patterns of the feature vectors. Additionally, both channels of the ECG signal have a trained support vector machine model. The results of both models are considered in the final classification result. The result table of the paper shows that the algorithm has nearly 90% accuracy for four heartbeat types (N, F, V, and S) classification.

Thomas et al. introduced an automatic ECG arrhythmia classification idea using dual-tree complex wavelet-based features to detect normal, paced, RBBB, LBBB, and PVC heartbeats. The authors proposed a feature extraction technique based on a dual-tree complex wavelet transform (DTCWT) technique. Then the feature vectors were input to a multilayer perceptron neural network for abnormal heartbeat detection [32]. The experimental results of this research are compared in Table 4.

Kandala Rajesh et al. used ensemble empirical mode decomposition (EEMD) features to classify normal PVC, PAC, LBBB, and RBBB heartbeats. For the classification tool, a sequential minimal optimization SVM was used to train and classify the different heartbeat types [33]. The experimental results of this research are compared in Table 4.

Wess et al. implemented a multi-layer perceptron (MLP) classifier to detect anomalies in ECG signal. To reduce the size of feature vectors, the author applied PCA on the extracted heartbeats. Finally, the processed feature vectors were used as inputs to train an MLP neural network. The trained model could be used for classifying the anomalies in the ECG signal [82]. The authors were able to test their model on the MIT-BIH database with an overall accuracy of 99.82%.

Most researchers have used traditional methods to solve the problem. Traditional machine learning classification methods do not require a considerable amount of training data, and they do not need a lot of computational power. Recently, due to the development of GPUs, deep learning has been proven to be reliable and fast for classification problems. Compared to traditional algorithms, deep learning does not require cardiology experts to extract features since the network can extract the features automatically. Instead, a deep learning model needs many labeled data for training. Luckily, public data sets could be easily found on the Internet. Therefore, many studies using various deep learning architecture have published new algorithms to classify heartbeats.

The Ubeyli algorithm uses Eigenvectors as the feature vectors and a recurrent neural network as the classification tool. In the experiment, normal, congestive heart failure, VT, and AFIB rhythms were trained and tested [83]. The experiment result of this research is compared in Table 4.


**Table 4.** Heartbeat classification performance on the MIT-BIH dataset.

Chauhan and Vig developed a predictive algorithm that could detect normal, PVC, PAC, paced heartbeats via deep LSTM (long short-term memory) neural network (Figure 16). In their algorithm, the features extraction/selection step is neglected, raw ECG data, and corresponding labels are used as inputs to the stacked (two-layer) LSTM neural network. In the experiment, they split the MIT-BIH database into four sets: a non-anomalous training set (*SN*), non-anomalous validation set (*VN*), mixture of both abnormal and normal validation sets (*SN*+*A*) and the test sets (*tN*+*A*). The LSTM network was trained on *SN*, and used *VN* for early stopping. The trained LSTM network was then applied to *SN*+*<sup>A</sup>* to find the threshold for detecting abnormal heartbeats. Finally, the chosen threshold was used on *tN*+*<sup>A</sup>* to discriminate regular and anomalous heartbeats while predicting [79]. The presented model was able to achieve a 97.5% precision with a 46.47% recall on the test set (*tN*+*A*).

Kiranyaz et al. presented a fast and accurate patient-specific ECG classification and monitoring system. In their experiment setup, they picked five heartbeat types, N, V, S, F, and Q, from 20 ECG records (100–124) from the MIT-BIH database as the training samples. The raw heartbeat segments were submitted to a 1-D adaptive convolutional neural network (CNN) for pattern recognition. The 1-D convolutional neural network acted as a feature extraction tool as well as a classification tool. The classification times for this model is 0.58 and 0.74 ms for 64 and 128 sample heartbeat resolutions, respectively. The speed is more than 1000x faster than the real-time requirement [36]. The experiment result of this research is compared in Table 4.

**Figure 16.** Long short-term memory layers.

Sahoo et al. made an improvement to Rai's algorithm [39] by using multi-resolution wavelet transform and machine learning to detect Normal, LBBB, RBBB, and Paced heartbeats [75]. The authors used Q-peak, R-peak, S-peak, T-peak, QR-interval, ST-interval, RR-interval, and QRS duration as the input feature vector and used a MLP and a SVM classifier as the classification tool. In their experimental results, the overall classification accuracy of normal, LBBB, RBBB, and Paced heartbeats were 96.67% for the SVM classifier and 98.39% for the MLP classifier. The algorithm was tested on the MIT-BIH database [24].

In addition to training with a public data set, some researchers used a patient-specific approach to train the model. The first step of a patient-specific approach is to train an initial classifier with the public data set. Then the second step requires a local cardiologist to review and correct the produced labels by the initial classifier. The final step consists of training the initial classifier with corrected labels to produce the final classifier to this specific patient. The patient-specific approach could eliminate the inter-patient variations of the ECG signals. Biel et al.'s research shows that the variance in different human heartbeats can be very high [88]. Many research works, [31,89–93], have proven that by using a patient-specific model, the detection algorithms have a higher accuracy than the traditional systems in practical cases.

#### *3.5. Irregular Rhythm Classification*

Different irregular heartbeat classifications can be found in the literature. Rhythm classification focuses on finding abnormal rhythm among normal rhythms. To find a rhythm anomaly, the algorithm needs to process more than one heartbeat.

Ge et al. [94] used an auto-regressive (AR) modeling technique to classify the Normal, PAC, PVC, SVT, VT, and VF rhythms. The algorithm uses Burg's algorithm to compute the AR coefficients X. In their paper, the authors have attempted two ways to classify the AR coefficients of X: a generalized linear model (GLM) and multi-layer feed-forward neural network. The GLM equation is:

$$Y = X\beta + \varepsilon,\tag{11}$$

where *Y* = [*y*1, *y*2, ..., *yN*] is an N-dimensional vector of the observed responses, X is theN\*P matrix of the AR coefficients, *β* is a P-dimensional vector, *ε* is an N-dimensional error vector. The GLM outputs, *y*<sup>1</sup> to *yN*, compared to predefined conditions to classify various heartbeat types. An artificial neural network with the AR coefficients as inputs was used for training and classification. Their experimental results show that artificial neural networks perform better than GLM.

Ozbay et al. integrated a type-2 fuzzy clustering and discrete wavelet transform in order to build a neural network-based ECG classifier to detect Normal, Br, VT, SA, PAC, P, RBBB, LBBB, AF, and AFI results [95]. The proposed diagnostic algorithm can distinguish 10 different rhythm types. The system was formed by combining fuzzy clustering layers, feature extraction layers, and a final classifier layer. The fuzzy clustering layer select segments represents the arrhythmia class in the ECG. A wavelet transformation was applied to the ECG segments to generate features. The authors have trained three Type-2 Fuzzy Clustering Neural Network models (T2FCWNN-1, T2FCWNN-2, and T2FCWNN-3) with three different training data sets. The three training data sets have the same amount of ECG segments. However, the length of each ECG segment is 101 sample points, 52 sample points, and 27 sample points. As a result, the T2FCWNN-3 had the lowest training time, which is 4.86 s and test error rate, which is 0.23% among all three models.

Patel et al. used a thresholding technique to detect arrhythmias on ECGs collected from a mobile platform [35]. In the paper, they first used the Pan–Tompkins [56] algorithm to detect the R peaks on the ECG recordings. Then they characterized SB, ST, PVC, PAC, and Sleep Apnea using a predefined threshold to classify different rhythms. Their system a 97.3% detection accuracy.

Rajpurkar et al. developed an algorithm that could out-perform a board-certified cardiologist in the detection of 12 types of arrhythmia using a 34-layer CNN [96]. The network took a 30 s long raw ECG signal recording as input, and the output was a sequence of label prediction. The model output a new prediction every second. The training data set contained 64,121 ECG records from 29,163 patients, and the testing data set contained 336 records from 328 patients. The model performed with 80.9% precision, 82.7% sensitivity, and a 0.809 F1 score.

Acharya et al. used two 11-layer CNNs to detect AFIB, AFL, ad VF(VFL) from normal heartbeat rhythms [40]. The two networks, Net A and Net B, used a 2-second raw ECG recording and 5-second raw ECG recording as inputs, respectively and output the corresponding label. In the algorithm, no wave detection was performed on the input data. Before submission to the 1-D deep CNN, the ECG segments were Z-score normalized. The result of Net A and Net B are compared in Table 5.


#### *3.6. Heartbeat/Rhythm Classification Algorithm Comparison*

In the previous sections, we reviewed many algorithms that classify the ECGs in various categories. We can see in Table 4 the classification results performed using the MIT-BIH database. In addition, some algorithms' performance metrics were converted to binary classification, which detects normal and abnormal heartbeats. The reason is that computer diagnoses are not 100% accurate. We still need doctors to make the final diagnosis as they are the only ones who know the context. The methods should be focusing on binary classification, which classifies all abnormal heartbeat as one class. The terms used in the table are explained:


Similarly, Table 5 compares all methods that classify the rhythms on MIT-BIH database. In addition, the table has only shown the algorithms that provided enough information to compute our metrics.

#### **4. Discussion**

#### *4.1. Challenges for Heart Anomaly Detection with Ambulatory Electrocardiograms*

There are still several challenges in heart anomaly detection:


#### *4.2. Future Works*

The next generation of heart anomaly detection algorithms should be able to deal with ambulatory health measurements taking advantage of multiple synchronized measurements from: accelerometer, real-time blood pressure (based on pulse transit time), skin temperature, and upper and lower chest breathing sensors. An excellent review of the state-of-the-art of body sensor fusion work can be found in Gravina et al. [98]. One commercial example of such a data fusion system is Astroskin from Carre Technologies Inc. The Astroskin space-grade garments offer state-of-the-art continuous real-time monitoring for 48 h of blood pressure, pulse oximetry, 3-lead ECG, respiration, skin temperature, and activity. Using Astroskin, one can develop new fusion algorithms that can compensate for ECG motion artifacts by correlations with synchronized accelerometer and breathing data.

This can be accomplished by using advanced LSTM and recurrent neural networks (RNN). An example of this approach can be found in Shrimanti et al. [99] where ECG, peak blood oxygenation

signal (PPG), and accelerometer measurements were combined using a LSTM and RNN to compute in real-time motion compensated blood pressure. Such technology could open the door to real-time patient-specific anomaly detection that goes far beyond simple ECG measurements, for example correlating cardiac and respiratory events with patient activities.

#### *4.3. Conclusions*

In this survey, we have first introduced the definition of anomaly detection on ambulatory electrocardiograms (ECG) and its importance. We then discussed the basic medical background (Section 2) of electrocardiogram interpretation and the type of anomalies that need to be detected. Most electrocardiogram anomalies can be categorized into two major categories: irregular heart rates and irregular heart rhythms. The irregular heart rates on ECG could indicate bradycardia, tachycardia, heart block, arrhythmia, and so on. The irregular heart rhythms could be ectopic heartbeat when checking a period of ECG signal.

Therefore, based on the different irregularities on the ECG, anomaly detection can be divided into several categories: heartbeat detection (Section 3.3) for detecting the location of each heartbeat; heartbeat segmentation (Section 3.3) for segmenting the heartbeats from the entire ECG signal; heartbeat classification (Section 3.4) for classifying the type of one heartbeat; and rhythm classification (Section 3.5) for classifying the type of a period of ECG signal. In addition, as the ECG signal is frequently contaminated with electrical noise and motion artifacts, noise removal (Section 3.2) is important for anomaly detection on the ambulatory ECG.

From the literature, we have reviewed the conventional methods for each part. For the noise removal on ECG, fixed digital filters, discrete wavelet transform, empirical mode decomposition, and adaptive filters have been used by many researchers. For heartbeat detection, many researchers used fixed digital filters, discrete wavelet transform, and Shannon energy envelopes to remove the noise and unwanted waves while preserving the R peak information. They then used the R peak location to compute the heartbeat. For heartbeat segmentation, the most common method was to use a predefined window to segment the heartbeat signal from the entire signal.

For the literature on heartbeat classification, authors used morphological features and derived features to represent the heartbeat signal. The morphological features were calculated from the ECG signal, and derived features were computed using other methods, such as discrete wavelet transform, independent component analysis, empirical model decomposition, and many more. Both morphological and derived features are then used for training in order to generate a mathematical model of the heartbeat signal.

The most popular models used k nearest neighbor, linear discriminant analysis, support vector machines, multilayer perceptron neural networks, and deep neural networks, such as CNN and RNN. Similarly for the rhythm classification, the algorithms take a period of ECG signal as the input to the model.

The current challenges of anomaly detection for ambulatory electrocardiograms are analyzed in this paper. We determined three major challenges. First, the reduction of motion artifacts on the ECG signal interferes with the anomaly detection. Second, model training requires a massive amount of labeled data that are had to come by. Third, ECG databases have very imbalanced data making it difficult for deep learning model training.

**Author Contributions:** H.L. did the literature review and analysis, and P.B. did editing. All authors have read and agreed to the published version of the manuscript.

**Funding:** Funded by the CISCO Chair in Healthcare.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

#### MDPI

St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Sensors* Editorial Office E-mail: sensors@mdpi.com www.mdpi.com/journal/sensors

MDPI St. Alban-Anlage 66 4052 Basel Switzerland

Tel: +41 61 683 77 34 Fax: +41 61 302 89 18

www.mdpi.com

ISBN 978-3-0365-3876-1