*Article* **Motion Artifact Reduction in Wearable Photoplethysmography Based on Multi-Channel Sensors with Multiple Wavelengths**

#### **Jongshill Lee 1,**†**, Minseong Kim 1,**†**, Hoon-Ki Park 2,\* and In Young Kim 1,\***


Received: 12 February 2020; Accepted: 7 March 2020; Published: 9 March 2020

**Abstract:** Photoplethysmography (PPG) is an easy and convenient method by which to measure heart rate (HR). However, PPG signals that optically measure volumetric changes in blood are not robust to motion artifacts. In this paper, we develop a PPG measuring system based on multi-channel sensors with multiple wavelengths and propose a motion artifact reduction algorithm using independent component analysis (ICA). We also propose a truncated singular value decomposition for 12-channel PPG signals, which contain direction and depth information measured using the developed multi-channel PPG measurement system. The performance of the proposed method is evaluated against the R-peaks of an electrocardiogram in terms of sensitivity (Se), positive predictive value (PPV), and failed detection rate (FDR). The experimental results show that Se, PPV, and FDR were 99%, 99.55%, and 0.45% for walking, 96.28%, 99.24%, and 0.77% for fast walking, and 82.49%, 99.83%, and 0.17% for running, respectively. The evaluation shows that the proposed method is effective in reducing errors in HR estimation from PPG signals with motion artifacts in intensive motion situations such as fast walking and running.

**Keywords:** photoplethysmography; motion artifact; independent component analysis; multi-wavelength

#### **1. Introduction**

Photoplethysmography (PPG) is an optical method used to detect volume changes in blood in the peripheral circulation. PPG can determine these volume changes from the surface of the skin, and is a low-cost and noninvasive method. This technique provides useful information related to cardiovascular systems such as heart rate, oxygen saturation, blood pressure [1], and cardiac output [2,3], and is also used to determine stress levels by analyzing the response of the autonomic nervous system based on pulse rate variability (PRV) [4].

Recently, wearable PPG sensors have attracted attention because they can continuously measure and monitor heart rate (HR), and numerous devices in the form of bands or watches (e.g., Apple watch, Fitbit, and Samsung Gear) are being used to monitor instantaneous heart rate using PPG. While these PPG-based devices have the advantages of being lightweight, portable, and easy to use, the distortion of the signal due to motion artifacts in the PPG signal is a challenge to overcome. Currently, these devices are only used for general wellness purposes because they are accurate only in limited conditions such as resting or walking slowly.

PPG uses a sensor composed of light-emitting and light-receiving elements. When light is irradiated to the body tissue by the light-emitting element, it is transmitted, reflected, and scattered by the Beer–Lambert law in the tissues, blood vessels, and blood of the body and detected by the light-receiving element [5,6]. In general, a green, red, or infrared light-emitting diode (LED) is used as a light-emitting unit, and a photo diode (PD) is used for a light-receiving unit [7]. LEDs used for PPG measurements generally have a wavelength of 400–1000 nm. Short wavelengths do not reveal much cardiac activity and blood vessel information due to low skin penetration depths, but are less affected by motion artifacts due to the shorter light path. In the case of long wavelengths, the penetration depth of the skin is deep, which can clearly indicate activity of the heart and blood vessels such as the dicrotic notch, but these are affected by motion artifacts because of the long light path [8].

The normal frequency range for PPG signals is 0.5 to 5 Hz, while for motion artifacts it is 0.01 to 10 Hz [9–11]. Therefore, it is not easy to obtain a clean signal by applying a general filter to a PPG signal contaminated by motion artifacts. In order to solve this problem, adaptive filters or moving average filters are commonly used in the industrial field. However, satisfactory performance in removing or reducing motion artifacts has not yet been achieved, and various signal processing methods for reducing motion artifacts of PPG signals have been proposed.

Poh M.-Z. et al. developed an earlobe-wearable PPG measuring device and presented a method of removing motion artifacts by applying adaptive noise cancellation (ANC) using an accelerometer [12]. The correlation between the heart rate calculated via electrocardiogram (ECG) and the heart rate measured via PPG was shown as a performance evaluation. The results showed a correlation coefficient of 0.75 (*p* < 0.001) when ANC was applied while running. Due to the size of the earlobe-wearable measurement system, it is difficult to apply in everyday life and it may be inconvenient when measuring because the attachment method uses neodymium magnets. In addition, the ANC method requires an additional sensor, such as an accelerometer, because it must provide a reference signal for motion artifacts.

In a recent study, Zhang Y. et al. proposed the use of optical signals rather than accelerometers as the motion reference for the cancellation of motion artifacts [13]. The proposed framework uses the infrared (IR) PPG signal as the motion reference and the green PPG signal for HR estimation. This approach helps to reduce burden on additional hardware such as accelerometers and the computational complexity.

Reddy K. A. et al. proposed the CFSA (Cycle-by-cycle Fourier Series Analysis) method using Fourier series analysis for each cycle using the autocorrelation of PPG signals [14]. The results show that randomly applied Gaussian noise is removed. However, due to the limitation that CFSA can only be applied to periodic signals, it is difficult to apply to situations where loss of periodicity occurs due to distortion caused by motion artifacts during the actual PPG measurement.

In order to improve the performance of the noise reduction algorithm, methods using a multi-channel PPG system have been proposed. Warren K. et al. measured six-channel PPG signals at the forehead using six red and infrared LEDs [15]. They proposed an algorithm that selects the channel with the least influence of noise by quantifying the amount of motion artifacts for each channel during exercise. As a result, it was possible to automatically select an accurate channel from the measured multi-channel PPG signals. However, since the algorithm does not include noise reduction, and selects the signal with the lowest noise level from the measured signals, it is difficult to apply when motion artifacts exist in all channels.

It is necessary to design a filter suitable for multiple channels, and various algorithms such as independent component analysis (ICA), principal component analysis (PCA), and singular value decomposition (SVD) have been actively studied [9,16–19]. PCA is used to find an orthogonal linear transformation that maximizes the variance of variables, whereas ICA is used to find the linear transformation of the basis vectors that are statistically independent and non-Gaussian. Unlike PCA, the major feature of ICA is that the basis vectors are neither orthogonal nor ranked in order. The PPG signal and the motion artifacts are independent components of the detected signal, so ICA or PCA can be used to separate the cleaned PPG signal from these artifacts. However, in most studies, the number of PPG channels used for verifying a multi-channel signal processing algorithm is limited, and the relationship between the multi-channel signal and the algorithm is not clear.

In this study, we develop a multi-channel PPG measurement to consider the effects of motion artifacts on the direction of the sensor module and the change of penetration depth in the skin according to the wavelength. Further, we propose a multi-channel motion artifact reduction algorithm based on the signals obtained through this system. Using a multi-channel PPG system, 12-channel PPG signals for three wavelengths are acquired in four directions (up, down, left, and right). We present a method by which to reduce motion artifacts through applying ICA and a truncated SVD to 12 channels of PPG signals. We extract the independent components using an ICA of three channels of PPG signals measured in each module, then select the most pulsatile components. Using PCA, the statistical method by which the basis vectors are ranked in order, we can obtain the cleaned PPG signal. PCA can be implemented with powerful, robust techniques such as singular value decomposition (SVD) [20]. Ultimately, we implement PCA with SVD.

#### **2. Materials and Methods**

Our research focuses on reducing motion artifacts using multi-channel PPGs. In this section, we introduce our in-house-built wearable multi-channel PPG measurement system and describe the proposed motion artifact reduction algorithm using a multi-sensor module. We also describe in detail the experimental protocol and data acquisition using the measurement system.

#### *2.1. Wearable Multi-Channel PPG System*

We developed a wearable multi-channel PPG system consisting of the main system, inertial measurement units (IMUs), and PPG sensors, as shown in Figure 1a. The sensor module connected to the main system has sensors arranged in four directions perpendicular to its center, with each sensor consisting of a green, red, and infrared LED and one PD (Figure 1b). In addition, the sensor module includes a nine-axis IMU to detect movement.

**Figure 1.** In-house-built wearable multi-channel PPG acquisition system: (**a**) wearable PPG hardware system consisting of a main system, sensor module (PPG and IMU), and battery; (**b**) the direction and wavelength (G: green; R: red; IR: infrared) of the sensor for each channel when the wearable system is worn on the wrist.

The system architecture of the in-house-built wearable multi-channel PPG measurement system is shown in Figure 2. The main system consists of an ARM Cortex TM-M4-based microcontroller (STM32F407VGT, STMicroelectronics, Geneva, Swiss), an analog front-end (AFE4900, Texas Instruments, Dallas, TX, USA), and a Bluetooth module (PAN1321i, Panasonic, Osaka, Japan). The sensor module was designed and implemented using four SFH7050 sensors (OSRAM, Munich, Germany) and one motion sensor (MPU9250, InvenSense, San Jose, CA, USA). SFH7050 is a sensor for heart rate monitoring or oximetry. It is an integrated sensor that contains three LEDs (green, red, and IR) of different wavelengths.

**Figure 2.** System diagram of the wearable multi-channel PPG acquisition system including PC application software (S/W). The solid line indicates a wired connection (USART (Universal Asynchronous Receiver/Transmitter), SPI (Serial Peripheral Interface) and I2C (Inter-Integrated Circuit)) while the dashed line indicates a wireless connection.

The readings of the PPG sensor are acquired via the analog front-end under the control of the microcontroller, then the amplified digital signals are sent to the microcontroller using serial peripheral interface (SPI) communication. Motion sensor data are transmitted to the microcontroller via I2C communication. The main system includes a Bluetooth module so that the data can be transferred to a PC via wireless communication. Additionally, we have implemented in-house data acquisition software for wireless transmission and storage to the PC.

The PPG signal of each channel was acquired at a 100-Hz sampling rate with a 24-bit high resolution, while the three-axis acceleration data were acquired at 16 bits and 100 Hz. In this paper, we use the vector sum magnitude of three-axis acceleration to determine the degree of motion.

#### *2.2. Data Acquisition*

The subjects were seven healthy males and one female without cardiovascular disease between the ages of 20 and 30 years (mean age: 27.1 years). This study was approved by Hanyang University IRB (IRB Approved no. HYI-17-048-4) and informed consent was received from all subjects before the experiment.

Since ECG is characterized by robustness to motion artifacts, most studies use ECG as the ground truth [21,22]. Therefore, in this study, the experiment was conducted using the polar chest strap electrode with high reliability in ECG measurement. In addition, the electrode was wetted with water before the experiment to minimize contact noise between the skin and the electrode. The R-peak could be easily extracted from the measured ECG through the Pan–Tompkins algorithm.

PPG and ECG were measured in walking, fast walking, and running environments that may occur in everyday life. The developed multi-channel PPG measurement system was worn on the subject's wrist to measure 12-channel PPG and acceleration signals. The ECG signal to be used as a reference for the heart rate was obtained at a sampling rate of 300 Hz using the developed prototypes of ECG acquisition systems [23] based on ADS1298 (Texas Instruments, Dallas, TX, USA) and the chest strap (Polar Pro Strap, Polar Electro, Ltd., Kempele, Finland). In order to minimize the time delay difference between the multi-channel PPG and ECG systems, two systems can simultaneously perform Analog-to-Digital Conversion (ADC) and transmit measured data based on trigger signals transmitted from the in-house program. Figure 3 shows a photograph of a subject wearing an ECG and multi-channel PPG measurement system. The ECG system for measuring the signal to be used as a reference for HR is mounted on the chest strap, and the main system of the multi-channel PPG measuring system is fixed with a stretchable band on the arm. The sensor module is also fixed with a wrist support band.

**Figure 3.** The picture of the subject being equipped with an ECG and multi-channel PPG measurement system.

Experimental protocols consisted of resting, walking, fast walking, and running sessions. The measured signals were normalized using PPG measured during the first 1-min resting session. In order to minimize the effect between sessions, each session had a 1-min rest period. In the walking (about 1 m/s) session, the subjects walked for 2 min at the typical pace of their daily activities. Next, 2 min of fast walking (about 1.8 m/s) and 2 min of running (about 2.2 m/s) were performed. The experiment was carried out by the subject reciprocating a distance of 20 m in the corridor of the building. Subjects were allowed to move along both sides close to the wall instead of the center of the corridor for a smooth turn of the subject at both turning points. In order to maintain the subject's speed at each session, the observer induced the subject to reach a fixed time at the start of the round trip. By conducting this protocol two times for each subject, a total of 16 datasets were obtained (one additional dataset whose ECG was damaged by the lead fault of the electrode was excluded). The number of beats for walking, fast walking, and running were 2757, 2913, and 3563, respectively, for a total of 9233 beats.

#### *2.3. Motion Artifact Reduction Algorithm Based on Multi-Sensors*

The penetration depth of radiation in human skin is known to increase with increasing wavelengths [24]. When measuring PPG in different directions and locations on the wrist, different signals are detected for the same movement due to the diversity of the directions and distribution of blood vessels in the skin of the wrist. Therefore, in the presence of motion artifacts caused by movement, multi-channel PPG signals measured by sensors with multiple wavelengths and in various locations contain information about blood vessel characteristics under the skin and movement in various directions and depths.

In this study, we extracted three independent component signals via preprocessing and independent component analysis for three channels of PPG signals with three different wavelengths (green: solid line arrow; red: dashed line arrow; infrared: dotted line arrow) for each of the four sensors, as shown in Figure 4. Among these signals, the signal with the component pulsating the most was selected, and a reconstructed PPG signal was obtained by applying the truncated SVD to a total of four pulsating component signals selected one by one.

**Figure 4.** Block diagram of signal flows for each sensor and proposed algorithm.

Figure 5 shows the detailed proposed algorithm. The raw signals measured by the sensors contained various noise components such as power line interference, baseline drift, and ambient noise. In order to remove or reduce these noises, digital filtering was performed using a 3-order Butterworth band pass filter with cutoff frequencies of 0.5 and 5 Hz.

**Figure 5.** Flowchart of the proposed multi-sensor-based motion artifact reduction algorithm. (MMR is defined as Maximum to mean ratio).

In little or no motion environments, all channel signals can be measured with high quality. Therefore, heart rate can only be obtained by detecting peak points in PPG signals without any special processing. In this case, we used a signal with a green wavelength that is known to be measured well on the wrist. To quantify the motion, we used the three-axis accelerometer of the IMU mounted on the sensor module. High-pass filtering was performed to remove the gravity component from the acceleration, and the vector sum (*accsum*) was calculated for three axes of the gravity-free accelerometer, as shown in Equation (1). The threshold of the presence or absence of motion was set to the vector sum value to distinguish resting and walking.

$$acc\_{sum} = \sqrt{(acc\_x^2 + acc\_y^2 + acc\_z^2)}\tag{1}$$

In the presence of movement, PPG signals are a mixture of pulsation and motion artifact components. In this multi-PPG system, the PPG measured by sensors facing different directions showed the change of blood volume of various depths at each location, so the pulsatile component of each sensor can be extracted through ICA algorithms. However, because the output components of the ICA appear in random order and the signals can be reversed, a reverse-check and pulsatile component selection algorithm is required. The reverse-check algorithm (shown as a dotted box in Figure 5) performs differential and peak detection on each independent component (IC), calculates the average of the peak values, and applies the same process to the inverted signal. Due to the morphological characteristics of the PPG, the IC signal with the largest average peak value among the inverted and non-inverted signals becomes the correct PPG. Therefore, the IC with a large average value is the output.

The number of outputs of the ICA is the same as the number of input channels, and the output signals are randomly output regardless of the order of the input signals. The pulsation component is periodic because it is a change of blood volume that occurs with every beat of the heart. Therefore, by analyzing the periodicity for each IC signal, it is possible to select an IC with a pulsation component. We applied fast Fourier transform (FFT) to IC signals of each sensor module and selected one pulsatile component based on the maximum to mean ratio (MMR) in power spectral density.

As shown in Figure 4, the four components selected based on the MMR were pulsatile components measured at different locations. The truncated SVD [25] was applied to reduce the motion artifacts remaining on the pulsatile components extracted from four sensor modules with different directions, and to obtain the PPG signal that best reflected the change in blood volume. By applying the truncated SVD, the PPG is reflected in the largest singular value. Therefore, reconstruction using only this singular value yields a pure PPG signal with motion artifacts removed.

Figures 6–8 show an example of applying the proposed motion artifact reduction algorithm to a subject's data. Figure 6 shows the 12-channel PPG signals measured in the running state most affected by motion artifacts. The *x*-axis represents the time elapsed after 70 s, including 1 min of resting and 10 s of running.

Figures 7 and 8 show the results of applying the algorithm to each step shown in Figure 5. Figure 7 shows the 12 ICs after passing the ICA. Figure 8a shows four ICs selected through pulsatile component selection for each sensor module, and Figure 8b shows the results of applying the truncated SVD.

Figure 6 shows a 12-channel PPG signal measured from sensors placed in four directions while running at about 8 km/h. The four rows represent signals measured by the four sensor modules, and each column is a frequency-specific (green, red, and infrared) signal that can reflect information of the same depth. It can be seen that all channels are contaminated by motion artifacts. The R-peak time points extracted from the ECG are marked with the symbol (-), and purple dotted lines represent the heart rate. The Pan–Tompkins algorithm was used to detect R-peaks from ECG data [26]. In this paper, all algorithms (including the motion artifact reduction algorithm) were implemented in Matlab 2019 (MathWorks Inc, Natick, MA, USA).

Figure 7 shows the 12 independent components (ICs) applying ICA to the 12-channel input PPG signals shown in Figure 6. This represents the same number of ICs as the number of input signals and includes pulsatile, motion artifacts, and other noise components for the signals acquired from each sensor. Each component appears in random order and some may appear as inverted signals. Each sensor module has three ICs for PPG signals from three wavelengths, which are the signals shown in each row of Figure 7.

In Figure 7, the most periodic signals for the IC signals and their inverted signals for each sensor module are IC1, IC1, and IC3 for the sensor modules 1, 2, and 3, respectively; the signals for sensor 4 are unclear. The results of applying the reverse-check and pulsatile component selection algorithm to select these pulsatile components are shown in Figure 8a. One pulsatile signal was selected for each sensor. In this example, the signals selected for each of the sensor modules 1, 2, and 3 are the reversed signals of IC1 and the reversed signal of IC1 and IC2. After applying the truncated SVD to the four selected ICs, reconstruction using only the largest singular value yielded a pure PPG signal with motion artifacts reduced, as shown in Figure 8b.

**Figure 6.** Twelve-channel PPG signals measured while running at about 8 km/h.

**Figure 7.** ICA components of 12-channel PPG signals.

**Figure 8.** Selected independent components and motion artifact reduced PPG signals. (**a**) Selected ICs applying the reverse-check and pulsatile component selection algorithm, (**b**) reconstructed PPG signal based on the truncated SVD.

#### **3. Experimental Results**

In order to evaluate the proposed algorithm, the motion artifact reduction algorithm was applied to the signal measured by the multi-channel PPG measurement system, and the indexes for evaluating the performance of the algorithm were calculated by comparing the ECG with the reference signal. Sensitivity (Se) as Equation (2), positive predictive value (PPV) as Equation (3), and failed detection rate (FDR) as Equation (4) were used as performance indexes [27]. In these equations, TP (True Positive) is the number of peaks detected, FN (False Negative) is the number of peaks non-detected, and FP (False Positive) is the number of artifacts or noise classified as peaks. Algorithms based on the best signal selection [15], ICA [28], or SVD [25] were compared with the proposed method and evaluated using performance indexes.

$$\text{Se} = \frac{TP}{TP + FN} \times 100\% \tag{2}$$

$$\text{PPV} = \frac{TP}{TP + FP} \times 100\% \tag{3}$$

$$\text{FDR} = \frac{FP}{TP} \times 100\% \tag{4}$$

Figure 9 shows the results of applying the proposed method and algorithms based on the best signal selection, ICA, or SVD to the measured data at about 8 km/h. The R-peak time points extracted from the ECG as a reference for the heartbeat are indicated by the (-) symbol and purple dotted lines. Compared with other methods, the proposed algorithm is clearest in terms of the heart rate extraction of PPG signals and had a low FDR.

**Figure 9.** Example of applying the proposed method and algorithms based on best signal selection, ICA, and SVD during running at about 8 km/h.

Table 1 shows Se, PPV, and FDR as performance indexes for peak detection on 12-channel data measured during walking, fast walking, and running. As the intensity of the motion increases, Se decreases and PPV changes less. From the results, it can be seen that as the motion artifacts became more severe, the false negative (FN) for beat detection increased a lot while the false positive (FP) tended to fall slightly. In terms of wavelength, the green wavelength channels Ch1, Ch4, Ch7, and Ch10 had high sensitivity.


**Table 1.** Comparison of the results of Se, PPV, and FDR for the 12-channel signals acquired during walking, fast walking, and running conditions.


**Table 1.** *Cont*.

Table 2 shows the performance parameters Se, PPV, and FDR from algorithms based on the best selection method, SVD, ICA, and proposed method for the data measured in walking (2757 beats), fast walking (2913 beats), and running (3563 beats). As the movement increased with walking, fast walking, and running, the detection rate of heart beats decreased. The performance of the proposed method for walking was 99%, 99.55%, and 0.45% for Se, PPV, and FDR, respectively. The performance of the proposed method was higher than that of the other algorithms, and the SVD-based algorithm had the lowest accuracy. For the fast walking and running conditions, the proposed method showed the best performance, as well as for these two conditions with a lot of motion.

**Table 2.** Comparison of the results of Se, PPV, and FDR from algorithms based on the best signal selection method, SVD, ICA, or the proposed method for signals acquired under the walking, fast walking, and running conditions. **Conditions Performance Best Signal Selection SVD ICA Proposed**


#### **4. Discussion**

ECG is a representative signal for calculating heart rate that measures the bio-potential generated by electrical signals that control the expansion and contraction of the heart. Another signal is that of PPG, a light-based technology to sense volumetric changes in blood in peripheral circulation as controlled by the heart's pumping action. ECG produces an electrical signal that is robust in the presence of motion artifacts and has the advantage of stably extracting heartbeats even compromised by motion. However, ECG signals are obtained by measuring a weak electrical potential difference between two points, and thus cannot be measured on a single arm. In the case of using both hands with more electrical potential difference, the user must intentionally make contact with the electrode. In contrast, PPG has an advantage over ECG in terms of user convenience and wearability, as it can take measurements in any location with a high concentration of blood vessels.

Over the past decade, there HR monitors and wearable fitness equipment have been made commercially available. Many people use HR monitors to inform their training and to access aerobic fitness.

In the clinical field, physicians and trainers often refer to physiological and behavioral data such as energy consumption, step counts, sleep/wake information, and HR obtained from patients' wearable devices. Energy consumption is calculated using HR and accelerometer motion information. If accurate HR information is provided, more accurate energy consumption can be estimated. Accurate HR monitoring is an essential component of a systematic exercise prescription because a target HR is set to guide patient-specific exercise intensity.

As another field of the clinical application of HR, heart rate variability (HRV) is widely used to investigate the state of autonomic nervous systems and related diseases. HRV is calculated using beat-by-beat HRs, and accurate HR detection is required because HR errors lead to false ANS (Autonomic nervous system) analysis.

Currently, many devices are being used to monitor heart rate in the form of bands or watches (e.g., Apple watch, Fitbit, and Samsung Gear), and these PPG devices have the advantages of being lightweight, portable, and easy to use. However, the PPG signal is very vulnerable to motion artifacts. It is well known that the depth of penetration of light into human skin increases with decreasing wavelength [29]. Therefore, the wavelength of the LED and the direction of the PPG sensor module are important factors for analyzing the influence of motion artifacts in the PPG signal. In this study, we developed a 12-channel PPG measurement system with up, down, left, and right directions using three wavelengths (green, red, and infrared) for each direction. In addition, we proposed a multi-channel PPG motion artifact reduction algorithm based on independent component analysis and truncated singular value decomposition. In this study, PPG signals were measured at multiple skin penetration depths according to LED wavelength. In addition, PPG signals for each position and direction of blood vessels in the skin were measured through sensor modules with up, down, left and right directions. In order to apply independent component analysis to multi-channel PPG signals, signal inversion and pulsation component detection ability were confirmed by comparing MMR in the power spectral density through FFT. Furthermore, truncated singular value decomposition was applied to the detected pulsating components to reduce motion artifacts. When the proposed algorithm was applied to the signal measured during running using the developed system, a sensitivity of 82.49%, a positive predictive value of 99.83%, and a false detection rate of 0.17% were obtained. These results are due to the motion artifact reduction algorithm applied to the multi-channel PPG measurement system using various wavelengths and directions of the sensor modules. As the wavelengths increased, the change in blood volume at a deeper depth could be measured. Therefore, each PPG has a different amount of information, and when applied to the ICA algorithm, a pulsatile component can be obtained. In addition, the pulsatile components acquired from the sensor modules in different directions are represented as a single PPG signal that shows the change in blood volume without being affected by motion through the truncated SVD.

The proposed algorithm extracts the independent component signals from PPG signals with different wavelengths measured for each sensor module, as shown in Figure 4. In addition, in order to compare the effect of the wavelength and the direction of the sensor, we applied the proposed algorithm for PPG signals with the same wavelength in each senor module. This approach did not provide clear results as compared with the use of ICA-based algorithms for each sensor module. This means that signals measured by LEDs of the same wavelength in each sensor are measured through different positions and paths of light, so that information on motion artifacts is inconsistent when applying independent component analysis. Therefore, applying independent component analysis to the signals measured by the same sensor module rather than the same wavelength signals can increase the motion artifact reduction performance.

The limitation of this study is that the protocols for PPG measurement were limited to walking, fast walking, and running; therefore, more detailed protocols need to be set up to remove the noise caused by various kinds of movements in real life. Additional experiments and research are needed to develop real-world applications. As further work, we plan to quantitatively analyze the effects of motion artifacts on PPG signals at various speeds on the treadmill. In addition, long-term measurements are required to analyze the various features in the signal measured during physical activity. We will try to diversify the protocols and measure data in various environments through long-term experiments.

The proposed method is effective for monitoring heart rate. Furthermore, if the beat-to-beat interval can be obtained precisely like the R-peak of ECG in the presence of motion artifacts, it can be applied to various fields, such as emotion or stress, by analyzing the autonomic nervous system based on pulse rate variability.

**Author Contributions:** Conceptualization, J.L. and M.K.; methodology, J.L. and M.K.; formal analysis, J.L. and M.K.; writing—original draft preparation, J.L. and M.K.; writing—review and editing, I.Y.K. and H.-K.P.; supervision, I.Y.K. and H.-K.P.; funding acquisition, H.-K.P. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2018R1A5A7025522).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Retrieval and Timing Performance of Chewing-Based Eating Event Detection in Wearable Sensors**

#### **Rui Zhang \* and Oliver Amft**

Chair of Digital Health, Friedrich-Alexander Universität Erlangen-Nürnberg (FAU), Henkestraße 91, 91052 Erlangen, Germany; oliver.amft@fau.de

**\*** Correspondence: rui.rui.zhang@fau.de; Tel.: +49-9131-852-3604

Received: 20 December 2019; Accepted: 17 January 2020; Published: 20 January 2020

**Abstract:** We present an eating detection algorithm for wearable sensors based on first detecting chewing cycles and subsequently estimating eating phases. We term the corresponding algorithm class as a bottom-up approach. We evaluated the algorithm using electromyographic (EMG) recordings from diet-monitoring eyeglasses in free-living and compared the bottom-up approach against two top-down algorithms. We show that the F1 score was no longer the primary relevant evaluation metric when retrieval rates exceeded approx. 90%. Instead, detection timing errors provided more important insight into detection performance. In 122 hours of free-living EMG data from 10 participants, a total of 44 eating occasions were detected, with a maximum F1 score of 99.2%. Average detection timing errors of the bottom-up algorithm were 2.4 ± 0.4 s and 4.3 ± 0.4 s for the start and end of eating occasions, respectively. Our bottom-up algorithm has the potential to work with different wearable sensors that provide chewing cycle data. We suggest that the research community report timing errors (e.g., using the metrics described in this work).

**Keywords:** automated dietary monitoring; eating detection; eating timing error analysis; biomedical signal processing; smart eyeglasses; wearable health monitoring

#### **1. Introduction**

Eating occasion detection is at the core of automated dietary monitoring (ADM) in humans, targeting healthy diet management [1,2]. We regard intake to consume food pieces with dietary activities including ingestion, chewing, and swallowing [3] as an eating occasion if all dietary activities start and end in a given temporal relation. Meals or snacks are typical examples of eating occasions. Eating occasions thus have a start and end denoting the timing of intake beginning and intake completion. For solid and semi-solid food, chewing (i.e., the cyclic opening and closing of the jaw) is typically the longest activity within eating occasions [3]. We therefore consider chewing as representative of eating occasions, denoted as *eating events* in this work.

Recording chewing to interpret eating has been attempted in a variety of approaches intended for free-living ADM (see Section 2), as accurate eating event timing detection is essential for diet management. For example, users could be reminded to check vital parameters such as glucose level when the initial moment of an eating event is detected. Similarly, users could be asked to confirm food details or take a photo of leftovers immediately after an eating event ends. In both examples it is important that timing errors of the eating event detection are minimal. Hence, timing errors determine whether an eating event detection approach is suitable across the ADM application spectrum.

Detecting dietary activities, including eating events, in wearable or ambient sensor data is a complex pattern analysis and modelling problem due to the inter- and intra-individual variability in free-living behaviour patterns. Approaches to eating event detection and analysis can be categorised as top-down or bottom-up sensor data processing: In the top-down approach, eating events are detected by applying sliding windows to the sensor time series and applying feature pattern models. If necessary, further information details such as chewing cycles, intake gestures, etc. could be derived using the detected eating events. Conversely, in a bottom-up approach, individual dietary activities are modelled and the result is subsequently used to detect eating events. The early abstraction in bottom-up processing may help to deal with varying dietary activity patterns. Furthermore, bottom-up processing fits into hierarchical data processing schemes of resource-constrained wearable and IoT systems, where instead of raw data, derived parameters or events are communicated between system components.

This investigation proposes a bottom-up eating detection algorithm and compares it with two top-down algorithms. The bottom-up eating detection algorithm first detects individual chewing cycles. Retrieved chewing cycles are then used to detect eating events and estimate start and end of eating occasions. In contrast, top-down algorithms apply sliding windows over the sensor time series to detect eating events. The bottom-up algorithm proposed here is potentially agnostic to the particular sensor used, as long as chewing cycle information is acquired. In particular, the following contributions are made:


#### **2. Related Work**

ADM has received increasing research interest over the last decade, where eating event detection based on data from various body-worn and ambient sensors has been frequently considered. Most investigations that considered quantitative performance for eating event detection focused on detection accuracy or retrieval metrics. In this investigation, we highlight that timing errors are critical for detection performance and investigate timing errors specifically.

Eating event detection has often been approached by top-down data processing. For example, Dong et al. used a wrist motion sensor to detect eating, reporting 81% accuracy in 449 hours of free-living data [4]. Thomaz et al. also used a wrist-worn three-axis accelerometer to monitor eating in free-living conditions [5]. The random forest classifier yielded 66% precision and 88% recall for one day of data and intra-individual analysis. Bi et al. implemented a headband carrying a bone-conducting acoustic sensor and reported eating detection performance of over 90% [6]. Farooq et al. used accelerometer-equipped eyeglasses to detect food intake in the lab and in short-term free-living [7]. The highest F1 score of 87.9% ± 13.8% (mean ± standard deviation) was achieved with a 20 s sliding window using a *k*-nearest neighbour classifier. Studies involving multiple sensor modalities are a recent trend in eating event detection applications. Wahl et al. implemented an eyeglasses prototype equipped with an inertial measurement unit (IMU), an ambient light sensor, and a photoplethysmogram (PPG)sensor for the recognition of nine daily activities, including eating [8]. The classification reached an average accuracy of 77%. Merck et al. realised a multi-device monitoring system involving in-ear audio, head motion, and wrist motion sensors, which could recognise eating with 92% precision and 89% recall [9]. Papapanagiotou et al. proposed an ear-worn eating monitoring system based on PPG, audio and accelerometer, achieving an accuracy up to 93.8% and class-weighted accuracy up to 89.2% in eating detection [10]. Bedri et al. used an ear-worn system for chewing instance detection. An F1 score of over 80% and accuracy of over 93% was reported [11]. Timing error for eating start was 65.4 s. The authors did not report the timing error at eating ends. Doulah et al. investigated the effect of the temporalresolution

of eating microstructure analysis, including the duration of eating events [12]. The analysis did not yield insight into start and end time estimates for eating events. In our prior investigation of top-down eating detection based on free-living EMG recordings, a one-class support vector machine (ocSVM) yielded an F1 score of 95%. Timing error analysis showed 21.8 ± 29.9 s for eating start and 14.7 ± 7.1 s for eating end [13].

For the bottom-up data processing approach, dietary activities that characterise eating are modelled, and eating is subsequently derived from these activities. Chewing has frequently been investigated as a basis for subsequent eating analysis. Amft et al. investigated chewing detection for ADM using an ear-plug acoustic sensor, capturing vibration patterns during chewing [14]. Bedri et al. proposed earwear using proximity sensors for the detection of tiny deformations of the outer ear during chewing [15]. Eating could be detected with 95.3% accuracy with a user-dependent classification. Zhang et al. was the first to use smart eyeglasses to detect chewing, analysing EMG electrode positions in eyeglasses frames and the effect of hair on the EMG signal [16]. EMG electrodes were embedded into the eyeglasses' temples, and chewing cycles were detected with a precision and recall of 80%. In subsequent work [17], a refined version of the eyeglasses was used for eating detection, yielding an accuracy of above 95% in natural, free-living data. Furthermore, it was demonstrated that soft foods such as banana provide identifiable EMG signatures. Chung et al. incorporated a force-sensitive load cell in eyeglasses hinges to monitor temple movement during chewing, head movement, talking, and winking. A classification of these activities yielded an F1 score of 94% [18]. Farooq et al. attached a strain sensor at the temporalis muscle area to obtain chewing cycle information [19]. With additional accelerometer data, the authors reported an F1 score of 99.85% for recognising eating from other physical activities in laboratory recordings.

So far, timing performance has been rarely reported, partly because methods to derive eating reference in free-living studies were missing. Here, we evaluated three algorithms in free-living EMG recordings with a realistic ratio of eating vs. non-eating time. All algorithms can be used with one or more sensors and in multimodal configurations. In particular, the bottom-up algorithm builds on chewing cycle information extracted from sensor data, and thus can be applied with other sensors besides EMG by adapting the chewing cycle extraction. Our current work focuses in particular on the analysis of timing errors.

#### **3. Eating Event Detection Algorithms**

We propose a bottom-up eating event detection algorithm and compare it to two top-down algorithms. As input for all algorithms we consider a multi-source sensor data stream of chewing cycle measurements, corresponding to a random process *X*n(*t*), where *n* indexes the random variables (e.g., sensor channels or features) and *t* is the time index. For example, the sensor could be an EMG monitor measuring the temporalis muscle contraction or acoustic transducers measuring vibration patterns due to food fracture. An overview of the algorithm pipelines for all algorithms considered is shown in Figure 1. Below, we formally describe the algorithms.

#### *3.1. Bottom-Up Algorithm*

The idea of this algorithm is to estimate eating events from the density of chewing cycles, where a relatively high frequency of chewing cycles indicates eating. After pre-processing multi-source sensor signals *X*n(*t*), chewing cycle onsets *C*<sup>n</sup> were detected. Subsequently, a sliding window of length *w*0, was applied around each retrieved onset of *C*<sup>n</sup> (i.e., with a step size of one onset). Then, the sliding window moved to the next detected onset. At every onset, we calculated chewing cycle frequency *f*n as the number of detected onsets per time interval *w*0. A chewing segment start *t*n, start was detected as the first onset in *C*<sup>n</sup> at the signal start or an onset after a preceding detected chewing segment, where *f*<sup>n</sup> equalled or exceeded *θ*0. The end of a chewing segment *t*n, end was determined as the onset in *C*<sup>n</sup> where *f*<sup>n</sup> equalled *θ*<sup>0</sup> and the (*θ*<sup>0</sup> − 1)-th subsequent *f*<sup>n</sup> equalled 1. Detection results of *n* sensor sources were combined and post-processed by eliminating gaps between adjacent groups of chewing segments. The details of each step are described below.

**Figure 1.** Overview of the top-down and bottom-up eating event detection algorithms investigated in this work. White processing blocks indicate functions shared by the algorithms. Shaded processing blocks are specific functions for each algorithm. Both top-down algorithms follow the same detection pipeline with different implementations of the "Chewing segment detection" block. ocSVM: one-class support vector machine.

#### 3.1.1. Signal Pre-Processing

Pre-processing steps vary depending on the type of sensors used. It is likely that the human body acts as an antenna and picks up power line noise. Thus, we applied a notch filter to raw signal *X*n(*t*) to eliminate potential power line interference at frequency *f*nf. In this study, we used dual-channel smart eyeglasses EMG data sampled at 256 Hz per channel. Hence, *X*n(*t*)(*n* = 1, 2) represents EMG data in this case. The notch filter frequency was set to *f*nf = 50 Hz. Baseline wander and motion artifacts were removed using a high-pass filter with a cut-off frequency of *f*hpf = 20 Hz—a typical value for EMG signal processing. The resulting data *X*n, hpf were rectified for detection. The pre-processed and rectified data were abbreviated as *X*n. The pseudo code is in Algorithm block 1.

**Algorithm block 1** : Signal pre-processing.

**Input:** Multi-source free-living data *X*n(*t*)

**Parameter:** Notch filter band-stop frequency *f*nf, high-pass filter cut-off frequency *f*hpf

**Output:** Pre-processed data *X*n


#### 3.1.2. Chewing Cycle Detection

Chewing cycle detection was performed by adapting the EMG onset detection principle initially proposed by Abbink et al. [20]. Every chewing cycle has an onset time corresponding to the moment

when the muscle contraction starts, and an offset time corresponding to the contraction end. Hence, the number of onsets should represent the number of chewing cycles. First, a sliding window of size *w* was applied to *X*n. The value of *w* should be no larger than the duration of a typical chewing cycle. Here we used 0.4 s (100 samples for the EMG signal), as chewing cycle frequency typically ranges between 0.94 and 2.17 Hz [21]. We derived a conditional summation of sensor samples within the window: For samples 0 to *w*/2 within the current window starting at *i*0, we derived *index*<sup>1</sup> = ∑*w*/2 *<sup>i</sup>*=<sup>0</sup> 1 if *X*n[*i*<sup>0</sup> + *i*] < *θ*C. For samples in the second half-window, *index*<sup>2</sup> = ∑*<sup>w</sup> <sup>i</sup>*=*w*/2+<sup>1</sup> 1 if *X*n[*i*<sup>0</sup> + *i*] > *θ*<sup>C</sup> was summed. Finally, *index* = *index*<sup>1</sup> + *index*<sup>2</sup> was derived. Parameter *θ*<sup>C</sup> was set to *μ* + 3 × *σ*, where *μ* was the mean and *σ* the standard deviation derived from baseline noise of *X*n. Both *μ* and *σ* were estimated across training data of all participants. The amplitude of the baseline noise was assumed to be Gaussian distributed and threshold *θ*<sup>C</sup> was set to cover 99% of the confidence interval. As the window with size *w* was slid with a step size of one sample, an *index* in range [0, *w*] was obtained for each sample, forming a new time series *I*n per signal source *n*. To determine chewing onsets, we derived points of *I*<sup>n</sup> that exceeded *θ*<sup>P</sup> × *w*, with *θ*<sup>P</sup> in the range [0, 1]. Considering the chewing frequency, the temporal distance between neighbouring detection points of *I*<sup>n</sup> should be larger than *t*interval = 1/3 s. Detected chewing cycle onsets were sequentially saved in a list *C*n. The pseudo code is shown in Algorithm block 2.

**Algorithm block 2** : Chewing cycle detection.

**Input:** Pre-processed data *X*n

**Parameter:** EMG burst threshold *θ*C, sliding window size *w*, peak threshold *θ*P, peak interval *t*interval **Output:** A list of detected chewing cycle onsets *C*n

```
1: index = 0, In ← ∅, Cn ← ∅
2: for (i = 1, i < w/2, i + +) do
3: if Xn[i] < θC then
4: index+ = 1
5: for (i = w/2, i < w, i + +) do
6: if Xn[i] > θC then
7: index+ = 1
8: for (i = w/2 + 1, i < length(Xn) − w/2, i + +) do
9: if Xn[i − w/2 − 1] < θC then
10: index− = 1
11: if Xn[i − 1] < θC then
12: index+ = 1
13: if Xn[i − 1] > θC then
14: index− = 1
15: if Xn[i + w/2] > θC then
16: index+ = 1
17: In.append(index)
18: for (i = 0, i < length(In) − 2, i + +) do
19: if Ii < Ii+1 and Ii+1 > Ii+2 and Ii+1 > θP then
20: Cn.append(i + 1)
21: i+ = tinterval
```
3.1.3. Chewing Segment Detection

We applied a sliding window of size *w*<sup>0</sup> to *C*n, with the start of the window located at the first chewing cycle onset *C*n[0], and subsequently slid to the adjacent onset until reaching the end of *C*n. With the window starting at *C*n[*j*], the chewing cycles in the window were counted and noted as the *j*th

chewing cycle frequency *f*n[*j*]. We applied a criterion *f*n[*j*] ≥ *θ*<sup>0</sup> to confirm that onset *C*n[*j*] belonged to a chewing segment. Correspondingly, the first onset in *C*<sup>n</sup> that also satisfied the criterion *f*n[*j*start] ≥ *θ*<sup>0</sup> was considered as the start of the first chewing segment *t*n, start[0]. An onset with *f*n[*j*end] = *θ*<sup>0</sup> and *f*n[*j*end + *θ*<sup>0</sup> − 1] = 1 indicated that *C*n[*j*end + *θ*<sup>0</sup> − 1] was the only onset in the latest window, that is, the final onset/end of the *k*-th estimated chewing segment, denoted as *t*n, end[*k*]. The next onset after *t*n, end[*k*] that satisfied the criterion *f*n[*j*] ≥ *θ*<sup>0</sup> was considered as the (*k* + 1)-th chewing segment start *t*n, start[*k* + 1]. The pseudo code is shown in Algorithm block 3.

**Algorithm block 3** : Chewing segment detection.


#### 3.1.4. Fusion of Multi-Source Detection

The fusion of *N* sensor or feature channels was made by taking the union of source-specific chewing segments:

$$T\_{\text{merge}} = \bigcup\_{n=1}^{N} \bigcup\_{k=1}^{K\_n} [t\_{\text{n, start}}[k], \ t\_{\text{n, end}}[k]]\_{\text{}} \tag{1}$$

where *T*merge was a list of the merged chewing segments of *N* sources, and *K*<sup>n</sup> was the number of chewing segments in Channel *n*. All detected segments were collected chronologically regardless of any overlapping among sources. For the evaluation data used in this investigation, bilateral EMG channels yielded two lists of chewing segments. Hence, *N* = 2.

#### 3.1.5. Gap Elimination

In free-living, eating is often accompanied by interrupts (e.g., conversations). Thus, an eating event is usually represented by several chewing segments in *T*merge, where the gaps indicate interrupts without chewing cycles. Depending on the detection application and choice of eating event definition, it is reasonable to combine temporally close segments into one final eating event. We denote the start and end of the *k*-th segment *T*seg[*k*] in *T*merge as *start*[*k*] and *end*[*k*] respectively, and the gap between *T*seg[*k*] and *T*seg[*k* + 1] as *T*gap[*k*]. We generated a new list *T*concatenated by removing all gaps that were smaller than *t*gap:

$$T\_{\text{concatenated}} = \bigcup\_{k \in S} \left( T\_{\text{seg}}[k] \cup T\_{\text{gap}}[k] \cup T\_{\text{seg}}[k+1] \right), \tag{2}$$

where

$$S = \{ k \mid \operatorname{start}[k+1] - \operatorname{end}[k] \preccurlyeq t\_{\mathsf{gap}} \}. \tag{3}$$

An estimated eating event start *T*ˆ start[*q*] and end *T*ˆ end[*q*] with (*q* = 1, 2, ..., *Q*) were thus obtained as the start and end of every segment in *T*concatenated, where *Q* was the number of segments (i.e., detected eating events) in *T*concatenated. In the present investigation, *t*gap was set to 5 min.

#### *3.2. Top-Down Algorithms*

Two top-down algorithm variants were considered with different chewing segment detection blocks (see Figure 1): Threshold-based top-down and ocSVM top-down. Several blocks of the top-down and bottom-up pipelines were identical, including signal pre-processing (Section 3.1.1), fusion of multi-source detection (Section 3.1.4), and gap elimination (Section 3.1.5). Here we concentrate on the individual variants of the chewing segment detection.

#### 3.2.1. Threshold-Based Top-Down Algorithm

A sliding window of size *w*<sup>1</sup> and step size *s*<sup>1</sup> was applied to *X*n. We computed the chewing intensity feature *F* in each sliding window and applied threshold *θ*1. If *F* > *θ*1, the window was reported as chewing. For the present investigation, we considered EMG readings as time series containing chewing information and extracted EMG work as chewing intensity feature *F*. EMG work was defined as the summation of rectified EMG samples within the sliding window. For the EMG data, *s*<sup>1</sup> was 256 samples (1 s). The pseudo code is shown in Algorithm block 4.

#### **Algorithm block 4** : Chewing segment detection.

**Input:** Preprocessed signals *X*n

**Parameter:** Sliding window size *w*1, window step size *s*1, chewing intensity feature threshold *θ*<sup>1</sup>

**Output:** Detected eating starts/ends from each signal source *n*: *t*n, start and *t*n, end

1: *t*n, start ← ∅, *t*n, end ← ∅

$$\text{2: } \text{for } (i = s\_1, \ i < \text{length}(X\_{\mathbf{n}}) - w\_1, \ i + = s\_1) \text{ do}$$

$$\text{3:} \qquad \text{extract } F\_{\text{previews from } X\_{\text{In}}}[i - s\_1 : i + w\_1 - s\_1]$$

$$\text{4:} \qquad \text{extract } F\_{\text{current from } X\_{\text{n}}}[i:i+w\_{1}].$$

$$\text{5: } \begin{array}{c} \text{extract } F\_{\text{next}} \text{ from } X\_{\text{n}} [i + s\_1 : i + w\_1 + s\_1] \end{array}$$


#### 3.2.2. ocSVM Top-Down Algorithm

We applied a non-overlapping sliding window of size *w*<sup>2</sup> to the EMG data. An ocSVM model was trained based on the windows to detect chewing segments using the same features as described in [13]. The radial basis function (RBF) was used as the kernel. The hyper-parameters *γ* and *ν* were varied, where *γ* weighted the non-support vectors' influence on the hyper plane, and *ν* was an upper bound on the fraction of margin errors as well as a lower bound of the fraction of support vectors relative to the number of training samples. The ocSVM predicted the class of each sliding window as either eating or non-eating.

#### **4. Evaluation Methodology**

We evaluated the algorithms using a free-living dataset collected from smart eyeglasses with integrated EMG electrodes. Details of the eyeglasses design and data collection process can be found in [17]. Here we summarise the relevant data collection procedures, as well as evaluation methods.

#### *4.1. Participants and Recording Protocol*

The dataset was collected from a group of 10 participants (6 male, 4 female, average age of 25.1 years, average BMI of 23.8 kg/m2) each wearing the smart eyeglasses for one day of regular activity without script or specific protocol. The study was approved by the Ethical Committee of FAU Erlangen-Nürnberg. All participants were healthy and consented to participate after having received oral and written study information.

Each participant received a pair of 3D-printed smart eyeglasses mechanically fitted to their head using a personalisation procedure similar to [22], ensuring that the effect of hair, loss of contact between skin and electrodes, or movement was minimal. In each temple of the eyeglasses frame, dry stainless-steel electrodes of 3 mm × 20 mm (EL-DRY-STEEL-5-20, BITalino, Lisbon, Portugal) were integrated, yielding a two-channel EMG recording system on each side of the head. The EMG electrode pairs were positioned to capture activity of the temporalis muscle. A reference EMG channel was recorded from the right temporalis muscle via gel electrodes attached to the skin at the corresponding forehead region. All EMG channels were acquired with an EMG recorder (ACTIWAVE, CamNtech, Cambridgeshire, United Kingdom) at a sampling rate of 256 Hz per channel.

Participants were suggested to wear the eyeglasses during one entire recording day (i.e., attaching the system right after getting up and ending before going to bed at night). Recordings were conducted in free-living conditions without dietary constraints. Participants chose their diets and conducted other daily activities at their choice. Participants were asked to log activities in a paper-based 24-h activity journal with 1 min resolution, including any food intake as well as start and end times of eating events. As Figure 2 show:

**Figure 2.** Illustration of the EMG eyeglasses and study: (**A**) Eyeglasses frame with electromyographic (EMG) electrodes symmetrically integrated on the temples. (**B**) Study participant wearing the EMG eyeglasses. Reference EMG electrodes were attached to the skin at the right forehead temporalis muscle position.

#### *4.2. Data Corpus*

By the end of the recording, we collected a total of 122.3 h of free-living data including 44 eating events ranging from 54 s to 35.8 min, which summed up to 429 min of eating for all participants combined. Eating took up 5.8% of the whole dataset. Participants took off eyeglasses for a total time of 12 min during the recordings, which corresponds to 0.16% of the total recordings. Known activities reported by participants in the activity journal included cooking, eating, walking, transportation, attending lectures, performing office work, having conversations, doing housework, brushing teeth, playing video games, going to the cinema, and engaging in physical exercise. Through visual inspection we observed various artefacts in the data corpus including, for example, suspected teeth grinding [17].

#### *4.3. Free-Living Eating/Non-Eating Reference Construction*

Obtaining accurate reference information on eating events in unsupervised free-living studies is particularly challenging. Here, we propose a combination of participant activity journal and EMG reference recordings. All eating events were annotated using a custom Matlab annotation software. Our annotation process comprised two steps: coarse manual annotation using the activity journal and fine-tuning through reference EMG recordings. Coarse manual annotation was realised by searching the journal for the participant-logged start time *T*start[*i*] and end time *T*end[*i*] of each annotated eating event, indexed *i*. As manual journaling is often imprecise in identifying event times, a fine-tuning step was used to adjust coarse eating event times: Start and end times *T*S[*i*] and *T*E[*i*] of eating event *i* were adjusted by visually searching the reference EMG data for chewing cycle patterns in the neighbourhood of approx. ± 1 min (journal resolution) around the coarse annotations *T*start[*i*] and *T*end[*i*]. Since each chewing cycle had a duration of around 1/3 s, the fine-tuned eating event labels *T*S[*i*] and *T*E[*i*] resulted in a chew-accurate eating/non-eating reference with resolution of approximately 1/3 s. The derived start and end times were considered as eating/non-eating reference for algorithm evaluation. The eating/non-eating reference construction is illustrated in Figure 3.

**Figure 3.** Illustration of the free-living eating/non-eating reference construction. *T*start and *T*end are start and end times of an eating event obtained from the participant journal, while *T*<sup>S</sup> and *T*<sup>E</sup> are the corrected start and end times derived by searching the EMG reference ± 1 min around *T*start and *T*end. The eating/non-eating reference construction is described in Section 4.3.

Type 1 errors (false positives) could occur in the eating/non-eating reference if an activity journal entry could not be matched to any chewing-like pattern in the reference EMG signal. We inspected all entries in the participant journal and compared them to the reference EMG signal. In the present dataset, all participant-annotated events could be matched to the EMG reference.

Type 2 errors (false negatives) could occur in the eating/non-eating reference if participants omitted annotations. To amend potential omissions from the activity journal, we first inspected the entire reference EMG data for chewing-like signal patterns that did not correspond to any entry in the journal. For each chewing-like pattern found, we inspected the activity journal to obtain insight into the participant's momentary context. We observed that concise activations in the EMG reference occurred occasionally without corresponding eating annotations (e.g., during a lecture). Yet, EMG activations were typically short (i.e., less than five consecutive activations with lower EMG work compared to confirmed chewing). Given a non-eating context and the clear non-chewing signal patterns, we attributed the activations to teeth grinding. Jaw motion during speaking does not involve profound temporalis muscle activation, as there is hardly any teeth clenching and thus substantially lower EMG work than during chewing [16]. In addition, non-chewing muscle activity is typically non-periodic, thus observable and distinguishable during time series inspection. Overall, we did not find Type 2 errors in the dataset, supporting our eating/non-eating reference construction approach for free-living recordings.

#### *4.4. Evaluation Metrics*

A grid search over the window length parameters *w*<sup>i</sup> and thresholds *θ*<sup>i</sup> with *i* = 0, 1, 2, and *θ*<sup>2</sup> = (*γ*, *ν*) representing the combination of the ocSVM hyper-parameters was performed to investigate optimal parameter combinations. To evaluate the eating event detection algorithms, we derived the overlap between retrieved eating events and any eating/non-eating reference label. The precision and recall of each algorithm were calculated according to: *Recall* = *<sup>T</sup>*tp *<sup>T</sup>*gt and *Precision* <sup>=</sup> *<sup>T</sup>*tp *<sup>T</sup>*ret , where *T*gt was the summed duration of all *P* eating events according to the constructed eating/non-eating reference labels, calculated as:

$$T\_{\rm gt} = \sum\_{p=1}^{P} \left( T\_{\rm end}[p] - T\_{\rm start}[p] \right) \,, \tag{4}$$

while *T*ret was the summed duration of all *Q* detected eating events by the algorithm:

$$T\_{\text{ret}} = \sum\_{q=1}^{Q} (\hat{\mathcal{T}}\_{\text{end}}[q] - \hat{\mathcal{T}}\_{\text{start}}[q]),\tag{5}$$

and *T*tp was the summed overlap duration between retrieved eating events and the eating/non-eating reference:

$$T\_{\rm lp} = \sum\_{p=1}^{P} \sum\_{q=1}^{Q} \left( \min(T\_{\rm end}[p], \hat{T}\_{\rm end}[q]) - \max(T\_{\rm start}[p], \hat{T}\_{\rm start}[q]) \right), \tag{6}$$

given the following premise:

$$\min\left(T\_{\text{end}}[p], \hat{T}\_{\text{end}}[q]\right) - \max\left(T\_{\text{start}}[p], \hat{T}\_{\text{start}}[q]\right) > 0. \tag{7}$$

*T*ˆ end[*q*] and *T*ˆ start[*q*] were the start and end time points of the *q*th retrieved eating event, *Q* was the number of retrieved eating events, and *P* was the number of eating events in the eating/non-eating reference. All times were computed at a resolution of 1 sample (1/256 s). Finally, the F1 score was calculated as the harmonic mean of precision and recall.

The evaluation was performed using leave-one-participant-out (LOPO) cross-validation. In each evaluation fold, the EMG data were split into a training set of nine participants and a test set of one participant. This process was repeated 10 times until every participant's data were in the test set once. Training data were used in a grid search to estimate performance under different parameter combinations. Optimal parameter combinations were chosen according to the training data performance and applied with the test data to estimate algorithm performance. The test results of all folds were averaged to obtain the total algorithm performance. For the bottom-up algorithm, *w*0, *θ*0, and *θ*<sup>P</sup> were analysed. For the threshold-based top-down algorithm, *w*<sup>1</sup> and *θ*<sup>1</sup> were analysed, and for the ocSVM top-down algorithm, *w*2, *γ*, and *ν* were analysed.

#### *4.5. Detection Timing Errors*

We further investigated the detection timing error of every algorithm. The average start and end timing errors of the algorithms were calculated as follows:

$$
\Delta \overline{T}\_{\rm S} = \frac{\sum\_{q=1}^{Q} \min \left( \left| \left. \mathcal{T}\_{\rm S}[q] - T\_{\rm S}[p] \right| \right|\_{p=1,2,\dots,P} \right)}{Q}, \tag{8}
$$

and

$$\Delta T\_{\rm E} = \frac{\sum\_{q=1}^{Q} \min\left( \left. \left\langle \mathcal{T}\_{\rm E}[q] - T\_{\rm E}[p] \right\rangle \right|\_{p=1,2,\dots,P} \right)}{Q} \tag{9}$$

Δ*T*<sup>S</sup> and Δ*T*<sup>E</sup> were the average absolute detection errors at the start and end of eating events.

To investigate retrieval performance in detail and identify the algorithms' behaviour, different optimisation objectives were analysed. Using the grid search over the parameter space, the best performance point according to maximal F1 score (termed *P*X), minimal start timing error Δ*T*<sup>S</sup> (termed *P*S), and minimal end timing error Δ*T*<sup>E</sup> (termed *P*E) were derived.

#### **5. Results**

Algorithm detection performances according to the test data are shown in Figure 4 for varying parameter combinations. The threshold-based top-down algorithm could not reach meaningful F1 scores, indicating that detecting eating events is not a trivial task. The performance map of the ocSVM algorithm shows a periodic landscape due to the variation of parameters *γ* and *ν*. The best performance of the bottom-up algorithm was achieved with *θ*<sup>P</sup> = 0.7. The bottom-up algorithm had a smooth landscape across the parameters. For all algorithms, the three performance points (*P*X, *P*S, *P*E), did not coincide at the same parameter settings. To illustrate the performance points quantitatively, they are summarised in Table 1. The bottom-up algorithm yielded comparable performance values across all performance points (*P*X, *P*S, *P*E). At best, the bottom-up algorithm reached an F1 score of 99.2%, yielding a start/end error (Δ*T*<sup>S</sup> and Δ*T*E) of 2.4 ± 0.4 s and 4.3 ± 0.4 s, respectively. The results show that the bottom-up algorithm outperformed the top-down algorithms.

**Table 1.** Performance comparison among algorithms using optimal parameter settings for each performance point (*P*X, *P*S, *P*E). For timing metrics, mean performance ± std. dev. are shown. For example, the bottom-up algorithm reached an F1 score of 99.2% at best, where the start/end error was 2.4 ± 0.4 s and 4.3 ± 0.4 s, respectively.


**Figure 4.** F1 score, average start and end timing errors for test data and each eating event detection algorithm using grid search over the parameter space. The highest F1 score location was denoted as *P*X, while *P*<sup>S</sup> and *P*<sup>E</sup> indicate the minimal start timing error Δ*T*<sup>S</sup> and minimal end timing error Δ*T*E, respectively. The bottom-up algorithm performance was obtained with fixed peak detection threshold *θ*<sup>P</sup> = 0.7.

Figure 5 shows the effect of varying the peak detection threshold *θ*<sup>P</sup> of the bottom-up algorithm, indicating robust retrieval and timing performance (*P*X, *P*S, *P*E) for a parameter range of 0.65 < *θ*<sup>P</sup> < 0.8. The best retrieval and timing performances were achieved at *θ*<sup>P</sup> = 0.7.

**Figure 5.** Retrieval and timing performance of the bottom-up algorithm at different peak detection thresholds *θ*P. In the timing error diagrams, caps on vertical line ends indicate the standard deviation.

Figure 6 illustrates retrieved eating events as point pairs across F1 scores, where the line ends represent average start and end timing errors (Δ*T*<sup>S</sup> and Δ*T*E). Δ*T*<sup>S</sup> and Δ*T*<sup>E</sup> were obtained by varying the algorithm parameters and averaging the individual timing errors obtained for specific retrieval performances. For the bottom-up algorithm, the graph shows the performance obtained by varying sliding window size *w*<sup>0</sup> and chewing cycle frequency threshold *θ*<sup>0</sup> at fixed peak detection threshold *θ*<sup>P</sup> = 0.7. There was no parameter combination for the threshold-based top-down algorithm that yielded an F1 score above 40%. In contrast, bottom-up and ocSVM top-down algorithms provided retrieval performances of up to 99% and 95% respectively. With increasing F1 score, timing errors tended to decline. It can be derived from Figure 6 that the relation between start and end timing errors varied between algorithms. For the bottom-up algorithm and F1 score >80%, the start timing error Δ*T*<sup>S</sup> became smaller than the end timing error Δ*T*E.

**Figure 6.** Relation of retrieval and timing performance of all three algorithms. Δ*T*<sup>S</sup> and Δ*T*<sup>E</sup> were obtained by varying algorithm parameters. Blue lines link average start and end timing errors of all eating events at a given algorithm parameter set. With increasing F1 score, timing errors declined. Note that timing error analysis could be performed only for eating events retrieved by an algorithm. The bottom-up algorithm (*θ*<sup>P</sup> = 0.7) achieved the highest F1 score at smallest timing errors among all algorithms investigated. Point pairs were down-sampled for visualisation.

Figure 7 shows examples of the detected eating event starts and ends. The bottom-up algorithm yielded similar detected labels to the eating/non-eating reference whereas the ocSVM top-down algorithm incurred larger timing errors for some eating event instances.

**Figure 7.** *Cont*.

**Figure 7.** Examples of data situations with the corresponding retrieval results of bottom-up and ocSVM top-down algorithms obtained at each algorithm's performance point *P*<sup>S</sup> (left column) and *P*<sup>E</sup> (right column). As the diagrams illustrate, the ocSVM algorithm may anticipate or delay eating events' starts, as ocSVM deploys a time-domain sliding windowing with a given step size, whereas the bottom-up algorithm did not.

#### **6. Discussion**

The F1 score describes the algorithm's retrieval performance by retrieved and missed eating instances, while timing errors reveal the accuracy of estimated event timing. Considering the varying eating durations in a free-living context, the two metrics are not necessarily similar in their sensitivity, thus we argue here that both are relevant metrics for evaluation. Among the few investigations on event timing in ADM, Dong et al. [4] reported event start-timing errors of 0.6 minutes, and end errors of 1.5 min. The authors determined intake from bites using arm motion, while the present investigation was based on chewing. Bedri et al. [11] evaluated eating event detection using a metric called delay, measuring the time from the beginning of an eating event until it was recognised. The average delay reported was 65.4 s. In contrast to the investigation of Bedri et al. [11], we also evaluated the timing error at the end of eating events. Our bottom-up algorithm yielded average start/end timing errors of 2.4 s and 4.3 s.

We believe that the bottom-up method is practically useful for eating event start and end detection, as well as, for example, sending reminders, sampling user responses, and gathering environmental variables. Study participants did not complain or reject wearing the eyeglasses for one day. Hence, the combination of the bottom-up algorithm and smart eyeglasses could be adopted in unconstrained free-living applications. In contrast to several previous investigations of eating detection that require the training of many parameters, our bottom-up approach requires that only four parameters be set (*w*0, *θ*0, *θ*P, and *t*gap). Our analysis indicates that performance was unaffected by parameter changes across a wide value range (i.e., shown as a smooth performance space in Figure 4). Pattern learning may work reliably when trained on sufficient data with proper features. Considering the variability in free-living behaviour and the unbalanced distribution of eating and non-eating times, substantial training data is needed to implement any learning method and therefore a minimal number of free parameters is key. The bottom-up method outperformed our top-down methods, with a higher F1 score and lower detection timing errors. We attribute the higher performance yielded in the present investigation to the expert knowledge incorporated in the bottom-up approach.

In both top-down and bottom-up methods, the sliding window size *w*<sup>i</sup> influenced the algorithm performance. In top-down methods, a small sliding window of length *w*<sup>i</sup> contained fewer data samples, which usually led to less representative features. Thus, the lowest timing errors were typically not achieved with smallest sliding window sizes (e.g., *w*<sup>i</sup> < 10 *s*). Similarly, in the bottom-up method, both window size *w*<sup>0</sup> and the second parameter *θ*<sup>0</sup> influenced the detection performance. Hence, a small window size *w*<sup>0</sup> did not always give the best performance.

The timing errors of top-down methods were highly dependent on the combination of sliding window length and window step size. Large sliding window sizes included more dietary activity information, but usually failed in accurately detecting the starts and ends of eating events as the window was filled with both eating and non-eating data. Figure 7 shows impressively that the ocSVM top-down algorithm indeed incurred larger timing errors due to the larger sliding window size. In our previous investigation [13], we adopted window overlaps and majority voting on windows with differing results. We observed that retrieval performances differed marginally when comparing overlapping and non-overlapping windowing approaches. The bottom-up algorithm was not affected by the window parameterisation problem, as the window step size is determined by distance of neighbouring chewing onsets. Thus, eating and non-eating rarely coincided in one window.

The bottom-up algorithm is based on chewing cycle detection, which decouples the eating event detection from the sensor type. The detection leverages event frequency information (i.e., chewing cycle frequencies), which can be obtained with different chewing monitoring approaches. We expect that the algorithms could be applied with various sensors or sources that provide chewing cycle information, including acoustics [1], ear canal deformation [15], strain on head skin [19], eyeglasses temple motion [18], etc.

The present investigation analysed relevant free parameters of the proposed algorithms to determine their stability. For example, the sweep of the peak detection threshold *θ*<sup>P</sup> showed desirable performance trends (Figure 5) allowing us to set *θ*<sup>P</sup> to a proper range—approximately [0.65, 0.8]. In addition, the pipeline block "gap elimination" used the parameter *t*gap = 5 min to merge temporally close eating detections. The parameter *t*gap supports our informal definition of eating events as temporally linked sequences of dietary activities during one meal or snack [3] and was set based on experience. Varying *t*gap means to change the representation of eating occasions (i.e., meals and snacks), which is outside of the scope of this investigation.

While this investigation focuses on the retrieval performance, the computational complexity of the algorithms is an important consideration for wearable resource-limited systems. In a detection, the computational complexity is O(*n*) for the threshold-based top-down algorithm, and O(*n*sv × *n*) for the ocSVM top-down algorithm. Here, *n* is the input data dimension and *n*sv is the number of support vectors of the ocSVM model. The complexity of the bottom-up algorithm is decided by the chewing cycle detection method. For the proposed bottom-up algorithm, the corresponding complexity is O(*n*). With a proper chewing cycle detection approach, the bottom-up algorithm is suitable to execute, for example, on wearables at a minimal computational cost. The delay due to processing was not addressed in this investigation. However, with the low complexity of all algorithms, processing delay is expected to have a negligible effect compared to the algorithm timing errors.

This investigation was supported by a new method to obtain reference data on eating times in a free-living context, where we combined the participants' activity journals with reference EMG measurements. While the activity journals yielded rather coarse timing, they provided us with context information on the users' behaviour. The reference EMG measurement complemented the journal with accurate timing resolution of individual chewing cycles. However, adherence to journals is known to decline quickly over several days of measurement [23]. Hence, it is reasonable to assume that journals alone would be too inaccurate. We avoided video recordings to retrieve eating/non-eating reference due to privacy concerns and the potential impact of cameras on natural, free-living behaviour.

One limitation of our study is that only young healthy participants were involved. For other populations, the eating structure could vary, which could generate different eating durations. However, our present investigation already showed that eating events ranging from short snacks of 54 s to 35.8 min meals could be recognised. Other populations may benefit from different pre-processing steps or other sensors to apply the discussed bottom-up algorithm. We are planning longer-term studies in the future.

#### **7. Conclusions**

We proposed a bottom-up eating event detection algorithm that uses chewing cycle information as input and compared it to two top-down algorithms, including threshold-based and ocSVM algorithms. Evaluation of the algorithms was performed using free-living data with smart eyeglasses recording EMG data bilaterally from the temporalis muscles. Our results indicate that the F1 score became less meaningful at high retrieval rates above 0.9. The analysis of timing errors revealed substantial differences of several tens to hundreds of seconds on average between top-down and bottom-up algorithms. The grid search analysis showed smooth performance transitions during parameter variation for the bottom-up algorithm. We conclude that timing error analysis is an important component in performance estimation, besides a relevant retrieval metric, as the F1 score. We suggest that the research community report timing errors (e.g., using the metrics described in this work). The bottom-up algorithm yielded the overall best results with the lowest timing errors of 2.4 ± 0.4 s for eating start and 4.3 ± 0.4 s for eating end. The bottom-up algorithm is thus suitable for eating event detection.

**Author Contributions:** R.Z. and O.A. devised the methodology. R.Z. performed data curation and implemented the algorithms. O.A. provided feedback throughout the implementation phase. R.Z. and O.A. prepared the manuscript. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research did not receive external funding.

**Acknowledgments:** The present work was performed in partial fulfilment of the requirements for obtaining the degree "Dr. rer. biol. hum." We are thankful to the participants for the time and effort spent in the study.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
