Anomaly Detection Framework for Wearables Data: A Perspective Review on Data Concepts, Data Analysis Algorithms and Prospects

Sunny, Jithin S.; Patro, C. Pawan K.; Karnani, Khushi; Pingle, Sandeep C.; Lin, Feng; Anekoji, Misa; Jones, Lawrence D.; Kesari, Santosh; Ashili, Shashaanka

doi:10.3390/s22030756

Open AccessReview

Anomaly Detection Framework for Wearables Data: A Perspective Review on Data Concepts, Data Analysis Algorithms and Prospects

by

Jithin S. Sunny

^1,†

,

C. Pawan K. Patro

^2,*,†

,

Khushi Karnani

¹,

Sandeep C. Pingle

²,

Feng Lin

²,

Misa Anekoji

²,

Lawrence D. Jones

²,

Santosh Kesari

^3,4

and

Shashaanka Ashili

²

¹

Rhenix Lifesciences, Hyderabad 500038, India

²

CureScience, San Diego, CA 92121, USA

³

Pacific Neuroscience Institute, Providence Saint John’s Health Center, Santa Monica, CA 90404, USA

⁴

Department of Translational Neurosciences, Saint John’s Cancer Institute at Providence Saint John’s Health Center, Santa Monica, CA 90404, USA

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Sensors 2022, 22(3), 756; https://doi.org/10.3390/s22030756

Submission received: 16 December 2021 / Revised: 11 January 2022 / Accepted: 15 January 2022 / Published: 19 January 2022

(This article belongs to the Section Wearables)

Download

Browse Figures

Versions Notes

Abstract

:

Wearable devices use sensors to evaluate physiological parameters, such as the heart rate, pulse rate, number of steps taken, body fat and diet. The continuous monitoring of physiological parameters offers a potential solution to assess personal healthcare. Identifying outliers or anomalies in heart rates and other features can help identify patterns that can play a significant role in understanding the underlying cause of disease states. Since anomalies are present within the vast amount of data generated by wearable device sensors, identifying anomalies requires accurate automated techniques. Given the clinical significance of anomalies and their impact on diagnosis and treatment, a wide range of detection methods have been proposed to detect anomalies. Much of what is reported herein is based on previously published literature. Clinical studies employing wearable devices are also increasing. In this article, we review the nature of the wearables-associated data and the downstream processing methods for detecting anomalies. In addition, we also review supervised and un-supervised techniques as well as semi-supervised methods that overcome the challenges of missing and un-annotated healthcare data.

Keywords:

anomaly detection; heart rate; wearables; missing data; machine learning

1. Introduction

The assistance of wearable technologies in monitoring healthcare is revolutionizing the medical field. The emergence of wearable devices has allowed real time monitoring of vital signs, including the heart rate, number of steps taken and other parameters, such as calories and elevation [1]. These devices enable the continuous and longitudinal monitoring of the above-mentioned physiological parameters. The advantage of such a system is that it can be used anywhere and at any time.

Given the clinical significance of wearable devices and the associated physiological parameters that are measured, wearable devices likely can play a role in reducing health check-up costs and unwanted burden on the already overloaded healthcare programs around the world. The role of wearables in the future of precision health is currently being developed on a large scale [2]. To obtain an overview of the number of studies in the field of wearables, a keyword-based search was performed. The term “wearable sensors healthcare monitoring” was searched using the Google Scholar database (accessed 18 October 2021). We observed that the number of wearables-related studies has significantly increased within the last two decades (Figure S1).

The increase in wearables-associated studies published in the last few decades indicates a growing interest in wearable applications and efforts in advancing the data analysis associated with this field. The growing trend observed here also indicates the growth of the wearables market with several companies now developing biosensors for monitoring physiological parameters.

Several smart wearable devices and/or applications ranging from smart clothes, skin devices and other gadgets have been developed [3]. Popular clinical and research-grade wearable sensor technologies are being manufactured by Ava Science, Abbot, Zoll and Medtronic, among others. These manufacturers have also been approved by the United States (US) Food and Drug Administration (FDA). Among them is also Fitbit^®, which has the largest user base along with Jawbone^®. These devices have shown good performance in terms of accuracy of user physiological parameters, which has led to them being employed for various scientific studies.

Apart from these devices, the Apple^® Watch 2, Samsung Gear S3^®, Xiaomi Mi^® and Huawei Talk Band B2^® are other wearable devices that provide health care monitoring [4,5]. In addition to these established names in the wearables industries, there are several manufacturers that assemble devices, including Welch Allyn, Scanadu Scout and iHealth-finger. These latter firms have even been utilized in clinical studies [6]. Deemed a medical revolution, these devices provide continuous longitudinal monitoring.

Data processing of physiological parameters, such as heart rate, blood pressure and body temperature, recorded by the wearable devices, can provide a clinical yield that may play a role in assessing patient health. The continuous availability of the aforementioned features within well-defined time and date frames increases the availability of multiple unique patient data points individually as well as for larger population groups. In addition, such continuous monitoring may foster more efficient and reliable diagnoses.

Progress in the field of wearable technologies has also facilitated the development of algorithms for automated health event prediction along with modes for prevention and focused clinical intervention [7]. Extensive reviews on remote sensing of patient health and sensor development in the past have been presented, and such reviews continue to be published today [8,9,10,11]. Literature reviews and meta-analysis have also focused on wearable-based interventions, specifically on their role in enabling a healthier lifestyle [12].

In this review article, we address the process of detecting clinically relevant features extracted from wearable sensors and the associated data. As part of this process, a major task is to identify relevant data points or instances from raw wearable outputs which are indicative of patient health. For example, the heart rate is a vital physiological parameter, and abnormal heart rates that span a period of time can be translated into indicators of various diseases by utilizing mathematical models, which are elaborated from Section 3 onwards.

These abnormal data points are called anomalies, and this review is primarily on understanding and reviewing algorithms that are capable of detecting anomalies in addition to decision-based systems that can handle the constantly evolving personalized data [13]. Even though methods exist for anomaly detection, there are still some challenges in the current literature with respect to anomaly detection. The first hurdle in anomaly detection is in evaluating the anomalies and distinguishing them between true and false positives.

Secondly, various statistical and machine-learning-based techniques are employed in anomaly detection based on the field of application, such as bank fraud, malware detection and healthcare. However, the prediction of anomalies in each field is based on trends and signatures which are unique to that field [14]. In the field of healthcare, application of the existing anomaly detection methods would require significant re-structuring and unique assumptions [15]. In this article, we provide an overview of anomaly detection, data types, imputation strategies and the prospects of the field.

2. Overview of Anomaly Detection

This section details the data types and strategies involved in analyzing the wearable data for detecting anomalies. The process of anomaly detection involves detecting patterns in heart rate and steps among other parameters that are significantly different from the remaining data. Anomalies in heart rate typically translate to significant and often actionable information.

In addition to the enormous value of heart data, other physiological data that can be collected from wearables include steps, blood pressure, the respiration rate, SpO2 levels, electrocardiogram, calories and skin temperature [16,17]. Additionally, various studies have combined capnography, stroke volume, pain, level of consciousness and urine output to accurately determine the associated physiological changes in patients. Recent studies that will be discussed further have shown that there are few traditional vital parameters that are crucial and can accurately evaluate human health.

Heart rate is considered as the standard vital sign indicating changes in cycles of the heart. Recent studies have shown an increase in the usage of this primary attribute to infer various cardiac pathologies [18,19,20]. Evidence is mounting in support of heart rate data to assess cardiovascular disease and its prevention [21]. A high resting heart rate correlates with an increased risk of coronary artery disease (CAD) [22]. Even in healthy people, heart rate monitoring gives an insight into normal cardiac physiology.

Monitoring healthy individuals is necessary as anomaly detection techniques employ supervised, un-supervised and semi-supervised algorithms, all of which requires continuous temporal data for analysis. Using anomaly detection, the identification of unusual patterns can sometimes be false positives and will not have any medical relevance; hence, the results obtained by anomaly detection methods should always be cross-checked with the electronic health record (EHR) data of the user. A general overview of anomaly detection is described below (Figure 1).

The process starts with acquiring data from patient wearable devices, which includes the heart rate and steps. Various API interfaces are available to download this information, and these use secure gateways to retrieve it. The raw data are then pre-processed to ensure all the parameters have a uniform data structure based on timestamps. Data pre-processing is followed by missing data imputation by utilizing various algorithms, e.g., the Expectation-Maximization (EM) algorithm. The processed data are then subjected to various statistical and machine-learning algorithms for anomaly detection.

Anomaly detection can be carried out using several methods as elaborated at the end of the workflow. Some of the methods comprise hybrid methods from statistics, machine learning and data analysis, for example, HROS-AD, RHR-diff and LAAD. The detected anomalies are used to predict and infer clinically relevant information for the user wearing the wearable device. Abbreviations: EM—Expectation Maximization; KNN—K-nearest neighbor; HROS-AD—Heart rate over steps anomaly detection; RHR-Diff—Resting Heart Rate difference; LAAD—Long Short-Term Memory Network-based autoencoder; and MICE—Multivariate imputation by chained equations.

2.1. Noise and Outliers

Anomaly detection is strictly distinct to noise in the data. As the word suggests, noise is a phenomenon that is of no interest to data analysis and must be removed before anomaly detection is carried out. Considering a subjective judgement, the designated deviation for a point to be called an outlier in real applications is a difficult task. The anomalies can be embedded in a huge amount of noise, and it is noteworthy that even an outlier should be considered an abnormality/noise. The problem with noise in data is far worse for electronic health records since very little information is available on the patient’s whereabouts, which is required to associate the physiological parameters.

Under such conditions, noise is considered a weak outlier, and its detection algorithms use quantifiable methods, such as the nearest neighbor algorithms [23]. Since wearable data comprises continuous time series data points, careful evaluation is required for the distinction between normal data, anomalies and noise. However, some studies suggest that even the outliers may contain valuable information [24]. Raw data comprised of multiple components is illustrated in Figure 2.

These data instances require meticulous differentiation for further processing as discussed by Aggarwal (2017) [23]. In wearables-associated data, the distinction between normal and other data types is primarily performed using machine-learning-based techniques.

Various statistical tests along with proximity models provide a good approximation for differentiating normal data points from the rest [24]. Furthermore, the knowledge of outlier detection also requires understanding of the different machine-learning-based categories. Studies in anomaly detection problems in the area of finance, climate and internet applications have broadly used supervised, un-supervised and semi-supervised machine-learning approaches [25,26].

Semi-supervised methods have been successful and applications in the above fields have displayed methods, including the Mahalanobis distance [27], Cook’s distance [28], Tukey’s method [29], Z-score [30] and K-means [31] and K-medoids [24].

However, there are a few challenges to be addressed before considering anomaly detection from wearables-associated data. First, defining a region of anomaly is often in the boundaries of normal patterns. Under such situations, normal physiological behaviors can often be masked as anomalous and vice versa. Secondly, in the case of heart rate data, normal behavior might evolve, which can be otherwise represented in the absence of adequate knowledge of underlying physiological changes. Another important challenge is the unavailability of labelled data, which is usually used for training or validation processes. Under such conditions, appropriate modifications in the above-mentioned algorithms, as elaborated in Section 4 of this article, are required.

2.2. Data Types

The data collected from wearables can be an object, point or vector among others. The data can have attributes that can be binary, continuous or categorical. These data can have a relationship with each other in the form of sequence data, spatial data and graphical data. The anomaly points from these data can be divided into point anomalies and contextual anomalies. While point anomalies are those instances from the data that are anomalous or those that lie outside the boundary of the normal regions, the latter categorizes points based on a specific context. They may be deemed as an outlier in one instance while being normal in another [32].

The contextual anomalies are dependent on the structure of the data set and anomalies detected from time series data are a prominent example. Another group, called collective anomalies, is the occurrence of points that are not anomalous by themselves but occur as a collection of points as seen in cases where the data instances are related [33]. In a particular study, Chandola et al. (2009) introduced an example of human electrocardiogram output in which a region of low value existed for a large time period. In such cases, this abnormality was not characterized as an anomaly.

Labels provide additional information about each instance in the wearables data. A significant feature of advanced wearable devices is the availability of several features that can enhance the value of each point. Along with heart rate, information on the number of steps, calorie consumption and temperature can aid the anomaly detection methods. Retrieving these labels, however, poses a challenge [34]. The best practices in anomaly detection include associating wearable devices data with patient health records, which can annotate each instance of heart rate, steps, calorie consumption etc., into medically relevant information clusters [35]. However, obtaining such data from health care management systems depends heavily on accurate measurements and a thorough knowledge of the underlying mechanisms.

2.3. Data Pre-Processing

Wearable devices, based on their capability, can detect various physiological measurements from their users. These include heart rate, steps, calories burned, elevation and various activity summary data at the seconds and/or minute levels. Several wearable manufacturers have also made available dedicated servers for storing the user information, e.g., Fitbit^® stores user data in their servers. The raw data can be accessed via an application programming interface (API) and user authentication [36]. Once downloaded, these files can be converted into programmable formats for further processing.

The most popular among these are comma-separated values (CSV) files due to their pre-dominance in the data landscape [37]. The wearable data, which includes all the physiological properties described above, can be downloaded for each day separately. These intraday data are typically processed in order to achieve uniform date–time stamps. The anomaly detection pipelines have different requirements based on their usage. Although, all of them need the heart rate and the corresponding time frames, which has to be built accordingly.

Machine-learning-based methods generally require one file with all the physiological annotations that could be extracted from the device. These methods are mostly unsupervised and are capable of predicting accurate anomalies from user data. The LAAD framework has been particularly successful in this respect, and this, along with other existing methods, will be elaborated in greater detail in Section 3. A major challenge in the pre-processing step is filling and/or imputing the missing data [38], which is detailed in the following section.

2.4. Missing Data and Data Imputation

The physiological data from daily activities gathered from wearable devices at a highly detailed level can yield accurate clinical information. However, retrieving continuous wearables data is a difficult task due to inherent “missing data” challenges. There can be a number of reasons why data is missing from a wearable device.

For example, instrument malfunctions as well as inconsistencies in extracting all of the data due to abrupt wearing behaviors constitute a few of the reasons [39]. Imputing missing data may be suspect and draw great attention to the validity in time-series data analysis. Missing data can be categorized into three types based on the likelihood of being missing; missing completely at random (MCAR), missing at random (MAR) and missing not at random (MNAR). In MCAR, no difference exists between data sets with missing data and those with no missing data.

Missing data caused due to power loss in the wearable device is a loss of information, which is due to random chance. In the case of MAR, there is a pattern where the data would be missing based on the time period when the battery is still active. If the data were missing only on that specific time of day, it would be considered missing at random. It is assumed that, in the case of MAR, the missing data can be evaluated from the remaining values, i.e., we still have the information about the instance of the missing data. Here too, the missingness depends on the data we observe.

Here, the probability of the missing data is the same within certain specified instances. In both MAR and MCAR, we have a good chance of recovering the missing data. However in MNAR, the source of missing data is known, and yet, a mechanism to effectively retrieve it does not exist [40]. Wearable devices could be removed because of various reasons and sometimes even purposely, which could lead to MNAR. When the data follow a MCAR or MAR distribution, the observed data are still representative of the population, unlike the MNAR. The chief principle of identifying the appropriate category of missing data is so that no one fills in missing data unnecessarily.

Omitting the missing data can directly lead to biased estimates. Hence, data imputation presents the alternative by estimation based on knowledge from data and predicting the values. The approaches to handle missing data are broadly distinguished into statistical imputation methods and imputation mechanisms that are integrated with machine-learning algorithms. One of the most basic methods of missing data imputation is mean value imputation.

In this method, the empty instances are filled using the mean values from the available observed data. However, this method is unable to preserve the relationship between variables. Since the imputation in this case are estimates, there will be associated errors, which will lead to committing type I errors, in which case, the null hypothesis will be rejected although accurate. Although there are several methods for handling missing data, two of the widely used approaches include the maximum likelihood (ML) and multiple imputation (MI) methods.

The maximum likelihood method is considered to be more efficient due to multiple reasons, including the minimum sampling variance, consistency in the results produced and one uniform model. On the other hand, the multiple imputation method requires several decisions and imputation models, etc. Hot deck imputation is a simple and effective method of imputing values from the observed set of a similar unit, i.e., imputing values using values from the same set. This method is not as developed as the other methods, and its applications may have challenges which are yet to be studied in detail.

The multivariate imputation by chained equations (MICE) method imputes data using assumptions like MAR and MCAR [41]. The iterative series of the prediction models employs multiple variables to impute the missing variable making it highly robust. This is particularly useful for imputing values belonging to variables with good correlation. Bootstrapping along with multiple imputation using expectation maximization (EMB) is a robust method. This method is known to be effective for up to 30% missing data and is far superior to single imputation methods which do not consider the uncertainty of the predicted missing value as they are obtained using the mean, median or mode as opposed to the machine-learning-based methods, such as MICE. A brief comparison of these methods is presented in Table 1.

3. Basic Categorization of Anomaly Detection

The wearable data has two main features: a timestamp and continuity of data points. Each variable must have a time dimension and data points that are continuous within a certain period of time. Repeated measurements over time, otherwise referred to as time-series data are the primary form of data structure retrieved from wearables [48].

The physiological measurements, such as heart rate, steps etc., that are collected at regular intervals over time allows the identification of trends in patient health and aids in health monitoring. Time series data can be further classified into two types: univariate and multivariate. Univariate time series have scalar values, while the latter consist of multidimensional values or vectors. Wearable associated data are often multivariate time series as these consist of more than one-time dependent variable [49]. These variables are also dependent on each other.

In addition to the type of data, it is also important to understand the different types of outliers that are often discussed within the context of anomaly detection in multivariate data. Point outliers are abnormal points at a specific instance of time. The abnormality could be judged based on its relationship with the subsequent values, which is detailed in the next section. The other type of outliers is categorized as subsequence outliers. These are a collection of consecutive outlier points that can affect multiple variables in the dataset [23,50]. Detecting these outliers/anomalies from wearables-associated data would require a combination of several existing methods. These existing methods are benchmark anomaly detection techniques, which have been previously employed in fields, such as bank frauds, data networking, spam detection, insurance and other fields.

The methods employed to identify anomalies in these areas are now observed to have extended applications [25,51]. A certain degree of similarity in the data types of the above-mentioned fields with that of wearable data for example, time instances, longitudinal data and the existence of baseline features, among others, have pushed researchers in the field of anomaly detection to apply the existing algorithms for identifying health-related anomalies in the wearable data. As previously stated, anomaly detection for wearables-associated data can be broadly divided into supervised, semi-supervised and unsupervised methods.

3.1. Supervised Anomaly Detection

In this framework, the dataset is composed of labels for training and test data. The training data can be used to create a classification model that will demarcate the normal versus the anomalous instances. The general principles behind these methods are often independent of the spatial or temporal data in use [52]. The chief goal of supervised anomaly detection is to support the learning method with application specific information. In the case of wearables, the goal is to identify physiological data points that give information about the hidden abnormalities in the users’ data. Such relevant markers are however very rare to extract since extracting continuous biologically relevant information for long time periods and without missing data is difficult. As a consequence, the creation of a supervised learning model is very difficult [53].

A clear distinction between normal and abnormal data is required for supervision and is also referred to as training. Acquiring label information on these data points is of the utmost importance. Supervised anomaly detection methods completely ignore unlabeled data in the training step. The dependence of supervised methods on labeled data includes active learning from the extensive knowledge available for the outlier candidates [54]. It is evident that supervised methods would provide a better accuracy of detection by virtue of the additional knowledge.

Some of the popular methods include supervised neural networks, support vector machines (SVM) and decision trees. Support vector machines has been extended to various non-standard scenarios as in the case of anomaly detection from wearables [55]. Training data are used by the one-class SVMs, which then model the density distribution based on it. This is later used to differentiate between normal and abnormal data instances. Another popular method is the application of decision trees. Decision trees create models that classify anomalies based on a decision rule. This rule is dependent on the training data.

The classification of anomalies depends on the model class that is evaluated from the training data. The classification of data points will depend on all the nodes of the decision tree and its corresponding value derived from the decision rule. However, decision trees can produce over-complex trees [56]. Depending on the type of device, healthcare data from wearables can have several annotations, which can be useful for increasing the accuracy of machine-learning-based predictions. Several new approaches have been designed that addresses the current drawbacks, and these are detailed in Section 4.

3.2. Unsupervised Anomaly Detection

Contrary to supervised methods, this framework does not require training data. These algorithms are based on two assumptions for the data under consideration. The first assumption is that the number of normal instances far outnumber the anomalous ones. The second assumption views anomalies as a different qualitative instance compared to the normal [57,58]. Data groups appearing frequently can be assumed normal. Based on these assumptions, unsupervised anomaly detection methods can be broadly categorized into clustering-based methods, statistical methods and also nearest neighbor-based methods, the latter being frequently used for anomaly detection in this category [59].

k-Nearest neighbor (KNN) in the field of anomaly detection is a classification method based on user-defined threshold values. Even though the KNN method is used in supervised anomaly detection, it is also applicable in unsupervised anomaly detection. The fundamental idea behind identifying an anomaly is usually its tendency to stay farther away from a cluster of similar observation. In the absence of learning, a threshold value has to be determined, which will determine the anomaly [60]. However, a major drawback is that it identifies only global anomalies or point anomalies. In wearables data, local anomalies are equally important, as anomalies at an instance of time can have a significant impact on the nearby values. In any medical condition, the instances before and after an anomaly are particularly relevant and important.

Some of the widely used unsupervised anomaly detection methods also include the clustering technique [60,61] unsupervised neural networks [62] and K-means-based clustering method [63]. Neural networks are particularly important for anomaly detection in multivariate data, which is primarily the case with wearables-associated data. These can correlate different variables of the time series. A major drawback of supervised methods, such as SVM and KNN is their low effectiveness against multivariate data.

Several modifications on neural network-based methods, such as deep auto encoding [51], Gaussian mixture model [64], convolutional long short-term memory networks [65], multi-scale convolutional recurrent encoder-decoder [66] are amongst the latest improvised methods to overcome such issues. Several algorithms have been proposed for this unsupervised anomaly detection but to identify the proper subset for the anomaly detection task is considered difficult. With respect to the unsupervised anomaly detection method, deep learning allows extraction of the features from the data. There are, however, a few limitations to this approach.

It is difficult to obtain such huge volumes for training purpose. A large section of these data is unstructured and requires various pre-processing steps, which may vary based on their sources [67]. There also exists an innate variability amongst anomalies themselves. There are various types and subtypes of anomalies, with 3 broad, 9 basic and 63 subtypes, which have been previously studied. These should be understood in the analysis of healthcare associated data as further research work is done in anomaly detection [68].

3.3. Semi-Supervised Anomaly Detection

The semi-supervised detection method is considered to be better than supervised methods. Often due to incomplete information in data (mostly outlier labels and the variance in outlier (randomness of values)), semi-supervised techniques are used to check the outlier property of a data point. This class of anomaly detection is involved when a large section of the data is unlabeled and the dimensions of data are high, which is especially the case with clinical observation data [69].

Traditional unsupervised methods can work on unlabeled data; however, the unavailability of large-scale continuous data can be a major limitation. This is another reason behind employing semi-supervised methods, which can work with missing data at the cost of few labeled instances.

The aim of such detection methods is to build a normal class model, and the anomaly can be detected based on the deviation of its corresponding instances [70]. Widely used methods in this category of anomaly detection methods include one-class SVMs, deep auto-encoders and Gaussian models [64,71,72,73]. Deep semi-supervised anomaly detection was also introduced recently, which not only learns from labelled normal data but also labelled anomalies [74].

This also, like the supervised and un-supervised method, has to be applied under definite assumptions with a relative consideration towards the limitations. However, a combination of statistical inference along with machine learning methods is mostly used for anomaly detection in a disease framework. These methods are thus tailored to address the availability of continuous data, the type of devices and the disease conditions.

4. Applications of Anomaly Detection Methods on Wearables Associated Data

There are several wearables-associated studies with clinical implications. In this section, we elaborate in detail some of the most relevant studies. Li et al. (2017), in their study, demonstrated that wearables have the potential to monitor activity along with physiology. In the study, they combined the sensor data with frequent medical measurements to make predictions on the onset of Lyme disease and inflammation. Through the study, they also observed a distinction between the insulin-sensitive and insulin-resistant subjects.

Various devices were employed that measured the heart rate, SpO2 and temperature along with other activity-related parameters for a prolonged span of two years. A baseline was established for the data during sleep and non-sleep instances, and based on this, a Z-transformation was employed to scale and standardize the time instances. The activity specific data was then monitored and compared with the overall period to reveal the outliers [6].

In a study by Mishra et al. (2020), heart rate and steps data were used to detect COVID-19. They analyzed physiological and activity data from over 5200 participants, which included 32 individuals diagnosed with COVID-19 infections. The study showed elevated resting heart rates relative to the individual’s baseline. With the missing values imputed as zeros, two algorithms resting heart rate difference (RHR-diff) and heart rate over steps anomaly detection (HROS-AD) were developed. The first algorithm was based on standardizing the resting heart rate over a fixed time frame to observe baseline residuals.

The time interval was detected as anomalies based on scan statistic method [75]. In HROS-AD, heart rate and steps data were combined in a machine-learning-based elliptic envelope approach. The data was assumed to have a Gaussian distribution and the method identified both univariate and multivariate outliers based on the distance of each HROS point from the overall mean. These points are regarded as contaminants that are too extreme for the assumed Gaussian distribution of data [76].

In a similar study, Bogu et al. (2021) were able to detect an abnormal resting heart rate during the COVID-19 disease state. The resting heart rate was identified using corresponding step count values. A deep-learning approach based on a Long Short-Term Memory Network-based autoencoder (LAAD) was employed in this study. Data labelling (infectious, non-infectious and recovery periods) was based on a literature review. The abnormal resting heart rate (RHR) was estimated using its distance from baseline before generating the training set.

To prevent overfitting in the deep learning model, different data augmentation techniques were employed, which increased the size of training data. RHR indicative of COVID-19 infection was detected in 14 individuals [77]. The aforementioned studies emphasized the importance of characterizing an individual’s baseline. Irrespective of the evaluation models applied, setting a baseline can significantly improve the performance of anomaly detection.

Another study performed by Zhu et al., on COVID-19 prediction by utilizing heart rate and sleep data collected from wearables devices, employed a neural network prediction-based method. This study used categorical features, such as the holiday activity the person was engaged in along with information on the current season and weather patterns in addition to the wearable data. This information was combined with historically detected anomaly rates to construct the input for anomaly detection [78]. To accurately determine the anomaly points, it is important to identify the source of those points.

Various physiological ailments can point towards the anomalies; hence, a wider range of categorical features was proposed in this study. Apart from COVID-19, heart related ailments can also be monitored using wearables. Wang et al. (2021) studied a total of 16,320 atrial fibrillation (AF) patients and established that wearable use has a positive effect on health care [79]. The detection of AF was successfully performed by Lown et al. (2020).

In this case, an online library of time interval between two consecutive R-waves from an electrocardiogram was used, and a correlation of RR interval changes was studied with respect to previous changes, which represented the heart rate variability. These changes were represented using a Lorenz plot. A Gaussian support vector machine was employed, which correctly identified AF with 100% accuracy [80].

These studies exemplified wearable technology and its application in real time disease state monitoring. Hybrid algorithms profiting from the available statistical and machine learning frameworks are expected to grow in the near future. Some examples of such research works are presented below (Table 2).

5. Prospects

5.1. Handling and Transparency of Wearables Associated Data

Extracting clinically relevant outputs from wearables-associated data has several key requirements. The anomaly detection algorithms discussed above have all greatly benefited from the availability of large wearable datasets; hence, the development of data repositories should go hand in hand. A slow but steady shift from hospital-based care to patient-centric care is being observed worldwide. This, along with an increasing popularity of wearables, is slowly leading toward a data surge [87,88]. Tracking the progression of any disease state demands uninterrupted wearables-associated data points, which is beyond the scope of hospital care.

Such longitudinal data collected over large time periods on a daily basis can amount to data accumulation. The analytical usage of this data, as it has been discussed throughout this current review article, depends on approaches to access and distribute the data. The development of wearables-associated software directed at both accurate health monitoring and state of the art data collection, analysis and visualization is of the utmost importance [89]. Both standalone and hybrid methods for anomaly detection require accurate interpretations that can be correlated with the user activity and their daily experience.

Generating recommendations includes decision making by the users and goal synchronization, both of which may require platforms in addition to mobile applications. Existing recommendations regarding the correlation of various variables from the device should be put to the test. For example, exercise and sleeping are independent variables, which, when correlated can conceal the underlying physiological anomaly. Additionally, statistical significance on relationships should leave enough room for practical significance, which arises with the increase in data.

Anomaly detection methods also needs gold-standard data against which to compare the generated results. Databases, like the UK Biobank [90], and other similar libraries with cohort studies can be very useful in comparing standalone wearables-associated studies. Another important area is the transparency of algorithms designed to compute steps or sleep data, which should be encouraged. The limitations of the large-scale data generated from wearables need to be addressed more often [89].

5.2. Application of Wearables in Healthcare

The impact of COVID-19 has renewed everyone’s interest towards novel healthcare solutions. Wearables-technology and its adoption is significantly increasing in awareness among the public. Wearables are already being integrated into clinical practices. The vital signs, such as heart rate, temperature, blood pressure and blood oxygen saturation, measured by the wearable devices are carefully being used for clinical applications. The extraction of useful health-related markers is slowly making its way into mobile health-related interventions [90].

Intensive care units (ICU) have reported many benefits by employing wearables in making precise medical decisions. Heart rate and sleep measurements have helped manage post-ICU care. Patients recovering from major surgeries and associated stress can benefit from a continuous monitoring of their physiological changes. Heart rate and sleep are often associated with pain levels and underlying recovery.

Wearables-derived data employed with machine-learning algorithms can, therefore, provide accurate patient monitoring. A wide array of disorders is now being brought under the investigation paradigm of wearables. Metabolic disorders are marked by high blood pressure, blood sugar and abnormal cholesterol, among other conditions. With advancements in non-invasive monitoring of such conditions, the management and control of disease becomes easier [7,42]. Efforts are being made to monitor and diagnose hypertension by measuring blood pressure longitudinally over time. Blood pressure monitoring can be helpful for many associated primary conditions [91,92].

Additionally, sleep disorders, mental health, maternal and neo-natal care require active tracking; they can benefit from wearable technologies. Monitoring patient fitness has allowed for a better understanding of the before and after disease states. Such initiatives have afforded significant observations in various disease models. In case of cancer treatments, analyzing biometric data from wearables can be used to predict pain levels, stress etc. Similarly, activity and sleep data have been collected and employed to predict migraine attacks.

In this particular clinical trial, machine learning was employed on a dataset comprised of physiological parameters along with environment variables to predict the likelihood of a migraine attack [93]. Such studies have been noted from the ClinicalTrails.gov database. There are 82 studies that have wearable devices associated in their clinical trial process, twenty (20) of them have been completed thus far, and 36 are in the recruiting state. Other studies (14) have been initiated pending recruitment, while the remaining studies are either suspended or the status is unknown. A detailed account can be found in Table S1.

Among the current on-going clinical trials, the wearables-associated interventions have been particularly successful in managing lifestyle diseases. Metabolic disorders and their risk in worsening cardiovascular disease state is an active field of study in the application of wearables. Based on evaluating the disease conditions in the ongoing trails, there are 23 clinical trials utilizing wearables to understand and/or monitor lifestyle disease states. Hypertension requires a longitudinal measure of heart-rate variability, and wearable monitoring devices are an affordable and non-invasive means of monitoring the heart rate [94].

Monitoring energy expenditure in cardiac patients can be crucial in their rehabilitation. A Fitbit wearable-based clinical trial was able to assess the improvement in the physical activity of patients with chronic heart failure and coronary artery disease. Similarly, in demonstrating the effectiveness of a fitness tracker for children with deep vein thrombosis, a research team employed Fitbit devices to evaluate the physical activity in patients [95]. Several such studies are proving that wearables are robust healthcare monitoring devices.

5.3. Impact of Wearables on Managing Healthcare

Steady improvements in algorithms for anomaly detection are being made. Simultaneously, large datasets have been made available to improve the predictability. Wearables and the data analysis are thus connected to certain key areas, such as the cloud and data security. The internet connection, in particular Wi-Fi, helps in communication between wearable devices with hand-held devices, such as smart phones and the cloud.

The development of a dedicated cloud infrastructure for storage and analyzing data can enhance wearables usage in healthcare. The security of these devices is another aspect to be considered. A systematically regulated online cloud infrastructure along with device-to-device connectivity is necessary in digital healthcare framework [96]. These factors can eventually help in the application of wearables-associated big-data and accurate anomaly detection for studying disease pathogenesis.

Smartphones and other hand-held devices are typically used for obtaining and transmitting information collected from the wearables. An integrated system would be useful in managing disease states. Carefully detected anomalies with medical relevance can be translated into knowledge which can then be instrumental in handling a global health crisis. The COVID-19 pandemic has certainly shifted the thrust towards the need for a more digital or telemedicine-based healthcare system. The pandemic saw a rise in tele-consultations.

Under such situations, wearable devices can provide health parameters, which are otherwise collected during patient’s clinical visits. The primary health data collected in this way is already being used to assess chronic diseases. Tracking heart rate and other primary parameters has proved helpful in the past to develop models for the spread of influenza [97]. Such physiological tracking studies have been initiated by The Scripps Research Translational Institute along with Fitbit^®, Apple^® Watch and Garmin^® devices [98].

Another study conducted by TemPredict at University of California, San Francisco (UCSF) utilizing the Oura ring also made efforts in tracking physiological data from users [99]. The Internet of Things (IoT) can integrate and store data and carry out smart identifications along with the tracing of patient data and exchange of knowledge. The entire infrastructure is a potential tool for doctors to handle crisis situations.

6. Conclusions

Wearables-associated data are increasingly becoming popular in clinical setups. Such data are extremely valuable to predict disease biomarkers. The application of anomaly detection methods in the rapidly growing healthcare field can aid clinical practices by detecting inconsistencies and undetected physiological parameters. The machine-learning approaches discussed in the review are being employed to not only detect outliers but also identify novel data points, which are far from otherwise seemingly normal physiological values.

These methods also address the drawbacks of un-annotated wearables data by employing semi-supervised methods, such as one-class support vector machines and deep auto-encoders, which employ training based on both labelled and unlabeled data resulting in accurate user data analysis. Overall, applying anomaly detection on the real-time data produced by wearable devices can reveal valuable clinical information in terms of disease diagnosis, prediction, treatment and rehabilitation.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/s22030756/s1, Figure S1: Comparison of number of research articles on wearables sensors and healthcare monitoring; Table S1: List of wearables-associated clinical trials.

Author Contributions

Conceptualization, C.P.K.P.; writing—original draft preparation, J.S.S. and C.P.K.P.; writing—review and editing, J.S.S., C.P.K.P., K.K., S.C.P., F.L., M.A., L.D.J., S.K. and S.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors would like to thank Manasa Tata and Omnia Heikal from Rhenix Lifesciences for their contributions in reviewing the manuscript and preparing the figures, respectively.

Conflicts of Interest

The authors declare no conflict of interest.

References

Lee, Y.-D.; Chung, W.-Y. Wireless sensor network based wearable smart shirt for ubiquitous health and activity monitoring. Sens. Actuators B Chem. 2009, 140, 390–395. [Google Scholar] [CrossRef]
Smuck, M.; Odonkor, C.A.; Wilt, J.K.; Schmidt, N.; Swiernik, M.A. The emerging clinical role of wearables: Factors for successful implementation in healthcare. NPJ Digit. Med. 2021, 4, 45. [Google Scholar] [CrossRef]
Li, J.; Ma, Q.; Chan, A.H.; Man, S. Health monitoring through wearable technologies for older adults: Smart wearables acceptance model. Appl. Ergon. 2018, 75, 162–169. [Google Scholar] [CrossRef] [PubMed]
Xie, J.; Wen, D.; Liang, L.; Jia, Y.; Gao, L.; Lei, J. Evaluating the Validity of Current Mainstream Wearable Devices in Fitness Tracking Under Various Physical Activities: Comparative Study. JMIR mHealth uHealth 2018, 6, e94. [Google Scholar] [CrossRef] [Green Version]
Erdmier, C.; Hatcher, J.; Lee, M. Wearable device implications in the healthcare industry. J. Med. Eng. Technol. 2016, 40, 141–148. [Google Scholar] [CrossRef] [PubMed]
Li, X.; Dunn, J.; Salins, D.; Zhou, G.; Zhou, W.; Rose, S.M.S.-F.; Perelman, D.; Colbert, E.; Runge, R.; Rego, S.; et al. Digital Health: Tracking Physiomes and Activity Using Wearable Biosensors Reveals Useful Health-Related Information. PLoS Biol. 2017, 15, e2001402. [Google Scholar] [CrossRef]
Dunn, J.; Runge, R.; Snyder, M. Wearables and the medical revolution. Pers. Med. 2018, 15, 429–448. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Suh, M.-K.; Chen, C.-A.; Woodbridge, J.; Tu, M.K.; Kim, J.I.; Nahapetian, A.; Evangelista, L.S.; Sarrafzadeh, M. A Remote Patient Monitoring System for Congestive Heart Failure. J. Med. Syst. 2011, 35, 1165–1179. [Google Scholar] [CrossRef] [Green Version]
Youm, S.; Lee, G.; Park, S.; Zhu, W. Development of remote healthcare system for measuring and promoting healthy lifestyle. Expert Syst. Appl. 2011, 38, 2828–2834. [Google Scholar] [CrossRef]
Patel, S.; Park, H.; Bonato, P.; Chan, L.; Rodgers, M. A review of wearable sensors and systems with application in rehabilitation. J. Neuroeng. Rehabil. 2012, 9, 21. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Tricoli, A.; Nasiri, N.; De, S. Wearable and Miniaturized Sensor Technologies for Personalized and Preventive Medicine. Adv. Funct. Mater. 2017, 27, 1605271. [Google Scholar] [CrossRef]
Dargazany, A.R.; Stegagno, P.; Mankodiya, K. WearableDL: Wearable internet-of-things and deep learning for big data analytics—Concept, literature, and future. Mob. Inf. Syst. 2018, 2018, 8125126. [Google Scholar] [CrossRef]
Ringeval, M.; Wagner, G.; Denford, J.; Paré, G.; Kitsiou, S. Fitbit-Based Interventions for Healthy Lifestyle Outcomes: Systematic Review and Meta-Analysis. J. Med. Internet Res. 2020, 22, e23954. [Google Scholar] [CrossRef] [PubMed]
Patcha, A.; Park, J.-M. An overview of anomaly detection techniques: Existing solutions and latest technological trends. Comput. Netw. 2007, 51, 3448–3470. [Google Scholar] [CrossRef]
Zamini, M.; Hasheminejad, S.M.H. A comprehensive survey of anomaly detection in banking, wireless sensor networks, social networks, and healthcare. Intell. Decis. Technol. 2019, 13, 229–270. [Google Scholar] [CrossRef]
Salamon, J.; Mouček, R. Heart rate and sentiment experimental data with common timeline. Data Brief 2017, 15, 851–861. [Google Scholar] [CrossRef]
Dias, D.; Paulo Silva Cunha, J. Wearable health devices—Vital sign monitoring, systems and technologies. Sensors 2018, 18, 2414. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Agliari, E.; Barra, A.; Barra, O.A.; Fachechi, A.; Vento, L.F.; Moretti, L. Detecting cardiac pathologies via machine learning on heart-rate variability time series and related markers. Sci. Rep. 2020, 10, 8845. [Google Scholar] [CrossRef] [PubMed]
Melillo, P.; Izzo, R.; Orrico, A.; Scala, P.; Attanasio, M.; Mirra, M.; DE Luca, N.; Pecchia, L. Automatic Prediction of Cardiovascular and Cerebrovascular Events Using Heart Rate Variability Analysis. PLoS ONE 2015, 10, e0118504. [Google Scholar] [CrossRef]
Fox, K.; Borer, J.S.; Camm, A.J.; Danchin, N.; Ferrari, R.; Sendon, J.L.L.; Steg, P.G.; Tardif, J.-C.; Tavazzi, L.; Tendera, M. Resting Heart Rate in Cardiovascular Disease. J. Am. Coll. Cardiol. 2007, 50, 823–830. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Bayoumy, K.; Gaber, M.; Elshafeey, A.; Mhaimeed, O.; Dineen, E.H.; Marvel, F.A.; Martin, S.S.; Muse, E.D.; Turakhia, M.P.; Tarakji, K.G.; et al. Smart wearable devices in cardiovascular care: Where we are and how to move forward. Nat. Rev. Cardiol. 2021, 18, 581–599. [Google Scholar] [CrossRef] [PubMed]
Zhang, D.; Wang, W.; Li, F. Association between resting heart rate and coronary artery disease, stroke, sudden death and noncardiovascular diseases: A meta-analysis. Can. Med. Assoc. J. 2016, 188, E384–E392. [Google Scholar] [CrossRef] [Green Version]
Aggarwal, C.C. An Introduction to Outlier Analysis; Springer: Berlin/Heidelberg, Germany, 2016; pp. 1–34. [Google Scholar]
Salgado, C.M.; Azevedo, C.; Proença, H.; Vieira, S.M. Noise versus outliers. Second Anal. Electron. Health Rec. 2016, 163–183. [Google Scholar] [CrossRef] [Green Version]
Torr, P.H.; Murray, D.W. Outlier detection and motion segmentation. In Sensor Fusion VI; International Society for Optics and Photonics: Boston, MA, USA, 1993; pp. 432–443. [Google Scholar]
Marsland, S. On-Line Novelty Detection Through Self-Organisation, with Application to Inspection Robotics; The University of Manchester: Manchester, UK, 2001. [Google Scholar]
Penny, K.I. Appropriate Critical Values When Testing for a Single Multivariate Outlier by Using the Mahalanobis Distance. J. R. Stat. Soc. Ser. C (Appl. Stat.) 1996, 45, 73. [Google Scholar] [CrossRef]
Cook, R.D.; Weisberg, S. Residuals and Influence in Regression; Chapman and Hall: New York, NY, USA, 1982. [Google Scholar]
Tukey, J.W. Exploratory Data Analysis; Addison Wesley: Reading, MA, USA, 1977. [Google Scholar]
Shiffler, R.E. Maximum Z scores and outliers. Am. Stat. 1988, 42, 79–80. [Google Scholar]
Ramaswamy, S.; Rastogi, R.; Shim, K. Efficient algorithms for mining outliers from large data sets. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, TX, USA, 16–18 May 2000; pp. 427–438. [Google Scholar]
Song, X.; Wu, M.; Jermaine, C.; Ranka, S. Conditional Anomaly Detection. IEEE Trans. Knowl. Data Eng. 2007, 19, 631–645. [Google Scholar] [CrossRef]
Chandola, V.; Banerjee, A.; Kumar, V. Anomaly detection: A survey. ACM Comput. Surv. 2009, 41, 1–58. [Google Scholar] [CrossRef]
Banaee, H.; Ahmed, M.U.; Loutfi, A. Data Mining for Wearable Sensors in Health Monitoring Systems: A Review of Recent Trends and Challenges. Sensors 2013, 13, 17472–17500. [Google Scholar] [CrossRef] [Green Version]
Melstrom, L.G.; Rodin, A.S.; Rossi, L.A.; Fu, P., Jr.; Fong, Y.; Sun, V. Patient generated health data and electronic health record integration in oncologic surgery: A call for artificial intelligence and machine learning. J. Surg. Oncol. 2021, 123, 52–60. [Google Scholar] [CrossRef] [PubMed]
Szydło, T.; Konieczny, M. Mobile and wearable devices in an open and universal system for remote patient monitoring. Microprocess. Microsyst. 2016, 46, 44–54. [Google Scholar] [CrossRef]
Mitlohner, J.; Neumaier, S.; Umbrich, J.; Polleres, A. Characteristics of Open Data CSV Files. In Proceedings of the 2016 2nd International Conference on Open and Big Data (OBD), Vienna, Austria, 22–24 August 2016; pp. 72–79. [Google Scholar]
Lin, S.; Wu, X.; Martinez, G.; Chawla, N.V. Filling Missing Values on Wearable-Sensory Time Series Data. In Proceedings of the 2020 SIAM International Conference on Data Mining; SIAM: Philadelphia, PA, USA, 2020; pp. 46–54. [Google Scholar]
Mack, C.; Su, Z.; Westreich, D. Managing Missing Data in Patient Registries: Addendum to Registries for Evaluating Patient Outcomes: A User’s Guide; Agency for Healthcare Research and Quality (US): Rockville, MD, USA, 2018.
Li, P.; Stuart, E.A.; Allison, D.B. Multiple imputation: A flexible tool for handling missing data. JAMA 2015, 314, 1966–1967. [Google Scholar] [CrossRef] [Green Version]
Hicks, J.L.; Althoff, T.; Sosic, R.; Kuhar, P.; Bostjancic, B.; King, A.C.; Leskovec, J.; Delp, S.L. Best practices for analyzing large-scale health data from wearables and smartphone apps. NPJ Digit. Med. 2019, 2, 45. [Google Scholar] [CrossRef] [PubMed]
Newgard, C.D.; Lewis, R.J. Missing data: How to best account for what is not known. JAMA 2015, 314, 940–941. [Google Scholar] [CrossRef] [PubMed]
Allison, P. Handling Missing Data by Maximum Likelihood; Keynote presentation at the SAS Global Forum: Orlando, FL, USA, 2012. [Google Scholar]
Joenssen, D.W.; Bankhofer, U. Hot Deck Methods for Imputing Missing Data. In International Workshop on Machine Learning and Data Mining in Pattern Recognition; Springer: Berlin/Heidelberg, Germany, 2012; pp. 63–75. [Google Scholar]
Manly, C.A.; Wells, R. Reporting the Use of Multiple Imputation for Missing Data in Higher Education Research. Res. High. Educ. 2014, 56, 397–409. [Google Scholar] [CrossRef]
Hegde, H.; Shimpi, N.; Panny, A.; Glurich, I.; Christie, P.; Acharya, A. MICE vs PPCA: Missing data imputation in healthcare. Inf. Med. Unlocked 2019, 17, 100275. [Google Scholar] [CrossRef]
Honaker, J.; King, G. What to Do about Missing Values in Time-Series Cross-Section Data. Am. J. Polit. Sci. 2010, 54, 561–581. [Google Scholar] [CrossRef] [Green Version]
Gupta, M.; Gao, J.; Aggarwal, C.; Han, J. Outlier Detection for Temporal Data. Synth. Lect. Data Min. Knowl. Discov. 2014, 5, 129. [Google Scholar] [CrossRef]
Feng, T.; Narayanan, S. Imputing Missing Data in Large-Scale Multivariate Biomedical Wearable Recordings Using Bidirectional Recurrent Neural Networks with Temporal Activation Regularization. Annu. Int. Conf. IEEE Eng. Med. Biol. Soc. 2019, 2019, 2529–2534. [Google Scholar]
Blázquez-García, A.; Conde, A.; Mori, U.; Lozano, J.A. A Review on Outlier/Anomaly Detection in Time Series Data. ACM Comput. Surv. 2021, 54, 1–33. [Google Scholar] [CrossRef]
Sagheer, A.; Kotb, M. Unsupervised Pre-training of a Deep LSTM-based Stacked Autoencoder for Multivariate Time Series Forecasting Problems. Sci. Rep. 2019, 9, 19038. [Google Scholar] [CrossRef] [Green Version]
Goernitz, N.; Kloft, M.; Rieck, K.; Brefeld, U. Toward Supervised Anomaly Detection. J. Artif. Intell. Res. 2013, 46, 235–262. [Google Scholar] [CrossRef]
Cho, S.; Ensari, I.; Weng, C.; Kahn, M.G.; Natarajan, K. Factors Affecting the Quality of Person-Generated Wearable Device Data and Associated Challenges: Rapid Systematic Review. JMIR mHealth uHealth 2021, 9, e20738. [Google Scholar] [CrossRef]
Paulheim, H.; Meusel, R. A decomposition of the outlier detection problem into a set of supervised learning problems. Mach. Learn. 2015, 100, 509–531. [Google Scholar] [CrossRef]
Zhang, T.; Wang, J.; Xu, L.; Liu, P. Fall detection by wearable sensor and one-class SVM algorithm. In Intelligent Computing in Signal Processing and Pattern Recognition; Springer: Berlin/Heidelberg, Germany, 2006; pp. 858–863. [Google Scholar]
Indikawati, F.I.; Winiarti, S. Stress Detection from Multimodal Wearable Sensor Data; IOP Publishing: Bristol, UK, 2020; p. 012028. [Google Scholar]
Zhu, C.; Sheng, W.; Liu, M. Wearable Sensor-Based Behavioral Anomaly Detection in Smart Assisted Living Systems. IEEE Trans. Autom. Sci. Eng. 2015, 12, 1225–1234. [Google Scholar] [CrossRef]
Eskin, E.; Arnold, A.; Prerau, M.; Portnoy, L.; Stolfo, S. A Geometric Framework for Unsupervised Anomaly Detection. In Applications of Data Mining in Computer Security; Springer: Berlin/Heidelberg, Germany, 2002; pp. 77–101. [Google Scholar]
Gosavi, J.S. Wadne vs. Unsupervised distance-based outlier detection using nearest neighbours algorithm on distributed approach: Survey. Int. J. Innov. Res. Comput. Commun. Eng. 2014, 2, 7510–7514. [Google Scholar] [CrossRef]
Amer, M.; Goldstein, M. Nearest-neighbor and clustering based anomaly detection algorithms for rapidminer. In Proceedings of the 3rd RapidMiner Community Meeting and Conference (RCOMM 2012), Budapest, Hungary, 28–31 August 2012; pp. 1–12. [Google Scholar]
Syarif, I.; Prugel-Bennett, A.; Wills, G. Unsupervised Clustering Approach for Network Anomaly Detection. In Unsurprised Clustering Approach for Network Anomaly Detection; Springer: Berlin/Heidelberg, Germany, 2012; pp. 135–145. [Google Scholar]
Zhang, C.; Song, D.; Chen, Y.; Feng, X.; Lumezanu, C.; Cheng, W.; Ni, J.; Zong, B.; Chen, H.; Chawla, N.V. A Deep Neural Network for Unsupervised Anomaly Detection and Diagnosis in Multivariate Time Series Data. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19), Honolulu, HI, USA, 27 January–1 February 2019; pp. 1409–1416. [Google Scholar]
Veeravalli, B.; Deepu, C.J.; Ngo, D. Real-time, personalized anomaly detection in streaming data for wearable healthcare devices. In Handbook of Large-Scale Distributed Computing in Smart Healthcare; Springer: Berlin/Heidelberg, Germany, 2017; pp. 403–426. [Google Scholar]
Li, L.; Hansman, R.J.; Palacios, R.; Welsch, R. Anomaly detection via a Gaussian Mixture Model for flight operation and safety monitoring. Transp. Res. Part C Emerg. Technol. 2016, 64, 45–57. [Google Scholar] [CrossRef]
Sudhakaran, S.; Lanz, O. Convolutional long short-term memory networks for recognizing first person interactions. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2339–2346. [Google Scholar]
Choo, S.; Seo, W.; Jeong, D.-J.; Cho, N.I. Multi-scale recurrent encoder-decoder network for dense temporal classification. In Proceedings of the 2018 24th International Conference on Pattern Recognition (ICPR), Beijing, China, 20–24 August 2018; IEEE: Beijing, China, 2018; pp. 103–108. [Google Scholar]
Tayefi, M.; Ngo, P.; Chomutare, T.; Dalianis, H.; Salvi, E.; Budrionis, A.; Godtliebsen, F. Challenges and opportunities beyond structured data in analysis of electronic health records. Wiley Interdiscip. Rev. Comput. Stat. 2021, 13, e1549. [Google Scholar] [CrossRef]
Foorthuis, R. On the nature and types of anomalies: A review of deviations in data. Int. J. Data Sci. Anal. 2021, 12, 297–331. [Google Scholar] [CrossRef] [PubMed]
Filzmoser, P.; Maronna, R.; Werner, M. Outlier identification in high dimensions. Comput. Stat. Data Anal. 2008, 52, 1694–1711. [Google Scholar] [CrossRef]
Song, H.; Jiang, Z.; Men, A.; Yang, B. A Hybrid Semi-Supervised Anomaly Detection Model for High-Dimensional Data. Comput. Intell. Neurosci. 2017, 2017, 8501683. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Schölkopf, B.; Platt, J.; Shawe-Taylor, J.; Smola, A.J.; Williamson, R.C. Estimating the Support of a High-Dimensional Distribution. Neural Comput. 2001, 13, 1443–1471. [Google Scholar] [CrossRef]
Ruff, L.; Vandermeulen, R.A.; Görnitz, N.; Binder, A.; Müller, E.; Müller, K.R.; Kloft, M. Deep semi-supervised anomaly detection. arXiv 2019, arXiv:190602694. [Google Scholar]
Goldstein, M.; Uchida, S. A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data. PLoS ONE 2016, 11, e0152173. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Pang, G.; Shen, C.; Cao, L.; Hengel, A.V.D. Deep learning for anomaly detection: A review. ACM Comput. Surv. CSUR 2021, 54, 1–38. [Google Scholar] [CrossRef]
Mishra, T.; Wang, M.; Metwally, A.A.; Bogu, G.K.; Brooks, A.W.; Bahmani, A.; Alavi, A.; Celli, A.; Higgs, E.; Dagan-Rosenfeld, O.; et al. Pre-symptomatic detection of COVID-19 from smartwatch data. Nat. Biomed. Eng. 2020, 4, 1208–1220. [Google Scholar] [CrossRef] [PubMed]
Dwivedi, R.K.; Kumar, R.; Buyya, R. Gaussian Distribution-Based Machine Learning Scheme for Anomaly Detection in Healthcare Sensor Cloud. Int. J. Cloud Appl. Comput. 2021, 11, 52–72. [Google Scholar] [CrossRef]
Bogu, G.K.; Snyder, M.P. Deep learning-based detection of COVID-19 using wearables data. medRxiv 2021. [Google Scholar] [CrossRef]
Zhu, G.; Li, J.; Meng, Z.; Yu, Y.; Li, Y.; Tang, X.; Dong, Y.; Sun, G.; Zhou, R.; Wang, H.; et al. Learning from Large-Scale Wearable Device Data for Predicting the Epidemic Trend of COVID-19. Discret. Dyn. Nat. Soc. 2020, 2020, 6152041. [Google Scholar] [CrossRef]
Wang, L.; Nielsen, K.; Goldberg, J.; Brown, J.R.; Rumsfeld, J.S.; Steinberg, B.A.; Zhang, Y.; Matheny, M.E.; Shah, R.U. Association of Wearable Device Use with Pulse Rate and Health Care Use in Adults with Atrial Fibrillation. JAMA Netw. Open 2021, 4, e215821. [Google Scholar] [CrossRef]
Lown, M.; Brown, M.; Brown, C.; Yue, A.M.; Shah, B.N.; Corbett, S.J.; Lewith, G.; Stuart, B.; Moore, M.; Little, P. Machine learning detection of Atrial Fibrillation using wearable technology. PLoS ONE 2020, 15, e0227401. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Nemati, S.; Ghassemi, M.M.; Ambai, V.; Isakadze, N.; Levantsevych, O.; Shah, A.; Clifford, G.D. Monitoring and detecting atrial fibrillation using wearable technology. In Proceedings of the 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Orlando, FL, USA, 17–20 October 2016; pp. 3394–3397. [Google Scholar]
Liu, J.; Zhao, Y.; Lai, B.; Wang, H.; Tsui, K.L. Wearable Device Heart Rate and Activity Data in an Unsupervised Approach to Personalized Sleep Monitoring: Algorithm Validation. JMIR mHealth uHealth 2020, 8, e18370. [Google Scholar] [CrossRef] [PubMed]
Chow, H.-W.; Yang, C.-C. Accuracy of Optical Heart Rate Sensing Technology in Wearable Fitness Trackers for Young and Older Adults: Validation and Comparison Study. JMIR mHealth uHealth 2020, 8, e14707. [Google Scholar] [CrossRef] [PubMed]
Stehlik, J.; Schmalfuss, C.; Bozkurt, B.; Nativi-Nicolau, J.; Wohlfahrt, P.; Wegerich, S.; Rose, K.; Ray, R.; Schofield, R.; Deswal, A.; et al. Continuous wearable monitoring analytics predict heart failure hospitalization: The LINK-HF multicenter study. Circ. Heart Fail. 2020, 13, e006513. [Google Scholar] [CrossRef] [PubMed]
Chen, E.; Jiang, J.; Su, R.; Gao, M.; Zhu, S.; Zhou, J.; Huo, Y. A new smart wristband equipped with an artificial intelligence algorithm to detect atrial fibrillation. Hear. Rhythm. 2020, 17, 847–853. [Google Scholar] [CrossRef] [PubMed]
Benedetto, S.; Caldato, C.; Bazzan, E.; Greenwood, D.C.; Pensabene, V.; Actis, P. Assessment of the Fitbit Charge 2 for monitoring heart rate. PLoS ONE 2018, 13, e0192691. [Google Scholar] [CrossRef] [Green Version]
Perez-Pozuelo, I.; Spathis, D.; Clifton, E.A.; Mascolo, C. Wearables, smartphones, and artificial intelligence for digital phenotyping and health. In Digital Health; Elsevier: Amsterdam, The Netherlands, 2021; pp. 33–54. [Google Scholar]
Al-Turjman, F.; Baali, I. Machine learning for wearable IoT-based applications: A survey. Trans. Emerg. Telecommun. Technol. 2019, e3635. [Google Scholar] [CrossRef]
Angelides, M.C.; Wilson, L.A.C.; Echeverría, P.L.B. Wearable data analysis, visualisation and recommendations on the go using android middleware. Multimed. Tools Appl. 2018, 77, 26397–26448. [Google Scholar] [CrossRef] [Green Version]
Beach, C. A Flexible Temperature Sensing Insole for Diabetic Foot Ulcer Monitoring with an Investigation into the Self Powering of Wearables via Energy Harvesting; The University of Manchester: Manchester, UK, 2020. [Google Scholar]
Mahabala, C.; Kamath, P.; Bhaskaran, U.; Pai, N.D.; Pai, A.U. Antihypertensive therapy: Nocturnal dippers and nondippers. Do we treat them differently? Vasc. Health Risk Manag. 2013, 9, 125–133. [Google Scholar] [CrossRef] [Green Version]
Semaan, S.; Dewland, T.A.; Tison, G.; Nah, G.; Vittinghoff, E.; Pletcher, M.J.; Olgin, J.E.; Marcus, G.M. Physical activity and atrial fibrillation: Data from wearable fitness trackers. Hear. Rhythm. 2020, 17, 842–846. [Google Scholar] [CrossRef] [PubMed]
Siirtola, P.; Koskimäki, H.; Mönttinen, H.; Röning, J. Using Sleep Time Data from Wearable Sensors for Early Detection of Migraine Attacks. Sensors 2018, 18, 1374. [Google Scholar] [CrossRef] [Green Version]
Sannino, G.; De Falco, I.; De Pietro, G. Non-Invasive Risk Stratification of Hypertension: A Systematic Comparison of Machine Learning Algorithms. J. Sens. Actuator Netw. 2020, 9, 34. [Google Scholar] [CrossRef]
Hasan, R.; Hanna, M.; Zhang, S.; Malone, K.; Tong, E.; Salas, N.; Sarode, R.; Journeycake, J.; Zia, A. Physical activity in children at risk of postthrombotic sequelae: A pilot randomized controlled trial. Blood Adv. 2020, 4, 3767–3775. [Google Scholar] [CrossRef]
Wei, J. How Wearables Intersect with the Cloud and the Internet of Things: Considerations for the developers of wearables. IEEE Consum. Electron. Mag. 2014, 3, 53–56. [Google Scholar] [CrossRef]
Hill, E.M.; Petrou, S.; de Lusignan, S.; Yonova, I.; Keeling, M.J. Seasonal influenza: Modelling approaches to capture immunity propagation. PLoS Comput. Biol. 2019, 15, e1007096. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Quer, G.; Gouda, P.; Galarnyk, M.; Topol, E.J.; Steinhubl, S.R. Inter-and intraindividual variability in daily resting heart rate and its associations with age, sex, sleep, BMI, and time of year: Retrospective, longitudinal cohort study of 92,457 adults. PLoS ONE 2020, 15, e0227709. [Google Scholar] [CrossRef] [PubMed]
Jeong, H.; Rogers, J.A.; Xu, S. Continuous on-body sensing for the COVID-19 pandemic: Gaps and opportunities. Sci. Adv. 2020, 6, eabd4794. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Flow chart describing a general anomaly detection workflow.

Figure 2. A schematic of normal, anomaly and noise data mixed in a wearable dataset.

Table 1. Data imputation methods.

Methods	Definition	Accuracy	References
Mean value imputation (MVI)	The values are filled using calculating the mean for a missing value	Biased	[42]
Maximum Likelihood (ML)	A likelihood function is evaluated and then sum or integrate over the missing data	Unbiased parameter estimation	[43]
Hot Deck Imputation	A data matrix for all instances created is chosen as a source for missing values	Replication of values may cause bias	[44]
Multiple Imputation (MI)	Starts by introducing random variation and generates several datasets with slightly different imputed values. Statistical analysis on each to find the optimal one	Comparable to ML	[45]
Multivariate Imputation by Chained Equations (MICE)	The method first identifies an imputation model for each column followed by random draws from the observable data	Comparable to ML	[46]
Expectation–Maximization with Bootstrapping (EMB)	Initially the likelihood function is evaluated using model parameters. Next, with the updated parameters, the likelihood function is maximized, and the parameters are updated to return a new distribution	Comparable to ML	[47]

Table 2. Wearables-associated studies with clinical implications.

Disease under Study	Wearables Used	Method Applied	Major Finding	References
COVID-19	Huami wearable devices	Anomaly detection algorithm, neural network prediction modelling methodology	Prediction model with potential to alert COVID-19 outbreak in advance as a part of health surveillance system	[78]
Atrial Fibrillation (AFib)	Not mentioned	Not mentioned	Follow-up health care amongst those using wearables was higher indicating better disease management	[79]
Atrial Fibrillation (AFib)	Samsung Simband	Noise-resistant machine learning approach	The screening algorithm can enable large scale detection of undiagnosed AFib from noisy Photoplethysmogram (PPG) wearable sensor	[81]
Sleep/wake identification	Fitbit Alta; Fitbit Inc	Hidden Markov models	Accurate measurement of sleep/wake cycle and an effective personalized model	[82]
Monitor heart rate in real time during moderate exercise	Xiaomi Mi Band 2 and Garmin Vivosmart HR+	Not mentioned	Estimating accurate heart rate signals under physically strenuous activity	[83]
Prediction of Heart Failure Exacerbation	wearable sensor (Vital Connect, San Jose CA)	Machine learning analytics algorithm	Multivariate data from wearables accurately predicts the need for rehospitalization of patients with a heart failure risk	[84]
Atrial fibrillation (AF)	Amazfit Health Band 1S	Artificial intelligence (AI) algorithm	PPG sensor derived data along with AI can be an efficient way to detect AF	[85]
Distance walked or run, calorie consumption, quality of sleep and heart rate	Fitbit Charge 2 (Thought Technology LTD, Toronto, CANADA)	HR-derived algorithms	Accurate heart rate monitoring for fitness tracking using wearables compared to electrocardiograph has several significant differences, which needs to be studied	[86]

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sunny, J.S.; Patro, C.P.K.; Karnani, K.; Pingle, S.C.; Lin, F.; Anekoji, M.; Jones, L.D.; Kesari, S.; Ashili, S. Anomaly Detection Framework for Wearables Data: A Perspective Review on Data Concepts, Data Analysis Algorithms and Prospects. Sensors 2022, 22, 756. https://doi.org/10.3390/s22030756

AMA Style

Sunny JS, Patro CPK, Karnani K, Pingle SC, Lin F, Anekoji M, Jones LD, Kesari S, Ashili S. Anomaly Detection Framework for Wearables Data: A Perspective Review on Data Concepts, Data Analysis Algorithms and Prospects. Sensors. 2022; 22(3):756. https://doi.org/10.3390/s22030756

Chicago/Turabian Style

Sunny, Jithin S., C. Pawan K. Patro, Khushi Karnani, Sandeep C. Pingle, Feng Lin, Misa Anekoji, Lawrence D. Jones, Santosh Kesari, and Shashaanka Ashili. 2022. "Anomaly Detection Framework for Wearables Data: A Perspective Review on Data Concepts, Data Analysis Algorithms and Prospects" Sensors 22, no. 3: 756. https://doi.org/10.3390/s22030756

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Anomaly Detection Framework for Wearables Data: A Perspective Review on Data Concepts, Data Analysis Algorithms and Prospects

Abstract

1. Introduction

2. Overview of Anomaly Detection

2.1. Noise and Outliers

2.2. Data Types

2.3. Data Pre-Processing

2.4. Missing Data and Data Imputation

3. Basic Categorization of Anomaly Detection

3.1. Supervised Anomaly Detection

3.2. Unsupervised Anomaly Detection

3.3. Semi-Supervised Anomaly Detection

4. Applications of Anomaly Detection Methods on Wearables Associated Data

5. Prospects

5.1. Handling and Transparency of Wearables Associated Data

5.2. Application of Wearables in Healthcare

5.3. Impact of Wearables on Managing Healthcare

6. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI