Skip to Content
BiosensorsBiosensors
  • Article
  • Open Access

19 August 2021

A Study of One-Class Classification Algorithms for Wearable Fall Sensors

,
and
1
Departamento de Tecnología Electrónica, Universidad de Málaga, 29071 Málaga, Spain
2
Departamento de Tecnología Electrónica, Universidad de Málaga, Instituto TELMA, 29071 Málaga, Spain
*
Author to whom correspondence should be addressed.

Abstract

In recent years, the popularity of wearable devices has fostered the investigation of automatic fall detection systems based on the analysis of the signals captured by transportable inertial sensors. Due to the complexity and variety of human movements, the detection algorithms that offer the best performance when discriminating falls from conventional Activities of Daily Living (ADLs) are those built on machine learning and deep learning mechanisms. In this regard, supervised machine learning binary classification methods have been massively employed by the related literature. However, the learning phase of these algorithms requires mobility patterns caused by falls, which are very difficult to obtain in realistic application scenarios. An interesting alternative is offered by One-Class Classifiers (OCCs), which can be exclusively trained and configured with movement traces of a single type (ADLs). In this paper, a systematic study of the performance of various typical OCCs (for diverse sets of input features and hyperparameters) is performed when applied to nine public repositories of falls and ADLs. The results show the potentials of these classifiers, which are capable of achieving performance metrics very similar to those of supervised algorithms (with values for the specificity and the sensitivity higher than 95%). However, the study warns of the need to have a wide variety of types of ADLs when training OCCs, since activities with a high degree of mobility can significantly increase the frequency of false alarms (ADLs identified as falls) if not considered in the data subsets used for training.

1. Introduction

According to the World Health Organization (WHO), a fall is defined as an involuntary event that results in a person losing their balance and coming to lie unintentionally on the ground or other lower level [1]. Despite the fact that the majority of falls are not fatal, it is estimated that 646,000 fatal falls occur annually, which makes them the second worldwide cause of death due to accidental injuries [1].
Fall-related health problems are particularly serious among older people as they are strongly associated to loss of autonomy, impairment, and early death. In the world, about 28–35% of adults over 65 suffer one or more falls per year, while this percentage rises to 32–42% among those over 70 [2]. This situation poses a logistical and economic challenge for national health systems, especially if we think that the share of population aged over 60 will double in 2050, reaching a figure of 2 billion people, compared to 900 million in 2015 [3]. This problem is aggravated as a significant proportion of older adults live alone, so that if an accident occurs, a caregiver (a family member, medical or nursing staff, etc.) must be alerted to provide help. In this context, the time that elapses between a fall and the moment in which the person is assisted has been shown to determine the physical aftermaths of the accident and even the probability of survival [4]. Consequently, the last decade has witnessed an increasing interest in the development of affordable Fall Detection Systems (FDSs), which are able to permanently monitor patients and to trigger an automatic alarm message to a remote agent as soon as the occurrence of a fall is presumed.
Existing FDSs can be categorized into two generic groups. Firstly, context-aware systems are grounded on the deployment of cameras, microphones, and/or other environmental sensors in the specific locations where the user must be monitored. On the other hand, wearable-based systems utilize small transportable sensors that can be easily integrated or attached to the users’ clothing or garments to measure different parameters that describe their mobility.
When compared to context-aware solutions, the monitoring provided by wearable architectures offers a more ubiquitous service as they are not restricted to the particular area where the contextual sensors are installed. In addition, they are less privacy intrusive than camera-based methods and more robust to the presence of external artifacts or the alteration of the user’s setting. In addition, this type of FDS can benefit from the widespread acceptability and decreasing costs of wearable devices (smartwatches, sport bands, etc.).
The fundamental purpose of automatic fall detectors is to achieve the most accurate discernment between falls and other movements or Activities of Daily Living (ADLs), by simultaneously minimizing the number of undetected falls and false alarms (ADLs misjudged as falls). The efficiency of an FDS relies on the algorithm that makes the detection decision after processing and analyzing the measurements that are constantly captured by the wearable sensors (mainly, accelerometers, less frequently, gyroscopes, and in some prototypes, magnetometers, barometers, or heart rate sensors).
Detection strategies can be roughly classified into two groups [5]: threshold-based and machine learning methods. Threshold-based algorithms assume that a fall has occurred when one or several parameters (derived from the sensor measurements) exceeds or drops below a certain threshold limit. These algorithms are easy to implement and have a low computational load, although they are too simplistic and rigid to correctly classify many complex movements (especially those ADLs that involve an intense physical activity). Contrariwise, algorithms based on machine learning models usually overperform the thresholding schemes [6], as they have a greater potential to self-adapt to a wider typology of ADLs and falls, by directly learning from a set of samples or movement traces and without requiring the explicit and heuristic definition of a threshold value.
In most studies of the related literature, machine learning algorithms follow a fully supervised approach, so they need to be trained with labeled examples of both ADLs and falls. However, falls are rare events, and most studies on FDS are almost completely determined by the lack of real-world fall examples. Owing to the evident difficulties of capturing samples of actual falls experienced by the target public of these systems (older adults), falls aimed at training and testing new proposals on FDSs have to be normally generated in a testbed through the movements of young and healthy volunteers that emulate falls on cushioned surfaces according to a systematic and predefined test plan.
The validity of this procedure is still under discussion. Some related studies [7,8] have compared the dynamics of the falls experienced by older people and those ‘mimicked’ by young subjects in an experimental environment. Authors concluded that although there are similarities between the characteristics of both fall patterns, there also exist relevant differences in the monitored magnitudes related to the reaction time and the mechanisms of the compensatory movements to avoid falling or further damages. In this respect, Aziz et al. showed in [9] that the effectiveness of some supervised learning algorithms may dramatically decrease when they are evaluated in real scenarios.
To cope with this problem, one-class classifiers (OCCs) are a subtype of machine learning architectures particularly adequate to develop binary pattern classifiers with heavily unbalanced datasets [10]. OCCs bypass the need of obtaining laboratory samples of the minority class (falls), as they are conceived to be exclusively trained with traces of the most common class (ADLs). In this way, in the case of FDSs, once the training of the system is accomplished, a fall is detected whenever a certain movement is classified as a ‘anomaly’ (‘novelty’ or ‘outlier’). This occurs when its features substantially diverge from the samples of the majority class used during the training phase.
In a real use scenario, FDSs will have most likely to be adjusted or ‘tuned’ to the particular dynamics of the movements of the user to be monitored. In this vein, Medrano et al. evinced in [11] the benefits of ‘personalizing’ the configuration of the FDS by training the models with movements generated by the final user. Obviously, this process should not oblige the patient to emulate or generate fall patterns to particularize the FDS. In this regard, OCCs may greatly ease the implementation of this system personalization, as long as any user could train from scratch a certain machine learning method just by wearing the system during a certain training period in which the sensors could collect the traces generated by the daily routines of the user and feed the detector.
The idea of utilizing OCCs as the decision core of an FDS is not new. Table 1 summarizes the works that have assessed the performance of anomaly detectors when they are programmed to detect falls with a wearable device. In some specific cases, the FDS develops a ‘hybrid’ approach by combining an OCC and a thresholding method (such as the proposal by Viet et al. in [12]) or an OCC and a fully supervised classifier (such as that proposed by Lisowska et al. in [13]).
Table 1. Works that have proposed and compared one-class classifiers to detect falls as anomalies.
In all cases, the algorithms are primarily based on the analysis of the signals captured by a triaxial accelerometer, which is a strategy that has been massively adopted by the related literature on wearable FDSs. Only in six papers the information provided by the accelerometer is complemented by the use of other inertial sensors (a gyroscope, a magnetometer, or an orientation sensor), and in just two cases, a more complex sensor-fusion policy is applied, so that the classifiers are also fed with signals captured by other type of wearable sensing units (e.g., a heart rate monitor in the paper by Nho et al. [14]).
Table 1 indicates the best reported performing metrics (normally expressed in terms of sensitivity or specificity) of the corresponding OCC in the review literature. When more than one type of classifier is compared, the best performing algorithm in each study is marked in bold in the third column of the table. The results show that in some works, OCCs may achieve a noteworthy efficacy to discriminate ADLs from falls (with sensitivities and specificities higher than 0.98 or 98%). Furthermore, in [15], Medrano et al. illustrate that one-class classifiers may even exhibit a significantly better performance than their supervised counterparts. However, as it can be also appreciated from the last column in the table, all the works employ only one or at most two datasets to evaluate these algorithms. In some studies, these datasets are not obtained from a public repository but directly generated (and not released) by the authors. Due to the limited number of subjects and types of ADLs and falls considered in these datasets, it is legitimate to question if these results can be extrapolated to other repositories. Furthermore, the design criteria of these benchmarking datasets do not follow any particular recommendation and strongly rely on particular decisions of their creators. In a recent work [16], we have shown that even a deep learning method may achieve very divergent results when it is applied to different datasets. Thus, the good performance metrics obtained with a certain repository should be confirmed by training and testing the classifier with other datasets.
Another key problem of OCCs that is normally neglected by the related literature relates to the fact that these detectors may produce false alarms when tested with types of ADLs that were not part of the training subset [17]. This situation would be not so uncommon in a realistic scenario where the monitored user may execute unexpected movements (not caused by falls) that can be consequently be catalogued as ‘anomalies’ by the detector and trigger an undesired alerting message. Contrariwise, in the previous works on OCC-based FDSs, the ADLs included in the data subsets used for testing incorporate the same types of movements utilized for the configuration of the detector, which inherently minimizes the possibility of experiencing these false alarms.
In this paper, we thoroughly analyze these two issues. To this end, we systematically analyze the behavior of five basic types of anomaly detectors (with diverse hyperparameter configurations and input feature sets) when they are employed with nine different well-known datasets captured on the same body positions (the waist). We also investigate if the classification efficacy degrades when new of types of ADLs (not considered for training) are used for testing.
The paper is organized as follows: after the introduction and analysis of the related works presented in this section, Section 2 describes the different aspects of the methodology followed to evaluate the classifiers. Section 3 displays and discussed the main results for the considered study cases. Finally, Section 4 recapitulates the main conclusions of the article.

2. Methods

2.1. Election of the Datasets

To date, there have been released about 25 available datasets to benchmark detection algorithms for transportable FDSs (see [33] for a comprehensive review on this topic). These databases are formed by a set of numerical traces describing the signals captured by inertial sensors placed on one or several locations of the body. To the best of our knowledge, just one released dataset, provided by the FARSEEING project [34], publicly offers a very limited and unrepresentative number of traces captured from actual falls of older adults. In the other cases, the repositories are generated by recruiting a group of volunteers that systematically execute or emulate a series of predetermined ADLs or falls while transporting the corresponding sensor or sensors. For each movement, a trace (labeled as ADL or fall) is created.
Several studies [35,36,37,38,39] have shown that FDSs located on the waist outperform those placed on other body positions with a higher and independent mobility (e.g., a limb) as long as the waist is adjacent to the center of gravity of the human body. Therefore, in order to set up a common reference framework under optimal conditions, we limit our analysis to those 15 repositories that offer inertial data measured at the waist (although some of them also contain measurements captured on other body positions). For the study, we also discard those datasets that do not provide a significant number of samples (less than 400) or those that were collected with an accelerometer range of 2 g, which is too small to properly characterize the abrupt acceleration peaks caused by falls. After applying these criteria, we selected the 9 datasets (DLR, DOFDA, Erciyes, FallAllD, IMUFD, KFall, SisFall, UMAFall, and UP-Fall) described in Table 2. This quantity is clearly superior to the number of benchmarking repositories that are typically considered by the related literature to assess the performance of fall detection algorithms (in fact, as confirmed in Table 1, most proposals are validated against a single dataset). This need of evaluating the classifiers with different repositories is critical if we consider the remarkable heterogeneity [33,40] that exists among the available datasets in terms of the typology of the emulated ADLs and falls, strategies to generate the movements, duration of the traces, environment for the testbed, election of the volunteers, etc.
Table 2. Basic data of the employed datasets.

2.2. Compared One-Class Classifying Algorithms

As aforementioned, one-class classifiers constitute a particularization of binary supervised classification systems, in which the detection algorithms are trained only with data of one class. After the classifier is trained on these one-class traces, data corresponding to a category different from that used during training can be detected as anomalies. Therefore, once the model of an OCC is developed, input patterns can be identified as anomalies when a certain parameter derived from the input signals (e.g., a distance) exceeds a predefined decision threshold.
In the case of FDSs, the concept of an anomaly fits well with that of a fall, which can be envisaged as an unexpected movement that presents atypical characteristics with regard to those of the common or majority class (ADLs). Thus, in our evaluation, the classifiers are trained exclusively with part of the ADL samples included in the datasets while they are tested with both the falls and the rest of the ADLs (those not employed during the training stage).
In order to thoroughly evaluate the feasibility of using an OCC as the core of FDSs, we analyze the performance of five well-known one-class classifiers [10]: an autoencoder, a Gaussian Mixture Model (GMM), a Parzen Probabilistic Neural Network (PPNN), a One-Class K-Nearest Neighbor (OC-KNN), and One-Class Support Vector Machine (OC-SVM). All the classifiers were implemented and executed with Matlab scripts that used the Statistics and Machine Learning Toolbox [48]. Table 3 summarizes the values and possible considered alternatives to hyper-parameterize these classifiers. Through a grid search, we evaluated the performance of the algorithms for the different combinations of these hyperparameters.
Table 3. Values and alternatives of the hyperparameters utilized for the evaluated models of the classifiers.
As the decision threshold to detect the anomaly for each OCC, we employ the variable described in Table 4.
Table 4. Decision thresholds employed to detect anomalies for the five considered OCCs.

2.3. Feature Selection

In order to characterize the mobility samples and feed the machine learning classifiers, we compute a set of features derived from the raw signals collected by the inertial sensors. As all the repositories include the data captured by an accelerometer, which is the most employed sensor in the literature on wearable FDSs, the features are derived from the triaxial acceleration measurements. Falls provoke sudden peaks of the acceleration magnitude when the body hits the ground. This Signal Magnitude Vector (SMVi), for the i-th measurement, is computed as:
S M V i = A x i 2 + A y i 2 + A z i 2
where A x i ,   A y i ,   and   A z i indicate the values of triaxial components of the acceleration for each axis. For every movement trace (ADL or fall), the feature extraction exclusively focuses on a time interval of ±1 s around the sample where the maximum value of SMVi is identified, while the rest of the measurements in the sequence are not considered. The election of the duration of this observation window of 2 s (centered around the acceleration peak) is justified by the fact that an interval between 1 and 2 s is a good trade-off between recognition speed and accuracy to recognize most human activities [49]. In any case, the duration of the critical (impact) phase of a fall does not typically last longer than 0.5 s [50,51]. Thus, all the features will be derived from the consecutive acceleration components collected in the interval: i o N W , i o + N W where io is the index of the sample in which the maximum acceleration module is located:
S M V i o = max S M V i   i 1 , N N W + 1 )
where N denotes the number of measurements of in the trace (for each axis), while NW describes the number of samples captured during the observation window. NW can be straightforwardly calculated as:
N W = f s   ·   t w
where fs is the sampling rate of the trace and tw is the total duration of the window (2 s).
As a proper selection of the input features is a crucial factor in the design of any machine learning method, we consider different alternative candidate feature sets.
Firstly, we employ a set of twelve statistical candidate features that are physically interpretable as they entail a certain characterization of human dynamics. These features have been utilized by other works in the related literature on fall detection and activity recognition systems (refer, for example, to the comprehensive studies presented by Vallabh in [52] or by Xi in [53]). The symbol, labels (or labeling identifiers) and description of these twelve features are presented in Table 5 (a more detailed formal description of these parameters is provided in [33]).
Table 5. Values and alternatives of statistics analyzed to select the input feature set of the classifiers.
In order to select the most convenient combination of input features from these 12 candidate statistics, we performed a preliminary analysis of the effectiveness of these statistics when they are applied to the aforementioned datasets to discriminate falls and ADLs with the classifiers. For all the studies, all the features were z-score normalized before training and testing. After implementing all the possible permutations of the statistics to feed the detectors, obtained results (not presented here for the sake of simplicity) revealed that the two combinations that yielded the best performance metric (sensitivity and specificity) in the classifiers were those using the seven features labeled as B, C, D, F, G, I, and K in Table 4 (‘BCDFGIK’ feature set) as well as the set that employed the 12 candidate features (‘ABCDEFGHIJKL’ feature set).
As the election of these input feature sets can still seem arbitrary, we also consider another set of features obtained from the hctsa (Highly Comparative Time-Series Analysis) Matlab software package [54]. This software is capable of extracting thousands of heterogeneous features from a time-series dataset to produce an optimized low-dimensional representation of the data.
In our case, a set (HCTSA feature set) of 12 features has been selected according to the following procedure:
  • The SisFall [31] repository is selected as the baseline reference as it is considered one of the most complete in terms of types and quantity of movements and number and typology of subjects.
  • The candidate features of the samples are obtained by using HCTSA.
  • The performance resulting from the classification of the data is calculated by using each characteristic as input of a Support Vector Machine classifier with linear kernel and a k-fold analysis (with k = 10).
  • The tool analyzed the correlation between the features that have led to the best results. Then, the application was programmed to divide these features into 12 different clusters, grouping those that are correlated into the same cluster. From each cluster, hctsa selected the most representative feature (the one closest to the center of the cluster).

2.4. Performance Metrics and Model Evaluation

For each combination of the hyperparameters, input feature set, and dataset, we trained an instance of the five contemplated OCCs with a certain number of ADLs and tested it with both ADLs and falls of the same repository. To assess the capability of the one-class classifiers to discriminate both categories, we employed two metrics universally employed in the evaluation of binary classifiers: the sensitivity (Se) or recall, which is defined as the ratio of falls in the test subset that are properly recognized, and specificity (Sp), which is defined as the proportion of test ADLs that are not misidentified as falls. Unlike other metrics (such as the accuracy or F1 score), sensitivity and specificity are not affected if the data classes are unbalanced in the datasets. Once the model is trained, the classifier is tested with 2500 possible values of the detection threshold (between a minimum and maximum that respectively guarantee the maximization of the sensitivity and specificity). Through the estimation of Se and Sp for each value of the discrimination threshold, we compute the receiver operating characteristic curve (ROC curve), which represents the evolution of Se (true positive rate) against 1-Sp (false positive rate). From the curve, we graphically calculate the AUC (Area Under the Curve) as a metric commonly used to characterize the overall performance of the binary classifiers. Additionally, as another global performance metric of the system, which describes the trade-off between an adequate recognition rate of falls (high sensitivity) and the absence of false alarms (high specificity), we also utilize the geometric mean of Se and Sp ( S e · S p ) (together with the values of Se and Sp) in the point of the ROC where the maximum of this statistic is found. The election of this optimal cut-point in the ROC to select the corresponding decision threshold has been also proposed in works such as [55].
In order to minimize the impact of the election of the data used for training and testing the models, we evaluated the classifiers by means of a k-fold cross-validation [56,57]. For that purpose, the ADL traces of all datasets were split in five partitions (k = 5). Thus, for each combination of OCC, hyperparameters, input feature set, and dataset, the classifier is independently trained and tested five times. For each iteration, one of the five different partitions is reserved for the testing phase, while the rest of the ADLs and all the falls in the corresponding database are used to test the model. The performance metrics obtained with the test datasets for the five iterations (AUC, Se, and Sp for the threshold value that yields the highest value of S e · S p ) are averaged to characterize the performance of the classifier.

3. Results and Discussion

3.1. Study for the ‘Fair’ Case

As previously commented and indicated in Table 2, the datasets were generated by considering different predetermined types of ADLs and falls, which were executed by the experimental subjects. In our first analysis, we investigate the performance of the OCCs when the different typologies of ADLs are evenly (‘fairly’) distributed among the five subsets for five-fold cross-validation. Thus, we guarantee that all the types of ADL movements are represented in the subsets with which the anomaly detectors are trained.
The performance metrics obtained for the five algorithms and the nine datasets are presented in Table 6. Due to the high number of combinations that were evaluated, for each dataset and each type of OCC, the table only shows the combination of hyperparameters and input feature set (also indicated in the table) with which the highest value of the geometric mean of sensitivity and specificity ( S e · S p ) was achieved. For each dataset, the row corresponding to the classifier with the best global metric is highlighted in bold. Aiming at giving an insight of the confidence interval of the measurements, together with the mean value of the global metric S e · S p , the table also includes in the last column (preceded by the sign ±) the standard deviation of this parameter obtained for the five tests of the corresponding k-fold validation of the classifier. To ease the visualization of the comparison of the algorithms, the particular results of the AUC and S e · S p are summarized in Table 7 and Table 8, respectively. The highest values are also emphasized in bold.
Table 6. Performance metrics (AUC, Se, Sp, and S e · S p ) for the best combination of hyperparameters of the classifiers when they are applied to the datasets under study.
Table 7. Obtained AUC (Area Under the Curve) of the ROC for the best combination of hyperparameters of the classifiers.
Table 8. Maximum obtained geometric mean of sensitivity and specificity ( S e · S p ) for the best combination of hyperparameters of the classifiers.
From the results, we can draw the following conclusions:
  • The best results are achieved by the OC-KNN classifier, which outperforms the rest of the detection methods for five out of the nine analyzed datasets (in terms of the geometric mean of sensitivity and specificity), while it presents the second or third best results for the other three datasets.
  • The one-class SVM detector produces the best results for three datasets, while it offers the second-best behavior for five repositories. In any case, if we take into account the confidence interval that can be derived from the measurements, we can conclude that the differences in the behavior of OC-KNN and OC-SVM are not statistically significant.
  • In most cases, the best performance is attained with the simplest input feature set (with the seven features labeled as BCDFGIK and described in Table 5): This suggests that if the features are conveniently selected, a parsimonious OCC architecture can be sufficient to produce efficient detection decisions.
  • GMM, autoencoder and, specially, PPNN classifiers offer a more variable and erratic behavior as the quality of the classification strongly depends on the employed datasets. In several databases, the best achieved geometric mean of sensitivity and specificity is under 0.90.
  • For all the datasets, the OC-KNN classifier yields at least a specificity and a sensitivity of 0.9. In most cases, these metrics are both higher than 0.95. These results are in line with most of the supervised (double-class) methods of machine learning that can be found in the related literature (see, for example, the surveys presented in [58,59,60,61,62,63]). This implies that if the decision threshold is properly chosen, an OCC can behave as a two-class classifier without requiring training the detector with falls. In a realistic use scenario, the final user of the detector e.g., an older adult) could be monitored during his/her daily routines to generate a dataset of ADLs. This dataset could be used to train and personalize an FDS based on an OCC.

3.2. Study of the Benefits of Ensemble Learning

Ensemble methods offer a simple and efficient paradigm to boost the prediction capability of single machine learning methods basing on the combined decision of multiple models [64]. In this subsection, we assess if the aggregate knowledge reached by the models evaluated in the previous analysis can improve the individual performance of the classifiers. In particular, we re-calculate the detection decision when a simplistic majority voting of three classifiers is applied (a similar performance is achieved if a higher number of models is considered). In this case, for each dataset, we use as base learners the three combinations of hyperparameters, input feature sets, and OCCs with which the three highest global performance metrics (geometric mean of Se and Sp) were obtained. Thus, during the testing phase, a trace is identified as a fall if a majority of the decision classifiers (two or three) classify the movement as a fall.
The obtained results are presented in Table 9. For comparison purposes, the table also indicates the best results (extracted from Table 6) corresponding to the best discrimination ratio achieved by a single OCC. In the table, the metrics of the ensemble classifier are marked in bold when they improve those generated by the single learner. Conversely, the results are highlighted in italics when the majority voting underperforms the best single model.
Table 9. Comparison of the performance metrics of the majority voting ensemble and those of the best single OCC.
As it can be observed, the use of the ensemble improves the global performance metric in six out of the nine analyzed datasets (in several cases, a value of S e · S p close to 0.99 is attained), while with just one repository (DLR), the application of the voting technique reduces the effectiveness of the binary classification process.

3.3. Impact of the Typology of ADLs Employed in the Training Phase

As mentioned above, OCCs avoid the need of obtaining (or generating) traces describing real or emulated falls that are required to train supervised learning algorithms. In contrast, the use of one-class classifiers can present difficulties related to a lower specificity of the system due to the appearance of a greater number of false alarms or false positives, which is caused by ADLs not contemplated in the training dataset and identified as anomalies.
To determine the extent of this problem, we repeat the previous study of Section 3.1 when a certain typology of ADLs is removed from the training set and included in the testing subset. For this purpose, as already suggested in our previous studies in [33,40], the ADL movements of all the repositories have been split into three categories, which are displayed in Table 10, depending on the physical effort that they required to perform.
Table 10. Categorization criteria to divide the ADL movements into different types.
For each dataset (except for the DOFDA repository, which does not include sufficient traces of two categories), we generated three subsets of ADLs containing the traces of the corresponding categories. The best combination of hyperparameters and input feature set of each type of OCC obtained in Section 3.1 is trained and tested three times. In each experiment, each model is exclusively trained with the subsets of two categories and then tested with the falls and ADLs of the remaining category using the optimal decision threshold computed for the ‘fair’ case.
The results for all the analyzed datasets and the best performing OCC of each type are shown in Table 11, Table 12 and Table 13 for the cases in which the training sets do not include basic, standard, and sporting activities, respectively. The last column of each table (‘Loss’) indicates the difference between the global performance metric obtained with this segregation of the training and test subsets based on the categorization of the ADLs and the performance metric achieved with the ‘fair’ case (Table 6) in which traces of all the categories of ADLs are incorporated into the training subset. Consequently, a negative value of this parameter denotes a deterioration of the recognition capacity of the classifier.
Table 11. Results of the classifiers when they are tested with falls and basic activities and trained with the rest of ADL categories.
Table 12. Results of the classifiers when they are tested with falls and standard movements and trained with the rest of ADL categories.
Table 13. Results of the classifiers when they are tested with falls and sporting activities and trained with the rest of ADL categories (results for the IMUFD dataset are not included, as this repository does not include sporting movements).
As it could be expected, results show that the presence of new types of ADLs in the testing sets (not considered during the training phase) causes a strong degradation of the capability of the classifiers to discriminate falls from ADLs. This loss of effectiveness is particularly remarkable in those repositories (such as FallAllD) that encompass a greater number of types of ADL.
In this regard, the poorest discrimination rate is achieved when the system is tested with sporting movements. In some datasets, the best results for this situation achieve specificities below 80% (which imply that more than 20% of sporting actions are considered as falls and would trigger a false alarm). The brusque mobility patterns induced by this category of movements obviously provoke that the classifiers (trained with much less agitated activities) misinterpret them as anomalies.
Paradoxically, the results also indicate that very basic and less energetic activities also result in false positives, as they can be also identified as ‘novelties’ if traces corresponding to low motion movements are not included in the training subset. Nevertheless, these false alarms originated by ‘sedentary’ actions could be most probably avoided by a simple thresholding technique so that a movement trace is inputted to the OCC only if the magnitude of the acceleration exceeds a certain value and a fall can be reasonably suspected.
Finally, the movements included in the standard category seem to be the typology of activities with the lowest impact on the effectiveness of the training. This can be explained by the fact that these activities represent an intermediate point of physical strength between basic and sporting movements. Thus, training with movements with a lower and greater intensity (basic activities and sports, respectively) gives enough information to the classifiers to avoid being considered as anomalies. Yet, a relevant decay of the performance of certain OCCs is also perceived when this category is excluded from the training phase.

4. Conclusions

This work has assessed the effectiveness of utilizing one-class classifiers as the decision core of fall detection systems based on wearable inertial sensors. Unlike fully supervised methods, OCCs benefit from the fact that they can be trained exclusively with samples of a single class (conventional Activities of Daily Living), which avoids the need of obtaining traces captured during falls to train the classifiers.
In particular, we have analyzed the performance of five well-known OCCs under different input feature sets and a wide selection of hyperparameters. In contrast with most studies in the literature, which base their analysis on the use of a single dataset, we have extended the study to nine public repositories.
The achieved results (with values of the geometric mean of sensitivity and specificity higher than 95%) have shown the capability of the OCC to discriminate falls from ADLs with a high accuracy if the election of the decision threshold is optimized. This performance is comparable to that obtained with supervised systems in the literature. For almost all tests and datasets, the one-class KNN classifier stood out as the best (or second best) detection algorithm, which is a conclusion that is coherent with other previous analysis in the related works. The study has also revealed that the use of simplistic ensemble learning methods (such as voting) may improve the hit rate of the detector if the decisions of several OCCs are simultaneously considered.
In any case, the analyses have illustrated the extreme vulnerability of these classifiers to the typology of the ADLs used for the training phase. Actions that involve rapid movements (such as sports) and even very basic activities (which do not require any physical effort) may be straightforwardly identified as anomalies if they are not considered in the patterns used for training. This problem, which could be alleviated with the combination of OCCs and other simple methods that avoid identifying certain typical ADLs as falls, forces rethinking the way in which one-class detectors are adjusted and evaluated. The results clearly show the importance of having a sufficiently varied set of samples for training. Likewise, in the test phase, and as stress tests of the system, the evaluation should ponder the use of ADLs (not used for training) that entail agitated movements that may affect the decision of the classifier. Future studies should also focus on methodologies that automatically optimize the election of the decision threshold.

Author Contributions

Conceptualization, E.C.; methodology, E.C. and J.A.S.-R.; software, J.A.S.-R.; validation, J.A.S.-R.; formal analysis, E.C. and J.A.S.-R.; investigation, E.C. and J.A.S.-R.; resources, E.C. and J.M.C.-G.; data curation, E.C. and J.A.S.-R.; writing—original draft preparation, E.C.; writing—review and editing, E.C., J.A.S.-R. and J.M.C.-G.; visualization, J.A.S.-R.; supervision, E.C. and J.M.C.-G.; project administration, E.C.; funding acquisition, E.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by FEDER Funds (under grant UMA18-FEDERJA-022), Andalusian Regional Government (-Junta de Andalucía- grant PAIDI P18-RT-1652) and Universidad de Málaga, Campus de Excelencia Internacional Andalucia Tech.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

All the datasets employed in this work are publicly available. The URL to download the repositories can be found in the corresponding reference.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. World Health Organization Falls: Key Facts. Available online: https://www.who.int/news-room/fact-sheets/detail/falls (accessed on 16 July 2021).
  2. World Health Organization. WHO Global Report on Falls Prevention in Older Age; WHO Press: Geneva, Switzerland, 2007. [Google Scholar]
  3. World Health Organization Ageing and Health—Key Facts. Available online: http://www.who.int/mediacentre/factsheets/fs404/en/ (accessed on 21 July 2021).
  4. Lord, S.R.; Sherrington, C.; Menz, H.B.; Close, J.C.T. Falls in Older People: Risk Factors and Strategies for Prevention; Cambridge University Press: Cambridge, UK, 2007. [Google Scholar]
  5. Casilari, E.; Luque, R.; Morón, M. Analysis of android device-based solutions for fall detection. Sensors 2015, 15, 17827–17894. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  6. Aziz, O.; Musngi, M.; Park, E.J.; Mori, G.; Robinovitch, S.N. A comparison of accuracy of fall detection algorithms (threshold-based vs. machine learning) using waist-mounted tri-axial accelerometer signals from a comprehensive set of falls and non-fall trials. Med. Biol. Eng. Comput. 2017, 55, 45–55. [Google Scholar] [CrossRef]
  7. Klenk, J.; Becker, C.; Lieken, F.; Nicolai, S.; Maetzler, W.; Alt, W.; Zijlstra, W.; Hausdorff, J.M.; Van Lummel, R.C.; Chiari, L. Comparison of acceleration signals of simulated and real-world backward falls. Med. Eng. Phys. 2011, 33, 368–373. [Google Scholar] [CrossRef]
  8. Bagalà, F.; Becker, C.; Cappello, A.; Chiari, L.; Aminian, K.; Hausdorff, J.M.; Zijlstra, W.; Klenk, J. Evaluation of accelerometer-based fall detection algorithms on real-world falls. PLoS ONE 2012, 7, e37062. [Google Scholar] [CrossRef] [Green Version]
  9. Aziz, O.; Klenk, J.; Schwickert, L.; Chiari, L.; Becker, C.; Park, E.J.; Mori, G.; Robinovitch, S.N. Validation of accuracy of SVM-based fall detection system using real-world fall and non-fall datasets. PLoS ONE 2017, 12, e0180318. [Google Scholar] [CrossRef] [Green Version]
  10. Khan, S.S.; Madden, M.G. One-class classification: Taxonomy of study and review of techniques. Knowl. Eng. Rev. 2014, 29, 345–374. [Google Scholar] [CrossRef] [Green Version]
  11. Medrano, C.; Plaza, I.; Igual, R.; Sánchez, Á.; Castro, M. The effect of personalization on smartphone-based fall detectors. Sensors 2016, 16, 117. [Google Scholar] [CrossRef] [PubMed]
  12. Viet, V.; Choi, D.-J. Fall detection with smart phone sensor. In Proceedings of the 3rd International Conference on Internet (ICONI 2011), Sepang, Malaysia, 15–19 December 2011; pp. 15–19. [Google Scholar]
  13. Lisowska, A.; Wheeler, G.; Inza, V.C.; Poole, I. An evaluation of supervised, novelty-based and hybrid approaches to fall detection using silmee accelerometer data. In Proceedings of the 2015 IEEE International Conference on Computer Vision Workshop (ICCVW), Santiago, Chile, 7–13 December 2015; pp. 402–408. [Google Scholar]
  14. Nho, Y.H.; Lim, J.G.; Kwon, D.S. Cluster-analysis-based user-adaptive fall detection using fusion of heart rate sensor and accelerometer in a wearable device. IEEE Access 2020, 8, 40389–40401. [Google Scholar] [CrossRef]
  15. Medrano, C.; Igual, R.; García-Magariño, I.; Plaza, I.; Azuara, G. Combining novelty detectors to improve accelerometer-based fall detection. Med. Biol. Eng. Comput. 2017, 55, 1849–1858. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  16. Casilari, E.; Lora-Rivera, R.; García-Lagos, F. A study on the application of convolutional neural networks to fall detection evaluated with multiple public datasets. Sensors 2020, 20, 1466. [Google Scholar] [CrossRef] [Green Version]
  17. Khan, S.S.; Hoey David, J.R. Review of fall detection techniques: A data availability perspective. Med. Eng. Phys. 2017, 39, 12–22. [Google Scholar] [CrossRef] [Green Version]
  18. Zhang, T.; Wang, J.; Liu, P.; Hou, J. Fall detection by embedding an accelerometer in cellphone and using KFD algorithm. Int. J. Comput. Sci. Netw. Secur. 2006, 6, 277–284. [Google Scholar]
  19. Zhang, T.; Wang, J.; Xu, L.; Liu, P. Fall detection by wearable sensor and one-class SVM algorithm. In Intelligent Computing in Signal Processing and Pattern Recognition; Springer: Berlin/Heidelberg, Germany, 2006; pp. 858–863. [Google Scholar]
  20. Yin, J.; Yang, Q.; Member, S.; Pan, J.J. Sensor-based abnormal human-activity detection. IEEE Trans. Knowl. Data Eng. 2008, 20, 1082–1090. [Google Scholar] [CrossRef]
  21. Medrano, C.; Igual, R.; Plaza, I.; Castro, M. Detecting falls as novelties in acceleration patterns acquired with smartphones. PLoS ONE 2014, 9, e94811. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  22. Khan, S.S.; Karg, M.E.; Kulić, D.; Hoey, J. X-factor HMMs for Detecting Falls in the Absence of Fall-specific training data. In International Workshop on Ambient Assisted Living; Springer: Cham, Switzerland, 2014; Volume 8868, pp. 1–9. [Google Scholar]
  23. Khan, S.S.; Karg, M.E.; Kulić, D.; Hoey, J. Detecting falls with X-factor hidden markov models. Appl. Soft Comput. J. 2017, 55, 168–177. [Google Scholar] [CrossRef] [Green Version]
  24. Frank, K.; Vera Nadales, M.J.; Robertson, P.; Pfeifer, T. Bayesian recognition of motion related activities with inertial sensors. In Proceedings of the 12th ACM International Conference on Ubiquitous Computing-Adjunct, Copenhagen, Denmark, 26–29 September 2010; pp. 445–446. [Google Scholar]
  25. Vavoulas, G.; Pediaditis, M.; Spanakis, E.G.; Tsiknakis, M. The MobiFall dataset: An initial evaluation of fall detection algorithms using smartphones. In Proceedings of the IEEE 13th International Conference on Bioinformatics and Bioengineering (BIBE 2013), Chania, Greece, 10–13 November 2013; pp. 1–4. [Google Scholar]
  26. Yang, K.; Ahn, C.R.; Vuran, M.C.; Aria, S.S. Semi-supervised near-miss fall detection for ironworkers with a wearable inertial measurement unit. Autom. Constr. 2016, 68, 194–202. [Google Scholar] [CrossRef] [Green Version]
  27. Khan, S.S.; Taati, B. Detecting unseen falls from wearable devices using channel-wise ensemble of autoencoders. Expert Syst. Appl. 2017, 87, 280–290. [Google Scholar] [CrossRef] [Green Version]
  28. Ojetola, O.; Gaura, E.; Brusey, J. Data Set for Fall Events and Daily Activities from Inertial Sensors. In Proceedings of the 6th ACM Multimedia Systems Conference (MMSys’15), Portland, OR, USA, 18–20 March 2015; pp. 243–248. [Google Scholar]
  29. Micucci, D.; Mobilio, M.; Napoletano, P.; Tisato, F. Falls as anomalies? An experimental evaluation using smartphone accelerometer data. J. Ambient Intell. Humaniz. Comput. 2017, 8, 87–99. [Google Scholar] [CrossRef] [Green Version]
  30. Anguita, D.; Ghio, A.; Oneto, L.; Parra, X.; Reyes-Ortiz, J.L. A public domain dataset for human activity recognition using smartphones. In Proceedings of the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2013), Bruges, Belgium, 24–26 April 2013; pp. 437–442. [Google Scholar]
  31. Lisowska, A.; O’Neil, A.; Poole, I. Cross-cohort evaluation of machine learning approaches to fall detection from accelerometer data. In Proceedings of the 11th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2018)—Volume 5: HEALTHINF, Funchal, Madeira, Portugal, 19–21 January 2018; Volume 5, pp. 77–82. [Google Scholar]
  32. Chen, L.; Li, R.; Zhang, H.; Tian, L.; Chen, N. Intelligent fall detection method based on accelerometer data from a wrist-worn smart watch. Measurement 2019, 140, 215–226. [Google Scholar] [CrossRef]
  33. Casilari, E.; Santoyo-Ramón, J.A.; Cano-García, J.M. On the heterogeneity of existing repositories of movements intended for the evaluation of fall detection systems. J. Healthc. Eng. 2020, 2020, 6622285. [Google Scholar] [CrossRef]
  34. Bourke, A.K.; Klenk, J.; Schwickert, L.; Aminian, K.; Ihlen, E.A.F.; Mellone, S.; Helbostad, J.L.; Chiari, L.; Becker, C. Fall detection algorithms for real-world falls harvested from lumbar sensors in the elderly population: A machine learning approach. In Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBS 2016), Orlando, FL, USA, 16–20 August 2016; pp. 3712–3715. [Google Scholar]
  35. Gjoreski, H.; Luštrek, M.; Gams, M. Accelerometer placement for posture recognition and fall detection. In Proceedings of the 7th International Conference on Intelligent Environments (IE 2011), Nottingham, UK, 25–28 July 2011; pp. 47–54. [Google Scholar]
  36. Dai, J.; Bai, X.; Yang, Z.; Shen, Z.; Xuan, D. PerFallD: A pervasive fall detection system using mobile phones. In Proceedings of the 8th IEEE International Conference on Pervasive Computing and Communications Workshops (PERCOM Workshops), Mannheim, Germany, 29 March–2 April 2010; pp. 292–297. [Google Scholar]
  37. Kangas, M.; Konttila, A.; Lindgren, P.; Winblad, I.; Jämsä, T. Comparison of low-complexity fall detection algorithms for body attached accelerometers. Gait Posture 2008, 28, 285–291. [Google Scholar] [CrossRef] [PubMed]
  38. Fang, S.-H.; Liang, Y.-C.; Chiu, K.-M. Developing a mobile phone-based fall detection system on android platform. In Proceedings of the Computing, Communications and Applications Conference (ComComAp), Hong Kong, China, 21 February 2012; pp. 143–146. [Google Scholar]
  39. Ntanasis, P.; Pippa, E.; Özdemir, A.T.; Barshan, B.; Megalooikonomou, V. Investigation of sensor placement for accurate fall detection. In Proceedings of the International Conference on Wireless Mobile Communication and Healthcare (MobiHealth 2016), Milan, Italy, 14–16 November 2016; pp. 225–232. [Google Scholar]
  40. Casilari, E.; Santoyo-Ramón, J.A.; Cano-García, J.M. Analysis of public datasets for wearable fall detection systems. Sensors 2017, 17, 1513. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  41. Cotechini, V.; Belli, A.; Palma, L.; Morettini, M.; Burattini, L.; Pierleoni, P. A dataset for the development and optimization of fall detection algorithms based on wearable sensors. Data Br. 2019, 23, 103839. [Google Scholar] [CrossRef]
  42. Özdemir, A.T.; Barshan, B. Detecting falls with wearable sensors using machine learning techniques. Sensors 2014, 14, 10691–10708. [Google Scholar] [CrossRef] [PubMed]
  43. Saleh, M.; Abbas, M.; Le Jeannes, R.B. FallAllD: An open dataset of human falls and activities of daily living for classical and deep learning applications. IEEE Sens. J. 2021, 21, 1849–1858. [Google Scholar] [CrossRef]
  44. Human Factors and Ergonomics Lab—Korea Advanced Intitute of Science and Technology KFall: A Comprehensive Motion Dataset to Detect Pre-impact Fall for the Elderly based on Wearable Inertial Sensors. Available online: https://sites.google.com/view/kfalldataset (accessed on 30 April 2021).
  45. Sucerquia, A.; López, J.D.; Vargas-bonilla, J.F. SisFall: A fall and movement dataset. Sensors 2017, 17, 198. [Google Scholar] [CrossRef]
  46. Casilari, E.; Santoyo-Ramón, J.A.; Cano-García, J.M. Analysis of a smartphone-based architecture with multiple mobility sensors for fall detection. PLoS ONE 2016, 11, e01680. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  47. Martínez-Villaseñor, L.; Ponce, H.; Brieva, J.; Moya-Albor, E.; Núñez-Martínez, J.; Peñafort-Asturiano, C. UP-fall detection dataset: A multimodal approach. Sensors 2019, 19, 1988. [Google Scholar] [CrossRef] [Green Version]
  48. Mathworks Statistics and Machine Learning Toolbox—MATLAB. Available online: https://es.mathworks.com/products/statistics.html (accessed on 18 August 2021).
  49. Banos, O.; Galvez, J.-M.; Damas, M.; Pomares, H.; Rojas, I. Window size impact in human activity recognition. Sensors 2014, 14, 6474–6499. [Google Scholar] [CrossRef] [Green Version]
  50. Becker, C.; Schwickert, L.; Mellone, S.; Bagalà, F.; Chiari, L.; Helbostad, J.L.; Zijlstra, W.; Aminian, K.; Bourke, A.; Todd, C.; et al. Proposal for a multiphase fall model based on real-world fall recordings with body-fixed sens. Z. Gerontol. Geriatr. 2012, 45, 707–715. [Google Scholar] [CrossRef]
  51. Noury, N.; Rumeau, P.; Bourke, A.K.; ÓLaighin, G.; Lundy, J.E. A proposal for the classification and evaluation of fall detectors. IRBM 2008, 29, 340–349. [Google Scholar] [CrossRef]
  52. Vallabh, P.; Malekian, R. Fall detection monitoring systems: A comprehensive review. J. Ambient Intell. Humaniz. Comput. 2018, 9, 1809–1833. [Google Scholar] [CrossRef]
  53. Xi, X.; Tang, M.; Miran, S.M.; Luo, Z. Evaluation of feature extraction and recognition for activity monitoring and fall detection based on wearable sEMG sensors. Sensors 2017, 17, 1229. [Google Scholar] [CrossRef]
  54. Fulcher, B.D.; Jones, N.S. hctsa: A computational framework for automated time-series phenotyping using massive feature extraction. Cell Syst. 2017, 5, 527–531. [Google Scholar] [CrossRef] [PubMed]
  55. Liu, X. Classification accuracy and cut point selection. Stat. Med. 2012, 31, 2676–2686. [Google Scholar] [CrossRef] [PubMed]
  56. Rodríguez, J.D.; Pérez, A.; Lozano, J.A. Sensitivity analysis of k-fold cross validation in prediction error estimation. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 32, 569–575. [Google Scholar] [CrossRef] [PubMed]
  57. Wong, T.T. Performance evaluation of classification algorithms by k-fold and leave-one-out cross validation. Pattern Recognit. 2015, 48, 2839–2846. [Google Scholar] [CrossRef]
  58. Delahoz, Y.S.; Labrador, M.A. Survey on fall detection and fall prevention using wearable and external sensors. Sensors 2014, 14, 19806–19842. [Google Scholar] [CrossRef] [Green Version]
  59. Wang, X.; Ellul, J.; Azzopardi, G. Elderly fall detection systems: A literature survey. Front. Robot. AI 2020, 7, 71. [Google Scholar] [CrossRef]
  60. Andò, B.; Baglio, S.; Castorina, S.; Crispino, R.; Marletta, V. Advanced solutions aimed at the monitoring of falls and human activities for the elderly population. Technologies 2019, 7, 59. [Google Scholar] [CrossRef] [Green Version]
  61. Ren, L.; Peng, Y. Research of fall detection and fall prevention technologies: A systematic review. IEEE Access 2019, 7, 77702–77722. [Google Scholar] [CrossRef]
  62. Islam, M.; Tayan, O.; Islam, R.; Islam, S.; Nooruddin, S.; Kabir, M.N.; Islam, R. Deep learning based systems developed for fall detection: A review. IEEE Access 2020, 8, 166117–166137. [Google Scholar] [CrossRef]
  63. Broadley, R.; Klenk, J.; Thies, S.; Kenney, L.; Granat, M.; Broadley, R.W.; Klenk, J.; Thies, S.B.; Kenney, L.P.J.; Granat, M.H. Methods for the real-world evaluation of fall detection technology: A scoping review. Sensors 2018, 18, 2060. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  64. Polikar, R. Ensemble learning. In Ensemble Machine Learning; Springer: Boston, MA, USA, 2012; pp. 1–34. [Google Scholar]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.