1. Introduction
Chronic and physical fitness-related diseases are rapidly increasing as the population increases. Physical activities are directly associated with human health benefits. Therefore, many researchers strongly recommend 30 to 40 minutes of physical activity regularly for a healthier life, since it can reduce the risk of many diseases, such as heart attacks, diabetes, cancer, cardiovascular disease, and so on [
1]. In hospitals, many patients need continuous monitoring, which is quite expensive and inconvenient, especially for children and elderly people [
2]. Instead of relying on expensive treatments and delayed intervention, healthcare sensing technology can inform doctors of escalated incidents beforehand. Among the primary health-care sensing categories, wearable inertial sensors show promising potential for human locomotion tracking [
3]. These sensors can monitor human activities conveniently and effectively in a free living environment, and their use is rapidly increasing, representing up to 97% of the market volume by 2020 [
4].
Human activity recognition (HAR) can be broadly classified into template matching, generative and discriminative categories [
5,
6]. Firstly, template matching algorithms, such as the K-Nearest Neighbors classifier, compute the distances between the event data. Secondly, generative algorithms, e.g., the Bayes network, use probabilistic graphs to classify human activities. Finally, discriminative approaches model the boundaries between data events [
7,
8,
9]. Although these machine learning algorithms operate with little prior information, they nevertheless provide good classification results [
10,
11,
12,
13,
14,
15]. However, their feature engineering requires deep expertise that can significantly reduce discriminant errors and improve the performance of a recognition system.
In this paper, we propose new robust ‘multi-combined’ features to represent human body movements and to classify human activity patterns using the time-series data from tri-axial inertial signals via wearable sensors. These new features are combinations of 14 different kinds of features, including statistical features, the Mel Frequency Cepstral Coefficients (MFCC), Electrocardiogram (ECG) features, and Gaussian Mixture Model (GMM) features, which efficiently reduce discriminant errors and improve the performance of the activity recognition system. For HAR, novel combined classifiers such as Binary Grey Wolf Optimization (BGWO) and Decision Trees (DTs) are applied. To examine the performance, a new continuous wearable human activity dataset (i.e., the IM-AccGyro dataset) that contains 1D segmented signal sequences is provided; this can be used for the training and testing data of different physical exercises. This will become a benchmark dataset in the field of the wearable activity recognition of physical exercises based on inertial sensors. Additionally, we apply the proposed method to public datasets such as the MOTIONSENSE and MHEALTH datasets. For comparison studies, we consider state-of-the-art methods such as a Genetic algorithm optimized by Ant Colony Optimization (ACO) and a Support Vector Machine (SVM) optimized by Particle Swarm Optimization (PSO). We obtained remarkable improvements in the recognition rates over current state-of-the-art methods.
The rest of this paper is organized as follows.
Section 2 presents related works.
Section 3 presents the complete system methodology, which is comprised of physical activity detection, preprocessing, feature extraction, optimization, and classification methodologies.
Section 4 explains the complete experimental setting and describes the datasets.
Section 5 describes the results and evaluation. Finally,
Section 6 presents the conclusion.
2. Related Works
Several studies have classified human activity patterns. These activity patterns have been classified using two major categories of sensor devices: vision sensors and wearable sensors. In vision-based HAR, video cameras are used to capture image data. Babiker et al. [
16] proposed digital image processing techniques including background subtraction, binarization, and morphological operations. Then, a multilayer feed-forward perceptron network is applied to recognize daily human movements and to conduct HAR in indoor environments with a single static camera. Jalal et al. [
17] obtained video-based invariant features by applying the R transform to depth silhouettes. These silhouettes are encoded into feature values using depth images. Then, the scaling invariant features are calculated by computing the 2D and 1D features using Radon transform and R transform. Finally, principal component analysis and the Hidden Markov Model are applied to the computed features to train and recognize different human activities. Liu et al. [
18] focused on the shapelet-based method to recognize four different types of human activities. They evaluated their proposed approach on two public datasets. Experimental results showed their proposed approach was efficient to handle complex activities.
Instead of relying on image data, many researchers have designed wearable sensor technologies for activity monitoring and classification. Jansi et al. [
19] presented a multi-feature (time and frequency) domain to enhance the classification of eight different human activities from inertial sensors installed in smartphones. Tian et al. [
20] proposed a two-layer diversity-enhanced multi-classifier recognition method from one triaxle accelerometer to classify four different activities. Furthermore, they extracted three-domain features (time, frequency, and AR coefficients) to optimize the performance of the multi-classifier recognition system. Tahir et al. [
21] proposed a multifused model to maximize the optimal features values. The extracted features values are then optimized and classified using adaptive moment estimation and a maximum entropy Markov Model. This method achieved an accuracy of 90.91% over the MHEALTH dataset. Haresamudram et al. [
22] introduced a masked reconstruction-based BERT model for human activity recognition activities. The activities are pre-trained as a self-supervised approach. The transformer encoder architecture is also applied for continuous data from body worn sensors and achieved an accuracy of 79.86% over a MOTIONSENSE dataset. Jordao et al. [
23] implemented convolution neural network for wearable sensors human activity recognition data. The authors evaluated the implemented methodology on an MHEALTH dataset by using the “leave one subject out” validation protocol; they achieved an accuracy rate of 83%. Batool et al. [
24] proposed a physical activity detection model based on Mel Frequency Cepstral Coefficients (MFCC) and statistical features. The extracted features were then optimized and classified with a PSO and SVM algorithm. The implemented methodology was later evaluated over a MOTIONSENSE dataset, giving an accuracy of 87.50%. Zha et al. [
25] presented Logical Hidden Semi-Markov Models (LHSMMs), which are a combination of Logical Hidden Markov Models (LHMMSs) and Hidden Semi-Markov Models, to segment the duration of each activity. Moreover, a comparison of LHSMMs and LHMMs proved that the given method is more robust and has higher probability results than the LHMM method.
Optical sensors like digital and bumblebee cameras can be used to improve human lifestyles. However, there are limitations to the use of optical sensors for detecting human activities. With those sensors, the detection of subject’s movements is restricted to a particular range and they have privacy issues, e.g., recording in private places like restrooms or intruding on the user’s personal life. It is uncomfortable for the subjects to carry optical sensors around with them because sensors are bulky, invasive, and are not easily worn during working hours. Additionally, such cameras are relatively more expensive than other wearable sensors. Despite previous human activity classification research, there are still challenges in computation, multi-sensor support, and precise signal data acquisition. Therefore, we suggest a novel method for human activity classification in this paper.
5. Experimental Results and Evaluation
The proposed system is evaluated using the “leave one subject out” (LOSO) cross-validation method with training and testing data. The three chosen classifiers with optimization algorithms are the following: the GA optimized by ACO, the DT optimized by wolf optimization, and the SVM optimized by PSO. The human activity classification algorithm is validated using precision, recall, and F-measure to identify different postures and movements. The precision is defined as the True positive (instances that belong to the class) by the total number of instances (True positive and False positive).
The recall is defined as the proportion of instances classified in one class by the total instances. Total instances include True Positive (
TP) and True Negative (
TN) values.
The F-measure is the combination of precision and recall and is defined as:
The classification results of the three classifiers, i.e., support vector machine, genetic algorithm, and decision tree on MOTIONSENSE, MHEALTH, and IM-AccGyro datasets are reported in
Table 1,
Table 2 and
Table 3. All classifiers are trained on the training set. The classification results in
Table 1,
Table 2 and
Table 3 were obtained by using the testing set. In
Table 3, we got a better classification result on the IM-AccGyro dataset with an F-measure of more than 90% compared to the classification results of
Table 1 and
Table 2, i.e., the MOTIONSENSE, and MHEALTH datasets. The overall results showed that our proposed method achieved better performance than other state-of-the-art methods.
Table 4 depicts the confusion matrix of the MOTIONSENSE dataset for six different activities with a mean accuracy of 88.25%.
Table 5 presents the mean accuracy of 93.95% on the MHEALTH dataset of 12 different activities.
Table 6 shows the confusion matrix of the IM-AccGyro dataset for six different activities with an average accuracy of 96.83%.
Table 7 presents the comparison results of the proposed approach over MOTIONSENSE, MHEALTH, and IM-AccGyro datasets, respectively.
6. Conclusions and Future Works
In this paper, we proposed a novel robust framework, called the multi combined features of the HAR system, which recognizes human activities via the inertial measurements captured from wearable sensors. The multi combined features examined the spatiotemporal variation, optimal pattern, structural uncertainty, rehabilitation motion, and transitional activity features. These features are then passed through three classifiers with optimization algorithms including the Genetic Algorithm (GA) optimized by Ant Colony Optimization (ACO), the Decision Tree (DT) optimized by Binary Grey Wolf Optimization (BGWO), and the Support Vector Machine (SVM) optimized by Particle Swarm Optimization (PSO). During the experiments, we used three challenging inertial sensors datasets including the MOTIONSENSE, MHEALTH, and a proposed self-annotated IM-AccGyro human–machine dataset. The recent work done by numerous researchers using state-of-the-art classifiers (SVM [
41] and GA [
42]) has shown good classification results against multiple benchmark HAR datasets. That is why we evaluated our proposed model against these classifiers. Our proposed method achieved remarkable recognition accuracy performance over the state-of-the-art methods.
As future work, we will improve the efficiency of our multi combined features by adding wavelet and frequency-domain features. Additionally, we are planning to develop more complex activities for different scenarios such as smart homes, offices, and hospitals using various other wearable sensors.