Next Article in Journal
Design of Helical Capacitance Sensor for Holdup Measurement in Two-Phase Stratified Flow: A Sinusoidal Function Approach
Next Article in Special Issue
Rule-Based vs. Behavior-Based Self-Deployment for Mobile Wireless Sensor Networks
Previous Article in Journal
A Channelization-Based DOA Estimation Method for Wideband Signals
Previous Article in Special Issue
Collaboration-Centred Cities through Urban Apps Based on Open and User-Generated Data
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Novel Wearable Sensor-Based Human Activity Recognition Approach Using Artificial Hydrocarbon Networks

Faculty of Engineering, Universidad Panamericana, Mexico City 03920, Mexico
*
Author to whom correspondence should be addressed.
Sensors 2016, 16(7), 1033; https://doi.org/10.3390/s16071033
Submission received: 31 March 2016 / Revised: 22 June 2016 / Accepted: 24 June 2016 / Published: 5 July 2016
(This article belongs to the Special Issue Selected Papers from UCAmI, IWAAL and AmIHEALTH 2015)

Abstract

:
Human activity recognition has gained more interest in several research communities given that understanding user activities and behavior helps to deliver proactive and personalized services. There are many examples of health systems improved by human activity recognition. Nevertheless, the human activity recognition classification process is not an easy task. Different types of noise in wearable sensors data frequently hamper the human activity recognition classification process. In order to develop a successful activity recognition system, it is necessary to use stable and robust machine learning techniques capable of dealing with noisy data. In this paper, we presented the artificial hydrocarbon networks (AHN) technique to the human activity recognition community. Our artificial hydrocarbon networks novel approach is suitable for physical activity recognition, noise tolerance of corrupted data sensors and robust in terms of different issues on data sensors. We proved that the AHN classifier is very competitive for physical activity recognition and is very robust in comparison with other well-known machine learning methods.

Graphical Abstract

1. Introduction

The interest in human activity recognition research has been growing in context-aware systems for different domain applications. Human activity recognition (HAR) deals with the integration of sensing and reasoning in order to better understand people’s actions. Research related to human activity recognition has become relevant in pervasive and mobile computing, surveillance-based security, context-aware computing, health and ambient assistive living. Recognizing body postures and movements is especially important to support and improve health systems, as discussed below.
In their survey, Avci et al. [1] reviewed several medical applications of activity recognition for healthcare, wellbeing and sports systems. Regarding medical applications using HAR with wearable sensors, the authors report examples in the literature of healthcare monitoring and diagnosis systems; rehabilitation; systems to find correlation between movement and emotions; child and elderly care. They also reviewed assisted living and home monitoring systems improving the quality life and ensure the health, safety and wellbeing of children, the elderly and people with cognitive disorders. The authors also state that numerous activity recognition systems using wearable sensors have been proposed for sports and leisure applications; for example: daily and sport activity recognitions; detection of motion sequences in martial arts to increase interaction in video games or martial arts education; monitoring sport activities in order to train and monitor the performance.
Preece et al. [2] reported activity classification systems to find links between common diseases and levels of physical activity. The authors also reviewed systems that provide information on daily activity patterns to improve the treatment and diagnosis of neurological, degenerative and respiratory disorders. Other reported systems quantify levels of physical activity providing feedback and motivating individuals to achieve physical activity goals. Guidoux et al. [3] presented an approach based on smartphone sensors for estimating energy expenditure recognizing physical activities in free-living conditions. In summary, health systems and assistive technologies can benefit from activity recognition and deliver personalized services.
Automated human activity recognition is a challenging task. Two main approaches are used to perform the task of activity recognition: vision-based and sensor-based activity recognition [4,5]. The vision-based approach is based on image processing of video sequences or digital visual data provided by cameras. No wearable or smartphone sensors are required, but it depends on the image quality. The quality of cameras, lighting and environments, among others, are factors that determine image quality. Visually monitoring the actor behavior entails privacy issues. The sensor-based approach is focused on activity monitoring using wearable [5], smartphone sensors and technologies [6] or object embedded sensors [7]. There are several drawbacks in these approaches: wearing sensors or smartphones for a long period of time is necessary, and there might be battery issues. However, the main problem when using sensor-based approaches is the different types of noise found in input features due to sensor errors or noisy environments. The output class can also have errors. Noise in data hampers the human activity recognition classification process.
Nettleton et al. [8] state that “machine learning techniques often have to deal with noisy data, which may affect the accuracy of the resulting data models.” This statement is also true in the activity recognition classification process given the great variations on the types, number and positioning of sensors. Sensor characteristics change also across different subjects and for the same individual [2]. Therefore, in order to develop a successful activity recognition system, it is necessary to use stable and robust machine learning techniques capable of dealing with noisy data.
In this paper, we present a novel machine learning technique to the human activity recognition community: artificial hydrocarbon networks (AHN). Our artificial hydrocarbon networks approach is suitable for physical activity recognition, the noise tolerance of corrupted data sensors and is robust in terms of different issues for data sensors. With the purpose of proving the aforementioned characteristics of our technique, a comparison analysis was performed with the most commonly-used supervised classification techniques in the HAR community. The performance of the proposed AHN classifier was compared to fourteen supervised techniques frequently used in the activity recognition classification process and reviewed in the literature [2,9,10,11].
In order to evaluate the performance of the artificial hydrocarbon network-based classifier, four experiments were designed using the public Physical Activity Monitoring dataset (PAMAP2) [12,13]. The first experiment was done using the entire raw dataset. The second experiment was made performing a prior feature reduction using the recursive feature elimination (RFE) method. The third experiment evaluated noise tolerance in all supervised classifiers using three levels of noise: 7%, 15% and 30%. Noise was simulated with random insertion in input features of the testing set. Lastly, a majority voting across windows-based approach for an HAR system using the proposed method was implemented.
Our results show that the AHN classifier is very competitive for physical activity recognition and is very robust in comparison with the other methods. In addition, this paper also contributes with a benchmark between fifteen supervised machine learning methods in the human activity recognition field, comparing them in terms of: accuracy, macro- and micro-averaging sensibility, precision and F 1 -score and training time; also contrasting the experimental results with recent literature. Notice that the proposed method is not working in real time, and the introduction of the artificial hydrocarbon networks in real-time HAR systems is out of the scope of this work.
The rest of the paper is as follows. Section 2 describes the state-of-the-art sensor-based human activity recognition and discusses noise in the classification process. Then, Section 3 introduces the artificial hydrocarbon networks technique as a supervised learning method, and Section 4 describes our proposal for using an AHN-based classifier in human activity recognition. In order to prove the proposed classifier, a case of study in physical activity monitoring is presented and described in Section 5. In addition, Section 6 presents the results and a discussion of the proposal, as well as a comparison between fourteen supervised classifiers used in HAR. Lastly, Section 7 concludes the paper and highlights future work in this context.

2. Sensor-Based Human Activity Recognition

Recognizing a human activity in a wearable sensor-based approach means that: (i) the activity is present in the physical environment; (ii) sensors are able to provide a reliable representation of the physical parameters of the environment affected by the activity; and (iii) a classification algorithm recognizes accurately an activity [14]. In that sense, this work is focused on the latter component of the wearable sensor-based human activity recognition approach.
Currently, many learning methods have been used in recent years for human activity recognition. Several reviews have been published analyzing the performance of different classifiers in the human activity recognition research area for applications in home care systems, surveillance, physical therapy and rehabilitation, sports improvement, among others. The literature reports several surveys and comparisons for sensor-based human activity, like in [1,2,4,9,11,15,16,17], and vision-based human activity recognition can be found in [18,19].
Since our work is mainly focusing on sensor-based human activity recognition and data-driven approaches, this section is particularly interested in reviewing related works regarding the stability and robustness of machine learning techniques when confronted with the task of human activity recognition. Thus, noise in the human activity recognition classification process is discussed, firstly. Subsequently, related works on machine learning techniques used for human activity recognition are reviewed.

2.1. Noise in the Human Activity Recognition Classification Process

A classification process must be done in order to recognize human activity given that the activity is present and wearable sensors reliably represent physical parameters affected by the activity. The goal of the classification task for human activity recognition is to interpret the features of physical parameters and perform a correct classification of the activity [14].
Noisy data are often provided in machine learning processes, making it more difficult to obtain accurate models for real problems [8]. Different types of noise can also be found in the human activity recognition classification process. Input features may have noise for several reasons, such as: (1) sensor miscalibration; (2) dead or blocked sensors; (3) errors in sensor placement; (4) activities registered in noisy environments; (5) activities interleaved, so that the events are not only related to one activity. Classification labels in the output class need human intervention, and it is therefore likely to have errors, as well. As in other classification problems, noise can be located in training and/or test data.
It is difficult to measure the impact of each type of noisy data in the classification process. Nettleton et al. [8] reviewed works that studied the impact of noise for several learners and presented a comparison of the effect of attribute and class noise on models created by naive Bayes, C4.5 decision tree, an instance-based algorithm, and support vector machines. They compared the techniques’ performance with thirteen classification problems (activity recognition is not included). In the latter work, the authors proved that naive Bayes is relatively more robust to noisy data than the other three techniques, and SVM presented the poorest performance. In this regard, we agree with Nettleton et al. [8] on two statements:
  • Developing learning techniques that effectively and efficiently deal with noisy types of data is a key aspect in machine learning.
  • There is a need for comparisons of the effect of noise on different learning paradigms.
These two statements are also pertinent for human activity recognition domain.

2.2. Machine Learning Techniques Used for Human Activity Recognition

The growing interest on human recognition and the great advances in sensor technologies create the necessity for developing robust machine learning systems. Applications in the field of activity recognition need to deal with a large number of mult imodal sensors that provide high-dimensional data with large variability; thus, data may be missing, and labels can be unreliable.
Recently, some efforts have been done to promote the development of robust machine learning techniques, especially in the domain of activity recognition. The workshop on robust machine learning techniques for human activity recognition is one example of these efforts [20]. An overview of activity recognition describing the major approaches, methods and tools associated with vision and sensor-based recognition was presented by Chen et al. [4]. In fact, the authors made the distinction between data-driven and knowledge-driven approaches. The sensor-based approach is focused on activity monitoring using wearable or smartphone sensors and technologies, while the vision-based approach requires image processing of video sequences or digital visual data provided by cameras [21].
Preece et al. [2] present an introduction and research review of different machine learning techniques used for human activity recognition and their failures. Currently,the authors discuss findings and results obtained with the following learning techniques used in activity classification: threshold-based classification, hierarchical methods, decision trees, k-nearest neighbors, artificial neural networks, support vector machines, naive Bayes and Gaussian mixture models, fuzzy logic, Markov models, combined classifiers and some unsupervised learning methods. They made a summary of studies comparing different classifiers and an overview of the advantages and drawbacks of each of the aforementioned methods. Their comparison includes the number and type of activities classified, accelerometer placements and inter-subject classification accuracy. From this overview, we extract and highlight the following statements [2]:
  • “The variability in activities, sensors and features means that it is not possible to directly compare classification accuracies between different studies.”
  • “... there is no classifier which performs optimally for a given activity classification problem.”
  • “... there is a need for further studies investigating the relative performance of the range of different classifiers for different activities and sensor features and with large numbers of subjects.”
Regarding noise, Preece et al. [2] only mentioned wavelet analysis techniques for suppressing noise, but they had not mentioned anything about the classifier’s robustness or stability.
Dohnálek et al. [11] present a comparison of the performance only in terms of the accuracy of several classifiers: two orthogonal matching pursuit techniques, k-nearest neighbors, classification and regression tree (CART) techniques and global merged self-organizing maps. Their dataset contains data of sensors that measure temperature and 3D data from the accelerometer, gyroscope and magnetometer of nine healthy human subjects. Their results confirm that a compromise between speed and accuracy must be made given that the best classifiers are slower than the worst. It is important to notice that only a brief discussion of time complexity was presented, and no discussion regarding the robustness of the classifiers was done.
Lara et al. presented a summary of classification algorithms used in human activity recognition systems in their survey [9]. They discussed the advantages and limitations of different types of classifiers: decision trees, Bayesian instance-based artificial neural networks, domain transform, fuzzy logic, regression methods, Markov models and classifier ensembles. In addition to this work, the authors did not mention the impact of noise in the process of activity recognition; however, Lara presented experiments addressing this impact in his dissertation [22]. He induced noise by arbitrarily modifying the labels in the dataset to assess the effectiveness of the proposed probabilistic strategies. His results show that some classification algorithms are more tolerant to noise than others.
Lustrek et al. [23] compared the performance of eight machine learning techniques in fall detection and activity recognition. They added Gaussian noise to their input recordings of body tags to the shoulders, elbows, wrists, hips, knees and ankles. They presented classification accuracy results for clean and noisy data in support vector machines, random forest, bagging and AdaBoost classifiers. The best accuracy (support vector machines) on clean data was 97.7% and on noisy data 96.5%.
Ross et al. [24] presented a comparative analysis of the robustness of naive Bayes, support vector machines and random forest methods for activity with respect to sensor noise. The authors performed experiments with collections of test data with random insertions, random deletions and dead sensors. They simulated miscalibrated and dead sensors. Random forest models outperform the other methods in all of their experiments. In their brief study, the three chosen methods were consistent in their relative performance.
To this end, the Opportunity Activity Recognition Challenge was set to provide a common platform to allow the comparison of different machine learning algorithms on the same conditions. Chavarriaga et al. [25] presented the outcome of this challenge. They reported the performance of the following standard techniques over several subjects and recording conditions: k–nearest neighbors, nearest centroid classifier, linear discriminant analysis and quadratic discriminant analysis. One of the subjects had different sensor configurations and noisy data. The dataset used for the challenge is a subset of the one presented by Roggen et al. in [10]. These efforts provide a method of comparison of machine learning techniques using common benchmarks.

3. Artificial Hydrocarbon Networks as a Supervised Learning Method

Nature-inspired computing promotes methodologies, techniques and algorithms focusing on the computation that takes place in nature [26]. Moreover, in machine learning, heuristic- and meta-heuristic-based methods have been widely explored in order to efficiently tackle real-life problems that are difficult to solve due to their high complexity and limitations of resources to analyze and extract experience from them [26]. Recent works have introduced artificial hydrocarbon networks as a supervised learning algorithm [27], which we use as a classifier for human activity recognition. Thus, this section briefly describes the high-level framework of artificial hydrocarbon networks, called artificial organic networks, and then a full description of the artificial hydrocarbon networks algorithm and its characteristics is exposed.

3.1. Artificial Organic Networks

The artificial organic networks (AON) technique is a machine learning framework that is inspired by chemical organic compounds [27], such that all definitions and heuristics are based on chemical carbon networks. Currently, this technique proposes two representations of artificial organic compounds: a graph structure representing physical properties and a mathematical model behavior representing chemical properties.
The main characteristic of the AON framework is that it packages information into modules, so-called molecules [27]. Similar to chemical organic compounds, artificial organic networks define heuristic mechanisms for generating organized and optimized structures based on chemical energy. In a nutshell, artificial organic networks allow [27]: modularity, inheritance, organizational and structural stability.
Currently, artificial organic networks define a framework in order to develop useful learning algorithms inherit to it [27], as shown in Table 1. Reading bottom-up, the first component of this framework defines the basic units that can be used in the machine learning algorithm, the second level is related to the interactions among components to compute nonlinear relationships. Then, the third level of the framework refers to the chemical heuristic rules that control the interactions over components. These three levels are also mathematically modeled in terms of their structure and functionality, and lastly, the implementation level considers training learning models and then inferring from them [27,28]. Detailed information of the AON-framework can be found in [27,28].

3.2. Artificial Hydrocarbon Networks Algorithm

Artificial hydrocarbon networks (AHN) algorithm is a supervised learning algorithm with a graphical model structure inspired by chemical hydrocarbon compounds [27]. Similar to chemical hydrocarbon compounds, artificial hydrocarbon networks are composed of hydrogen and carbon atoms that can be linked with at most one and four other atoms, respectively. Actually, these atomic units interact among themselves to produce molecules. Particular to this method, the basic unit with information is the C H -molecule. It is made of two or more atoms linked between each other in order to define a mathematical function φ centered in the carbon atom and parameterized with hydrogen-based values attached to it, as shown in Equation (1); where φ R represents the behavior of the C H -molecule, σ is a real value called the carbon value, H i C is the i-th hydrogen atom linked to the carbon atom, k represents the number of hydrogen atoms in the molecule and x is the input to that molecule [27,29,30].
φ ( x ) = σ i = 1 k 4 x - H i
If a C H -molecule is unsaturated (i.e., k < 4 ), then it can be joined together with other C H -molecules, forming chains of molecules, so-called artificial hydrocarbon compounds. In [29,30,31], the authors suggest using saturated and linear chains of molecules like in Equation (2); where C H k represents a C H -molecule with k hydrogen atoms associated with it, and the line symbol represents a simple bond between two molecules. Notice that outer molecules are C H 3 , while inner molecules are C H 2 .
C H 3 - C H 2 - - C H 2 - C H 3
Artificial hydrocarbon compounds also have associated a function ψ representing their behavior. For instance, the piecewise compound behavior [27] ψ R can be expressed as Equation (3); where L t represents the t-th bound that limits the action of a C H -molecule over the input space. In that sense, if the input domain is in the interval x [ L m i n , L m a x ] , then L 0 = L m i n and L n = L m a x , and the j-th C H -molecule acts over the interval [ L j - 1 , L j ] , for all j = 1 , . . . , n .
ψ ( x ) = φ 1 ( x ) L 0 x < L 1 φ n ( x ) L n - 1 x L n
To obtain the bounds L t for all t = 0 , . . . , n , a distance r between two adjacent bounds, i.e., [ L t - 1 , L t ] , is computed as in Equation (4); where r represents the intermolecular distance between two adjacent molecules. In addition, Δ r is computed using a gradient descent method based on the energy of the adjacent molecules ( E j - 1 and E j ) like in Equation (5), where 0 < η < 1 is a learning rate parameter [27,28,31]. For implementability, the energy of molecules can be computed using a loss function [27].
r = r + Δ r
Δ r = - η ( E t - 1 - E t )
At last, artificial hydrocarbon compounds can interact among themselves in definite ratios forming a mixture S R . For this method, weights are called stoichiometric coefficients, and they are represented as elements α i R , as shown in Equation (6); where c is the number of compounds in the mixture [27]. For this work, the artificial hydrocarbon networks structure considers one compound, such that, c = 1 and S ( x ) = ψ 1 ( x ) .
S ( x ) = i = 1 c α i ψ i ( x )
Formally, an artificial hydrocarbon network is a mixture of artificial hydrocarbon compounds (see Figure 1), each one obtained using a chemical-based metaheuristic rule. The training algorithm is known as the AHN-algorithm [28,29,30]; and for this work, the AHN-algorithm was reduced to Algorithm 1 for saturated and linear hydrocarbon compounds. Notice that Algorithm 1 reflects the restrictions about a saturated linear chain of molecules and a piecewise compound behavior imposed for this work. For a detailed description of the general AHN-algorithm, see [27]; and for the implementability, see [28]. In addition, a numerical example of training and testing AHN is summarized in Appendix A.
Algorithm 1 AHN algorithm for saturated and linear hydrocarbon compounds.
Input: a training dataset Σ = ( x , y ) , the number of molecules in the compound n 2 and a tolerance value ϵ > 0 .
Output: the saturated and linear hydrocarbon compound A H N .
Initialize an empty compound A H N = { } .
Create a new saturated linear compound C like Equation (2).
Randomly initialize intermolecular distances r t .
while | y - ψ | > ϵ do
 Determine all bounds L t of C using r t .
for each j-th molecule in C do
    Determine all parameters of behavior φ j in Equation (1) using an optimization process.
end-for
 Build the compound behavior ψ of C using Equation (3).
 Update intermolecular distances using Equations (4) and (5).
end-while
Update A H N with C and ψ.
return A H N

3.3. Characteristics of Artificial Hydrocarbon Networks

The artificial hydrocarbon networks algorithm has some characteristics that can be useful in regression and classification problems. In particular, for this work, both monitoring and noise tolerance tasks in human activity recognition are considered. Thus, several characteristics of AHN related to these tasks are discussed below:
  • Stability: This characteristic considers that the artificial hydrocarbon networks algorithm minimizes the changes in its output response when inputs are slightly changed [27]. This is the main characteristic that promotes using AHN as a supervised learning method.
  • Robustness: This characteristic implies that artificial hydrocarbon networks can deal with uncertain or noisy data. The literature reports that AHN can deal with noisy data as it filters information, e.g., AHN has been used in audio filtering [27,31]. Additionally, ensembles of artificial hydrocarbon networks and fuzzy inference systems can also deal with uncertain data, for example in intelligent control systems, like in [29,30].
  • Metadata: Molecular parameters like bounds, intermolecular distances and hydrogen values can be used as metadata to partially understand underlying information or to extract features. In [27], it reports that the artificial organic networks method packages information in several molecules that might be interpreted as subsystems in the overall domain. For example, these metadata have been used in facial recognition approaches [27].

4. Artificial Hydrocarbon Networks-Based Classifier for HAR Systems

From above, this work considers training and using an AHN classifier exploiting stability and robustness characteristics in the field of human activity recognition based on wearable sensors, with particular approaches in monitoring and noise tolerance. Previous work in this direction can be found in [21].
In this paper, we propose to build and train an artificial hydrocarbon network for a supervised learning classifier (AHN classifier) aiming to monitor human activities based on wearable sensor technologies. In fact, this AHN classifier is computed and employed in two steps: training-and-testing and implementation, as shown in Figure 2.
Currently, the AHN classifier considers that sensor data have already processed in N features x i for all i = 1 , . . . , N and have organized in Q samples, each one associated with its proper label y j representing the j-th activity in the set of all possible activities Y for j = 1 , . . . , J ; where J is the number of different activities in the dataset. Thus, samples are composed of features and labels as ( N + 1 ) -tuples of the form ( x 1 , . . . , x N , y j ) q for all q = 1 , . . . , Q .
Considering that there is a dataset of Q samples of the form defined above, then the AHN classifier is built and trained using the AHN algorithm shown in Algorithm 1. It is remarkable to say that this proposal is using a simplified version of artificial hydrocarbon networks. Thus, the AHN classifier is composed of one saturated and linear hydrocarbon compound, i.e., no mixtures were considered (see Figure 1 for a hydrocarbon compound reference). In that sense, the inputs of the AHN-algorithm are the following: the training dataset Σ is a subset of R samples, from the original dataset, as Equation (7); the number of molecules n in the hydrocarbon compound is proposed to be the number of different activities in the dataset ( n = J ); and the tolerance value ϵ is a small positive number selected manually. Notice that the number of molecules in the compound is an empirical value; thus, no pairing between classes and molecules occurs. At last, the AHN-algorithm will compute all parameters in the AHN classifier: hydrogen and carbon values, as well as the bounds of molecules.
Σ = x 1 , , x N , y j 1 x 1 , , x N , y j R
For testing and validating the AHN classifier, the remaining samples P from the original dataset (i.e., such that Q = P + R ) form the testing dataset. Then, the testing dataset is introduced to the AHN classifier, previously computed. Lastly, the validation of the classifier is calculated using some metrics (see Section 5). Moreover, new sample data can be also used in the AHN classifier for recognizing and monitoring a human activity based on the corresponding features.

5. Case Study: Physical Activity Monitoring Using Artificial Hydrocarbon Networks

In this section, a case study on physical activity monitoring is presented and described in order to prove the performance of the proposed AHN classifier in terms of both monitoring and noise tolerance tasks. In particular, this case study uses a public dataset, and it compares the performance of the AHN-algorithm among other well-known supervised classifiers in the field of human activity recognition. At last, several metrics for classification tasks are also described.

5.1. Dataset Description

This case study employs the public Physical Activity Monitoring Data Set (PAMAP2) [12,13], which consists on 3,850,505 samples of raw signals from inertial sensors. Those samples were collected from three sensors placed on nine 27-year average people (eight men and one woman), as shown in Figure 3. The subjects performed 18 different activities during intervals of 10 h. However, only eight hours are dedicated to the activities, and the remaining two hours are dedicated to rest and change from one activity to another. Notice that resting and transitional period activities were labeled with zero-value in this dataset. In particular to our case study, we eliminated these zero-labeled activities. Then, the 18 different activities in our modified dataset are summarized in Table 2.
Since the PAMAP2 dataset consists of several measurements from inertial sensors and a heart rate monitor, this case study only considers numerical features from inertial sensors. Each “Colibri” wireless sensor has a total of 17 features: one for temperature, three 3D-acceleration data in inertial measurement units (IMU) sampled at 100 Hz at the scale of Å ± 16 g (13-bits), three 3D-acceleration data (IMU) sampled at 100 Hz at the scale of Å ± 6 g (13-bits), three 3D-gyroscope data (rad/s), three 3D-magnetometer data (μT) and three orientation values. Furthermore, the timestamp was eliminated from the dataset, since it might cause overfitting in supervised classifiers.
To this end, the dataset for the case study is composed of the following samples: 10,200 training samples (600 random samples for each of the first twelve activities and 500 random samples for each of the other activities) and 5100 testing samples (300 random samples for each of the first twelve activities and 250 random samples for each of the remaining activities) chosen randomly from the original dataset. In both cases, samples with missing values were avoided. Notice that since random selection was done, samples in the training and testing sets are not time dependent.

5.2. Methodology for Building Supervised Models

In order to prove that our AHN classifier is very competitive for physical activity recognition in terms of performance and noise tolerance, we choose to compare fourteen supervised classifiers, and we conduct three experimental cases.
The supervised classifiers used are the following: stochastic gradient boosting (SGB), AdaBoost (AB), C4.5 decision trees (DT), rule-based classifier (RBC), support vector machines with linear kernel (SVM-L), support vector machines with basis function kernel (SVM-BF), random forest (RF), k-nearest neighbors (KNN), linear discriminant analysis (LDA), mixture discriminant analysis (MDA), multivariate adaptive regression splines (MARS), naive Bayes (NB), multilayer feedforward artificial neural networks (ANN) and nearest shrunken centroids (NSC). Selection criteria of these techniques are supported in the reviewed literature [2,9,10,11].
In addition, the methodology also considers a cross-validation technique (10 folds and five repetitions) for each classifier in order to build suitable supervised models. For this case study, the accuracy metric was employed within the cross-validation technique to select the best model for each classifier. Table 3 summarizes the configuration parameters for training these models, using the caret package in R. Notice that the configurations column represents the number of different configurations created automatically in the cross-validation technique before selecting a suitable classifier.
On the other hand, each stage of the activity recognition chain (ARC) described by Bulling et al. [32] (i.e., stages from data acquisition, signal preprocessing and segmentation, feature extraction and selection, training and classification) directly influences the overall recognition performance of an HAR system [32]. In particular, feature extraction and selection are common practice to improve the performance of most HAR systems. Hence, if bad design decisions are made, the processed dataset might contain redundant or irrelevant information [9]; the computational demand may unnecessarily increase and also reduce the accuracy of some classification methods [2]. Therefore, some authors choose to experiment with raw data for comparison and evaluation of the recognition performance of supervised and/or unsupervised machine learning techniques [32,33].
In our work, we choose to compare the following cases trying to minimize the influence of feature generation and extraction:
  • Case 1: This experiment occupies the raw dataset of the case study as the feature set in order to measure the classification, recognition and monitoring performance over physical activities of all supervised methods, as explained above [32,33].
  • Case 2: This experiment conducts a feature reduction over the feature set of the previous case, using the well-known recursive feature elimination (RFE) method [34,35]. Table 4 shows the ten retained features, and Figure 4 shows the accuracy curve of its validation. In fact, this experiment aims to compute human activity recognition with the minimal set of raw signals from the sensors’ channels, since minimizing the number of sensors and the usage of their channels is a challenging problem in HAR [9]. The features retained from the initial set of features by the automatic RFE method can apparently contain presumably redundant features (e.g., accelerometers 16 g and 6 g) or some variables that presumably can lead to overfitting. Regarding these two concerns, Guyon et al. [35] proved with simple examples that “noise reduction and consequently better separation may be obtained by adding variables that are presumably redundant” [35]. Thus, sometimes, variables that are apparently redundant, as in our case, can enhance the prediction power when they are combined. At last, the same measures of classification, recognition and monitoring performance were computed.
  • Case 3: This experiment evaluates noise tolerance in all supervised classifiers using noisy datasets. For instance, Zhu and Wu [36] describe different types of noise generation: in the input attributes and/or in the output class; in training data and/or in test data; in the most noise-sensitive attribute or in all attributes at once. Thus, we decided to generate noise only in some input feature values of some samples of the testing dataset. In order to add noise in a numeric attribute, the authors in [36] suggest selecting a random value that is between the maximal and the minimal values of the feature. For our experimentation, we first randomly removed some feature values using a 7%, 15% and 30% data selection in order to simulate missing values and then automatically replaced the null values with the mean of the related feature, as some data mining tools suggest [37]. In fact, this method can be considered as random noise insertion, given that generated missing data are replaced with a value. Notice that supervised models built for this experiment are the same classifiers as those built in the first experiment.
The overall methodology is shown in Figure 5. The experiments were executed in a computer Intel®Core™i5-2400 with CPU at 3.10 GHz and 16 GB-RAM over Windows 7 Pro, Service Pack 1 64-bit operating system.
To this end, we conduct another experiment using a majority voting across windows-based approach and the AHN classifier to simulate the data flow in a real HAR system and to validate the performance of the proposed classifier in that situation. In fact, we select the first 30 s of each activity carried out by all of the subjects as the testing set, using the same models obtained in the first experiment. Table 5 shows the activities performed by each subject for at least 30 s [12,13]. Then, we apply a fixed window of 2 . 5 s in size (i.e., 250 samples) without overlapping during the 30 s of each activity. Lastly, a majority voting strategy [32] was employed inside the window in order to finally output the recognized activity. For this experiment, we build the models with the same strategy as followed in the previous cases.

5.3. Metrics

This case study uses different metrics to evaluate the performance of the AHN classifier in comparison with the other supervised classifiers, such as: a c c u r a c y , s e n s i t i v i t y , p r e c i s i o n and F 1 s c o r e [38]. In addition, the metrics distinguish two ways of computation: macro-averaging (M) and micro-averaging (μ) [38]. The first one treats all classes equally, while the second one considers the size of each class. Thus, macro-averaging is important to measure the overall classification, and micro-averaging computes the performance of classifiers in a precise way. To this end, F 1 s c o r e was calculated using Equations (8) and (9) [38], respectively.
F 1 s c o r e M = 2 × p r e c i s i o n M × s e n s i t i v i t y M p r e c i s i o n M + s e n s i t i v i t y M
F 1 s c o r e μ = 2 × p r e c i s i o n μ × s e n s i t i v i t y μ p r e c i s i o n μ + s e n s i t i v i t y μ
Additionally, other metrics on the classifiers are computed as well: t r a i n i n g _ t i m e specifies the training time (in seconds) to build and train a model, and t e s t i n g _ t i m e specifies the evaluation time of an input sample (in milliseconds).

6. Experimental Results and Discussion

As said above, three experiments were conducted in order to evaluate the performance in both monitoring and noise tolerance tasks using an artificial hydrocarbon networks-based classifier in the context of the case study previously presented. In addition, a fourth experiment was conducted using a majority voting across windows-based strategy to simulate the data flow in a real HAR system and to validate the performance of the AHN classifier in that situation. Thus, this section presents and analyzes the comparative results obtained in this regard.

6.1. Comparative Analysis on Physical Activity Monitoring

To evaluate the performance on monitoring physical activities using the AHN classifier, two experiments were conducted. The first experiment considers the complete dataset of the case study, and the second experiment occupies the reduced dataset using the RFE technique (see Section 5). Table 6 and Table 7 show comparative results (sorted in descending order by accuracy) of the supervised classifiers in terms of the metrics already defined above.
In both cases, the AHN classifier ranks over the mean accuracy, and it is positioned in the first quartile of the evaluated classifiers. Using the complete dataset, the AHN classifier is placed close to decision tree (first place), rule-based (second place) and support vector machine (fourth and fifth places) -based classifiers, as seen in Table 6. In addition, Table 7 shows that the AHN classifier is placed close to stochastic gradient boosting (first place), AdaBoost (third place), random forest (fourth place) and rule-based (fifth place) classifiers.
For instance, the decision tree-based classifier (the best ranked method in Table 6) is 0.52% over the AHN classifier based on the accuracy, and in terms of F 1 - score μ , the decision tree-based classifier is 0.33% over the AHN classifier. Using the same comparison, Table 7 shows that stochastic gradient boosting (the best ranked method) is 1.5% and 0.86% over the AHN classifier based on accuracy and F 1 - score μ , respectively.
Comparing Table 6 and Table 7, the performance of the methods is modified. For example, the decision tree-based classifier goes down 3.12% in accuracy and 1.35% in F 1 - score μ ; while stochastic gradient boosting goes up 1.77% in accuracy and 0.57% in F 1 - score μ . In this regard, the AHN classifier goes down 0.89% in accuracy and 0.46% in F 1 - score μ . These comparisons give some insights about the robustness of the AHN classifier in contrast to the other two methods that were ranked in first place in any of the complete or reduced datasets.

6.2. Comparative Analysis on Supervised Model Performance under Noisy Data

A third experiment was conducted in order to measure the noise tolerance of the selected supervised classifiers. In this case, three noisy datasets (7%, 15% and 30% randomly corrupted) were used (see Section 5). Table 8, Table 9 and Table 10 show the overall results, sorted in descending order by accuracy, of this experiment.
In 7% noisy data, the AHN classifier ranks over the mean accuracy, and it is positioned in the first quartile of the evaluated classifiers. The proposed classifier is placed close to random forest (first place), stochastic gradient boosting (second place), rule-based (fourth place) and decision tree (fifth place) -based classifiers. In terms of the accuracy, the random forest-based classifier is 1.31% over the AHN classifier; while it is 0.71% over the AHN classifier in terms of F 1 - score μ .
In 15% and 30% noisy data (Table 9 and Table 10), the AHN classifier also ranks over the mean accuracy, and it is positioned in the first quartile of the evaluated classifiers. In both experiments, the AHN classifier is very close to naive Bayes, k-nearest neighbors, SVM with radial basis function kernel and stochastic gradient boosting. In the 15% noisy dataset, the AHN classifier is ranked at the top of the table; while in the 30% noisy data, it is ranked 0.14% under the naive Bayes-based classifier.

6.3. Comparative Analysis on the Majority Voting Across Windows-Based Strategy

As already mentioned in Section 5, a majority voting across windows-based approach was also conducted to validate the performance of the AHN classifier in a simulated data flow that can be found in a real HAR system.
At first, we extracted the first 30 s of each activity carried out by each of the subjects (see Table 5), and we validated that our AHN classifier, as well as the other supervised models are able to classify human activities correctly. Table 11 reports the performance results of all methods, sorted in descending order by accuracy. In contrast with Table 6, it can be seen that the AHN classifier is stable in both circumstances with small ( 0.9829 in accuracy) and large ( 0.9845 in accuracy) testing sets. Furthermore, the other top methods (i.e., random forest, rule-based classifier, SVM, decision tree and stochastic gradient boosting) are consistent in both experiments. In addition, Table 12 shows the confusion matrix of the AHN classifier.
Then, a fixed window of 2.5 s was applied to the sequential data, and a majority voting strategy was computed within the window. The results of the AHN classifier, as well as the other fourteen methods are reported in Table 13, sorted in descending order by accuracy. Notice that the AHN classifier, as well as rule-based classifier, decision trees, random forest, stochastic gradient boosting and k-nearest neighbors have 100 % accuracy. In particular, the confusion matrix of the AHN classifier is presented in Table 14. The values of this matrix correspond to the number of windows for each activity performed by the related subjects. In contrast with the confusion matrix of Table 12, the majority voting across windows-based approach improves the performance of the sample-based experiment. This can be explained because the latter has less false positive than true positive values for each activity. To this end, an overall perspective of the learning performance in the proposed classifier can be seen in Figure 6, which shows the learning curve of the AHN classifier for this experiment.

6.4. Discussion

From the first two experiments, the artificial hydrocarbon networks-based classifier showed good performance in terms of accuracy and F 1 - score μ in comparison with the other 14 supervised methods of classification. In that sense, the AHN classifier can achieve physical activity monitoring tasks.
Besides, Table 15, Table 16 and Table 17 show the confusion matrices of the AHN classifier using the 7%, 15% and 30% noisy datasets, respectively. As shown, the confusion matrices present a few mistaken classifications, most of them close to the diagonal. This behavior can be explained by the nature of the method. For instance, the nature of artificial hydrocarbon networks is mainly for regression tasks; then, classification problems are converted into a regression problem using numeric labels as data values for approximation. In that sense, similar numeric labels are the cause of misclassification. To this end, this misclassification behavior is completely related to the nature of the method and not in terms of the nature of physical activities.
On the other hand, large values in the confusion matrix are also analyzed. For instance, ascending stairs, cycling and walking are confused with Nordic walking; also, computer work is confused with watching TV. The human performances of these activities are closely related; thus, the performance of the AHN classifier is related to the nature of the physical activity. To this end, notice that confusion matrices correspond to the AHN classifier performance when data from sensors are corrupted, and as a result, it is more difficult to handle physical activity monitoring for the methods. From Table 8, Table 9 and Table 10, it is shown that the AHN classifier has a suitable performance in contrast with the other methods.
From the above experimental results, all methods have advantages and weaknesses. In that sense, the overall performance of the supervised classifiers is also inspected. For instance, Table 18 shows the overall performance of the classifiers in terms of the accuracy metric, and Table 19 summarizes the overall results in terms of the F 1 - score μ . The first three experiments are concentrated in these tables. In order to preserve a more confident analysis, results from the 7% noisy dataset are only considered here. The mean ( x ^ ) and the standard deviation (σ) of both metrics were computed. The tables are sorted in descending order by the mean values of the metric, concluding that the artificial hydrocarbon networks-based classifier is ranked in second position in both accuracy and F 1 - score μ metrics.
Since the accuracy measures the overall classification performance (Table 18), the AHN classifier is very competitive for physical activity monitoring ( x ^ = 0 . 9756 ) because the method is close to the best stochastic gradient boosting ranked method ( x ^ = 0 . 9782 ), representing a relative gap of 0.27%. In addition, the AHN classifier does not only performed well in monitoring, it also shows the smallest standard deviation ( σ = 0 . 0055 ) in comparison with the other methods, proving that the AHN classifier is very robust instead of different datasets (complete, reduced and noisy), as shown in Figure 7.
The same analysis can be done using the information from Table 19 in which the F 1 - score μ is compared. Since the F 1 - score μ measures the tradeoff between sensitivity and precision evaluations in unbalanced classes, then the AHN classifier is also suitable for physical activity monitoring represented by the x ^ = 0 . 9871 . This mean value is close to the best random forest ranked method, which obtained x ^ = 0 . 9895 , representing a relative gap of 0.24%. Using the F 1 - score μ , the AHN classifier also showed suitable robustness to different datasets (complete, reduced and noisy), obtaining σ = 0 . 0029 , which ranks it in the second position below the random forest-based classifier, as depicted in Figure 8.
To this end, the AHN classifier is positioned close to the following classifiers in terms of monitoring task performance and noise tolerance (see Table 18 and Table 19) and robustness (see Figure 7 and Figure 8): Stochastic gradient boosting, random forest, rule-based classifier, decision trees and artificial neural networks.
A closer look at the results over the noisy datasets is summarized in Table 20. The mean and the standard deviation of accuracy and F 1 - score μ were calculated. As shown, the AHN classifier is ranked at the top of the table with 93.43% of accuracy and 96.97% of F 1 - score μ on average. In terms of standard deviation, the AHN classifier is the second best classifier in accuracy over the nearest shrunken centroids; and it is the best classifier in F 1 - score μ . The above results conclude that the AHN classifier is tolerant to different ratios of noise in raw data sensors.
On the other hand, the above benchmark is closely related to the literature. An overall look into Table 18 and Table 19 shows that boosting and bagging methods (e.g., stochastic gradient boosting, AdaBoost and random forest) are positioned over discriminant analysis methods (e.g., linear and mixture), and those are over instance-based classifiers (e.g., k-nearest neighbors and nearest shrunken centroids), as noted in [25]. Furthermore, artificial neural networks are placed over discriminant analysis and instance-based methods, as suggested in [25]. In terms of noise tolerance, instance-based classifiers are easily altered by exclusion of single noisy data, as mentioned in [8]; this can be explained by the low positions of these methods observed in the experimental results. Additionally, decision trees obtained good performance in the benchmark (Table 8), which is correlated with the tolerance characteristic detected in [8], which assumes that decision trees trained with noisy data are more tolerant than when the method is trained with filtered data and then test data are corrupted with noise. With respect to support vector machines, the methods occupied in this benchmark obtained between medium to poor performance (see Table 8), which can be explained, since SVMs are easily altered by the exclusion of noisy data, as suggested in [8].
In fact, the above results were computed with raw sensor signals as features in order to minimize the influence of the feature extraction typically done in HAR. Hence, the accuracy in several methods is ranked high. Other factors that influence the high levels of accuracy are the cross-validation process and the selection of the best model based on the latter. In contrast to the single-based approach, a fourth experiment was conducted using a majority voting across windows-based approach. As noted, the proposed AHN classifier is improved in terms of accuracy (100%), since calculating a majority voting value per window increases the probability to predict activities well, as expected [32]. In addition, other methods can also reach that accuracy in the same way.
To this end, Table 21 summarizes the training time (measured in seconds) that classifiers take to build and train a model and the testing time (measured in milliseconds) that they take to compute a classification of one sample. As shown, the AHN classifier has the longest training times in both the complete (72.61 s) and the reduced (61.53 s) datasets; while it is the third worst classifier in terms of testing times in both the complete (1.71 ms) and the reduced (0.92 ms) datasets.
Finally, from the comparative study of the three experiments run in this benchmark, the majority voting across windows strategy and comparing the results obtained with the literature, it is evident that artificial hydrocarbon networks-based classifiers are: (i) suitable for physical activity monitoring; (ii) noise tolerant of corrupted data sensors; (iii) robust in terms of different issues for data sensors; and (iv) useful for simulated data flow classification; proving that AHN classifiers are suitable in the field of human activity recognition.

7. Conclusions and Future Work

Automated human activity recognition is a challenging task. Particularly in sensor-based approaches, these present several drawbacks, such as: the intensive periods of time for wearing sensors, typical battery issues and the presence of noise in data due to sensor errors or noisy environments. Thus, robust machine learning techniques are required in human activity recognition.
In that sense, this paper presents a novel supervised machine learning method called artificial hydrocarbon networks for human activity recognition. In fact, experimental results over a public physical activity monitoring raw dataset proved that the artificial hydrocarbon networks-based classifier is suitable for human activity recognition when compared to the other fourteen well-known supervised classifiers. In particular, the overall classification performance was measured in terms of accuracy ( x ^ = 0 . 9756 ) and micro-averaging F 1 - score μ ( x ^ = 0 . 9871 ), while robustness was analyzed in terms of the standard deviation of accuracy ( σ = 0 . 0055 ) and micro-averaging F 1 - score μ ( σ = 0 . 0029 ) over three different experiments, concluding that the AHN classifier is robust for different data (complete, reduced and noisy) profiles. To this end, experimental results in noisy data also confirm that the AHN classifier is noise tolerant of corrupted raw data sensors (i.e., 7%, 15% and 30% noise level), achieving 93.43% in accuracy and 96.97% in F 1 - score μ . Moreover, when using a majority voting across windows-based approach, the AHN classifier is able to provide an accuracy of 100%, validating that it is useful for simulated data flow classification.
For future work, we must address two important challenges in order to prove that our AHN classifier is very well suited for human activity recognition. One important challenge for an activity recognition classifier is to determine if it is sufficiently flexible to deal with inter-person and intra-person differences in the activities’ performance. People can perform the same activity differently if they are in various times and situations (e.g., day or night, energetic or tired, etc.). Similarly, there is great variability in the performance of an activity depending on the person characteristics, such as age, weight, gender, health conditions, etc. [9]. The second challenge is to determine if AHN is capable of finding the most informative and discriminative features with the goal of developing a real-time HAR system to classify as many activities as possible with good performance. To this end, we will also revise the artificial hydrocarbon networks algorithm in order to improve the training time and make it more competitive with respect to the other methods.

Author Contributions

Hiram Ponce, María de Lourdes Martínez-Villaseñor and Luis Miralles-Pechuán conceived of and designed the experiments. Luis Miralles-Pechuán and Hiram Ponce ran the experiments. Hiram Ponce and María de Lourdes Martínez-Villaseñor analyzed the data. Hiram Ponce, María de Lourdes Martínez-Villaseñor and Luis Miralles-Pechuán wrote the paper.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Artificial Hydrocarbon Networks: A Numerical Example

This section aims to show training and testing procedures in artificial hydrocarbon networks (AHN) for classification purposes. To this end, a numerical example with a general purpose was elected.

Appendix A.1. Training Step

Consider that there is a dataset of 20 samples with three features and one label, as shown in Table A1. If an artificial hydrocarbon network model is required, then the training process will be as follows: (i) define the training set; (ii) determine configuration parameters; (iii) run Algorithm 1; and (iv) obtain the AHN-model.
Table A1. Dataset used in the numerical example.
Table A1. Dataset used in the numerical example.
No. Sample x 1 x 2 x 3 yNo. Sample x 1 x 2 x 3 y
14.32133.62215.00021116.90394.97603.36882
24.21412.39126.83211127.36752.34513.57822
34.38036.71156.86181137.47274.6783−4.38582
43.66463.59006.48061147.78665.01083.48292
53.26876.66526.32681156.98586.85100.70492
63.61564.54136.68951166.883015.7787−3.00573
73.34365.93036.46201177.527117.8117−3.19383
84.49323.37615.49431187.799216.9416−2.34853
94.87782.09185.54661196.977316.6224−2.23073
104.91025.11025.41661207.052917.09540.40503

Appendix A.1.1. Define the Training Set

For this particular example, the training set is defined to be 50% of the original dataset (Table A1), and the remaining 50% will be considered for testing. Random selection is applied. For example, the following samples are part of the training set: { 1 , 5 , 6 , 7 , 13 , 14 , 15 , 16 , 18 , 19 } .

Appendix A.1.2. Determine Configuration Parameters

Three configuration parameters are required for training an AHN-model: the number of molecules n in the hydrocarbon compound, the learning rate η and the tolerance value ϵ > 0 . For this case, the selected values are: n = 3 , η = 0 . 1 and ϵ = 0 . 0001 .

Appendix A.1.3. Run AHN-Algorithm

Next, Algorithm 1 is computed. Following the algorithm, the first step is to initialize an empty hydrocarbon compound A H N = { } . Then, a saturated compound C is created using the number of molecules n and Equation (2), i.e., C = C H 3 - C H 2 - C H 3 . In fact, the latter means that the first molecule has three hydrogen values; the second one has two hydrogen values; and the third one has three hydrogen values. Then, three intermolecular distances r t for t = 1 , . . . , n are randomly created. For this example, Table A2 shows the initial intermolecular distances. Notice that each intermolecular distance is a vector in the feature space, such that r t = { r 1 , r 2 , r 3 } t .
Table A2. Summary of the first update of intermolecular distances.
Table A2. Summary of the first update of intermolecular distances.
t r t ( i = 0 ) L t ( i = 0 ) r t ( i = 1 )
1(−0.04, 0.33, −0.01)(3.27, 3.23, 3.56, 3.54)(0.11, 0.48, 0.14)
2(0.36, 0.53, 0.71)(3.62, 3.98, 4.51, 5.22)(0.16, 0.33, 0.51)
3(0.06, 0.14, 0.15)(−4.39, −4.32, −4.19, −4.04)(0.21, 0.29, 0.30)
Then, a loop starts until a tolerance criterion is true. Inside this loop, the set of bounds L t is computed using Equation (A1); where L 0 is the minimum value of each feature in the training set, i.e., L 0 = ( 3 . 2687 , 3 . 6221 , - 4 . 3858 ) . Table A2 shows the first iteration of bounds.
L t = L t - 1 + r t
Then, these bounds define a subset of samples for each molecule. In fact, a subset of samples is used to compute the hydrogen H i and carbon σ values in the specific molecule, using Equation (1) and an optimization process. In this work, the least square estimates were used as suggested in [27]. Using these parameters, the compound C is built using Equation (3). A prediction with this compound is done in order to calculate the energy of molecules. In this example, the mean squared error is employed. It is remarkable to say that the rounding function was employed at the output of the predicted values. Lastly, the updated values of intermolecular distances are computed using Equations (4) and (5). For this example, assume that the energy values of the molecules are E 1 = 1 . 5 , E 2 = - 0 . 5 , E 3 = 1 . 0 with a steady state E 0 = 0 . 0 ; then, the intermolecular distance differences are Δ r 1 = 0 . 15 , Δ r 2 = - 0 . 20 , Δ r 3 = 0 . 15 , and the updated intermolecular distances are those summarized in Table A2.
Once the loop stops, the artificial hydrocarbon network A H N is completed with the following information: the set of C H -molecules, the hydrogen and carbon values of each molecule and the complete set of bounds, such that A H N = { C , ψ ( x ) } . Table A3 summarizes the parameters of the resultant AHN-model.
Table A3. Parameters obtained after training the AHN-model.
Table A3. Parameters obtained after training the AHN-model.
ParametersValues
H i (0.0, 0.0, 10.94; 0.0, 0.0, 26.99; 0.0, 0.0, 0.0)
(0.0, 0.0; 0.0, 16.02)
(0.0, 0.0, 10.94; 0.0, 0.0, 26.99; 0.0, 0.0, 0.0)
σ(1, 1, 0)
L(3.27, 3.38, 3.86, 3.99)
(3.62, 3.78, 4.11, 4.62)
(−4.39, −4.17, −3.89, −3.59)
Notice that the order of molecules is defined by the algorithm; but once the AHN-model is trained, the order has to remain constant when testing.

Appendix A.2. Testing Step

Once the AHN-model is trained, the testing step considers validating the output predictions of the classifier. In that sense, the testing set is required. Following with this example, the testing set is composed of the samples: { 2 , 3 , 4 , 8 , 9 , 10 , 11 , 12 , 17 , 20 } . Then, the functional ψ ( x ) with parameters equal to the ones as shown in Table A3 is used. The inputs of this function are the features of the testing set. For instance, consider the first sample in the testing set x = ( 4 . 2141 , 2 . 3912 , 6 . 8321 ) . This value is tested in ψ ( x ) , which calculated the value ψ ( x ) = 1 . As noted, the result is the same as the label. Table A4 shows a comparison between the predicted values using the AHN classifier and the target values. For an extended description of training and testing artificial hydrocarbon networks, see [27,28].
Table A4. Comparison between the predicted values y A H N and the target values y.
Table A4. Comparison between the predicted values y A H N and the target values y.
No Sampley y A H N
211
311
411
811
911
1011
1122
1222
1733
2033

References

  1. Avci, A.; Bosch, S.; Marin-Perianu, M.; Marin-Perianu, R.; Havinga, P. Activity recognition using inertial sensing for healthcare, wellbeing and sports applications: A survey. In Proceedings of the 23rd International Conference on Architecture of Computing Systems, Hannover, Germany, 22–25 February 2010; pp. 1–10.
  2. Preece, S.J.; Goulermas, J.Y.; Kenney, L.P.; Howard, D.; Meijer, K.; Crompton, R. Activity identification using body-mounted sensors-a review of classification techniques. Physiol. Meas. 2009, 30, 1–33. [Google Scholar] [CrossRef] [PubMed]
  3. Guidoux, R.; Duclos, M.; Fleury, G.; Lacomme, P.; Lamaudiere, N.; Maneng, P.; Paris, L.; Ren, L.; Rousset, S. A Smartphone-driven methodology for estimating physical activities and energy expenditure in free livng conditions. J. Biomed. Inform. 2014, 52, 271–278. [Google Scholar] [CrossRef] [PubMed]
  4. Chen, L.; Hoey, J.; Nugent, C.D.; Cook, D.J.; Yu, Z. Sensor-based activity recognition. IEEE Trans. Syst. Man. Cybern. C Appl. Rev. 2012, 42, 790–808. [Google Scholar] [CrossRef]
  5. Ugulino, W.; Cardador, D.; Vega, K.; Velloso, E.; Milidiú, R.; Fuks, H. Wearable computing: Accelerometers’ data classification of body postures and movements. In Advances in Artificial Intelligence—SBIA 2012; Springer: Berlin, Germany; Heidelberg, Germany, 2012; pp. 52–61. [Google Scholar]
  6. Reyes, J. Smartphone-Based Human Activity Recognition; Springer Theses; Springer: Cham, Switzerland, 2015. [Google Scholar]
  7. Bouarfa, L.; Jonker, P.; Dankelman, J. Discovery of high-level tasks in the operating room. J. Biomed. Inform. 2011, 44, 455–462. [Google Scholar] [CrossRef] [PubMed]
  8. Nettleton, D.F.; Orriols-Puig, A.; Fornells, A. A study of the effect of different types of noise on the precision of supervised learning techniques. Artif. Intell. Rev. 2010, 33, 275–306. [Google Scholar] [CrossRef]
  9. Lara, O.D.; Labrador, M.A. A survey on human activity recognition using wearable sensors. IEEE Commun. Surv. Tutor. 2013, 15, 1192–1209. [Google Scholar] [CrossRef]
  10. Roggen, D.; Calatroni, A.; Rossi, M.; Holleczek, T.; Förster, K.; Tröster, G.; Lukowicz, P.; Bannach, D.; Pirkl, G.; Ferscha, A.; et al. Collecting complex activity datasets in highly rich networked sensor environments. In Proceedings of the IEEE Seventh International Conference on Networked Sensing Systems, Kassel, Germany, 15–18 June 2010; pp. 233–240.
  11. Dohnálek, P.; Gajdoš, P.; Moravec, P.; Peterek, T.; SnáŠel, V. Application and comparison of modified classifiers for human activity recognition. Prz. Elektrotech. 2013, 89, 55–58. [Google Scholar]
  12. Reiss, A.; Stricker, D. Introducing a new benchmarked dataset for activity monitoring. In Proceedings of the IEEE 16th International Symposium on Wearable Computers (ISWC), Newcastle, UK, 18–22 June 2012; pp. 108–109.
  13. Reiss, A.; Stricker, D. PAMAP2 physical activity monitoring monitoring data set. In Dataset from the Department Augmented Vision, DFKI, Saarbrücken, Germany, August 2012.
  14. Gordon, D.; Schmidtke, H.; Beigl, M. Introducing new sensors for activity recognition. In Proceedings of the Workshop on How to Do Good Research in Activity Recognition at the 8th International Conference on Pervasive Computing, Helsinki, Finland, 17 May 2010; pp. 1–4.
  15. Ravi, N.; Dandekar, N.; Mysore, P.; Littman, M.L. Activity recognition from accelerometer data. In Proceedings of the 17th Conference on Innovative Applications of Artificial Intelligence (IAAI), Pittsburgh, PA, USA, 9–13 July 2005; AAAI Press: Pittsburgh, PA, USA, 2005; Volume 3, pp. 1541–1546. [Google Scholar]
  16. Mannini, A.; Sabatini, A.M. Machine learning methods for classifying human physical activity from on-body accelerometers. Sensors 2010, 10, 1154–1175. [Google Scholar] [CrossRef] [PubMed]
  17. Sagha, H.; Digumarti, S.T.; Millán, J.D.R.; Chavarriaga, R.; Calatroni, A.; Roggen, D.; Tröster, G. Benchmarking classification techniques using the opportunity human activity dataset. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, Anchorage, AK, USA, 9–12 October 2011; IEEE: Anchorage, AK, USA, 2011; pp. 36–40. [Google Scholar]
  18. Shoaib, M.; Bosch, S.; Incel, O.D.; Scholten, H.; Havinga, P.J. A survey of online activity recognition using mobile phones. Sensors 2015, 15, 2059–2085. [Google Scholar] [CrossRef] [PubMed]
  19. Vishwakarma, S.; Agrawal, A. A survey on activity recognition and behavior understanding in video surveillance. Vis. Comput. 2013, 29, 983–1009. [Google Scholar] [CrossRef]
  20. Chavarriaga, R.; Roggen, D.; Ferscha, A. Workshop on Robust Machine Learning Techniques for Human Activity Recognition; IEEE: Anchorage, AK, USA, 2011. [Google Scholar]
  21. Ponce, H.; Martinez-Villaseñor, L.; Miralles-Pechuan, L. Comparative Analysis of Artificial Hydrocarbon Networks and Data-Driven Approaches for Human Activity Recognition, Lecture Notes in Computer Science; Springer: Berlin, Germany; Heidelberg, Germany, 2015; Volume 9454, pp. 150–161. [Google Scholar]
  22. Lara, O. On the Automatic Recognition of Human Activities Using Heterogeneous Wearable Sensors. Ph.D. Thesis, University of South California, Los Angeles, CA, USA, 2012. [Google Scholar]
  23. Luštrek, M.; Kaluža, B. Fall detection and activity recognition with machine learning. Informatica 2009, 33, 205–212. [Google Scholar]
  24. Ross, R.; Kelleher, J. A Comparative Study of the Effect of Sensor Noise on Activity Recognition Models. In Evolving Ambient Intelligence; Springer: Berlin/Heidelberg, Germany, 2013; pp. 151–162. [Google Scholar]
  25. Chavarriaga, R.; Sagha, H.; Calatroni, A.; Digumarti, S.T.; Tröster, G.; Millán, J.D.R.; Roggen, D. The Opportunity challenge: A benchmark database for on-body sensor-based activity recognition. Pattern Recognit. Lett. 2013, 34, 2033–2042. [Google Scholar] [CrossRef]
  26. Ponce, H.; Ayala-Solares, R. The Power of Natural Inspiration in Control Systems; Studies in Systems, Decision and Control; Springer: Berlin/Heidelberg, Germany, 2016; Volume 40, pp. 1–10. [Google Scholar]
  27. Ponce, H.; Ponce, P.; Molina, A. Artificial Organic Networks: Artificial Intelligence Based on Carbon Networks; Studies in Computational Intelligence; Springer: Berlin/Heidelberg, Germany, 2014; Volume 521. [Google Scholar]
  28. Ponce, H.; Ponce, P.; Molina, A. The Development of an Artificial Organic Networks Toolkit for LabVIEW. J. Comput. Chem. 2015, 36, 478–492. [Google Scholar] [CrossRef] [PubMed]
  29. Ponce, H.; Ponce, P.; Molina, A. A Novel Robust Liquid Level Controller for Coupled-Tanks Systems Using Artificial Hydrocarbon Networks. Expert Syst. Appl. 2015, 42, 8858–8867. [Google Scholar] [CrossRef]
  30. Ponce, H.; Ponce, P.; Molina, A. Artificial Hydrocarbon Networks Fuzzy Inference System. Math. Probl. Eng. 2013, 2013, 1–13. [Google Scholar] [CrossRef]
  31. Ponce, H.; Ponce, P.; Molina, A. Adaptive Noise Filtering Based on Artificial Hydrocarbon Networks: An Application to Audio Signals. Expert Syst. Appl. 2014, 41, 6512–6523. [Google Scholar] [CrossRef]
  32. Bulling, A.; Blanke, U.; Schiele, B. A tutorial on human activity recognition using body-worn inertial sensors. ACM Comput. Surv. 2014, 46, 1–33. [Google Scholar] [CrossRef]
  33. Attal, F.; Mohammed, S.; Dedabrishvili, M.; Chamroukhi, F.; Oukhellou, L.; Amirat, Y. Physical Human Activity Recognition Using Wearable Sensors. Sensors 2015, 15, 31314–31338. [Google Scholar] [CrossRef] [PubMed]
  34. Guyon, I.; Weston, J.; Barnhill, S.; Vapnik, V. Gene selection for cancer classification using support vector machines. Mach. Learn. 2002, 46, 389–422. [Google Scholar] [CrossRef]
  35. Guyon, I.; Elisseeff, A. An introduction to variable and feature selection. J. Mach. Learn. Res. 2003, 3, 1157–1182. [Google Scholar]
  36. Zhu, X.; Wu, X. Class Noise vs. Attribute Noise: A quantitative Study. Artif. Intell. Rev. 2004, 22, 177–210. [Google Scholar] [CrossRef]
  37. Hall, M.; Frank, E.; Holmes, G.; Pfahringer, B.; Reutemann, P.; Witten, I.H. The WEKA Data Mining Software: An Update. SIGKDD Explor. 2009, 11, 10–18. [Google Scholar] [CrossRef]
  38. Sokolova, M.; Lapalme, G. A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 2009, 45, 427–437. [Google Scholar] [CrossRef]
Figure 1. Structure of an artificial hydrocarbon network using saturated and linear chains of molecules [29]. For this work, the topology of the proposed classifier considers just one hydrocarbon compound (see Section 4).
Figure 1. Structure of an artificial hydrocarbon network using saturated and linear chains of molecules [29]. For this work, the topology of the proposed classifier considers just one hydrocarbon compound (see Section 4).
Sensors 16 01033 g001
Figure 2. Diagram of the proposed artificial hydrocarbon network-based classifier (AHN classifier). First, data from sensors and activity labeling are used for training the AHN-model, then it is used as the AHN classifier in the testing step.
Figure 2. Diagram of the proposed artificial hydrocarbon network-based classifier (AHN classifier). First, data from sensors and activity labeling are used for training the AHN-model, then it is used as the AHN classifier in the testing step.
Sensors 16 01033 g002
Figure 3. Location of the three wearable sensors used in the dataset (hand, chest and ankle), adapted from [21].
Figure 3. Location of the three wearable sensors used in the dataset (hand, chest and ankle), adapted from [21].
Sensors 16 01033 g003
Figure 4. Average accuracy of models with respect to the number of features selected using RFE.
Figure 4. Average accuracy of models with respect to the number of features selected using RFE.
Sensors 16 01033 g004
Figure 5. Methodology of experimentation.
Figure 5. Methodology of experimentation.
Sensors 16 01033 g005
Figure 6. Learning curve of the AHN classifier for the windowing-based approach.
Figure 6. Learning curve of the AHN classifier for the windowing-based approach.
Sensors 16 01033 g006
Figure 7. Overall variability of supervised classifiers present in the experiments with respect to the accuracy metric.
Figure 7. Overall variability of supervised classifiers present in the experiments with respect to the accuracy metric.
Sensors 16 01033 g007
Figure 8. Overall variability of supervised classifiers present in the experiments with respect to the F 1 - score μ metric.
Figure 8. Overall variability of supervised classifiers present in the experiments with respect to the F 1 - score μ metric.
Sensors 16 01033 g008
Table 1. Framework of artificial organic networks.
Table 1. Framework of artificial organic networks.
Framework LevelDescription
implementationtraining and inference
mathematical modelstructure and functionality
chemical heuristic rulesfirst molecules, then compounds, lastly mixtures
interactionsbonds in atoms and molecules, relations in compounds
types of componentsatoms, molecules, compounds, mixtures
Table 2. Physical activities identified in this case study, adapted from [13].
Table 2. Physical activities identified in this case study, adapted from [13].
No.Performed ActivitiesActivity Description
1LyingThis movement is lying flat, slightly changing position or stretching a little bit.
2SittingRefers to sitting in a chair in any posture. It also includes more comfortable positions as leaning or crossing your legs.
3StandingThis position includes the natural movements of a person who is standing, swaying slightly, gesturing or talking.
4WalkingThis activity is a stroll down the street at a moderate speed of approximately 5 km/h.
5RunningThe people who made this activity ran at a moderate speed; taking into account non-high level athletes.
6CyclingA bicycle was used for this movement, and people pedaled as on a quiet ride. An activity requiring great effort was not requested.
7Nordic walkingFor this activity, it was required that persons that were inexperienced walked on asphalt using pads.
8Watching TVThis position includes the typical movements of someone who is watching TV and changes the channel, lying on one side or stretching his or her legs.
9Computer workThe typical movements of someone who works with a computer: mouse movement, movement of neck, etc.
10Car drivingAll movements necessary to move from the office to the house for testing sensors.
11Ascending stairsDuring this activity, the necessary movements up to a distance of five floors were recorded; from the ground floor to the fifth floor.
12Descending stairsThis movement is the opposite of the former. Instead of climbing the stairs, the activity of descending them was recorded.
13Vacuum cleaningRefers to all of the activities necessary to clean a floor of the office. It also includes moving objects, such as rugs, chairs and wardrobes.
14IroningIt covers the necessary movements to iron a shirt or a t-shirt.
15Folding laundryIt consists of folding clothes, such as shirts, pants and socks.
16House cleaningThese are the movements that a person makes while cleaning a house; such as moving chairs to clean the floor, throwing things away, bending over to pick up something, etc.
17Playing soccerIn this activity, individuals are negotiating, running the ball, shooting a goal or trying to stop the ball from the goal.
18Rope jumpingThere are people who prefer to jump with both feet together, and there are others who prefer to move one foot first and then the other.
Table 3. Configuration parameters of supervised models employed with the caret package in R.
Table 3. Configuration parameters of supervised models employed with the caret package in R.
No.Method NameParametersCase 1 & Case 3Case 2Configurations
1AdaBoostsize, decay, bag(150, 3, 3)(150, 3, 3)27
2Artificial Hydrocarbon Networksmolecules, eta, epsilon(18, 0.1, 0.0001)(18, 0.1, 0.0001)1
3C4.5 Decision Treesc(0.25)(0.25)1
4k-Nearest Neighborskmax, distance, kernel(9, 2, 1)(9, 2, 1)3
5Linear Discriminant Analysis1
6Mixtures Discriminant Analysissubclasses(4)(4)3
7Multivariate Adaptive Regression Splinesdegree(1)(1)1
8Naive Bayesfl, use_kernel(0, true)(0, true)2
9Nearest Shrunken Centroidsthreshold(2.512)(1.363)3
10Artificial Neural Networkssize, decay(5, 0)(5, 0)9
11Random Forestmtry(26)(6)3
12Rule-Based Classifierthreshold, pruned(0.25, true)(0.25, true)1
13Stochastic Gradient Boostingn.trees, depth, shrinkage(150, 3, 0.1)(150, 2, 0.1)9
14SVM with Linear Kernelc(1)(1)1
15SVM with Radial Basis Function Kernelsigma, c(0.0179, 1)(0.0748, 1)3
Table 4. Retained features using the recursive feature elimination (RFE) technique.
Table 4. Retained features using the recursive feature elimination (RFE) technique.
SensorFeatures SelectedFeature Number
handtemperature4
chesttemperature21
z-axis 3D-accelerometer 16 g24
z-axis 3D-accelerometer 6 g27
y-axis 3D-magnetometer32
z-axis 3D-magnetometer33
ankletemperature38
x-axis 3D-magnetometer48
z-axis 3D-magnetometer50
first orientation51
Table 5. Summary of activities performed by each subject for at least 30 s.
Table 5. Summary of activities performed by each subject for at least 30 s.
ActivitiesSub 1Sub 2Sub 3Sub 4Sub 5Sub 6Sub 7Sub 8Sub 9Total
Lying--------8
Sitting--------8
Standing--------8
Walking--------8
Running-----5
Cycling-------7
Nordic walking-------7
Watching TV-1
Computer work----4
Car driving-1
Ascending stairs--------8
Descending stairs--------8
Vacuum cleaning--------8
Ironing--------8
Folding laundry----4
House cleaning-----5
Playing soccer--2
Rope jumping-----5
Table 6. Comparison of the supervised classifiers using the complete dataset of the case study.
Table 6. Comparison of the supervised classifiers using the complete dataset of the case study.
No.Method NameAccuracySensitivity μ Precision μ F 1 -Score μ Sensitivity M Precision M F 1 -Score M
1C4.5 Decision Trees0.98800.99940.98930.99430.98860.98850.9938
2Rule-Based Classifier0.98760.99930.98900.99420.98830.98820.9937
3Artificial Hydrocarbon Networks0.98290.98290.98310.99100.98300.98320.9910
4SVM with Linear Kernel0.98270.99890.98280.99080.98330.98340.9911
5SVM with Radial Basis Function Kernel0.97450.99850.97520.98670.97460.97470.9865
6Random Forest0.97270.99860.98190.99020.97430.97970.9890
7Stochastic Gradient Boosting0.97250.97250.98050.98930.97400.98150.9899
8k-Nearest Neighbors0.97180.99840.97240.98520.97110.97170.9849
9Mixture Discriminant Analysis0.97140.99820.97170.98480.97240.97290.9854
10AdaBoost0.97100.99820.97740.98770.97260.97800.9880
11Multivariate Adaptive Regression Splines0.95530.99740.96210.97940.95720.96090.9788
12Linear Discriminant Analysis0.93820.99620.93980.96720.93990.94180.9683
13Naive Bayes0.93270.99620.94590.97040.93500.94380.9692
14Artificial Neural Networks0.89760.99390.90190.94570.89840.90200.9458
15Nearest Shrunken Centroids0.70310.98200.70060.81780.70730.70630.8218
Average0.94680.99410.95020.97160.94800.95040.9718
Table 7. Comparison of the supervised classifiers using the reduced dataset of the case study.
Table 7. Comparison of the supervised classifiers using the reduced dataset of the case study.
No.Method NameAccuracySensitivity μ Precision μ F 1 -Score μ Sensitivity M Precision M F 1 -Score M
1Stochastic Gradient Boosting0.98980.98980.99070.99500.99000.99000.9947
2Artificial Hydrocarbon Networks0.97410.97410.97460.98640.97440.97410.9861
3AdaBoost0.96570.99790.96740.98240.96750.96860.9831
4Random Forest0.96550.99820.97680.98740.96730.97440.9861
5Rule-Based Classifier0.96040.99790.97160.98460.96170.96930.9833
6C4.5 Decision Trees0.95710.99760.96470.98090.95860.96290.9799
7k-Nearest Neighbors0.72220.98340.72170.83250.72910.72550.8351
8Multivariate Adaptive Regression Splines0.70610.98250.73570.84130.71590.73920.8437
9SVM with Radial Basis Function Kernel0.65490.97910.65260.78320.66190.66010.7887
10Naive Bayes0.60690.97580.65060.78070.61250.66150.7888
11Mixture Discriminant Analysis0.58530.97500.56840.71810.59290.57460.7232
12SVM with Linear Kernel0.53860.97200.51960.67720.54870.52960.6858
13Artificial Neural Networks0.50920.97030.46310.62700.51900.47160.6349
14Linear Discriminant Analysis0.47920.96830.46760.63070.48530.47730.6396
15Nearest Shrunken Centroids0.42160.96450.41310.57840.41910.42090.5863
Average0.73580.98180.73590.82570.74030.74000.8293
Table 8. Comparison of the supervised classifiers using the 7% noisy dataset of the case study.
Table 8. Comparison of the supervised classifiers using the 7% noisy dataset of the case study.
No.Method NameAccuracySensitivity μ Precision μ F 1 -Score μ Sensitivity M Precision M F 1 -Score M
1Random Forest0.98250.99900.98300.99090.98330.98300.9909
2Stochastic Gradient Boosting0.97220.97220.97540.98680.97330.97470.9864
3Artificial Hydrocarbon Networks0.96960.96960.97000.98390.96910.96960.9837
4Rule-Based Classifier0.94840.99700.95320.97470.94970.95210.9740
5C4.5 Decision Trees0.94590.99690.95190.97390.94780.95070.9732
6SVM with Radial Basis Function Kernel0.92760.99570.92810.96070.92780.92850.9609
7AdaBoost0.91510.99490.92420.95820.91810.92560.9590
8k-Nearest Neighbors0.90590.99440.90660.94840.90600.90750.9490
9Mixture Discriminant Analysis0.88160.99280.88300.93470.88310.88560.9362
10SVM with Linear Kernel0.87960.99280.87860.93220.88000.87920.9326
11Multivariate Adaptive Regression Splines0.87550.99240.90550.94690.87590.90830.9486
12Linear Discriminant Analysis0.83390.98990.83670.90690.83530.83980.9088
13Artificial Neural Networks0.78200.98700.77640.86910.78420.77820.8703
14Naive Bayes0.77690.98680.79570.88100.78110.79640.8815
15Nearest Shrunken Centroids0.62020.97710.60760.74930.62350.61250.7531
Average0.88110.98920.88510.93320.88250.88610.9339
Table 9. Comparison of the supervised classifiers using the 15% noisy dataset of the case study.
Table 9. Comparison of the supervised classifiers using the 15% noisy dataset of the case study.
No.Method NameAccuracySensitivity μ Precision μ F 1 -Score μ Sensitivity M Precision M F 1 -Score M
1Artificial Hydrocarbon Networks0.95470.95470.95960.97810.95460.95920.9779
2k-Nearest Neighbors0.94250.94250.94330.96920.94180.94330.9692
3SVM with Radial Basis Function Kernel0.92490.92490.93080.96210.92380.93140.9624
4Naive Bayes0.91820.91820.93430.96380.92090.93280.9630
5Stochastic Gradient Boosting0.84020.84020.87290.92780.83870.87670.9302
6SVM with Linear Kernel0.84000.84000.85500.91770.83940.85790.9195
7Mixture Discriminant Analysis0.79750.79750.81130.89090.79860.81480.8931
8AdaBoost0.77550.77550.82090.89600.77240.82520.8988
9Random Forest0.74980.74980.82630.89860.74770.83130.9017
10Linear Discriminant Analysis0.74780.74780.76680.86220.74890.77100.8650
11Rule-Based Classifier0.74140.74140.80390.88510.73700.80430.8855
12C4.5 Decision Trees0.73350.73350.79840.88150.72900.80140.8835
13Multivariate Adaptive Regression Splines0.68240.68240.76100.85690.67910.76840.8619
14Nearest Shrunken Centroids0.67610.67610.68200.80440.68090.68840.8090
15Artificial Neural Networks0.52140.52140.53390.68920.52530.53070.6865
Average0.78970.78970.82000.89220.78920.82250.8938
Table 10. Comparison of the supervised classifiers using the 30% noisy dataset of the case study.
Table 10. Comparison of the supervised classifiers using the 30% noisy dataset of the case study.
No.Method NameAccuracySensitivity μ Precision μ F 1 -Score μ Sensitivity M Precision M F 1 -score M
1Naive Bayes0.87980.87980.90900.94910.88160.90870.9489
2Artificial Hydrocarbon Networks0.87860.87860.90520.94700.87930.90440.9466
3k-Nearest Neighbors0.86080.86080.86810.92570.86040.87080.9273
4SVM with Radial Basis Function Kernel0.80430.80430.83490.90510.80240.83740.9067
5Stochastic Gradient Boosting0.69670.69670.78220.87050.69500.79050.8759
6SVM with Linear Kernel0.67470.67470.73650.84100.67500.74190.8448
7Nearest Shrunken Centroids0.63020.63020.66050.78830.63540.66750.7935
8AdaBoost0.63020.63020.73150.83670.62490.73920.8421
9Mixture Discriminant Analysis0.60980.60980.65250.78230.61160.65790.7863
10Random Forest0.55140.55140.73780.83900.54740.74630.8449
11Rule-Based Classifier0.54960.54960.69130.80820.54460.69260.8093
12Linear Discriminant Analysis0.54490.54490.59880.74120.54830.60500.7461
13C4.5 Decision Trees0.53080.53080.66930.79250.52550.67600.7976
14Multivariate Adaptive Regression Splines0.46880.46880.64390.77310.46340.65480.7814
15Artificial Neural Networks0.45590.45590.52710.68240.45630.52950.6845
Average0.65110.65110.72990.83210.65010.73480.8357
Table 11. Comparison of the supervised classifiers using the first 30 s of each activity carried out by all subjects.
Table 11. Comparison of the supervised classifiers using the first 30 s of each activity carried out by all subjects.
No.Method NameAccuracySensitivity μ Precision μ F 1 -Score μ Sensitivity M Precision M F 1 -Score M
1Stochastic Gradient Boosting0.99520.99520.99960.99740.99630.99620.9980
2Random Forest0.99510.99510.99960.99740.99630.99590.9978
3Rule-Based Classifier0.98660.98660.99900.99280.98940.98570.9924
4Artificial Hydrocarbon Networks0.98450.98450.99910.99190.98440.97520.9870
5SVM with Radial Basis Function Kernel0.96350.96350.99730.98040.96980.96620.9818
6C4.5 Decision Trees0.96300.96300.99730.97970.97090.96520.9812
7Multivariate Adaptive Regression Splines0.95740.95740.99660.97780.96690.96730.9822
8AdaBoost0.94740.94740.99590.97360.94980.96210.9792
9SVM with Linear Kernel0.94410.94410.99580.96900.95500.95080.9732
10k-Nearest Neighbors0.91410.91400.99420.95300.91830.91220.9518
11Naive Bayes0.90930.90930.99390.95180.92080.91500.9532
12Artificial Neural Networks0.88790.88790.99230.93880.90320.89090.9393
13Mixture Discriminant Analysis0.88680.88680.99220.93790.90260.89140.9396
14Linear Discriminant Analysis0.77480.77480.98470.87000.79410.77970.8711
15Nearest Shrunken Centroids0.61990.61990.97270.73500.62850.59720.7414
Average0.91530.91530.99400.94980.92310.91670.9513
Table 12. Confusion matrix of the AHN classifier using the first 30 s of each activity carried out by all subjects.
Table 12. Confusion matrix of the AHN classifier using the first 30 s of each activity carried out by all subjects.
Actual Values
LyingSittingStandingWalkingRunningCyclingNordic walkingWatching TVComputer WorkCar DrivingAscending StairsDescending StairsVacuum CleaningIroningFolding laundryHouse CleaningPlaying SoccerRope Jumping
Lying239851912230713306010000030
Sitting132356762281100004000001
Standing02023542301051100014000000
Walking2109482356719163800078000000
Running01334770147601478000711100010
Cycling04775851920616660001021100011
Nordic walking0741291343212020676065343411420731
Watching TV04693126733529613510272917201911
Computer work04181721512133117907373131931820
Car driving0712622371403729574950412012742
Ascending stairs016110211903642363855664291162
Predicted valuesDescending stairs0023101010017164235697272123388
Vacuum cleaning001731712130202901032368419070903040
Ironing0726040000071630235923434820
Folding laundry0001100000620244111820271515
House cleaning000030000028141922146901114
Playing soccer0000400000612592832588369
Rope jumping08284175301068021122414826
Table 13. Comparison of the supervised classifiers using a majority voting across windows-based approach ( 2 . 5 -s window size).
Table 13. Comparison of the supervised classifiers using a majority voting across windows-based approach ( 2 . 5 -s window size).
No.Method NameAccuracySensitivity μ Precision μ F 1 -Score μ Sensitivity M Precision M F 1 -Score M
1Artificial Hydrocarbon Networks1.01.01.01.01.01.01.0
2Rule-Based Classifier1.01.01.01.01.01.01.0
3C4.5 Decision Trees1.01.01.01.01.01.01.0
4Random Forest1.01.01.01.01.01.01.0
5Stochastic Gradient Boosting1.01.01.01.01.01.01.0
6k-Nearest Neighbors1.01.01.01.01.01.01.0
7SVM with Radial Basis Function Kernel0.98410.98410.98460.99160.98730.98860.9938
8SVM with Linear Kernel0.97860.97860.97910.98870.98430.98210.9904
9Multivariate Adaptive Regression Splines0.97700.97700.97850.98820.98320.98440.9914
10AdaBoost0.96270.96270.96510.98080.96810.97320.9853
11Naive Bayes0.95710.95710.96040.97850.96510.96030.9785
12Artificial Neural Networks0.94600.94600.94780.97140.95700.94810.9718
13Mixture Discriminant Analysis0.93650.93650.94120.96770.95000.94210.9684
14Linear Discriminant Analysis0.84440.84440.85080.91480.85830.85280.9166
15Nearest Shrunken Centroids0.68570.68570.65930.78720.70180.67730.8014
Average0.95150.95150.95110.97130.95700.95390.9732
Table 14. Confusion matrix of the artificial hydrocarbon networks (AHN) classifier using a majority voting across windows-based approach ( 2 . 5 -s window size).
Table 14. Confusion matrix of the artificial hydrocarbon networks (AHN) classifier using a majority voting across windows-based approach ( 2 . 5 -s window size).
Actual Values
LyingSittingStandingWalkingRunningCyclingNordic WalkingWatching TVComputer WorkCar DrivingAscending StairsDescending StairsVacuum CleaningIroningFolding LaundryHouse CleaningPlaying SoccerRope Jumping
Lying9600000000000000000
Sitting0960000000000000000
Standing0096000000000000000
Walking0009600000000000000
Running0000600000000000000
Cycling0000084000000000000
Nordic walking0000008400000000000
Watching TV0000000120000000000
Computer work0000000048000000000
Car driving0000000001200000000
Ascending stairs0000000000960000000
Predicted valuesDescending stairs0000000000096000000
Vacuum cleaning0000000000009600000
Ironing0000000000000960000
Folding laundry0000000000000048000
House cleaning0000000000000006000
Playing soccer0000000000000000240
Rope jumping0000000000000000060
Table 15. Confusion matrix of the AHN classifier using the 7% noisy dataset.
Table 15. Confusion matrix of the AHN classifier using the 7% noisy dataset.
Actual Values
LyingSittingStandingWalkingRunningCyclingNordic WalkingWatching TVComputer WorkCar DrivingAscending StairsDescending StairsVacuum CleaningIroningFolding LaundryHouse CleaningPlaying SoccerRope Jumping
Lying29730100000000000010
Sitting02841020000000000000
Standing03291110000000000000
Walking13029220110000010000
Running03102911100000100000
Cycling01021294110100100000
Nordic walking02332429133151100000
Watching TV00000142414201000100
Computer work01211001241312001100
Car driving00000012224001101000
Ascending stairs00000010012911021000
Predicted valuesDescending stairs00200000011292111110
Vacuum cleaning10000000012129321320
Ironing00000000000002930011
Folding laundry00000000000010241200
House cleaning00000000000101223941
Playing soccer10000000000010132393
Rope jumping00000001000000102295
Table 16. Confusion matrix of the AHN classifier using the 15% noisy dataset.
Table 16. Confusion matrix of the AHN classifier using the 15% noisy dataset.
Actual Values
LyingSittingStandingWalkingRunningCyclingNordic WalkingWatching TVComputer WorkCar DrivingAscending StairsDescending StairsVacuum CleaningIroningFolding LaundryHouse CleaningPlaying SoccerRope Jumping
Lying28800000000000000010
Sitting02860000000000000000
Standing00283000000000000000
Walking00028500000000000000
Running00002841000000000100
Cycling00211285180012020200
Nordic walking61249672946316115122500
Watching TV00112302366020203101
Computer work01312020238411013100
Car driving40230230224230212200
Ascending stairs11303100112873202200
Predicted valuesDescending stairs00202100020283002100
Vacuum cleaning10000000000028900010
Ironing00000000000002840011
Folding laundry00000000000000236010
House cleaning00000000000000023500
Playing soccer00000000000000002415
Rope jumping00000000000000005293
Table 17. Confusion matrix of the AHN classifier using the 30% noisy dataset.
Table 17. Confusion matrix of the AHN classifier using the 30% noisy dataset.
Actual Values
LyingSittingStandingWalkingRunningCyclingNordic WalkingWatching TVComputer WorkCar DrivingAscending StairsDescending StairsVacuum CleaningIroningFolding LaundryHouse CleaningPlaying SoccerRope Jumping
Lying26300000000000000020
Sitting02680000000000000000
Standing00244000000000000000
Walking00025300000000000000
Running02002540002001001000
Cycling34111253291320153010
Nordic walking17222026191627820631617193181010
Watching TV101315582212022542300
Computer work10422550221432905300
Car driving723105630923146429211
Ascending stairs5113591040492695715610
Predicted valuesDescending stairs21214300301265104110
Vacuum cleaning10011200203225411071
Ironing00000000000002560000
Folding laundry00000000000000212012
House cleaning00000000000000022521
Playing soccer00000000000000002256
Rope jumping00000000000000008289
Table 18. Overall performance of supervised classifiers during the experiments with respect to the x ^ accuracy metric.
Table 18. Overall performance of supervised classifiers during the experiments with respect to the x ^ accuracy metric.
No.Method NameComplete Dataset7% Noisy DatasetReduced Dataset x ^ Accuracyσ Accuracy
1Stochastic Gradient Boosting0.97250.97220.98980.97820.0082
2Artificial Hydrocarbon Networks0.98290.96960.97410.97560.0055
3Random Forest0.97270.98250.96550.97360.0070
4Rule-Based Classifier0.98760.94840.96040.96550.0164
5C4.5 Decision Trees0.98800.94590.95710.96370.0178
6Artificial Neural Networks0.89760.91510.96570.92610.0289
7k-Nearest Neighbors0.97180.90590.72220.86660.1056
8SVM with Radial Basis Function Kernel0.97450.92760.65490.85240.1409
9Multivariate Adaptive Regression Splines0.95530.87550.70610.84560.1039
10Mixture Discriminant Analysis0.97140.88160.58530.81270.1650
11SVM with Linear Kernel0.98270.87960.53860.80030.1898
12Naive Bayes0.93270.77690.60690.77220.1331
13AdaBoost0.97100.78200.50920.75410.1895
14Linear Discriminant Analysis0.93820.83390.47920.75050.1965
15Nearest Shrunken Centroids0.70310.62020.42160.58160.1181
Average0.94680.88110.73580.85460.0951
Table 19. Overall performance of supervised classifiers during the experiments with respect to the x ^ F 1 - score μ metric.
Table 19. Overall performance of supervised classifiers during the experiments with respect to the x ^ F 1 - score μ metric.
No.Method NameComplete Dataset7% Noisy DatasetReduced Dataset x ^ F 1 -Score μ σ F 1 -Score μ
1Random Forest0.99020.99090.98740.98950.0015
2Artificial Hydrocarbon Networks0.99100.98390.98640.98710.0029
3Rule-Based Classifier0.99420.97470.98460.98450.0080
4C4.5 Decision Trees0.99430.97390.98090.98300.0085
5Stochastic Gradient Boosting0.97250.97220.98980.97820.0082
6Artificial Neural Networks0.94570.95820.98240.96210.0152
7Multivariate Adaptive Regression Splines0.97940.94690.84130.92260.0590
8k-Nearest Neighbors0.98520.94840.83250.92200.0651
9SVM with Radial Basis Function Kernel0.98670.96070.78320.91020.0905
10Mixture Discriminant Analysis0.98480.93470.71810.87920.1157
11Naive Bayes0.97040.88100.78070.87740.0775
12SVM with Linear Kernel0.99080.93220.67720.86670.1362
13Linear Discriminant Analysis0.96720.90690.63070.83490.1465
14AdaBoost0.98770.86910.62700.82790.1501
15Nearest Shrunken Centroids0.81780.74930.57840.71520.1006
Average0.97050.93220.82540.90940.0657
Table 20. Overall performance of supervised classifiers during the experiments with 7%, 15% and 30% noisy datasets.
Table 20. Overall performance of supervised classifiers during the experiments with 7%, 15% and 30% noisy datasets.
No.Method Name x ^ Accuracyσ Accuracy7% (acc)15% (acc)30% (acc) x ^ F 1 -Score μ σ F 1 -Score μ 7% ( F 1 )15% ( F 1 )30% ( F 1 )
1Artificial Hydrocarbon Networks0.93430.03980.96960.95470.87860.96970.01620.98390.97810.9470
2k-Nearest Neighbors0.90310.03340.90590.94250.86080.94780.01780.94840.96920.9257
3SVM with Radial Basis Function Kernel0.88560.05750.92760.92490.80430.94260.02660.96070.96210.9051
4Naive Bayes0.85830.05970.77690.91820.87980.93130.03610.88100.96380.9491
5Stochastic Gradient Boosting0.83630.11250.97220.84020.69670.92840.04750.98680.92780.8705
6SVM with Linear Kernel0.79810.08870.87960.84000.67470.89700.04000.93220.91770.8410
7AdaBoost0.77360.11630.91510.77550.63020.89700.04960.95820.89600.8367
8Mixture Discriminant Analysis0.76290.11360.88160.79750.60980.86930.06410.93470.89090.7823
9Random Forest0.76120.17620.98250.74980.55140.90950.06250.99090.89860.8390
10Rule-Based Classifier0.74650.16290.94840.74140.54960.88930.06800.97470.88510.8082
11C4.5 Decision Trees0.73670.16950.94590.73350.53080.88260.07410.97390.88150.7925
12Linear Discriminant Analysis0.70890.12120.83390.74780.54490.83680.07000.90690.86220.7412
13Multivariate Adaptive Regression Splines0.67560.16610.87550.68240.46880.85900.07100.94690.85690.7731
14Nearest Shrunken Centroids0.64220.02430.62020.67610.63020.78070.02310.74930.80440.7883
15Artificial Neural Networks0.58640.14080.78200.52140.45590.74690.08640.86910.68920.6824
Average0.77400.10550.88110.78970.65110.88580.05020.93320.89220.8321
Table 21. Training and testing times of the supervised classifiers in the complete and the reduced datasets.
Table 21. Training and testing times of the supervised classifiers in the complete and the reduced datasets.
No.Method NameTraining Time (s)Testing Time (ms)
Complete DatasetReduced DatasetComplete DatasetReduced Dataset
1AdaBoost20.396.190.550.60
2Artificial Hydrocarbon Networks72.6161.531.710.92
3C4.5 Decision Trees2.230.910.030.02
4k-Nearest Neighbors6.873.260.600.21
5Linear Discriminant Analysis10.230.130.040.01
6Mixture Discriminant Analysis7.235.020.210.15
7Multivariate Adaptive Regression Splines40.267.720.070.03
8Naive Bayes29.305.7656.5511.04
9Nearest Shrunken Centroids0.080.080.010.01
10Artificial Neural Networks18.2912.140.020.01
11Random Forest24.738.970.030.04
12Rule-Based Classifier3.621.190.030.02
13Stochastic Gradient Boosting16.545.480.070.07
14SVM with Linear Kernel3.513.910.100.07
15SVM with Radial Basis Function Kernel26.2036.381.903.02

Share and Cite

MDPI and ACS Style

Ponce, H.; Martínez-Villaseñor, M.D.L.; Miralles-Pechuán, L. A Novel Wearable Sensor-Based Human Activity Recognition Approach Using Artificial Hydrocarbon Networks. Sensors 2016, 16, 1033. https://doi.org/10.3390/s16071033

AMA Style

Ponce H, Martínez-Villaseñor MDL, Miralles-Pechuán L. A Novel Wearable Sensor-Based Human Activity Recognition Approach Using Artificial Hydrocarbon Networks. Sensors. 2016; 16(7):1033. https://doi.org/10.3390/s16071033

Chicago/Turabian Style

Ponce, Hiram, María De Lourdes Martínez-Villaseñor, and Luis Miralles-Pechuán. 2016. "A Novel Wearable Sensor-Based Human Activity Recognition Approach Using Artificial Hydrocarbon Networks" Sensors 16, no. 7: 1033. https://doi.org/10.3390/s16071033

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop