1. Introduction
Gestures are composed of multiple body-part motions and can form activities [
1]. Hence, gesture recognition offers a wide range of applications, including inter alia, fitness training, human robot and computer interaction, security, and sign language recognition. Likewise, gesture recognition is employed in ambient assisted living systems for tackling burgeoning and worrying public healthcare problems, such as autonomous living for people with dementia and Parkinson’s disease. Although a large amount of work has been conducted on image-based sensing technology, camera and depth sensors are limited to the environment in which they are installed. Moreover, they are sensitive to obstructions in the field of vision, variation in luminous intensity, reflection, etc. In contrast, wearable sensors and mobile devices are more suitable for monitoring ambulatory activities and physiological signals.
In a supervised context, a wide range of action or gesture recognition techniques has been explored using wearable sensors. k-Nearest Neighbor (k-NN) might be the most straightforward classifier to utilize since it does not learn but searches the closest data in the training data using a given distance function. Even though conventional k-NN achieves good performance, it suffers from lack of ability to deal with these problems: low attribute and sample noise tolerance, high-dimensional spaces, large training dataset requirements, and imbalances in the data. Yu et al. [
2] recently proposed a random subspace ensemble framework based on hybrid k-NN to tackle these problems, but the classifier has not yet been applied to a gesture recognition task. Hidden Markov Model (HMM) is the most traditional probabilistic method used in the literature [
3,
4]. However, computing transition probabilities necessary for learning model parameters requires a large amount of training data. HMM-based techniques may also not be suitable for hard real-time (synchronized clock-based) systems due to its latency [
5]. Since data sets are not necessarily large enough for training, Support Vector Machine (SVM) is a classical alternative method [
6,
7,
8]. SVM is, nevertheless, very sensitive to the selection of its kernel type and parameters related to the latter. There are novel dynamic Bayesian networks often used to deal with sequence analysis, such as recurrent neural networks (e.g., LSTMs) [
9] and deep learning approach [
10], which should become more popular in the next years.
Dynamic Time Warping (DTW) is one of the most utilized similarity measures for matching two time-series sequences [
11,
12]. Often reproached for being slow, Rakthanmanon et al. [
13] demonstrated that DTW is quicker than Euclidean distance search algorithms and even suggests that the method can spot gestures in real time. However, the recognition performance of DTW is affected by the strong presence of noise, caused by either segmentation of gestures during the training phase or gesture execution variability.
The longest common subsequence (LCSS) method is a precursor to DTW. It measures the closeness of two sequences of symbols corresponding to the length of the longest subsequence common to these two sequences. One of the abilities of DTW is to deal with sequences of different lengths, and this is the reason why it is often used as an alignment method. In [
14], LCSS was found to be more robust in noisy conditions than DTW. Indeed, since all elements are paired in DTW, noisy elements (i.e., unwanted variation and outliers) are also included, while they are simply ignored in the LCSS. Although some image-based gesture recognition applications can be found in [
15,
16,
17], not much work has been conducted using non-image data. In the context of crowd-sourced annotations, Nguyen-Dinh et al. [
18] proposed two methods, entitled SegmentedLCSS and WarpingLCSS. In the absence of noisy annotation (mislabeling or inaccurate identification of the start and end times of each segment), the two methods achieve similar recognition performances on three data sets compared with DTW- and SVM-based methods and surpass them in the presence of mislabeled instances. Extensions were recently proposed, such as a multimodal system based on WarpingLCSS [
19], S-SMART [
20], and a limited memory and real-time version for resource constrained sensor nodes [
21]. Although the parameters of these LCSS-based methods should be application-dependent, they have so far been empirically determined and a lack of design procedure (parameter-tuning methods) has been suggested.
In designing mobile or wearable gesture recognition systems, the temptation of integrating many sensing units for handling complex gesture often negates key real-life deployment constraints, such as cost, power efficiency, weight limitations, memory usage, privacy, or unobtrusiveness [
22]. The redundant or irrelevant dimensions introduced may even slow down the learning process and affect recognition performance. The most popular dimensionality reduction approaches include feature extraction (or construction), feature selection, and discretization. Feature extraction aims to generate a set of features from original data with a lower computational cost than using the complete list of dimensions. A feature selection method selects a subset of features from the original feature list. Feature selection is an NP-hard combinatorial problem [
23]. Although numerous search techniques can be found in the literature, they fail to avoid local optima and require a large amount of memory or very long runtimes. Alternatively, evolutionary computation techniques have been proposed for solving feature selection problem [
24]. Since the abovementioned LCSS technique directly utilizes raw or filtered signals, there is no evidence on whether we should favour feature extraction or selection. However, these LCSS-based methods impose the transformation of each sample from the data stream into a sequence of symbols. Therefore, a feature selection coupled with a discretization process could be employed. Similar to feature selection, discretization is also an NP-hard problem [
25,
26].
In contrast to the feature selection field, few evolutionary algorithms are proposed in the literature [
25,
27]. Indeed, evolutionary feature selection algorithms have the disadvantage of high computational cost [
28] while convergence (close to the true Pareto front) and diversity of solutions (set of solutions as diverse as possible) are still two major difficulties [
29].
Evolutionary feature selection methods focus on maximizing the classification performance and on minimizing the number of dimensions. Although it is not yet clear whether removing some features can lead to a decrease in classification error rate [
24], a multiple-objective problem formulation could bring trade-offs. Discretization attribute literature aims to minimize the discretization scheme complexity and to maximize classification accuracy. In contrast to feature selection, these two objectives seem to be conflicting in nature [
30].
A multi-objective optimization algorithm based on Particle swarm optimization (heuristic methods) can provide an optimal solution. However, an increase in feature quantities increases the solution space and then decreases the search efficiency [
31]. Therefore, Zhou et al. 2021 [
31] noted that particle swarm optimisation may find a local optimum with high dimensional data. Some variants are suggested such as competitive swarm optimization operator [
32] and multiswarm comprehensive learning particle swarm optimization [
33], but tackling many-objective optimization is still a challenge [
29].
Moreover, particle swarm optimization can fall into a local optimum (needs a reasonable balance between convergence and diversity) [
29]. Those results are similar to filter and wrapper methods [
34] (more details about Filter and wrapper methods can be found in [
31,
34]). Yang et al. 2020 [
29] suggest to improve computational burdens with a competition mechanism using a new environment selection strategy to maintain the diversity of population. Additionally, to solve this issue, since mutual information can capture nonlinear relationships included in a filter approach, Sharmin et al. 2019 [
35] used mutual information as a selection criteria (joint bias-corrected mutual information) and then suggested adding simultaneous forward selection and backward elimination [
36].
Deep neural networks such as CNN [
37] are able to learn and select features. As an example, hierarchical deep neural networks were included with a multiobjective model to learn useful sparse features [
38]. Due to the huge number of parameter, a deep learning approach needs a high quantity of balanced samples, which is sometimes not satisfied in real-world problems [
34]. Moreover, as a deep neural network is a black box (non-causal and non-explicable), an evaluation of the feature selection ability is difficult [
37].
Currently, feature selection and data discretization are still studied individually and not fully explored [
39] using many-objective formulation. To the best of our knowledge, no studies have tried to solve the two problems simultaneously using evolutionary techniques for a many-objective formulation. In this paper, the contributions are summarized as follows:
- 1.
We propose a many-objective formulation to simultaneously deal with optimal feature subset selection, discretization, and parameter tuning for an LM-WLCSS classifier. This problem was resolved using the constrained many-objective evolutionary algorithm based on dominance (minimisation of the objectives) and decomposition (C-MOEA/DD) [
40].
- 2.
Unlike many discretization techniques requiring a prefixed number of discretization points, the proposed discretization subproblem exploits a variable-length representation [
41].
- 3.
To agree with the variable-length discretization structure, we adapted the recently proposed rand-length crossover to the random variable-length crossover differential evolution algorithm [
42].
- 4.
We refined the template construction phase of the microcontroller optimized Limited-Memory WarpingLCSS (LM-WLCSS) [
21] using an improved algorithm for computing the longest common subsequence [
43]. Moreover, we altered the recognition phase by reprocessing the samples contained in the sliding windows in charge of spotting a gesture in the steam.
- 5.
To tackle multiclass gesture recognition, we propose a system encapsulating multiple LM-WLCSS and a light-weight classifier for resolving conflicts.
The main hypothesis is as follows: using the constrained many-objective evolutionary algorithm based on dominance, an optimal feature subset selection can be found. The rest of the paper is organized as follows:
Section 2 states the constrained many-objective optimization problem definition, exposes C-MOEA/DD, highlights some discretization works, presents our refined LM-WLCSS, and reviews multiple fusion methods based on WarpingLCSS. Our solution encoding, operators, objective functions, and constraints are presented in
Section 3. Subsequently, we present the decision fusion module. The experiments are described in
Section 4 with the methodology and their corresponding evaluation metrics (two for effectiveness, including Cohen’s kappa, and one for reduction). Finally, our system is evaluated and the results are discussed in
Section 5.
2. Preliminaries and Background
In this section, we first briefly provide some basic definitions on the constrained many-objective optimization problem. We then describe a recently proposed optimization algorithm based on dominance and decomposition, entitled C-MOEA/DD. Additionally, we review evolutionary discretization techniques and successors of the well-known class-attribute interdependence maximization (CAIM) algorithm. Afterward, we expose some modifications on the different key components of the limited memory implementation of the WarpingLCSS. Finally, we review some fusion methods based on WarpingLCSS to tackle the multi-class gesture problem and recognition conflicts.
2.1. Constrained Many-Objective Optimization
Since artificial intelligence and engineering applications tend to involve more than two and three objective criteria [
40], the concept of many objective optimization problems must be introduced beforehand. Literally, they involve many objectives in a conflicted and simultaneous manner. Hence, a constrained many-objective optimization problem may be formulated as follows:
where
is a
n-decision variable candidate solution taking its value in the bonded space
. A solution respecting the
J inequality (
) and
K equality constraints (
) is qualified as attainable. These constraints are included in the objective functions and are detailed in our proposed method in
Section 3.3.
associates a candidate solution to the objective space
through
m conflicting objective functions. The obtained results are thus alternative solutions but have to be considered equivalent since no information is given regarding the relevance of the others.
A solution
is said to dominate another solution
, written as
if and only if
2.2. C-MOEA/DD
MOEA/DD is an evolutionary algorithm for many-objective optimization problems, drawing its strength from MOEA/D [
44] and NSGA-III [
45]. As it combines both the dominance-based and decomposition-based approaches, it implies an effective balance between the convergence and diversity of the evolutionary process. Decomposition is a popular method to break down a multiple objective problem into a set of scalar optimization subproblems. Here, the authors use the penalty-based boundary intersection approach, but they highlight that any approach could be applied. Subsequently, we briefly explain the general framework of MOEA/DD and expose its requisite modifications for solving constrained many-objective optimization problems.
At first, a procedure generates N solutions to form the initial parent solutions and creates a weight vector set, W, representing N unique subregions in the objective space. As the current problem does not exceed six objectives, only the one layer weight generation algorithm was used. The T closest weights for each solution are also extracted to form a neighborhood set of weight vectors, E. The initial population, P, is then divided into several non-domination levels using the fast non-dominated sorting method employed in NSGA-II.
In the MOEA/DD main while-loop, a common process is applied for each weight vector in E until the termination criterion is reached. It consists of randomly choosing k-mating parents in the neighboring subregions of the weight vector considered. When no solution exists in the selected subregions, they are randomly selected in the current population. These k-solutions are then altered using genetic operators. For each offspring, an intricate update mechanism is applied on the population.
First, the associated subregion of the offspring is identified. The considered offspring is then merged with the population in a temporary container, . Next, the non-domination level structure of is updated. It is worthy to note that an ingenious method was employed to avoid full non-dominated sorting of . Since the population must preserve its size throughout the run of MOEA/DD, three cases may arise. When all solutions are non-dominated, the worst solution of the most crowded weight vector is deleted from the population. This function has been denominated LocateWorst. When there are multiple non-domination levels, the deletion of one solution depends on the number within the last non-domination level, . On the one hand, there is only one solution in , and the density of the associated subregion is investigated so as not to incorrectly alter the population diversity. LocateWorst is called in the case where the density contains only one element. When the most crowded subregion associated with each solution in contains more than one element, the solution owning the largest scalarized value within it is deleted. Otherwise, LocateWorst is called so as not to delete isolated subregions.
Since MOEA/DD is designed to solve unconstrained many-objective optimization problems, Li et al. [
40] also provided an extension for handling constrained many-objective optimization problems, which requires three modifications. First, a constraint violation value,
, henceforth accompanies each solution
. It is determined as follows:
where the function
returns the absolute value of
if
and returns 0 otherwise. Second, while the abovementioned update procedure is maintained for feasible solutions, the survey of the infeasible ones is dictated by their association with an isolated subregion. More precisely, a second chance of survival is granted to these infeasible solutions, and the solution with the largest
or the one that is not associated with an isolated subregion is eliminated from the next population. Finally, the selection for reproduction procedure becomes a binary tournament, where two solutions are initially randomly picked, and the solution with the smallest
is favoured or a random choice is applied in the case of equality.
2.3. Discretization
The discretization process aims to transform a set of continuous attributes into discrete ones. Although there is a substantial number of discretization methods in the literature, Garcia et al. [
26] recently carried out extensive testing of the 30 most representative and newest discretization techniques in supervised classification. Amongst the best performing algorithms, FUSINTER, ChiMerge, CAIM, and Modified Chi2 obtained the highest average accuracies; it is possible to add Zeta and MDLP to this list if the Cohen’s kappa metric is considered. In the authors’ taxonomy, the evaluation measures for comparing solutions were broken down into five families: information, statistics, rough set, wrapper, and binning. Subsequently, we review few evolutionary approaches to solve discretization problems and succeeding methods of CAIM.
In [
46], a supervised method called Evolutionary Cut Points Selection for Discretization (ECPSD) was introduced. The technique exploits the fact that boundary points are suitable candidates for partitioning numerical attributes. Hence, a complete set of boundary points for each attribute is first generated. A CHC model [
47] then searches the optimal subset of cut points while minimizing the inconsistency. Later on, the evolutionary multivariate discretizer (EMD) was proposed on the same basis [
27]. The inconsistency was substituted for the aggregate classification error of an unpruned version of C4.5 and a Naive Bayes. Additionally, a chromosome length reduction algorithm was added to overcome large numbers of attributes and instances in datasets. However, the selection of the most appropriate discretization scheme relies on the weighted-sum of each objective functions, where a user-defined parameter is provided. This approach is thus limited even though varying parameters of a parametric scalarizing approach may produce multiple different Pareto-optimal solutions. In [
25], a multivariate evolutionary multi-objective discretization (MEMOD) algorithm is proposed. It is an enhanced version of EMD, where the CHC has been replaced by the well-known NSGA-II, and the chromosome length reduction algorithm hereafter exploits all Pareto solutions instead of the best one. The following objective functions have been considered: the number of cut points currently selected, the average classification error produced by a CART and Naive Bayes, and the frequency of the selected cut points.
As previously exposed, CAIM stands out due to its performance amongst the classical techniques. Some extensions have been proposed, such as Class-Attribute Contingency Coefficient [
48], Autonomous Discretization Algorithm (Ameva) [
49], and ur-CAIM [
30]. Ameva has been successfully applied in activity recognition [
50] and fall detection for people who are older [
51]. The technique is designed for achieving a lower number of discretization intervals without prior user specifications and maximizes a contingency coefficient based on the
statistics. The Ameva criterion is formulated as follows:
where
k and
l are the number of discrete intervals and the number of classes, respectively. The ur-CAIM discretization algorithm enhances CAIM for both balanced and imbalanced classification problems. It combines three class-attribute interdependence criteria in the following manner:
where
denotes the CAIM criterion scaled into the range [0,1]. CAIR and CAIU stand for Class-Attribute Interdependence Redundancy and Class-Attribute Interdependence Uncertainty, respectively. In the ur-CAIM criterion, the CAIR factor has been adapted to handle unbalanced data.
2.4. Limited-Memory Warping LCSS Gesture Recognition Method
SegmentedLCSS and WarpingLCSS, introduced by [
18], are two template matching methods for online gesture recognition using wearable motion sensors based on the longest common subsequence (LCS) algorithm. Aside from being robust against human gesture variability and noisy gathered data, they are also tolerant to noisy labeled annotations. On three datasets (10–17 classes), both methods outperform DTW-based classifiers with and without the presence of noisy annotations. WarpingLCSS has a smaller runtime complexity, about one order of magnitude, than SegmentedLCSS. In return, a penalty parameter, which is application-specific, has to be set. Since each method is a binary classifier, a fusion method must be established, which will be discussed and illustrated in detail later.
A recently proposed variant of the WarpingLCSS method [
21], labeled LM-WLCSS, allows the technique to run on a resource constrained sensor node. A custom 8-bit Atmel AVR motion sensor node and a 32-bit ARM Cortex M4 microcontroller were successfully used to illustrate the implementation of this method on three different everyday life applications. On the assumption that a gesture may last up to 10 s and given that the sample rate is 10 Hz, the chips are capable of recognizing, simultaneously and in real-time, 67 and 140 gestures, respectively. Furthermore, the extremely low power consumption used to recognize one gesture (135
) might suggest an ASIC (Application-Specific Integrated Circuit) implementation.
In the following subsections, we review the core components of the training and recognition processes of an LM-WLCSS classifier, which will be in charge of recognizing a particular gesture. All streams of sensor data acquired using multiple sensors attached to the sensor node are pre-processed using a specific quantization step to convert each sample into a sequence of symbols. Accordingly, these strings allow for the formation of a training data set essential for selecting a proper template and computing a rejection threshold. In the recognition mode, each new sample gathered is quantized and transmitted to the LM-WLCSS and then to a local maximum search module, called SearchMax, to finally output if a gesture has occurred or not.
Figure 1 describes the entire data processing flow.
2.4.1. Quantization Step (Training Phase)
At each time,
t, a quantization step assigns an
n-dimensional vector,
representing one sample from all connected sensors as a symbol. In other words, a prior data discretization technique is applied on the training data, and the resulting discretization scheme is used as the basis of a data association process for all incoming new samples. Specifically to the LM-WLCSS, Roggen et al. [
21] applied the K-means algorithm and the nearest neighbor. Despite the fact that K-means is widely employed, it suffers from the following disadvantages: the algorithm does not guaranty the optimality of the solution (position of cluster centers) and the optimal number of clusters assessed must be considered the optimum. In this paper, we investigate the use of the Ameva and ur-CAIM coefficients as a discretization evaluation measure in order to find the best suitable discretization scheme. The nearest neighbor algorithm is preserved, where the squared Euclidean distance was selected as a distance function. More formally, a quantization step is defined as follows:
where
assigns to the sample
the index of a discretization point
chosen from the discretization scheme
associated with the gesture class
c. Therefore, the stream is converted into a succession of discretization points.
2.4.2. Template Construction (Training Phase)
Let
denote the sequence
i, i.e., the quantized gesture instance
i, belonging to the gesture class training data set
. Hence,
, where
S is the training data set. In the LM-WLCSS, the template construction of a gesture class
c simply consists of choosing the first motif instance in the gesture class training data set. Here, we adopt the existing template construction phase of the WarpingLCSS. A template
, representing all gestures from the class
c, is therefore the sequence that has the highest LCS among all other sequences of the same class. It results in the following:
where
is the length of the longest common subsequence.
The LCS problem has been extensively studied, and it has an exponential raw complexity of
. A major improvement, proposed in [
52], is achieved by dynamic programming in a runtime of
, where
n and
m are the lengths of the two compared strings. In [
43], the authors suggested three new algorithms that improve the work of [
53], using a van Emde Boas tree, a balanced binary search tree, or an ordered vector. In this paper, we use the ordered vector approach, since its time and space complexities are
and
, where
n and
L are the lengths of the two input sequences and
R is the number of matched pairs of the two input sequences.
2.4.3. Limited-Memory Warping LCSS
LM-WLCSS instantaneously produces a matching score between a symbol
and a template
. When one identical symbol encounters the template
, i.e., the
ith sample and the first
jth sample of the template are alike, a reward
is given. Otherwise, the current score is equal to the maximum between the two following cases: (1) a mismatch between the stream and the template, and (2) a repetition in the stream or even in the template. An identical penalty
D, the normalized squared Euclidean distance between the two considered symbols
weighted by a fixed penalty
, is thus applied. Distances are retrieved from the quantizer since a pairwise distance matrix between all symbols in the discretization scheme has already been built and normalized. In the original LM-WLCSS, the decision between the different cases is controlled by tolerance
. Here, this behavior has been nullified due to the exploration capacity of the metaheuristic to find an adequate discretization scheme. Hence, modeled on the dynamic computation of the LCS score, the matching score
between the first
j symbols of the template
and the first
i symbols of the stream
W stem from the following formula:
where
. It is easily determined that the higher the score, the more similar the pre-processed signal is to the motif. Once the score reaches a given acceptance threshold, an entire motif has been found in the data stream. By updating a backtracking variable,
, with the different lines of (
9) that were selected, the algorithm enables the retrieving of the start-time of the gesture.
2.4.4. Rejection Threshold (Training Phase)
The computation of the rejection threshold,
, requires computing the LM-WLCSS scores between the template and each gesture instance (expected chosen template) contained in the gesture class
c. Let
and
denote the resulting mean and standard deviation of these scores. It follows
where
is a real positive in the range
.
2.4.5. Searchmax (Recognition Phase)
A SearchMax function is called after every update of the matching score. It aims to find the peak in the matching score curve, representing the beginning of a motif, using a sliding window without the necessity of storing that window. More precisely, the algorithm first searches the ascent of the score by comparing its current and previous values. In this regard, a flag is set, a counter is reset, and the current score is stored in a variable called Max. For each following value that is below Max, the counter is incremented. When Max exceeds the pre-computed rejection threshold, , and the counter is greater than the size of a sliding window , a motif has been spotted. The original LM-WLCSS SearchMax algorithm has been kept in its entirety. , therefore, controls the latency of the gesture recognition and must be at least smaller than the gesture to be recognized.
2.4.6. Backtracking (Recognition Phase)
When a gesture has been spotted by SearchMax, retrieving its start-time is achieved using a backtracking variable. The original implementation as a circular buffer with a maximal capacity of has been maintained, where and denote the length of the template and the length of the backtracking variable , respectively. However, we add an additional behavior. More precisely, elements are skipped because of the required time for SearchMax to detect local maxima, and the backtracking algorithm is applied. The current matching score is then reset, and the previous samples’ symbols are reprocessed. Since only references to the discretization scheme are stored, re-quantization is not needed.
2.5. Fusion Methods Using WarpingLCSS
WarpingLCSS is a binary classifier that matches the current signal with a given template to recognize a specific gesture. When multiple WarpingLCSS are considered in tackling a multi-class gesture problem, recognition conflicts may arise. Multiple methods have been developed in literature to overcome this issue. Nguyen-Dinh et al. [
18] introduced a decision-making module, where the highest normalized similarity between the candidate gesture and each conflicting class template is outputted. This module has also been exploited for the SegmentedLCSS and LM-WLCSS. However, storing the candidate detected gesture and reprocessing as many LCSS as there are gesture classes might be difficult to integrate on a resource constrained node. Alternatively, Nguyen-Dinh et al. [
19] proposed two multimodal frameworks to fuse data sources at the signal and decision levels, respectively. The signal fusion combines (summation) all data streams into a single dimension data stream. However, considering all sensors with an equal importance might not give the best configuration for a fusion method. The classifier fusion framework aggregates the similarity scores from all connected template matching modules, and each one processes the data stream from one unique sensor, into a single fusion spotting matrix through a linear combination, based on the confidence of each template matching module. When a gesture belongs to multiple classes, a decision-making module resolves the conflict by outputting the class with the highest similarity score. The behavior of interleaved spotted activities is, however, not well-documented. In this paper, we decided to deliberate on the final decision using a light-weight classifier.
3. Proposed Method
In this section, we present an evolutionary algorithm for feature selection, discretization, and parameter tuning for an LM-WLCSS-based method. Unlike many discretization techniques requiring a prefixed number of discretization points, the proposed algorithm exploits a variable-length structure in order to find the most suitable discretization scheme for recognizing a gesture using LM-WLCSS. In the remaining part of this paper, our method is denoted by MOFSD-GR (Many-Objective Feature Selection and Discretization for Gesture Recognition).
3.1. Solution Encoding and Population Initialization
A candidate solution integrates all key parameters required to enable data reduction and to recognize a particular gesture using the LM-WLCSS method.
As previously noted, the sample at time t is an n-dimensional vector , where n is the total number of features characterizing the sample. Focusing on a small subset of features could significantly reduce the number of required sensors for gesture recognition, save computational resources, and lessen the costs. Feature selection has been encoded as a binary valued vector , where indicates that the corresponding features is not retained whereas signifies that the associated feature is selected. This type of representation is very widespread across literature.
The discretization scheme is represented by a variable-length vector, where m is a positive integer uniformly chosen in the range . The upper limit of this decision variable is purposely larger than necessary to improve diversity. These limits are selected by trial and error. Each discretization point , , is a n-dimensional point uniformly chosen in the training space of the gesture c.
Amongst the abovementioned LM-WLCSS parameters, only the SearchMax window length , the penalty , and the coefficient of the threshold have been included into the solution representation.
controls the latency of the recognition process, i.e., the required time to announce that a gesture peak is present in the matching score. is a positive integer uniformly chosen in the interval . By fixing the reward to 1, the penalty is a real number uniformly chosen in the range ; otherwise, gestures that are different from the selected template would be hardly recognizable.
The coefficient of the threshold is strongly correlated to the reward and the discretization scheme . Since it cannot easily be bounded, its value is locally investigated for each solution.
The backtracking variable length allows us to retrieve the start-time of a gesture. Although a too short length results in a decrease in recognition performance of the classifier, its choice could reduce the runtime and memory usage on a constrained sensor node. Since its length is not a major performance limiter in the learning process and it can easily be rectified by the decider during the deployment of the system, it was fixed to three times the length of the longest gesture occurrence in c in order to reduce the complexity of the search space.
Hence, the decision vector
can be formulated as follows:
3.2. Operators
In C-MOEA/DD, selected solutions produce one or more offspring using any genetic operators. In this paper, for each selected parent solution pair , a crossover generates two children that are mutated afterwards. In the following subsections, these two operators are explained.
3.2.1. Crossover Operation
The classical uniform crossover is used for the selected feature vector. In this paper, we adapted the recently proposed rand-length crossover for the random variable-length crossover differential evolution algorithm [
42] to crossover two discretization schemes. More precisely, offspring lengths are firstly randomly and uniformly selected from the range
, where
indicates the discretization scheme (to be used for the gesture class
c) associated with the solution
and
indicates the number of elements in this designated discretization scheme. For the current value of
, three cases might occur. When both parent solutions contain a discretization point at the index
i, the simulated binary crossover (SBX) is applied to each dimension of the two points. When one of the parent solution discretization scheme is too short, both children inherit from the parent having the longest discretization scheme. Otherwise, a new discretization point is uniformly chosen in the training space for each children solution. All newly created discretization points are randomly assigned to children solution. The pseudo-code of the rand-length crossover for discretization scheme procedure is given in Algorithm 1.
Since LM-WLCSS penalties are encoded as real-values, the SBX operator is also applied to the decision variable
. In contrast, SearchMax window lengths are integers; thus, we incorporate the weighted average normally distributed arithmetic crossover (NADX) [
54]. It induces a greater diversity than uniform crossover and SBX operators while still proposing values near and between the parents. Despite the length of the backtracking variable having been fixed, the NADX operator could be considered.
When selecting features, the discretization schemes or LM-WLCSS penalties, and SearchMax window lengths of children solutions are different from those of parent solutions, and their coefficients, , of the threshold must be undefined because the resulting LM-WLCSS classifier from the solution is altered.
3.2.2. Mutation Operation
All decision variables are equiprobably modified. The uniform bit flip mutation operator is applied to the selected feature binary vector. Each discretization point in the discretization scheme is also equiprobably altered. Specifically, when a discretization point has been identified for a modification, all of its features are mutated using the polynomial mutation operator. For all of the remaining decision variables, the polynomial mutation is applied whether decision variables are encoded as integers or real numbers.
Algorithm 1: Rand-length crossover for discretization schemes. |
|
3.3. Objective Functions
The quality of a candidate solution is measured by the objective functions. In order to find the best solution for recognizing a particular gesture using LM-WLCSS, five functions have been considered:
where
subject to
where
is the set of distinct discretization points in the elected template
,
is the number of distinct elements in the latter, and
denotes the Iverson bracket.
Let us firstly define the basic terms generated by a confusion matrix:
(true positives) is the number of correctly identified samples,
(false positives) refers to the incorrectly identified samples,
(true negatives) is the number of correctly rejected samples, and
(false negatives) refers to the incorrectly rejected samples. In (
13),
measures how well the trained binary classifier performs on the testing data set. Although the accuracy is widely acknowledged, it cannot be used as exclusive performance recognition indicator, since the classifier could have exactly zero predictive power [
55]. We alternatively selected the F1 score, defined as the harmonic mean of precision and recall, where
and
.
The objective function
, in (
14), directly comes from the template construction during the training phase of the binary classifier. It is the average sum of the longest common subsequence between the elected template
and the other quantized gesture instances in the gesture class training data set. The higher the score is, the more the template represents the gesture class
c.
The Ameva criterion, determined by the objective function
in (
15), expresses the quality of the discretization scheme component of the solution. Its highest values are attained when all samples from a specific class are quantized to a unique discretization point (the other discretization points have no associated samples). Additionally, the criterion favours a low number of discretization points. Since there are only two classes in this problem, i.e., the samples from the gesture class
c represents the positive class, and all others examples are negatives; it might be possible to encounter similarities in the different gesture executions for both classes. As a result, negative examples might be quantized into the same discretization points defining the class template
, and the Ameva criterion might try to create unnecessary discretization points. To overcome this issue, a constraint on the template, defined in (
18), imposes that the latter must be defined by at least three distinct discretization points. Additionally, in (
16), the objective function
counters this conflicting situation and measures heterogeneity by the normalized entropy of the elected template
included between
. Lower appearance of a discretization point in the template is thus penalized. The Ameva criterion may be interchanged with ur-CAIM or any other discretization criterion.
In (
17), the last objective function indicates the average number of selected features in the current solution, as we need to reduce the number of features.
Algorithm 2 presents the pseudo-code of the evaluation procedure of a candidate solution
. First and foremost, a quantizer
is created using the discretization scheme
and the feature selection vector
. An LM-WLCSS classifier can thus be trained on the training dataset. Although the objective function
is completely independent of the classifier construction, an infeasible solution situation may be encountered due to the negativity of the rejection threshold
, as stated in (
19). In contrast, evaluation procedure continues, and from the elected class template
and the rejection threshold, it follows the objective function
. As previously mentioned, the decision variable
must be locally investigated. When the coefficient of variation
is different from zero, the procedure increments the value of
from 0 to
with a step of
because a high amplitude of the coefficients can nullify the rejection threshold. For each coefficient value, the previously constructed LM-WLCSS classifier is not retained. Only updating the SearchMax threshold, clearing the circular buffer (variable
), and resetting the matching score are necessary. Here, the greater objective function
obtained value (i.e., the best-obtained classifier performance) and its associated
are preserved, and the evaluated solution
and objective function
are updated in consequence.
3.4. Multi-Class Gesture Recognition System
Whenever a new sample
is acquired, each of the required subset of the vector is transmitted to the corresponding trained LM-WLCSS classifier to be specifically quantized and instantaneously classified. Each binary decision, forming a decision vector
, is sent to a decision fusion module to eventually yield which gesture has been executed. Among all of the aggregation schemes for binarization techniques, we decided to deliberate on the final decision through a light-weight classifier, such as neural networks, decision trees, logistic regressions, etc.
Figure 2 illustrates the final recognition flow.
Algorithm 2: Solution evaluation. |
|
5. Results and Discussion
The validation of our simultaneous feature selection, discretization, and parameter tuning for LM-WLCSS classifiers is carried out in this section. The results on performance recognition and dimensionality reduction effectiveness are presented and discussed. The computational experiments were performed on an Intel Core i7-4770k processor (3.5 GHz, 8 MB cache), 32 GB of RAM, Windows 10. The algorithms were implemented in C++. The Euclidean and LCSS distance computations were sped up using Streaming SIMD Extensions and Advanced Vector Extensions. Subsequently, the Ameva or ur-CAIM criterion used as an objective function
(
15) is referred to as MOFSD-GR
and MOFSD-GR
respectively.
On all four subjects of the Opportunity dataset,
Table 2 shows a comparison between the best-provided results by Nguyen-Dinh et al. [
19], using their proposed classifier fusion framework with a sensor unit, and the obtained classification performance of MOFSD-GR
and MOFSD-GR
. Our methods consistently achieve better
and
scores than the baseline. Although the use of Ameva brings an average improvement of 6.25%, te F1 scores on subjects 1 and 3 are close to the baseline. The current multi-class problem is decomposed using a one-vs.-all decomposition, i.e., there are
m binary classifiers in charge of distinguishing one of the
m classes of the problem. The learning datasets for the classifiers are thus imbalanced. As shown in
Table 2, the choice of ur-CAIM corroborates the fact that this method is suitable for unbalanced dataset since it improves the average F1 scores by over 11%.
Figure 3 illustrates the feature reduction rates produced by MOFSD-GR
and MOFSD-GR
across all 17 gestures of the Opportunity dataset. The following analysis are made.
- 1.
The ur-CAIM criterion consistently leads to a better reduction rate (close to 80% in mean). Therefore, from a design point of view, the effectiveness of sensors—and their ideal placements—to recognize a specific activity are more identified.
- 2.
The Ameva criterion achieves a more stable standard deviation in the reduction rate across all subjects than the ur-CAIM criterion.
- 3.
Since MOFSD-GR achieves a better recognition rate than the baseline, its implied reduction capabilities are still acceptable (>40%).
Figure 3 and
Figure 4 depict the number of discretization points yielded by the two discretization strategies across all 17 gestures of the Opportunity dataset. From the results, the following assessment can be made.
- (1)
As intended by the nature of Ameva, MOFSD-GR
yields a small number of cut points close to the constraint imposing that the template be made of at least three distinct discretization points (
18). However, this advantage seems to limit the exploration capacity of C-MOEA/DD since only half of the original features are discarded.
- (2)
In contrast, MOFSD-GR tends to generate larger discretization schemes than MOFSD-GR. Since the ur-CAIM criterion aggregates two conflicting objectives (CAIM aimed to generate a lower number of cut points, and the pair CAIR and CAIU advocates a larger number), compromises are made.
Table 3 and
Table 4 present more detailed results. They recapitulate the average,
, and standard deviation, SD, of the number of cut points (
) produced and features selected (
) by MOFSD-GR
and MOFSD-GR
, respectively. Please note that no substantive conclusions could be drawn from the intersections between the following sets of selected features from (1) a particular subject, (2) a particular gesture, and (3) a particular gesture and fold due to the one-vs.-all decomposition approach used for this multi-class problem.