1. Introduction
Human activity recognition from wearable sensors is a key element of many human-centric applications, such as smart personal assistants [
1], healthcare assessment [
2,
3,
4,
5,
6], sports monitoring [
7], and aging care [
8]. In HAR, typically one first collects a training set of examples of rich, multi-modal sensor observations, such as gravitational acceleration and Global Positioning System(GPS), labeled with corresponding activity annotations, and then uses this data to learn a machine learning classifier that predicts activities from sensor measurements [
9,
10].
Many state-of-the-art approaches assume the training and testing data to be independent and identically distributed. This assumption, however, usually does not hold in practice: sensor data for HAR are collected from a diverse pool of individuals, and behavior patterns are person-dependent [
11] owing to biological and environmental factors, meaning that the same activity can be performed differently by different individuals [
1]. In practice, while a certain number of participants’ data can be collected and annotated for training, the target users are usually not available at the training time [
9]. This is what defines cross-individual (or cross-subject) HAR.
The key challenge in cross-individual HAR is how to train a model on known users’ data and achieve good recognition performance on new-coming target individuals. A standard solution is to generalize information from well-known training individuals to the target using techniques from transfer learning and domain adaptation [
12]. For instance, Zhao et al. [
13] tackled cross-individual HAR using a transfer learning approach based on decision trees and
k-means clustering. Similarly, Wang et al. [
14] explored intra-class knowledge transfer and proposed a transfer learning model for cross-domain activity recognition tasks, while a previous study [
15] focused on the problem of cross-dataset activity recognition using an approach that extracts both spatial and temporal features. A previous study [
16] proposed another algorithm that adapts to the characteristics and behaviors of different individuals with reduced training data.
However, the success of domain adaptation approaches is not always guaranteed. Existing domain adaptation approaches assume that source and target domains share an identical label space, ignoring the fact that different behavior patterns can contribute to differently distributed sensory data across multiple individuals. If the source and target domains are not sufficiently similar or the source domains have low-quality labeled data, transferring from such a weakly related source may hinder the performance of the target, which is known as negative transfer [
12,
17]. The negative transfer phenomenon indicates that the source domain data and task contribute to the reduced performance of learning in the target domain, which happens in the real-world scenario where people perform diversely [
12]. Despite the fact that avoiding negative transfer is an important issue, little research work has been published to analyze or predict negative transfer for cross-individual HAR tasks. Another issue with these strategies is that they are entirely black-box: it is difficult to extract the reasons why information from a particular source individual was transferred to the target. This is problematic, partly because it makes it hard to identify bugs in the transfer learning step and partly because it prevents stakeholders from controlling and debugging the transfer process.
Motivated by these observations, we propose X-WRAP (eXplain, Weight and Rank Activity Prediction), a novel approach for cross-individual HAR designed for achieving high transfer performance, interpretability, and controllability and specifically tailored for realistic settings where training data for the target individual are scarce. X-WRAP takes an existing baseline transfer learning algorithm and learns a source selection model that identifies the (approximately) optimal source individuals for the given algorithm. The possible sources are first ranked using a linear scoring function using interpretable meta-features. The latter encodes properties of candidate sources as well as their relation to the target individual, enabling the scoring function to properly disambiguate between promising and unpromising candidates. Then, X-WRAP selects the higher-scoring candidates using a learned threshold and applies the baseline transfer algorithm to them to obtain a predictor for the target. The parameters of the scoring function and the threshold are specifically learned so that they generalize across different sources and targets, enabling applications to previously unobserved subjects about whom little is known. The learning problem itself involves repeatedly invoking and evaluating the baseline transfer algorithm, which can be computationally expensive. In order to cope with this, X-WRAP leverages state-of-the-art Bayesian optimization (BO) algorithms [
18]. Thus, this architecture offers several major technical novelties. First, X-WRAP facilitates the introspection of the transfer process by selecting candidate sources to improve the performance of cross-individual HAR tasks and avoid the negative transfer. Second, it is completely model- and transfer-algorithm agnostic, making it possible to use any state-of-the-art transfer learning. Finally, BO enables X-WRAP to find high-quality parameters while keeping the number of evaluations of the learning objective (and hence the number of calls to the costly baseline transfer algorithm) at a minimum.
Summarizing, our contributions are as follows:
We propose X-WRAP, a simple but effective approach for cross-individual HAR that optimizes source selection for transfer learning algorithms in a way that generalizes across individuals with heterogeneous sensor data.
We propose to use a Bayesian optimization strategy for training the meta-transfer learning framework in an efficient and model-agnostic fashion. The training process includes a data masking strategy implemented in the meta-learning loop.
X-WRAP builds on an interpretable ranking step and thus enables stakeholders to obtain a high-level understanding of the reasons behind the selective transfer and control (or debug) of the HAR system.
We report an extensive empirical evaluation of X-WRAP on a real-world dataset and several baseline transfer learning algorithms. Our results indicate that X-WRAP improves the post-transfer performance of cross-individual HAR. In addition, the label-level results indicate the consistent superiority of our approach for almost all activities. Furthermore, we set up experiments to prove that X-WRAP is explainable and controllable.
The remainder of the paper is structured as follows:
Section 2 positions X-WRAP with respect to existing approaches.
Section 4.3.1 defines the motivation of selective transfer learning for cross-individual HAR.
Section 3 introduces X-WRAP, the proposed explainable meta-transfer algorithm for cross-individual HAR, and
Section 4 describes and discusses our experimental evaluation of X-WRAP on real-world data. Finally,
Section 6 presents some concluding remarks and illustrates promising directions for future work.
3. Method
3.1. Problem Formulation
In the simplest case of HAR, examples annotated by some individuals are used to learn a machine learning classifier that generalizes to unseen inputs from the same individual. In this work, we tackle cross-individual human activity recognition, a more challenging and realistic setting where a HAR predictor has access to annotations from a whole pool of individuals and is applied to potentially different target individuals. Cross-individual HAR is very common in real-world applications.
In the following, sensor measurements (such as acceleration, GPS coordinates, etc.) are encoded as a vector and activities (e.g., “running”, “walking”, “swimming”) as a one-hot vector , where A is the number of alternative activities and is the annotation of activity . We assume access is given to N training sets , one for each individual , and a target user for which training data are scarce or absent. The goal is to compute a predictor that performs well for user t by leveraging training data available for one or more of the other users.
In cross-individual human activity recognition settings, each individual u has unique characteristics and behaviors. Different individuals also have different priors over activities. Formally, this means that both the prior over observations and the conditional distribution of sensor observations given priors depend strongly on the individual u.
3.2. Overall Architecture of X-WRAP
Based on the above motivation mentioned in
Section 4.3.1, we propose X-WRAP (eXplain, Weight and Rank Activity Prediction), a meta-transfer learning approach for cross-individual human activity recognition tasks, which is also specifically designed for interpretability and control. As shown in
Figure 1, our proposed model has three main modules to selectively transfer from the multiple source domains for cross-individual human activity recognition:
Firstly, a meta-training procedure is trained to get a meta-model for selecting proper source individuals according to various statistical meta-features.
Then, the meta-model is applied to the transfer learning model to selectively transfer knowledge from existing individuals. A transfer learning model is applied to adapt the model trained on selected source domains, which uses a domain classifier to minimize the distribution distance between source and target domains.
Based on these two procedures, we can then deploy our system to achieve selective and explainable HAR according to the parameters of the meta-model. In addition, the function of empowering the coordinator with control over the meta-model is designed, which can provide an explainable report and tool for adjusting the parameters in order to gain a better performance.
3.3. Meta-Transfer Mechanism
Specifically, X-WRAP takes a baseline transfer learning algorithm “
”, such as CORAL [
61], a set of candidate sources 𝑆 ⊆
, a target individual
, and an activity
and outputs a subset of sources
that is approximately optimal for predicting activity
a with
.
The core of the function
is the ranking step responsible for identifying the source individuals
. X-WRAP sorts potential sources using a linear scoring function
, defined as follows:
Here, the ’s are meta-features that capture salient information about the source s, action a, and target t, while the ’s are learned weights chosen specifically so as to score more beneficial source individuals higher than the others using the training procedure described below. Notice that the weights are shared by all sources and targets but differ across activities a; however, the score itself does depend on all three elements because of the meta-features.
After ranking the candidates, X-WRAP applies the
function to choose those that score above a certain threshold
, which is also a parameter to be learned. Specifically, our approach returns a selected subset
, defined as follows:
where
are learnable parameters, 𝑆 is the set of source domains, and
is the threshold to select proper source users for activity
a. Thus, Equation (2) can select the source domains that have scores higher than the threshold. X-WRAP then simply applies the transfer learning model to the selected individuals, obtaining a predictor for activity
a that leverages the information of sources in
. The transfer learning model here is a deep domain adaptation approach as shown in
Figure 1. The transfer model contains three parts, which is similar to the previous work [
62], which can maximize the similarity between data classes across domains. The first part is an LSTM-based feature extractor. Then, we use a domain classifier that discriminates between the source and the target domains during training. Finally, fully connected layers are applied for classifying activities. In addition, the proposed meta-learning architecture is model-agnostic, which means it can also be applied to any transfer learning approach to select proper source domains.
3.4. The Meta-Features
The meta-features are designed for data valuation to select the best source domains, which need to satisfy three desiderata. First, they should enable the ranking function to identify source individuals that are likely to provide useful information. For instance, candidates on which the HAR predictor performs poorly may be inadequate as sources for the transfer. Second, they should highlight similarities and dissimilarities between the distribution of activities and sensor measurements of the candidate source and target individuals. This further facilitates discriminating between promising and unpromising individuals for a given target. Third, they should be understandable to human stakeholders. This is essential for enabling sufficiently expert users to understand why selected individuals have achieved a high enough score and to provide corrective feedback on the behavior of X-WRAP.
Previous study [
63] surveyed the principles for selecting valuable data, which contain model-driven principles and data-driven principles. Model-driven principles refer to evaluating the data by training a model and measuring the data according to the performance on a validation set intuitively. Previous studies compute the metrics, such as accuracy and negated loss functions, as the validation of the performance to evaluate the data [
64,
65]. Data-driven principles indicate computing metrics, such as frequency, diversity, and similarity, to reference distribution directly on the dataset instead of the model. Frequency or monotonicity indicates the amount of the data and having more data is more valuable [
66]. Diversity indicates that the data points cover a larger region of the input space. The dataset can improve the model’s predictive performance if it contains more intra-diversity [
67,
68]. In transfer learning, similarity to the reference distribution is crucial for transfer performance. Tay et al. [
67] measured similarity using the translated negative maximum mean discrepancy (MMD) between the two distributions to evaluate the quality of the dataset.
Motivated by the previous studies, we implement two groups of meta-features to satisfy the desideratum, which are then concatenated by X-WRAP and used together in Equation (1). The meta-features in the first group capture information about a single individual, while the second group of meta-features is concerned with how dissimilar the source and target individuals are. The computations of the meta-features are shown in
Table 1. The meta-features can be categorized as follows:
Meta-features based on predictability: By predictability (or discriminability), we refer to the easiness of classifying a particular activity a using a supervised classifier, 1-nearest neighbor (1-NN), fit on the training set of an individual u. In order to estimate predictability, for each candidate source u, we take of the training set as a validation set to compute multiple metrics, including accuracy and score, and the rest as the training set.
Meta-features based on diversity: By diversity, we mean the intrinsic heterogeneity in its activity patterns. For each individual u, we measure this by computing the number of distinct activities that they perform and the Shannon entropy of the activity annotations available in the training set. These meta-features model the intrinsic difficulty of predicting the behavior of an individual u and are useful for preventing unpredictable individuals being used as sources.
Meta-features based on frequency: We also account for the frequency of the context labels by computing the number of times a certain activity occurs in the training annotations.
Meta-features based on dissimilarity: X-WRAP measures diversity using the maximum mean miscrepancy (MMD), a well known statistic that estimates how “different” two (empirical) distributions are and that has found ample application in hypothesis testing [
69] and domain adaptation [
70].
3.5. Fitting X-WRAP Using Meta-Learning
The only remaining element is the learning objective. Recall that our aim is to acquire the parameters for each activity a that ensure that function in Equation (2) selects source users that are maximally beneficial for the baseline transfer learning algorithm. Moreover, and crucially, we want these parameters to work well regardless of the choice of source and target individuals.
X-WRAPachieves this using a simple but effective meta-learning strategy. At a high level, the idea is to directly optimize the performance of the predictor’s output by the baseline transfer mechanism
when applied to the known users
. Formally, let
be a loss function that measures the quality of the output of the prediction by a classifier
f for activity
a and individual
t, e.g., the number of mistakes or the binary cross-entropy. In addition, let
be the predictor output by running the baseline transfer learning algorithm
on sources 𝑆 ⊆
. Then, the improvement in prediction performance due to transferring from 𝑆 compared to not transferring at all is as follows:
and the average improvement due to parameters
is given by the following:
Notice that the expectation runs over all possible choices of source individuals
and the overall best parameters are those that maximize it. In practice, however, this expectation cannot be computed because the ground-truth distribution of
is unknown. X-WRAPworks around this issue by optimizing a leave-one-out estimator of the expected benefit over the known users
, namely,
In words, for each activity , X-WRAP seeks parameters that lead to a high post-transfer performance when applying the baseline transfer learning algorithm to the subset 𝑆 of individuals known to the system , minus one hold-out individual u used as the target. Naturally, both training and validation data are available for these individuals, making it possible to compute the performance and features in an unbiased manner.
3.6. Training X-WRAP Using Bayesian Optimization
The issue with minimizing Equation (5) is that computing the loss involves invoking the baseline transfer learning algorithm
. The training and testing procedures are shown in
Figure 2. X-WRAP is a meta-learning structure and each round of the training process involves the transfer model, which is not cheap to evaluate. Therefore, we treat Equation (5) as an expensive global optimization problem and solve it using effective Bayesian optimization (BO) algorithms [
18,
71]. This brings two advantages. First, BO is designed for expensive-to-evaluate problems and can effectively keep the number of evaluations of the loss function within an acceptable budget, which is 300 rounds in this work. Second, BO does not need gradient information in order to explore the space of the candidate parameters and therefore can also work in our model- and algorithm-agnostic setting. In short, BO algorithms seek a global maximum of a black-box function
f by repeatedly sampling the value of the function at well-chosen inputs and learning a surrogate that matches the corresponding outputs. The query inputs are chosen so as to maximize the expected information about the structure of the function that they convey, estimated using the surrogate itself. The BO loop proceeds as in the following Algorithm 1.
Algorithm 1 X-WRAP’s training procedure: u is the training set of user u; t and a are the target user and activity, respectively; I is the max iteration budget; and is the space of possible parameters. |
- 1:
function () - 2:
for do - 3:
rank candidate sources according to (Equation (1)) - 4:
- 5:
- 6:
- 7:
return L -
- 8:
function (, a, I, ) - 9:
# Initialize parameter-loss data - 10:
sample at random from - 11:
- 12:
- 13:
# Optimize using Bayesian optimization - 14:
for do - 15:
fit surrogate on observations - 16:
- 17:
- 18:
- 19:
# Return best parameters found - 20:
- 21:
return
|
X-WRAP follows the meta-learning manner, which means that the parameters are fitted with all training users and then evaluated on a distinct set of test users. The training set only participates in the training procedure and is used to train the meta-model, as in the procedure in
Figure 2 (top figure). Specifically, in the training procedure, we split the training users into source users and target users with the leave-one-user-out setting. As for the testing procedure, all of the testing users are used for target users, and the selected best source users are chosen according to the trained parameters from training users (who work as source users), as shown in
Figure 2 (bottom figure). Considering that the computations of the meta-features do not need labels from target users, the proposed model is unsupervised. In this way, only the sensory data of testing users are used in the test procedure but the labels are not used.
3.7. Model Interpretation
Compared to traditional transfer learning, which is completely black-box, in X-WRAPit is straightforward to extract the reasons why a particular individual is selected (or not) from the scoring function. This follows from two aspects. First, like other interpretable approaches [
72] and explainability techniques [
73], X-WRAP relies on a linear model scoring function
from which the relative contribution of each meta-feature is easy to read off from the associated weight. Second, the meta-features are themselves easy to interpret for sufficiently expert stakeholders. Indeed, prediction accuracy and entropy are easy to assign intuitive meaning to for trained statisticians, while cross-individual diversity (modeled by MMD) can be specifically broken down along different and intuitive axes, such as geographical similarity. This is precisely what we do in our last experiment. This formulation ignores information overlap between sources, which is not a huge deal if
performs learning afterward. This could be solved by making use of submodular scoring functions or techniques from deep active learning, but this is not entirely straightforward.
6. Conclusions
We introduced X-WRAP, a transfer learning approach for HAR that combines favorable accuracy with explainability and controllability. X-WRAP leverages meta-learning to acquire an interpretable source selection strategy that is (approximately) optimal for any given baseline transfer learning algorithm. Our experiments show that X-WRAP often improves the performance of the baseline while enabling human control.
Our approach can be extended in several directions. First and foremost, extending X-WRAP to sequential prediction would bring performance gains and only require minimal changes to the meta-learning procedure. Moreover, in many settings activities are organized hierarchically, implying that distinct activities may be statistically and logically related to each other. X-WRAP could be easily generalized to this setting, and the relations between activities could be leveraged to design more fine-grained meta-features. Finally, an interactive version of X-WRAP tailored for wearable personal assistants would also be useful, as it would enable the machine to acquire supervision from the target individual that is most informative in terms of transfer learning.