1. Introduction
One of goals of smart home development is to provide residents a comfortable and safe living space [
1,
2]. Smart homes are expected to be able to prompt or warn residents about their health condition [
3,
4,
5,
6] by recognizing and forecasting upcoming daily activity [
7,
8,
9,
10]. As far as daily activity forecast is concerned, category forecasting and occurrence time forecasting of daily activities are two key tasks. Category forecasts are devoted to forecasting which daily activity is about to occur. Occurrence time forecasts are devoted to forecasting when a given daily activity occurs.
So far, category forecasts and occurrence time forecasts of daily activity have been explored separately rather than as a whole [
11,
12,
13,
14]. Daily activity forecasts are usually separated into several independent sub-tasks, and then the results of sub-tasks are combined. However, these serial and separate approaches perform not well enough [
15].
To improve the performance of daily activity forecasting, this paper proposes a forecast model based on multi-task learning. The proposed model assumes that the category forecasts and occurrence time forecasts of daily activities are related to each other. Combining the two forecast tasks into a network model can not only ensure their co-training, but also promote the generalization and performance of the model by weighing the training information in the two related tasks.
The key contributions of this paper are:
- (1)
A daily activity forecast model based on multi-task learning is proposed. The proposed model decomposes the features of recent sensor events, and then constructs the forecast model from these generated features using multi-task learning technology.
- (2)
The proposed model is evaluated in detail on five distinct datasets.
This paper is organized as follows:
Section 2 reviews related work.
Section 3 introduces the problem formulation.
Section 4 describes the datasets used.
Section 5 describes the forecast model of multi-task learning in hybrid networks.
Section 6 provides regression and classification tasks metrics.
Section 7 discusses different task loss weights and sliding window sizes, further validates the proposed approach and analyzes the results. Finally,
Section 8 concludes this paper with a brief summary of our findings.
2. Related Work
For category forecasting of daily activity, Gopalratnam et al. proposed a probabilistic method based on an improved Markov model [
16] without considering the uncertainty of daily activities. Alam et al. employed the SPEED model to forecast daily activity categories via analyzing the sequences of daily activities that were occurring [
17]. This method was further refined by an All Discoverable Episodes (SPADE) model [
12]. Channe et al. used an Apriori model to mine frequent control sequences in sensor data [
18]. Similarly, due to the chaos of the control sequence in different time periods, the performance of the prediction results was poor. Neural networks were also used in sequence prediction research and achieved improved performance. Sungjoon et al. used a hybrid network framework to predict various daily activities [
19]. A recursive neural network (RNN) [
20,
21,
22,
23,
24] and LSTM network [
13] were used in daily activity forecasting in an in-depth study. Although these neural networks improved the forecast performance to some extent, they only trained the category forecast model of daily activities and ignored the time information of the daily activities themselves.
For occurrence time forecast of daily activity, popular models include the autoregressive moving average (ARMA) and autoregressive integrated moving average (ARIMA) [
25]. Scellato et al. forecasted the timing and duration of daily activities by analyzing the average of data from previous similar sequences [
26]. Rule-based models were employed for occurrence time forecasts, but these could not account for more complex daily activities [
2,
27]. A non-linear autoregressive network (NARX) was used to predict the start and end time of sensor activation, but it was not effective in the relevant prediction of daily activities [
28]. Mahmud et al. forecasted the next daily activity occurrence time based on the Poisson process [
29]. Similarly, Minor et al. independently trained a predictive regression model for occurrence time of specified daily activities based on additional feature sets [
14,
30]. Due to their accessibility limitations, it was not always feasible to add additional feature sets to the model.
For daily activity forecasting, Nazerfard et al. used Bayes networks to forecast daily activity. Nazerfard et al. constructed a normal mixture model based on an expectation maximization (EM) algorithm to obtain the occurrence time range. Since the time forecast relied heavily on the activity label predicted in the previous step, error propagation easily occured [
15]. The combination of LSTM and k-means was used to solve the prediction problem of the next sensor event, but they were essentially independent models for sensor and trigger time forecast [
31]. To our best knowledge, all of prior forecast strategies dealt with a certain forecast task independently without the parallel training of the two tasks. Thus, the correlation information of the original related tasks was missing.
Multi-task learning has replaced previously conventional independent learning with multiple related tasks. The aim was to improve the model generalization ability [
32,
33,
34,
35]. Neural network-based multi-task learning has been applied in many fields [
36,
37,
38,
39]. Long et al. added matrix priors into the full connection layer to learn the relationships between tasks. Due to the need for a predefined shared structure, there was an error bias for the new tasks [
40]. Cross-stitch networks solvd problems without universality for multi-task network structures, but many parameters in the model were redundant [
41]. Reference [
42] was similar to [
41] in essence, but the algorithm was relatively simple. Li et al. utilized a 3D CNN combined with multi-task learning to extract spatiotemporal features. Attention-based LSTM was then used for feature embedding, but the outliers were not handled effectively, which could affect the model performance [
43]. According to the needs of each task, CNNs stochastic filter groups grouped the convolution kernel of each convolution layer [
44]. There are some other networks such as branched multi-task networks [
45], sluice networks [
46] and learning sparse sharing [
47] to address multiple task sharing issues, but it was difficult to train them due to the high complexity of the model. There are also low supervision [
48] and self-supervised learning [
49] which are used to do part-of-speech tagging or other issues in the NLP field. In the image application field, Yang et al. extended the model parameter division to obtain the correlation coefficient between shared parameters and tasks [
50]. Reference [
51] described a soft attention mask which learned jointly with features in the shared network to maximize the generalization of shared features in multiple tasks.
The proposed approach falls into the field of daily activities forecasting in smart homes. To our best knowledge, the state of the art has focused on either forecasting daily activities or forecasting the time when a given daily activity will occur. For the approach presented in this paper, multi-task learning is firstly employed to forecast daily activity. Compared with the state of the art, the proposed approach performs these two tasks as a whole. Based on the nature of multi-tasks, this paper presents a multi-task learning approach for daily activities forecast. The proposed approach features that each task forecast result learns mutually and iteratively in order to improve the forecast performance of each task.
4. Dataset Description
Five publicly available datasets: “MavLab”, “Adlnormal”, “Cairo”, “Tulum2009” and “Aruba” were used to evaluate the proposed approach [
52,
53]. “MavLab” was published by University of Texas. “Adlnormal”, “Cairo”, “Tulum2009” and “Aruba” were published by the Center for Advanced Studies in Adaptive Systems (CASAS). The kinds of sensors, locations of sensors, categories of involved daily activities are displayed in
Table 1.
The training data in this paper includes a series of raw sensor events
E = {
e1,
e2,...,
en}. As shown in
Figure 2, one sensor event
e is recorded per line, which is expressed as four tuples:
e = (
D,
T,
I,
R).
D and
T are the date and time when
e was generated;
I is the identification of the active sensor, and R is the sensor reading. For example, the sensor event shown in line 7 was generated at 07: 58: 45.794425 on 2011-06-15. The activated sensor is M008 and the reading is ON, and the sensor event labeled the beginning of eating activity.
We further assume the context of the sensor event to calculate the feature vector X ∈ F for the most recent sensor event e. We also establish a multi-task learning forecast model to make multiple forecast outputs have higher test results (such as F-score and NRMSE).
5. Method
Here, the details of the proposed method are described. The overall the framework involves three steps: initial features generation, model architecture and training.
5.1. Initial Features Generation
For the sequence of sensor events activated by daily activities, the initial feature value
X of the most recent sensor event for model training is generated by Algorithm 1. Algorithm 1 is divided into two phases. In the first phase (lines 2–5), the temporal features of the most recent sensor events are extracted. In the second phase (lines 6–15), the recent sensor event space features are solved according to the deployed sensor identifications
S.
Algorithm 1. Generate initial features group |
Input:S, deployed sensor identifications in smart house E, A sequence of sensor events activated in the window Output: XX←∅; Tef ←getFirstSensorEventTime(E); //Get time of first sensor event ef in E. Tel ←getLastSensorEventTime(E); //Get time of last sensor event el in E. Δt←ComputTimeInterval(Tef, Tel); //Comput time interval of ef, and el. X←X∪{Tef, Tel, Δt }; IS←{I|I ∈ E(e)}; //Extract set of I in E for eachIinS ifI ∈ IS then IN←ComputNumberSensor(I, IS); //Calculate the frequency of sensor I at IS. X←X∪{IN}; else X←X∪{0}; end if end for returnX
|
5.2. Model Architecture
Multi-task learning based on a neural network is a common method in practical application. Caruana demonstrated early success in this research field [
54]. Next, we propose a brief overview of our multi-task architecture. The network architecture mines deeply the input data in both vertical and horizontal direction, which is shown in
Figure 3.
Each task forecasts the next activity information from the most recent sensor event. One is to forecast the next daily activity category of most recent sensor event. The other is to forecast the start time of the daily activity. In the multi-task learning, the two tasks are co-trained to boost the performance of the forecast model.
In particular, the related feature group X = {x1,x2,...,xn} is input into the one-dimensional convolutional (Conv1D) layer to extract short-term patterns of the series. The Conv1D layer has 32 one-dimensional filters of size 5. It is followed by the rectified layer unit (ReLU) as a non-linear activation function. A max pooling layer is stacked on top of the convolutional layers. This reduces the latent representation dimension and computation in the network. It is a moving window of size 2, where the maximum value within each window corresponds to the output. The latent-space consists of two shared Bi-LSTM layers of 32 and 16 units. Bi-LSTM helps efficiently discover more high-level features at different time scales, which results in improvement of the forecast performance. The features vector is then passed to shared dense layers, followed by ReLU and dropout (rate 0.2). Finally, the shared features vector is passed to two independent dense layers. One dense layer (activation Softmax) makes classification judgment. And the probability values of category of the next daily activity are output. The other (activation ReLU) makes a regression judgment and outputs the time at which the activity occurred. Task-specific loss functions are then used to learn the weights of the network.
5.3. Training
Network training in this paper is a multiple regression and classification problem. Hence, it involves different loss functions for activity detection and time estimation training.
5.3.1. Category Forecast of Daily Activity
The forecast model of daily activity can estimate the next most possible activity class
ak ∈
A for the features in the most recent sensor event, where
A = {
a1,
a2, ...,
aK}. Therefore, the sparse categorical cross entropy loss function given in (1) is used to train the activity detection task:
In Equation (1), aik is the i-th sample ground-truth daily activity category, and âik is the predicted probability of the target daily activity category. The probability values âik is obtained from the last fully connected layer for the network model.
5.3.2. Occurrence Time Forecast of Daily Activity
For outliers with large differences in the dataset, we use the Huber loss function to avoid the impact of outliers to a certain extent, making training more robust to outlier. The Huber loss function is defined in Equation (2):
where
ŷi is the estimated occurrence time value of the
i-th sample,
yi is the real value, and
δ is a Huber loss hyperparameter. The choice of
δ determines the behavior of the model in dealing with outliers.
The objective of this paper is to minimize joint losses for all tasks. In particular, the joint loss function
Lfull is defined by the average weighted loss of all task-specific losses:
where the weight parameters
λA,
λT are determined by the importance of the task in the overall loss. More penalties are imposed for errors on the primary task. Hence, we set the weight to 10 times that of the second task.
In the forecast model of multi-task learning, as shown in
Figure 3, two tasks share features and network structure together during iterative training. They are separated at the last fully connected layer. In each iteration of the iterative model, randomly select one task from
M tasks and update the model according to the task-specific target. Algorithm 2 is repeatedly executed until the maximum epoch number
T of training models is reached.
Algorithm 2. Training algorithm for multi-task forecast model |
Input:F, Sequence of training datasets for two forecast tasks. y*, the ground-truth output values Output:ŷ*, predicted valuesP←0; //initializes model parameters whilet ≤ T do for each subtask m in M Lm←lossfunctionm(F); //loss for the mth task Δpm←calculatesgradientdescent(Lm); //Adam algorithm is used to calculate Δpm gradient descent. end for ; end while returnŷ*
|
7. Experiments and Discussion
7.1. Experimental Setup
Five public datasets “MavLab”, “Adlnormal”, “Cairo”, “Tulum2009” and “Aruba” are used to evaluate the proposed model. The first dataset is the MavLab dataset collected in the MavHome testbed at the University of Texas (Arlington, TX, USA) [
52]. Others are collected from CASAS smart home and provided by the Washington State University [
53]. Details of the five datasets are shown in
Table 2.
We use sliding windows to train and test the proposed model. This method uses a fixed-length sliding window and moves it across the datasets to segment time series data. The last event in the window is taken as the most recent sensor event. The initial feature group of the event in the window is extracted using Algorithm 1. Then the window is moved forward the specified step size (number of sensor events), and the process is repeated. Finally, the sample data are randomly divided into training (60%), verification (20%) and testing (20%).
Table 3 provides some specific parameter settings during model training.
We evaluate two recurring factors that affect the forecaster performance. The first is the weight setting of the joint loss function of multi-task. It uses the loss-weighted sum method of different tasks. The second is the size of the training window. It takes into account the impact on performance of information about the context of recent sensor events. Compared with the best single-task learning models, the proposed model achieves better performance.
7.2. Comparison of Different Loss Weight
Table 4,
Table 5 and
Table 6 show the forecast performance of the multi-task forecast method proposed in this paper. We compare the evaluation metrics of the classification task and regression task of three groups of loss weights on three datasets (Cairo, Tulum2009 and Aruba). 1000, 2000, 3000, 4000 and 5000 are assigned to the training window size to evaluate the learning capabilities of the models. The best results are highlighted in bold underline.
The deep learning forecast model under the third set of loss weight values (λA = 1, λT = 0.1) is generally better than the two groups. This explains that under the optimal weight, multi-task can better coordinate training and promote each other to improve generalization skills. For the Cairo dataset, performance of the third group is significantly higher than the other two groups in terms of F-score and NPMSE when 1000, 3000, 4000, 5000 are assigned to window size. The F-score of the category forecast of daily activity are 0.9459, 0.9451, 0.9255 and 0.9405, respectively. There are 3.11%, 4.5%, 1.46%, and 2% improvements over the better outcomes of the first two groups. For occurrence time forecasts of daily activity, the NRMSE values increase by at least 6.57% compared with the results in the first two groups. For the Tulum2009 dataset, the third set beat the other groups when 1000, 4000, 5000 are assigned to window size. F-score can increase by 8.52%, 5.86%, and 1.5%. NPMSE values increase by 3.79%, 1.8%, and 7.15%. The Aruba dataset also gets the same result pattern, which outperform the other groups when 2000, 4000 are assigned to window size. Although it lags behind other groups in the rest of the window, other metrics of the model still perform well. Therefore, seeking better weight settings plays an important role in improving the performance of the model.
Furthermore, to promote the performance comparison of the model in the window size, the six evaluation metrics of all weight settings are averaged in this paper.
Figure 4 shows that the forecast model performs better under the third set of weights than the other two sets. In particular, category forecast of daily activity achieves significant improvements in the classification evaluation metrics.
7.3. Training Window Size
Based on the results in
Table 4,
Table 5 and
Table 6, the performance of a multi-task forecast model is partially dependent on the size of the used training window. Therefore, we perform relevant verification in
Table 7,
Table 8 and
Table 9. In the tests, the relatively small training window can obtain a sufficient number of test points to calculate evaluation indexes. Although overfitting of the model is prevented, this results in the lack of information about daily activity. Oversize the training window can result in performance degradation. Therefore, sliding windows of 1000, 2000, 3000, 4000 and 5000 events are used to determine the effect of training window size on performance. For all tests, each iteration moves the window forward 20 events.
Table 7,
Table 8 and
Table 9 show the test results of six average metrics of the three weight values. These tables indicate that the optimal training window size may vary between datasets and activities. For the Cairo dataset, the overall evaluation value of the forecast model is not stand out in each window size. Therefore, model performance is not sensitive to window size. For the Tulum2009 dataset, the model with the highest performance is the model with a sliding window size of 3000.
For example, the average F-score in this test has the best effect of 0.8409. The average NRMSE value also improves by at least 1.8% compared with the performance of other tests. The Aruba dataset also beat the other tests in 2000. The average Recall is 0.8477, which is slightly behind the results of the 4000 window size model. But the model still performs best in other average evaluation indicators in 2000. The optimal window size varies in different datasets.
7.4. Daily Activity Forecast for Multi-Task and Single-Task
To check the effectiveness of the multi-task learning model for each task in daily activity forecast, we select the benchmark method to compare the performance of each forecast task. In addition, based on the test results of the above two parts, the loss weights for multi-task learning are all set as λA = 1, λT = 0.1.
Firstly, for category forecasts of daily activity, we compare the proposed model (multi-task CNN+Bi-LSTM) with SPADE [
14], LSTM [
22] and CNN+Bi-LSTM models in two datasets (Adlnormal, MavLab). The experimental results are shown in
Table 10. Compared with the benchmark method, the forecast performance of the proposed method is significantly improved. At the same time, the
Accuracy of the two datasets are 0.9323 and 0.8673, respectively. In particular, it achieve at least 2.93%, 2.22% improvements over other benchmark models. This shows that the proposed model has good performance in task of category forecast of daily activity.
Secondly, for occurrence time forecast of daily activity, the proposed model is compared with other single-task learning models to check the generalization ability of models. We use three datasets (Cairo, Tulum2009 and Aruba) in the three evaluation metrics (
NMAE,
NRMSE and
R2) mentioned above. The baseline methods include Bi-LSTM and CNN+Bi-LSTM. The specific results are shown in
Table 11 and
Figure 5.
For the Cairo dataset, the proposed model achieves 0.0971, 0.0224, and 0.965 at NMAE, NRMSE and R2. It has improvements of 35.44%, 24,49%, and 2.71% over the best benchmark. For the Tulum2009 dataset, this proposed model is also significantly higher than other benchmark methods. There are at least 18.76%, 11.68%, and 1.69% improvements in the three metric settings. The Aruba dataset also has the same result pattern, which is better than the previous two single-task learning methods. It demonstrates that the proposed model can effectively forecast occurrence time of a given daily activity.
7.5. Discussion
We discuss in this section a few crucial observations from our experiments. As shown in
Figure 4, all tests benefit from setting appropriate weights for different loss functions. This gain may be mainly due to the fact that the loss of the classification task and regression task is not a magnitude. The rate of gradient descent is not consistent. Thus, setting different loss weights can balance them to some extent.
We also analyze the effect of the size of the sliding window on the model. The optimal window size is different for different datasets. We believe that these differences may be caused by factors such as the type of sensor used in the dataset and the relationship between activities and sensor events. Furthermore, the selection of window size must balance the need for a sufficient number of events in the training window, and the need for the number of samples for model training and performance analysis.
The performance of our multi-task learning model is better than that of the single-task learning model on multiple datasets. This may be due to the fact that residents perform certain daily activities at fixed times. For example, the activity “work” might start at a set time. Or this is a habit of residents who start performing activities such as “sleep” and “cook” at a particular time. Therefore, there is a special correlation between the daily activities and their occurrence time. Then, multi-task learning technique may use this potential information to improve the results of these two forecast tasks. Moreover, variations between datasets also do not impact the predictions very much, which allows daily activity forecast model for multi-task learning to be applied in a variety of situations.
We note that the weight values of the loss function are manually adjusted in this paper, but the selected values are not necessarily the most appropriate. Consequentially, it may be necessary to perform further studies to automatically select the more appropriate loss weights. Besides, the forecast model needs to be further improved to dig deeper correlation information between daily activity forecast tasks.