Figure 1.
The multi-channel asymmetric auto-encoder. The multi-channel asymmetric auto-encoder is composed of an encoder block that dedicates one encoder to each sensor channel and a decoder block to reconstruct every signal in the output. Every encoder starts with a batch normalization layer followed by four repetitions of the convolutional layer and max pooling layer. Each decoder is made up of a batch normalization layer followed by four deconvolutional layers. All of the convolutional and deconvolutional layers share the same features. The max pooling layer is utilized for down-sampling. It is also used to examine the impact of signal reconstruction on representation learning as the unsupervised baseline.
Figure 1.
The multi-channel asymmetric auto-encoder. The multi-channel asymmetric auto-encoder is composed of an encoder block that dedicates one encoder to each sensor channel and a decoder block to reconstruct every signal in the output. Every encoder starts with a batch normalization layer followed by four repetitions of the convolutional layer and max pooling layer. Each decoder is made up of a batch normalization layer followed by four deconvolutional layers. All of the convolutional and deconvolutional layers share the same features. The max pooling layer is utilized for down-sampling. It is also used to examine the impact of signal reconstruction on representation learning as the unsupervised baseline.
Figure 2.
The supervised baseline. Inspired by the unsupervised multi-channel asymmetric auto-encoder, this supervised baseline only uses the encoder block to perform activity classification. It is also used to examine the recognition performance in the absence of signal reconstruction.
Figure 2.
The supervised baseline. Inspired by the unsupervised multi-channel asymmetric auto-encoder, this supervised baseline only uses the encoder block to perform activity classification. It is also used to examine the recognition performance in the absence of signal reconstruction.
Figure 3.
The proposed multi-task learning model (MCAE). The proposed model architecture combines the unsupervised (signal reconstruction) and supervised (HAR classification) tasks. Thus, this multi-task learning model is formed of the multi-channel asymmetric auto-encoder and the classification head for the HAR task. The classification head flattens the latent features and passes them to a batch normalization layer followed by a dense layer with Softmax activation function.
Figure 3.
The proposed multi-task learning model (MCAE). The proposed model architecture combines the unsupervised (signal reconstruction) and supervised (HAR classification) tasks. Thus, this multi-task learning model is formed of the multi-channel asymmetric auto-encoder and the classification head for the HAR task. The classification head flattens the latent features and passes them to a batch normalization layer followed by a dense layer with Softmax activation function.
Figure 4.
The loss and Root Mean Square Error (RMSE) of the unsupervised baseline for signal reconstruction using different activation functions on the UCI-HAR dataset. Although all the loss values look similar, Selu performs better, which is represented when looking at RMSE.
Figure 4.
The loss and Root Mean Square Error (RMSE) of the unsupervised baseline for signal reconstruction using different activation functions on the UCI-HAR dataset. Although all the loss values look similar, Selu performs better, which is represented when looking at RMSE.
Figure 5.
Signal reconstruction from the UCI-HAR dataset using the unsupervised baseline and Selu activation function. (a) shows the worst signal reconstruction in the dataset regarding RMSE which is from walking activity. On the other hand, (b) presents the best reconstruction of walking activity as a comparison.
Figure 5.
Signal reconstruction from the UCI-HAR dataset using the unsupervised baseline and Selu activation function. (a) shows the worst signal reconstruction in the dataset regarding RMSE which is from walking activity. On the other hand, (b) presents the best reconstruction of walking activity as a comparison.
Figure 6.
Latent space comparison. As one sees, in (a), the latent space of the unsupervised baseline, one of the activities, lying, is far from the others. The other activities are closer to each other. They shape two clusters, though. We can see that walking activities shape one cluster on the top. Additionally, sitting and standing form another cluster in the middle. On the other hand, in the latent space of the supervised baseline, (b), activities are well separated. However, sitting and standing are pretty close, and there is a negligible overlap between walking and walking upstairs.
Figure 6.
Latent space comparison. As one sees, in (a), the latent space of the unsupervised baseline, one of the activities, lying, is far from the others. The other activities are closer to each other. They shape two clusters, though. We can see that walking activities shape one cluster on the top. Additionally, sitting and standing form another cluster in the middle. On the other hand, in the latent space of the supervised baseline, (b), activities are well separated. However, sitting and standing are pretty close, and there is a negligible overlap between walking and walking upstairs.
Figure 7.
In the latent space of the UCI-HAR dataset, activities are separable. There is only a small overlap between sitting and standing.
Figure 7.
In the latent space of the UCI-HAR dataset, activities are separable. There is only a small overlap between sitting and standing.
Figure 8.
The confusion matrix shows the model’s efficacy in recognizing activities in the UCI-HAR dataset with a high accuracy. As one can see, the overlap between sitting and standing in
Figure 7 is well represented in the matrix where the accuracy drops to 96%.
Figure 8.
The confusion matrix shows the model’s efficacy in recognizing activities in the UCI-HAR dataset with a high accuracy. As one can see, the overlap between sitting and standing in
Figure 7 is well represented in the matrix where the accuracy drops to 96%.
Figure 9.
In the latent space of the MHealth dataset, static activities, such as standing and lying, are far from the others. Activities performed in a fixed position such as the frontal elevation of arms and knee bending are well separated from the other activities.
Figure 9.
In the latent space of the MHealth dataset, static activities, such as standing and lying, are far from the others. Activities performed in a fixed position such as the frontal elevation of arms and knee bending are well separated from the other activities.
Figure 10.
The confusion matrix shows the performance of the proposed multi-task learning model on the MHealth dataset. We chose three random subjects for the test set who are independent of the training set.
Figure 10.
The confusion matrix shows the performance of the proposed multi-task learning model on the MHealth dataset. We chose three random subjects for the test set who are independent of the training set.
Figure 11.
The latent space of the model on the PAMAP2 dataset can clarify why the model performance is lower in comparison to the MHealth and UCI-HAR datasets. On the right side of the latent space, sitting, standing, and ironing shape one cluster. This is where, dissimilar to the other datasets, our model has a problem in separating static activities. This could be due to the fact that standing and ironing share similar patterns on the sensors attached to the ankle and chest.
Figure 11.
The latent space of the model on the PAMAP2 dataset can clarify why the model performance is lower in comparison to the MHealth and UCI-HAR datasets. On the right side of the latent space, sitting, standing, and ironing shape one cluster. This is where, dissimilar to the other datasets, our model has a problem in separating static activities. This could be due to the fact that standing and ironing share similar patterns on the sensors attached to the ankle and chest.
Figure 12.
As the confusion matrix of the trained model on the PAMAP2 dataset suggests, the model performs well in the HAR task except where it struggles in distinguishing vacuum cleaning and standing from ironing.
Figure 12.
As the confusion matrix of the trained model on the PAMAP2 dataset suggests, the model performs well in the HAR task except where it struggles in distinguishing vacuum cleaning and standing from ironing.
Figure 13.
The latent space of the multi-task learning model on the USC-HAD dataset. As one can see on the top right of the scatter plot, the two activities of elevator up and down are not distinctive. Additionally, they share the space with standing.
Figure 13.
The latent space of the multi-task learning model on the USC-HAD dataset. As one can see on the top right of the scatter plot, the two activities of elevator up and down are not distinctive. Additionally, they share the space with standing.
Figure 14.
Confusion matrix of the trained model on the USC-HAD dataset. The accuracy drop in the classification report is well represented in the confusion matrix where two activities, elevator up and elevator down, are classified as standing.
Figure 14.
Confusion matrix of the trained model on the USC-HAD dataset. The accuracy drop in the classification report is well represented in the confusion matrix where two activities, elevator up and elevator down, are classified as standing.
Figure 15.
The latent space of multi-task learning model on the alpine skiing dataset. As one can see, Parallel Basic—Long and Parallel Dynamic—Short are far from each other as an indication of why there is no confusion between these two techniques in the confusion matrix.
Figure 15.
The latent space of multi-task learning model on the alpine skiing dataset. As one can see, Parallel Basic—Long and Parallel Dynamic—Short are far from each other as an indication of why there is no confusion between these two techniques in the confusion matrix.
Figure 16.
The confusion matrix of the proposed model on the alpine skiing dataset shows that the model has some difficulty in distinguishing Parallel Dynamic—Short from the other activities. As it is represented in the latent space, there is no confusion between Parallel Basic—Long and Parallel Dynamic—Short.
Figure 16.
The confusion matrix of the proposed model on the alpine skiing dataset shows that the model has some difficulty in distinguishing Parallel Dynamic—Short from the other activities. As it is represented in the latent space, there is no confusion between Parallel Basic—Long and Parallel Dynamic—Short.
Table 1.
The table represents the number of activities and their types (Activities of Daily Life (ADL) or Sports), the number of sensors and their types, and the number of subjects in each dataset. A: Accelerometer, G: Gyroscope, M: Magnetometer, ECG: Electrocardiogram, HR: Heart Rate.
Table 1.
The table represents the number of activities and their types (Activities of Daily Life (ADL) or Sports), the number of sensors and their types, and the number of subjects in each dataset. A: Accelerometer, G: Gyroscope, M: Magnetometer, ECG: Electrocardiogram, HR: Heart Rate.
Dataset | Activities | Activity Type | Sensors | Frequency | Subjects |
---|
UCI-HAR | 6 | ADL | A, G | 50 Hz | 30 |
mHealth | 12 | ADL and Sports | A, G, M, ECG | 50 Hz | 10 |
PAMAP2 | 12 | ADL and Sports | A, G, M, HR | 100 Hz | 9 |
USC-HAD | 12 | ADL | A, G | 100 Hz | 14 |
Alpine Skiing | 4 | Sports | A, G, M | 50 Hz | 8 |
Table 2.
Classification report of the proposed multi-task learning model on the UCI-HAR dataset. In this report are F1 score, precision, recall, macro average, and weighted average F1 score. The support column shows the number of samples.
Table 2.
Classification report of the proposed multi-task learning model on the UCI-HAR dataset. In this report are F1 score, precision, recall, macro average, and weighted average F1 score. The support column shows the number of samples.
Label | Precision | Recall | F1 Score | Support |
---|
walking | 0.99 | 1.00 | 1.00 | 496 |
walking upstairs | 1.00 | 0.99 | 0.99 | 471 |
walking downstairs | 1.00 | 1.00 | 1.00 | 420 |
sitting | 0.96 | 0.96 | 0.96 | 491 |
standing | 0.96 | 0.96 | 0.96 | 532 |
lying | 1.00 | 1.00 | 1.00 | 537 |
accuracy | | | 0.99 | 2947 |
macro avg | 0.99 | 0.99 | 0.99 | 2947 |
weighted avg | 0.99 | 0.99 | 0.99 | 2947 |
Table 3.
Classification report of the proposed multi-task learning model on MHealth dataset.
Table 3.
Classification report of the proposed multi-task learning model on MHealth dataset.
Label | Precision | Recall | F1 Score | Support |
---|
Standing still | 1.00 | 1.00 | 1.00 | 141 |
Sitting and relaxing | 1.00 | 1.0 | 1.0 | 141 |
Lying down | 1.00 | 1.00 | 1.00 | 141 |
Walking | 1.00 | 1.00 | 1.00 | 141 |
Climbing stairs | 1.00 | 1.00 | 1.00 | 141 |
Waist bends forward | 1.00 | 1.00 | 1.00 | 137 |
Frontal elevation of arms | 1.00 | 1.00 | 1.00 | 132 |
Knees bending (crouching) | 1.00 | 1.00 | 1.00 | 139 |
Cycling | 1.00 | 1.00 | 1.00 | 141 |
Jogging | 1.00 | 0.95 | 0.97 | 141 |
Running | 0.95 | 1.00 | 0.98 | 141 |
Jump forward and back | 1.00 | 1.00 | 1.00 | 45 |
Accuracy | | | 0.99 | 1581 |
Macro avg | 0.99 | 0.99 | 0.99 | 1581 |
Weighted avg | 0.99 | 0.99 | 0.99 | 1581 |
Table 4.
Classification report of the proposed multi-task learning model on the PAMAP2 dataset.
Table 4.
Classification report of the proposed multi-task learning model on the PAMAP2 dataset.
Label | Precision | Recall | F1 Score | Support |
---|
Lying | 0.96 | 0.98 | 0.97 | 357 |
Sitting | 0.95 | 0.93 | 0.94 | 379 |
Standing | 0.91 | 0.87 | 0.89 | 353 |
Walking | 1.00 | 0.97 | 0.99 | 439 |
Running | 1.00 | 0.96 | 0.98 | 360 |
Cycling | 0.96 | 0.99 | 097 | 342 |
Nordic walking | 1.00 | 0.99 | 0.99 | 403 |
Ascending stairs | 0.89 | 0.94 | 0.91 | 206 |
Descending stairs | 0.94 | 0.88 | 0.91 | 179 |
Vacuum cleaning | 0.94 | 0.88 | 0.91 | 346 |
Ironing | 0.90 | 0.99 | 0.94 | 540 |
Rope jumping | 1.00 | 0.95 | 0.97 | 56 |
Accuracy | | | 0.95 | 3960 |
Macro avg | 0.95 | 0.94 | 0.95 | 3960 |
Weighted avg | 0.95 | 0.95 | 0.95 | 3960 |
Table 5.
Classification report on the USC-HAD shows serious confusion between standing, elevator up, and elevator down.
Table 5.
Classification report on the USC-HAD shows serious confusion between standing, elevator up, and elevator down.
Label | Precision | Recall | F1 Score | Support |
---|
Walking Forward | 0.98 | 0.95 | 0.96 | 1004 |
Walking Left | 0.98 | 0.98 | 0.98 | 473 |
Walking Right | 0.97 | 1.00 | 0.99 | 465 |
Walking Upstairs | 0.87 | 0.88 | 0.88 | 235 |
Walking Downstairs | 0.93 | 0.92 | 0.92 | 200 |
Running Forward | 0.94 | 1.00 | 0.97 | 279 |
Jumping Up | 1.00 | 0.90 | 0.95 | 232 |
Sitting | 0.99 | 0.99 | 0.99 | 554 |
Standing | 0.62 | 0.93 | 0.74 | 574 |
Sleeping | 1.00 | 1.00 | 1.00 | 700 |
Elevator Up | 0.47 | 0.31 | 0.37 | 317 |
Elevator Down | 0.47 | 0.25 | 0.32 | 333 |
Accuracy | | | 0.88 | 5366 |
Macro avg | 0.85 | 0.84 | 0.84 | 5366 |
Weighted avg | 0.87 | 0.88 | 0.87 | 5366 |
Table 6.
Classification report on the alpine skiing dataset shows the highest confusion on the Parallel Basic—Short, where it is classified as Parallel Dynamic—Long.
Table 6.
Classification report on the alpine skiing dataset shows the highest confusion on the Parallel Basic—Short, where it is classified as Parallel Dynamic—Long.
Label | Precision | Recall | F1 Score | Support |
---|
Parallel Basic—Long | 0.96 | 0.93 | 0.94 | 356 |
Parallel Basic—Short | 0.92 | 0.84 | 0.88 | 231 |
Parallel Dynamic—Long | 0.84 | 0.94 | 0.89 | 269 |
Parallel Dynamic—Short | 0.95 | 0.94 | 0.94 | 241 |
Accuracy | | | 0.92 | 1097 |
Macro avg | 0.92 | 0.91 | 0.91 | 1097 |
Weighted avg | 0.92 | 0.92 | 0.92 | 1097 |
Table 7.
Comparison to the SOTA and baselines. A comparison of our supervised baseline (STL), multi-task learning with shared encoder and decoder (CAE), and the proposed multi-task learning multi-channel AE (MCAE) with the state of the art based on F1 score.
Table 7.
Comparison to the SOTA and baselines. A comparison of our supervised baseline (STL), multi-task learning with shared encoder and decoder (CAE), and the proposed multi-task learning multi-channel AE (MCAE) with the state of the art based on F1 score.
| Model | UCI-HAR | MHealth | PAMAP2 | USC-HAD | Alpine Skiing |
---|
SOTA HAR Models | Zhang et al. [53] | 98.42 | - | - | - | - |
Abedin et al. [47] | - | - | 90.08 | - | - |
Li et al. [49] | - | - | 97.35 | - | - |
Zhang et al. [54] | | | 99.00 | 86.00 | - |
Sena et al. [48] | - | 93.49 | 75.82 | 80.65 | - |
Auh et al. [43] | - | 96.37 | 85.85 | - | - |
Abbaspour et al. [44] | - | - | 95.12 | - | - |
Tong et al. [50] | 95.45 | - | - | - | - |
Ek et al. [51] | 97.67 | - | - | - | - |
Single Task Learning | Supervised Baseline (STL) | 97.53 | 99.23 | 92.66 | 76.17 | 87.64 |
Multi-Task Learning | Classical AE (CAE) | 94.61 | 97.33 | 91.00 | 75.69 | 62.08 |
Multi-Channel AE (MCAE) | 98.55 | 99.58 | 94.88 | 83.90 | 91.24 |
Table 8.
A comparison of F1 score, multi-task learning with our proposed multi-channel AE (MCAE) trained by Selu and Relu, and multi-task learning with shared encoder and decoder (CAE).
Table 8.
A comparison of F1 score, multi-task learning with our proposed multi-channel AE (MCAE) trained by Selu and Relu, and multi-task learning with shared encoder and decoder (CAE).
Model | UCI-HAR | MHealth | PAMAP2 | USC-HAD | Alpine Skiing |
---|
Ours with Selu | 98.55 | 99.58 | 94.88 | 83.90 | 91.24 |
Ours with Relu | 95.48 | 99.93 | 93.95 | 71.64 | 86.88 |
CAE | 94.61 | 97.33 | 91.00 | 75.69 | 62.08 |
Table 9.
A comparison in terms of the number of parameters and F1 score on the UCI-HAR and PAMAP2 datasets. Reported values by [
46,
49] are weighted F1 and accuracy, respectively. M: Million, K: Thousand.
Table 9.
A comparison in terms of the number of parameters and F1 score on the UCI-HAR and PAMAP2 datasets. Reported values by [
46,
49] are weighted F1 and accuracy, respectively. M: Million, K: Thousand.
Model | UCI-HAR | PAMAP2 |
---|
Number of Parameters | F1 Score | Number of Parameters | F1 Score |
---|
Ours | 90 K | 98.55 | 258 K | 94.88 |
CAE | 9.7 K | 94.61 | 29 K | 91.00 |
Tong et al. [50] | 1.1 M | 95.45 | - | - |
Ek et al. [51] | 1.27 M | 97.67 | - | - |
Li et al. [49] | - | - | 185 K | 97.35 |
Gao et al. [46] | - | - | 3.51 M | 93.16 |