Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

UCA-EHAR: A Dataset for Human Activity Recognition with Embedded AI on Smart Glasses

Appl. Sci. 2022, 12(8), 3849; https://doi.org/10.3390/app12083849

by Pierre-Emmanuel Novac^1,*

, Alain Pegatoquet¹

, Benoît Miramond¹ and Christophe Caquineau²

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Appl. Sci. 2022, 12(8), 3849; https://doi.org/10.3390/app12083849

Submission received: 22 November 2021 / Revised: 4 January 2022 / Accepted: 25 March 2022 / Published: 11 April 2022

(This article belongs to the Special Issue Sensor-Based Human Activity Recognition in Real-World Scenarios)

Round 1

Reviewer 1 Report

The abstract is too short and quite poorly written. The contributions are not very clear after we read the abstract. The introduction also fails to convince others that such a scheme is especially helpful to “fall prevention”.
The motivation of knowing the 20 activities of a person wearing smart glasses using the neural network deployed on it may need further justification.
When smart glasses are used, we thought the proposed method is a vision-based approach. However, as only a gyroscope, an accelerometer and a barometer are employed, the proposed scheme looks more like a body-worn-sensor approach.
The focus of the paper is not very obvious. The authors should describe a clearer research scenario first. Then a specific implementation is presented with the related issues being solved by collecting appropriate training data and careful designs.
Temporal segmentation seems a problem since it is not easy to have a series of frames (in the classification phase) within which only one activity exists.
What could be the problems of possible classification delay and wrong classifications in the detection/classification of the considered applications?
Writing seems a serious problem of the paper as many sentences and short paragraphs are not quite organized.
Future work includes adding transition activities to the considered classes. However, it may not easy to identify them correctly (even in the data labeling phase).
Further proofreading is necessary to correct many gramma errors.

Author Response

Dear reviewer,
thank you for taking your time to review our manuscript and for providing your feedback.

1. The abstract is too short and quite poorly written. The contributions are not very clear after we read the abstract. The introduction also fails to convince others that such a scheme is especially helpful to “fall prevention”.

→ The “fall prevention” aspect is indeed not the focus of our paper and, to the best of our knowledge, achieving a satisfying “fall prevention” solution yet remains an open issue.
This is therefore more of a long-term objective than a claim in this paper, and the work presented here is a step towards this goal.

2. The motivation of knowing the 20 activities of a person wearing smart glasses using the neural network deployed on it may need further justification.

→ Our dataset contains 20 subjects but only 8 activities. The choice of activities has been inspired by the UCI-HAR dataset as outlined in line 74. Additionally, these activities are simple (to perform), common and relevant for elderly activity monitoring. We specifically added the DRINKING activity because we believe dehydration can be a risk for the elderly.

3. When smart glasses are used, we thought the proposed method is a vision-based approach. However, as only a gyroscope, an accelerometer, and a barometer are employed, the proposed scheme looks more like a body-worn-sensor approach.

→ This is indeed what has been presented in this article as explained in the introduction and throughout the article when talking about inertial measurement units. This is also highlighted at lines 58 to 60.
Our smart glasses device does not embed any cameras. Vision-based human activity recognition is often performed using cameras from the environment, rather than cameras worn by the subject.
This could be indeed an interesting approach to study, but this is not our focus yet. Moreover, it would bring additional challenges to embed cameras onto the smart glasses since low-power microcontrollers cannot easily deal with computer vision problems yet. Cameras also bring additional problems such as privacy and acceptability.

4. The focus of the paper is not very obvious. The authors should describe a clearer research scenario first. Then a specific implementation is presented with the related issues being solved by collecting appropriate training data and careful designs.

→ The focus of our paper is to present a dataset for human activity recognition on smart glasses. This is the title of our paper and it is explained throughout the article.
We also believe that providing only a dataset does not represent a good contribution to the scientific community. So, we also provide an embedded machine learning solution with classification results using our dataset that is deployed onto the smart glasses, thus demonstrating that our dataset can indeed be used to perform human activity recognition on smart glasses in a real-world and real-time scenario.

5. Temporal segmentation seems a problem since it is not easy to have a series of frames (in the classification phase) within which only one activity exists.

→ We agree with the reviewer, this can be indeed a problem during the training phase. A window can contain samples with different labels, but only one label must be chosen for the entire window. This is the label with the highest number of occurrences within the window. A new paragraph providing some explanations on that issue has been added in section 5.1 (lines 222 to 225) of the revised version of our paper.
Similarly, during the inference phase, prediction on a window of samples will only provide a single class. Other works studied this problem (Efficient Dense Labeling of Human Activity Sequences from Wearables using Fully Convolutional Networks, R. Yao et al.) but this is not the focus of our article.

6. What could be the problems of possible classification delay and wrong classifications in the detection/classification of the considered applications?

→ The classification delay (defined as the latency from the beginning of an event to its recognition) depends on both the length of the window, which is approximately 2.46 s in our case, and the inference time. The inference time depends on the model and the target microcontroller. With the platform and model used in this paper, the inference time is at worst 173 ms. So the maximum delay would be approximately 2.66 s.
We do not have hard real-time constraints for the applications we develop and we think a few seconds of delay is reasonable for live activity recognition.

Concerning wrong classifications and as shown by the confusion matrix (Figure 7) and the accuracy per class and per subject (Figure 8), we observed that wrong classifications often occur for a specific subject or activity.
For example, as explained in lines 344 to 350, our method cannot differentiate STANDING and SITTING activities properly. For elderly behaviour monitoring, if we want to know whether the person can stay in a standing position rather than having to stay seated due to fatigue or balance disorder, the proposed system cannot help. To better discriminate between these two activities, further work needs to be done.
However, our system can recognize a lying position 99 % of the time across 6 subjects, and can monitor the drinking patterns for at least 3 of them (T5, T15, T20).

7. Writing seems a serious problem of the paper as many sentences and short paragraphs are not quite organized.
→ We changed some sentences and paragraphs in the revised version of the paper.

8. Future work includes adding transition activities to the considered classes. However, it may not easy to identify them correctly (even in the data labeling phase).

→ This is indeed not a trivial task and this is one of the reasons why transitions have not been considered in this paper, but rather planned as future works.
However, in the current dataset, transitions such as STAND-TO-SIT, SIT-TO-STAND, SIT-TO-LIE, and LIE-TO-SIT are already labelled.
They are not too difficult to identify visually between static activities.
It is worth noting that some existing works focused on transitions (Transition-Aware Human Activity Recognition Using Smartphones, J.-L. Reyes-Ortiz et al.) as written at lines 75 to 79.

9. Further proofreading is necessary to correct many gramma errors.

→ We read the whole manuscript again and corrected as many mistakes as possible mistakes left in the text.

Sincerely yours,
Pierre-Emmanuel Novac and the authors of "UCA-EHAR: a dataset for Human Activity Recognition with embedded AI on smart glasses".

Reviewer 2 Report

From the reviewer’s knowledge, this is a novel and interesting research that provides a dataset for human activity recognition using smart glasses for the general public. The information presented allows forming an overview on the provided dataset. The experimental results are convincing. Please discuss the limitations of the dataset. Can it be used for a different smart glasses device ?
Regards

Author Response

Dear reviewer,
thank you for taking the time to read our manuscript. We genuinely appreciate the positive feedback.

- Please discuss the limitations of the dataset. Can it be used for a different smart glasses device ?

→ Unfortunately, it would be quite difficult to reuse this dataset for a different smart glasses device. The reason is that the type of sensors (accelerometer, gyroscope and barometer here) as well as their sensitivity, range, orientation and sampling rate will greatly impact the data collected by the device.
Please note that this is a limitation that we also highlighted with other datasets at the end of the State of the Art section (lines 120 to 126). This was actually a motivation to create our own dataset.

Keeping this dataset relevant for another device requires that this device uses the same set of sensors with the same characteristics (such as the sensitivity or range of the Ellcie Healthy smart glasses).

Nevertheless, it is likely that the proposed Machine Learning methodology stays relevant for data collected from another smart glasses device. Moreover, we can also expect similar results (i.e. the accuracy) for another device. Domain adaptation could help in using a model trained with data collected from a given device on a different device. Prior works have studied domain adaptation for human activity recognition for different body positions (A Systematic Study of Unsupervised Domain Adaptation for Robust Human-Activity Recognition, Y. Chang et al.). However, this is not a topic we cover in this study.

Sincerely yours,
Pierre-Emmanuel Novac and the authors of "UCA-EHAR: a dataset for Human Activity Recognition with embedded AI on smart glasses".

Round 2

Reviewer 1 Report

The authors have addressed the issues mentioned in the 1st review. However, we still suggest the authors improve the abstract, which is a bit short and doesn't clearly show the advantages of this work.
If Sec. 5, an implementation, serves as an illustration of usefullness of the dataset, then the authors have to describe the reasons, instead of merely showing one implementation (with different levels of complexity) in this section.
If the major contribution of this research is the dataset, then more description about its design is necessary. That is why I asked if it is necessary to explain the (8) different activities defined and collected in this work.

Author Response

Dear reviewer,
thank you for providing additional suggestions regarding our manuscript.

1. The authors have addressed the issues mentioned in the 1st review. However, we still suggest the authors improve the abstract, which is a bit short and doesn't clearly show the advantages of this work.
→ The abstract has been extended in the hope that it clarifies our work.
The following sentences have been added or modified:
"Human activity recognition can help in elderly care by monitoring the physical activities of a subject and identifying a degradation in physical abilities.
Vision-based approaches require setting up cameras in the environment, while most body-worn sensor approaches can be a burden on the elderly due to the need of wearing additional devices.
Another solution consists in using smart glasses, a much less intrusive device that also leverages the fact that the elderly often already wear glasses.
In this article, we propose UCA-EHAR, a novel dataset for human activity recognition using smart glasses.
UCA-EHAR addresses the lack of usable data from smart glasses for human activity recognition purpose.
[…]
Additionally, the neural network is quantized and deployed on the smart glasses using the open-source MicroAI framework in order to provide a live human activity recognition application based on our dataset.
[…]"

2. If Sec. 5, an implementation, serves as an illustration of usefullness of the dataset, then the authors have to describe the reasons, instead of merely showing one implementation (with different levels of complexity) in this section.
→ Some details were added at lines 241 to 245 and lines 287 to 290.

3. If the major contribution of this research is the dataset, then more description about its design is necessary. That is why I asked if it is necessary to explain the (8) different activities defined and collected in this work.
→ More details on the activities have been given at lines 176 to 189.

Sincerely yours,
Pierre-Emmanuel Novac and the authors of "UCA-EHAR: a dataset for Human Activity Recognition with embedded AI on smart glasses".

Article Menu

UCA-EHAR: A Dataset for Human Activity Recognition with Embedded AI on Smart Glasses

Further Information

Guidelines

MDPI Initiatives

Follow MDPI