Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Marfusion: An Attention-Based Multimodal Fusion Model for Human Activity Recognition in Real-World Scenarios

Appl. Sci. 2022, 12(11), 5408; https://doi.org/10.3390/app12115408

by Yunhan Zhao¹

, Siqi Guo¹

, Zeqi Chen¹

, Qiang Shen¹, Zhengyuan Meng² and Hao Xu^1,3,*

Reviewer 1:

Gregor Donaj

Reviewer 2:

Philip Moore

Reviewer 3: Anonymous

Appl. Sci. 2022, 12(11), 5408; https://doi.org/10.3390/app12115408

Submission received: 20 April 2022 / Revised: 23 May 2022 / Accepted: 24 May 2022 / Published: 26 May 2022

(This article belongs to the Special Issue Sensor-Based Human Activity Recognition in Real-World Scenarios)

Round 1

Reviewer 1 Report

The paper present a system for sensor data acquisition and an algorithm for human activity recognition. The presented systems have merit, but the paper needs improvements.

The authors mention other previous research in the field. The authors describe the field as only there are methods using wearable sensors. While this paper also deals with data collected in such a way (by smartphone), the authors should at least acknowledge the subfield of HAR using motion sensors.

The authors are refering to the "Multilayer perceptron" as "multiple layer perceptron".

Line 393: why would the results of FFT be time and frequency? They are real and imaginary components of the coefficients (or Magnitude and phase).

Figure 2: maybe the authors can comment on the fluctuation in the data for "sitting", since one would expect the data to be more steady.

The text inside several figures is very small.

What is the order of the FFT?

Figure 3: what is "magic"?

Figure 4: what is the purpose of sending the recognition results back to the mobile device?

Formulas are usually referred to as "Equation (1), Equation (2)" etc. in papers.

The acronym OSS is not explained in the paper.

Lines 281-286: those lines are mainly only repetitions of the text in the above 2 points from line 267 to 280.

Line 299: please add how the phone can/must be used online? Can the phone be online only at certain time to upload stored data or must it always be only to transmit the data as soon as collected.

Please explain the differences between the sensor type in table 1. What is the difference between "acceleration" and "linear acceleration" etc.

Line 325: do the authors mean "acceleration" instead of "motion"?

Line 333: please add an explanation as to why the sensors can not be aligned automatically.

Line 336: what type of format conversion have to be performed?

In algorithm 1 please check the meaning of "M". The text states that M is the maximum length. However, in the the algorithm maximum is C=[A-M,A+M], indicating that the maximum length is 2*M.

Figure 9 show the percentage-wise distribution of the activities. The authors may add the total number.

Line 381: "accuracy rate" is written twice.

Line 413: "training set" is written twice.

In the section on results - please check all results and comments whether the results are obtained in the training or test set. Why would they be obtained on the training set?

There is no particular discussion on the results. E.g., which activities are likely to be miss-classified and a possible explanation why.

The paper is at some points somewhat harder to read - the English must be improved.

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Reviewer 2 Report

The paper has a descriptive title and suitable keywords. However, the abstract is too large and needs revision to beter reflect the 'how', 'why', and 'what' for the reported study.Additionally, acronyms must be removed from the abstract with their use restricted to the main body text and defined on forse use.

This article addresses a topic which is reflected in many commercial [proprietary] applications [I use the Huawei system daily] which measure human metrics including 'running' 'walking' 'etc.' and therefore based on the detail provided in the manuscript I find it hard the determine where the benefits claimed by the authors improve on the commercial applications which also use the sensor-based data from mobile devices. This need much greater discussion including a comparative analysis between the claimed approach and commercial applications. Therefore, in y view the novelty value is limited.

I have comments:

There are some (albeit minor) issues with the use of English (principally in the use of the indefinite article and plurals. Moreover, there are paragraphs which are much too large and lack focus (see page 3). Such paragraphs need revision insmaller more focused paragraphs.
There are many figures which are unreadable (far too small) and must be presented in a suitable size with an appropriate caption.
I would question the sampling research method: (a) 1 week is too short, and (b) a research population is too small to the representative.
The dataset must be made publicly available (clearly anonymised) to enable the reproducibility of the proposed method to be validated in independent experiments.
The proposal uses 80% of the date for training and 20% of the data for testing. Why is the proposed approach used as the generally accepted approach would split the dataset in the 3 proaces?
My observation relating to commercial applications is reflected in the lack of any consideration for the practical managerial significance (PMS) for the proposal along with practical application(s). The authors must address the PMS and provide a comparative analysis between the proposed approach and commercial applications.
There are many limitations, constraints, and assumptions related to such studies. Consideration of these factors must be improved in a revised manuscript.
I missed any suitable discussion section where all the topics addressed in the paper are considered, compared, and discussed. This is essential.
In such research there will inevitably be open research questions (ORQ) with related proposed solutions and potential future work. This is missing and is essential.

In summary, the paper is potentially interesting but there are significant concerns (see my comments) and requires major revision and extension to provide a basis upon which the study can be effectively considered.

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Reviewer 3 Report

In this paper, the authors have proposed An Attention-based Multimodal Fusion Model for Human Activity Recognition in Real-World Scenarios. The paper has several flaws and the following comments must be addressed before resubmission.

In Figure 1, how you have decided the parameters of CNN layers (like stride, number of filters, length of each filter, etc.)
Can you evaluate and show the classification results of each sensor time-series case. I want to see which sensors are performing better in terms of accuracy for activity recognition.
The statistical analysis of the comparison table is required to know whether the difference in the classification results is significant or not.
The advantages and the disadvantages of your work must be written.

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

The authors have revised the paper after the first submission. My concerns from the first revision have mostly been adequately addressed. Still, I would suggest a few further improvements.

The authors have now explained the meaning of the 7 sensor data. They state that, e.g., acceleration is in 3 axes (x,y, and z), therefore, 3 data streams. But linear acceleration is their combination. That would mean that linear acceleration is only 1 data stream. The authors use a data structure to represent the data in shape [7,200,3], where 7 is the number of sensors and 3 is the number of axes. If, however, some of the sensor data is in only 1 data stream (not 3), then there is redundancy or unused space in the presented shape. How is this handled? Is the data in this case copied 3 times, or are the other two padded with zeros or some other was. The authors should clarify this.

In table 1, please check if the rotation vector is indeed unitless. Also, I am not sure if the difference between rotation vector and orientation is clear.

In line 545, the reference to the figure is broken. The text says, "Figure. ??"

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

I have read the response letter and the revised manuscript. My response is set out as follows with the comments question related:

Q-1: I note the reply and the proofreading (the revisions) are acceptable (subject to final proof reading by the editor)
Q-2: The issue has been generally addressed. However, there remain examples of paragraphs which lack focus and need revision by splitting into smaller more focused paragraphs (e.g., see Section 2.2)
Q-3: The problem is in fact worse in the revised abstract because now there are acronyms and undefined terms. This must be [very carefully] addressed.
Q-4: The abstract is in my view acceptable.
Q-5: The problem remains unresolved. For example see: Figures 2, 3, and 9.
Q-6: I have noted the author response and the related references. However, the very restricted population [the demographics, size, and profile] remains a concern.
Q-7: I note the observation that the dataset will be made available in a published version of the manuscript. Why is this not addressed in the review version? The lack of the dataset availability remains a concern.
Q-8: I note the response related to the related works. However, just because other studies use an 80/20 training/ testing does not make the proportion a good option. The authors need to provide improved discussion addressing why the proportion used is the correct option and why three stage process has not been used.
I note the reply to the practical managerial significance and I would agree that there are commercial applications. In appreciate the difficulty in comparing commercial applications and proposed methodologies and as such I would accept the revision.
The response for the comment related to ORQ is noted. But, ideally, the authors will identify the actual ORQ's along with proposed courses of action to address the issues identified. In practice the response is quite general and should have more detail.

In general the revisions address most of the questions. However, there remain concerns which must be addressed to render the paper a suitable candidate for publication.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

My comments are correctly addressed.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Article Menu

Marfusion: An Attention-Based Multimodal Fusion Model for Human Activity Recognition in Real-World Scenarios

Further Information

Guidelines

MDPI Initiatives

Follow MDPI