A Machine-Learning-Based Motor and Cognitive Assessment Tool Using In-Game Data from the GAME2AWE Platform
Round 1
Reviewer 1 Report
The paper analyzes how the GAME2AWE platform can be used to assess the motor and cognitive condition of seniors in a nonintrusive way. The paper addresses a topic of wide interest for the current research community, it is well written and organized.
For the authors I have some minor suggestions such as the following:
- create a separate Related Work section to better illustrate the current state of the art in the addressed domain as well as the major contributions of this paper.
- the paper contains many broken references which should be fixed (e.g. on page 4 lines 150-151, etc.)
Author Response
Please see the attached file.
Author Response File: Author Response.pdf
Reviewer 2 Report
This paper explores the potential of the GAME2AWE platform in assessing the motor and cognitive condition of seniors based on their in-game performance data.
Overall:
1. The paper is organized well with proper structure, and the bibliography is sufficient and well given.
2. The presented methodology and the results are communicated. The novel contribution of the paper is highlighted.
3. The experimental design is reasonable, and the evaluation results show the superiority of the proposed scheme.
This paper can be accepted. However, there are some points that the authors can handle:
1. It's better if you compare your work which the state of the art.
2. You have to check the references carefully because most of them show this message "Error! Reference source not found".
Author Response
Please see the attached file.
Author Response File: Author Response.pdf
Reviewer 3 Report
Summary of the paper and general considerations
The paper aims to explore the predictive abilities of different machine learning classification models in assessing the motor and cognitive conditions of the elderly based on their game performance data. In particular, the study emphasizes the development of machine learning models based on in-game data to estimate elderly individuals' motor and cognitive states. The paper describes in a precise and structured manner the methodology followed, and secondarily provides a secondary a thorough description of GAME2AWE platform. The results are well stratified and easily understood. Discussions deepen the results found and pose questions for reflection on the main methodological aspects of the work. The work in general is well done, follows a straightforward approach and is well understood in all its sections. However, its simplicity also translates into a work objective that is not too articulate or innovative but still interesting. The use of machine learning applied to data from exergames has already been covered in some studies, which, however, did not generalize as in the case of the present article, but rather focused on very specific conditions or diseases (take the following on Parkinson's disease as a reference 10.3389/fpsyg.2022.857249).
Specifc comments
In 2.1.1, the authors identified three macro categories of devices within the platform. However, when they indicate the exergame they used, namely Fruit Harvest Exergame, they do not specify to which of these categories it belongs. Is it programmed on VR technology? Or does it use other devices as well?
In 2.2.1The data used by the authors came from a single game, Fruit Harvest Game, which combines cognitive and physical activities. Why were only the data proceeds from this game used? Were other games tried? The idea is that if one wants to investigate the ability of an algorithm in predicting the physical and, separately as you did, cognitive condition of the elderly, two different games could be used, one aimed at the cognitive aspect and one at the motor aspect in order to have an inside-out knowledge about the two different aspect. This issue should be argued.
The authors used MoCA as clinical scale to access MCI of the participants. It should be specified who proposed the cognitive threshold. Were expert neurologists co-investigated? Did the authors find threshold references in previous literature? Explain more about this important aspect.
In 2.3.2, authors describe thoroughly the feature engineering process. The authors transformed the original features into new ones. Are the transformed features taken from any reference? Is there a study in which they are validated? Moreover, it should be specified if a statistical analysis conducted on the original features. Further to this point, it may be interesting to point out if there are statistically significant correlations or differences between the new/old variables and the binary output.
At Line 421-422, authors say that the correlation between the features and the target is to be accessed. In this regard there are two points to note:
· the variable-target correlation is not mentioned later but only the variable-target correlation.
· A simple statistical analysis, such as an analysis of variance ANOVA test, should be done to understand the relationship between the individual variables and the binary target output.
In 2.3.3. At the clinical level, it might be reductive and might introduce bias to choose a binary target abruptly. Has the choice of a multiclass target such as a score banding been considered? For example, did the authors considered the fact to split the cognitive ability target in three or more severity ranges (for example normal, MCI, Dementia)? Discuss about this point, eventually referring within the discussions of limitations.
At line 448 the authors specified that they adopted SMOTE technique to avoid unbalancing of the dataset and increased the minority class up to the size of majority one. Specify precisely here how many instances you have before and after the augmentation, reporting explicitly the sample size of the dataset you further use to train models.
In 2.3.4, at line 493-494 when it is stated that “…(LOSOCV) protocol, which is commonly used when dealing with relatively low sample sizes, as in our study.” A reference should be added.
At line 505, when you state that: “MCC provides a comprehensive measure 504 of the model's prediction performance.” Add a reference.
Section 3 reports schematically the results obtained for each algorithm in predicting the physical and cognitive target. Taking Table 6 as a reference, we can see that the decision tree has very high accuracy but metrics related to individuals are relatively lower. This denotes a poor ability of the model to generalize, and this is may be owed to a reduced sample size dimensionality, where the most influential metrics are F1, and recall. It may be interesting to deepen this topic within the discussions.
Moreover, why is there a gap of more than 20 percentage points between SVM, known for its binary generalization capability, and random forest? Probably the SVM needs to be changed to a different kernel function, avoiding the radial basis.
Author Response
Please see the attached file.
Author Response File: Author Response.pdf