Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Exploring the Influencing Factors on User Experience in Robot-Assisted Health Monitoring Systems Combining Subjective and Objective Health Data

Appl. Sci. 2023, 13(6), 3537; https://doi.org/10.3390/app13063537

by Caterina Neef^*

, Katharina Linden

and Anja Richert^*

Reviewer 1:

Şekip Esat Hayber

Reviewer 2: Anonymous

Reviewer 3:

Leonardo Garcia-Garcia

Reviewer 4: Anonymous

Appl. Sci. 2023, 13(6), 3537; https://doi.org/10.3390/app13063537

Submission received: 16 December 2022 / Revised: 4 March 2023 / Accepted: 5 March 2023 / Published: 10 March 2023

(This article belongs to the Special Issue Advances in Intelligent Robotics in the Era 4.0)

Round 1

Reviewer 1 Report

This study evaluated the user experience and usability of a health monitoring system. The authors achieved successful results in both cases. Obtaining the data both objectively and subjectively reveals the originality of the study. In other words, the information provided by the users must be evaluated and verified. Comparing the users with the least robotic experience and the experienced users shows the importance of the study. These results concluded that previous robot experience and technological expertise play an essential role in evaluating the user experience of robot-assisted health assessments. The care environment of older adults may also play a role in this assessment, but these need further investigation. All these factors provide important indications of what to consider when developing and evaluating health monitoring systems that provide a positive user experience, especially for older adults. In this context, publishing the study in Applied Sciences is deemed appropriate.

Author Response

Thank you very much for the positive feedback and the concise summary of our paper, it is greatly appreciated.

Author Response File: Author Response.pdf

Reviewer 2 Report

In the paper “Exploring the Influencing Factors on User Experience in Robot-Assisted Health Monitoring Systems Combining Subjective and Objective Health Data”, the authors presented a user experience-related case study in the health monitoring and assessments in wearable and assistive technology.

The title and abstract are well-chosen and well-written. The introduction along with related works provides sufficient background about assistive technologies for health assessment and health monitoring systems. The experimental setup and methods are well-defined. The results are explained well too.

Overall, The authors have presented the work well. It can be accepted in the present form.

Author Response

Thank you very much for the positive feedback and the concise summary of our paper, it is greatly appreciated.

Author Response File: Author Response.pdf

Reviewer 3 Report

The work presents the evaluation of an user interface and health monitoring system.

The abstract needs to mention briefly the method and provide data on the results

in the introduction, authors mentioned social robots from the literature, the paper would be improved if basic information is added, such as robot type, tasks performed by the robot, and key parameters relevant for their acceptance.

P7L211, authors need to provide mean and standard deviation of the ages.

P15L474. Authors assume that the results in session 3 were mostly because the participants in this group had previous experience with the robot, leading to the lack of excitement for the novelty. can this be corroborated in the questionnaire? i.e. asking about previous usage of the systems

Further more, authors need to make clear what is the main advantage of using robot rather than a simple computer to do the questionnaire?

Authors need to be more specific about the tests, what was the time limit, was there any ethical approval, what was the selectin process for the participants, i.e. inclusion and exclusion parameters.

Author Response

Thank you very much for your comments and your helpful feedback, which we have addressed in the attached PDF file.

Author Response File: Author Response.pdf

Reviewer 4 Report

I was quite excited by the app.

The background and literature seemed good.

Generally, I think there is a problem with the framing. I think with what has been done you can't explore the influencing factors on user-experience with the small user base. For example, you write that previous experience had an impact, however, you only had people without experience in the first round which had a different interface and with just 5 vs. 3. You would at least need to prove the significance.

I think instead you can focus on the development of the system. It would be interesting to hear more about the choice of monitors, and the aspects the question address. From your tests you can conclude it has potential to be used for this purpose. Ideally by getting a validation of the items you developed by interviewing a single person and doing a session 4 including people without Pepper experience including another carer.

There are some missing details, for example you mention the number of turns, but it was unclear to me how long the tests actually were and if it tested all functions.

Figure 3 was very nice.

In Figure 4 I would have preferred information about the groups (as in Table 1) rather than User Group A / B / C

Figure 6 seems odd - perhaps compare before and after for older people with and without pepper experience.

With the methods: In 2.6 I am not sure if you need the formulas, but should have a reference for the calcuations.

With the results: According to my understanding, more questionnaires are needed for both the SUS or UEQ (10+). The statistics chosen see odd - a standard deviation for only 3 values? You say in the discussion that the usability is quite high, but in Fig.6 the SUS in 3 appears to be below 70, which Brooks says this corresponds to only marginally acceptable. The acceptance of young technical men (S3) isn't meaningful.

In the discussion, it would be worthwhile mentioning the users, i.e. that user group B is a convenience sample, there is only one carer, and after the changes was only tested on people with pepper experience. Perhaps justify why this may still be okay - it is afterall exploratory / formative and will be followed up with long term tests. Could the the slow internet connection have effected the results?

With the conclusions, rather than being related to novelty, could also mean that it isn't as good as other pepper apps. And if it is related to the novelty, wouldn't ths mean that people might not like it long term? Couldn't one use the factors to see if it is related to the novelty?

Author Response

Thank you very much for your comments and your helpful feedback, which we have addressed in the attached PDF file.

Author Response File: Author Response.pdf

Round 2

Reviewer 3 Report

The authors have attended the comments, the manuscript has been improved,

only a small change needs to be done before publishing,

Figure 6, the scale of the y axis needs to be adjusted to show the complete data.

Author Response

Thank you very much for your very valuable feedback, our response is in the attached PDF.

Author Response File: Author Response.pdf

Reviewer 4 Report

The tests are not conclusive as reported. The comments regarding this from the first version were not addressed.

Re claim on line 2: "The study conducted in this work suggests that the system has a high rating for usability and user experience". This cannot be determined with only 3 users from the target group (older people). Session 2 includes only young, technical students; session 1 was testing a different version. Presenting a graphic (Fig. 6) showing the results for students is misleading. You either need to a) make it qualitative, b) test more older people, or c) combine the results of S1 and S3.

Re claim on line 3: "potential to be used for self-managing health and increasing health literacy". In the paper, it mentions having them measure their blood pressure or body temperature. I didn't see anything that could lead to the second conclusion. Did people look at these facts? Was there a question asking if they learned something? Explain or remove the claim.

The UEQ Handbook V. 3 states "The more data you have collected the better and more stable will be the scale means and thus the saver will be the conclusions you draw from these data. However, it is not possible to give a minimum number of data you need to collect to get reliable results. ... The more they agree, i.e. the lower the standard deviation of the answers to the items is, the less data you need for reliable results. For typical products evaluated so far around 20-30 persons already give quite stable results." The perspicuity for the App from Session 3 (1.92) would be in the range of good, but the number of users is insufficient, especially with this standard deviation, and -0.86 includes below average results (at least 1 of the 3).

The paper claims the SUS is good (l. 485): Bangor (2009) writes that the adjective "good" is used above 70, and in the range they have for the final evaluation with appropriate users is 67,5. Only the technical students who had experience with the device thought it was good - who are not like the intended users. Furthermore, Brooke (2013) says that "only" 8-12 users are needed. You would make that if you combine the results of S1 + S3 (assuming some of the testers weren't in both).

Fig. 6 presents mean and standard deviation for datasets with 3 and 5 values. I spoke with a statistics expert, and they agree it is not meaningful and maybe even misleading. A range would be more appropriate.

Author Response

Thank you very much for your very valuable feedback, our responses are in the attached PDF.

Author Response File: Author Response.pdf

Round 3

Reviewer 4 Report

With the author's letter I will accept that a "minor" revision may be sufficient. But that should not give the impression that it doesn't require quite a few changes to the paper / text, only that I do not require additional experiments.

I will accept your argument that with the people from the first session you have enough relevant test subjects, if as you describe in the response the changes were of a more technical nature. However, the paper needs to be changed, a) to make this clear and b) to present the results in this way.

Testing with students is fine, if you want to get comments about the design. Their UEQ says nothing about the system. You could perhaps compare their success and completion results to support that older people do well with the system. This is actually a very interesting result. You could report (in text) that their UEQ is in a similar range, but it is only an indication given the number of testers in S2.

I know what a standard deviation is. But it is not a meaningful value / not usually used with just 3 values.

Since both questionnaires need more people to provide significant results, you need to combine the results of S1 and S3 for the presentation. Since it doesn't base on exactly the same version, you need to clearly explain why they can be combined and show the results for each individually in addition - especially where the app gets lower results in S3. The SUS and UEQ are both worse with the new version, and so should be mentioned in the discussion. You can however argue that it may be an artifact of having so few subjects in the S3. Do the test subjects in these sessions differ in other ways?

Invest a little time and you can make it a good paper - a scientific one.

Author Response

Thank you very much for your helpful comments, which we have responded to in the attached PDF document.

Author Response File: Author Response.pdf

Article Menu

Exploring the Influencing Factors on User Experience in Robot-Assisted Health Monitoring Systems Combining Subjective and Objective Health Data

Further Information

Guidelines

MDPI Initiatives

Follow MDPI