You are currently viewing a new version of our website. To view the old version click .
Electronics
  • Feature Paper
  • Article
  • Open Access

Published: 26 September 2022

Quality Assessment of Virtual Human Assistants for Elder Users

,
,
,
,
,
and
1
Foundation for Research and Technology Hellas, Institute of Computer Science, GR-70013 Heraklion, Greece
2
Department of Computer Science, University of Crete, GR-70013 Heraklion, Greece
*
Author to whom correspondence should be addressed.
This article belongs to the Section Computer Science & Engineering

Abstract

Virtual humans (VHs) are gaining increasing attention in various fields, including games and cultural heritage and technological contexts including virtual reality and augmented reality. Recently, since VHs can simulate human-like behavior, VHs have been proposed as virtual assistants (VAs) for all sorts of education and training applications, including applications focused on the improvement of quality of life (QoL) and well-being. In this research work, we consider the quality and efficiency of VHs implemented as part of the MyHealthWatcher project, which focuses on the monitoring of health-related parameters of elder users to improve their QoL and self-management of chronic conditions. To validate our hypothesis that the increased quality of the VH has a positive effect on user satisfaction and user quality of interaction with the system, we developed and integrated into the MyHealthWatcher system two VH variations. The first was developed with mainstream technologies and the second was developed using a professional pipeline. The two variations developed were assessed by representative target users through a between-subject focus group study. The development and validation process of the two variations allowed us to draw valuable conclusions, which are discussed in this paper.

1. Introduction

Improving the well-being and quality of life (QoL) of the elderly population is closely related to assisting them to effectively cope with anxiety, psychological distress, and chronic illnesses that often coexist with aging. In this context, modular and portable devices may provide such solutions without the need to deploy equipment in the daily environment. This can be complemented with mobile and possibly wearable sensors, thus strengthening the cost-effectiveness and adoption of such technologies and assisting the user even when mobile and outdoors.
Mobile health (mHealth) monitoring solutions are discussed in the literature as a means to assist caregivers in managing the health of older adults []. mHealth systems addressing older adults target disease management (e.g., diabetes or hypertension control), adherence to medication, psychological support, and adoption of a healthy lifestyle []. Approaches reported in the literature also focus on specific conditions faced by older adults, such as Alzheimer’s disease [], fall risk [], or lack of physical activity []. Overall, solutions addressing older adults themselves are more challenging in their design, development, and deployment, considering the need to actively involve older adults throughout the development lifecycle, but also privacy [] and user acceptance aspects [] that arise, especially in the case of approaches involving health monitoring through the integration of sensors.
At the same time, in the ICT domain, several approaches employ VAs to support elders, allowing them to manage daily activities and monitor physiological parameters, taking advantage of new technological opportunities regarding the rendering of realistic VHs to act as VAs. The main components of such technology are not considered a technical challenge today for their integration in mHealth solutions, considering that modern mobile devices comprise multi-core computing solutions with 3D rendering capacity and given that there is a plethora of wearable devices and sensors that can operate together with the mobile device. This paper reports on the optimization of such a solution developed in the context of the MyHealthWatcher project. The developed system is a health monitoring system encompassing an mHealth solution for elderly people to self-manage their general health and a web application for their healthcare professionals and caregivers to monitor the health status of the elderly person. The mHealth system integrates wearable devices to acquire measurements of common vital signs (i.e., heart rate, blood oxygen level, and stress), and a sound processing subsystem for cough detection. Particular merit has been devoted to the design of the system in accordance with ethical and privacy concerns by design guidelines, ensuring that the older adult remains in control of their data and who they share them with []. The mobile system also features a VH who communicates results from vital sign measurements to the elderly, as well as messages from third parties. User interaction with the system is feasible through touch, but also through a finite set of voice commands.
The focus of this paper is on the design and development of the VH embedded in the mobile system. We strive to achieve optimal VHs in terms of realistic appearance and behavior, and present two different approaches that were adopted, resulting in two VH versions. We propose that more realistic VHs in terms of appearance, motion, and lip-synching have an impact on the overall user satisfaction with the system. Finally, we validated our hypothesis through the organization of two focus groups with elders to evaluate VA’s appearance and behavior.

3. MyHealthWatcher System Overview

In this work, we study the usage of VAs in the context of the MyHealthWatcher system, which comprises a portable sensor-based health monitoring system that additionally offers optional health professional monitoring capabilities through a dedicated online monitoring platform. These functionalities are divided into three physically and conceptually separate subsystems: the user/sensor/mobile device monitoring subsystem, the sound processing subsystem, and the health professional monitoring subsystem.
The user-side subsystem includes portable sensors, described later, that constantly record vital signs and the user’s mobile device where the VA agent resides as a visual component in the accompanying MyHealthWatcher mobile app’s UI. The mobile application can record (a) the user’s voice commands towards the VA and (b) environmental sounds. The sound files are sent to the sound processing subsystem to recognize speech commands in the first case or analyze the sound wave for pathological findings such as cough in the second case.
On the health professional side, doctors, caregivers, or relatives, collectively referred to as “secondary users”, can invite, through the online monitoring platform, mobile users to establish monitoring relationships between them. On the other side, the elderly can also invite secondary users, through the mobile app, to monitor specific vital signs and parameters that they wish to share. The online monitoring platform handles the visualization of collected vital sign measurements, creating alerts for extreme measurement values and messages to primary users. The architecture and communication between the user-side subsystem and the rest of the components can be seen in Figure 1, and detailed descriptions of the whole system can be found in [].
Figure 1. The MyHealthWatcher system components and how they communicate with each other.
A quick overview of the UI of the mobile app and the secondary user side is presented in Figure 2.
Figure 2. (a) Mobile app. (b) Secondary user side example—patient’s profile view page.

3.1. Vital Signs and Other Measurements

For this system, we consider the following measurements: heart rate, oxygen saturation (SpO2), blood pressure, blood glucose levels, and an indicator of stress levels. The selection of the above measurements was derived from our initial user requirement analysis []. The analysis showed that these were the most common measurements taken at home by elderly persons and the most important measurements for doctors in their initial assessment of a person’s health. At the same time, these measurements can be easily collected either automatically by wearable devices or manually by traditional medical home devices. In the latter case, the user would have to manually input the result of the measurement into the system.

3.2. Electronics in the System

The process of selecting the appropriate vital sign measurement equipment for the system took into account several different factors. Due to the target user group being elderly people, we focused on selecting sensors that (a) did not hamper mobility, (b) were affordable and widely available, and (c) provided the most measurements without resorting to multiple devices in total. Considering the technical requirements, we opted to rely on the Bluetooth protocol, which is almost universally adopted by the available sensors. Furthermore, the capability to easily extract the measurements directly from the sensors through a provided application programming interface positively affected our final decision in selecting the system electronics.
For heart rate and oxygen saturation, we turned to a smart band/smartwatch approach that would be able to measure both. Smart bands/smartwatches are comfortable and nonintrusive everyday devices that do not require any instructions to use other than the initial connection setup and settings. We developed the mobile application with Garmin devices in mind, and in particular, the smart band Vivosmart 4 [] and the smartwatches Vivoactive 4 [] and fēnix 5X []. The mobile application, through a software development kit, automatically and periodically extracts the heart rate and oxygen saturation measurements from the connected device and makes them available to the user.
Blood pressure and blood glucose levels require dedicated devices which were neither trivial to integrate in the application for automatic extraction nor as easy to acquire as the wrist sensors. The user can still input the values themselves through an intuitive user interface.
As regards to the stress level indicator, we briefly considered galvanic skin response (GSR) sensors, but most of them were either intrusive (i.e., using electrodes) or too expensive for widespread use. We ended up investigating the use of a wearable electrodermal activity (EDA) ring called Moodmetric Ring [], which has been used successfully in both clinical trials [] and more practical environments []. The ring provides an easy-to-read measurement of stress that can be used to identify sources of stress and work towards relief []. All considered devices pair with the user’s mobile device via Bluetooth and periodically transmit their measurements.

4. Virtual Assistant Design

The MyHealthWatcher VA is a service running on a mobile device that includes a humanoid avatar along with several functionalities that assist the user in their everyday health monitoring needs. The agent’s primary objective is to provide a communication bridge between the user and the vital signs recorded by the wearable sensors of the system, by announcing on demand the recorded measurements requested by the user. The user can utter a plethora of speech commands that correspond to simple requests such as “tell me my heart rate”, “stress”, “show my messages”, and others. The system is flexible enough to understand both single words and short sentences that include the keyword of the request. We created two different versions of the VA which we evaluate in this paper. Each version uses a different 3D model creator, a different animation scheme, and a slightly different approach to lip-synchronization.
The rationale for the selection of the concept of a humanoid VA was based on the requirements analysis phase of the project conducted with end-users and analyzed in our former publication []. Furthermore, this is supported by other research efforts, such as a study on user preferences employing focus groups and interviews that confirmed the social effects of virtual humanoid agents and highlighted the need for participatory design approaches to enhance the acceptability of the target user group []. Additionally, in [], the importance of voice for enhancing the acceptance of VA is discussed. Several other studies report on the attributes of VA in order to be acceptable by older adults [].

4.1. Agent’s Architecture and Incorporated Technologies

The main technology behind the deployment, maintenance, and functionalities of the VA agent is Unity 3D []. Unity 3D (or just Unity) is a cross-platform game engine that includes, among others, an integrated development environment (editor), a rich toolset, and an asset marketplace. Its main scripting language is C#, and it can be used to activate animations, play sound files, and perform other scripting tasks. An important feature of the engine is that there is plugin support, which can significantly extend the capabilities of the editor and provide a plethora of new tools for building 3D applications.
Figure 3 shows how the different components of the VA fit together to create the final integrated result to be shown in the mobile application. A lip-synchronization (lip-sync) component that can interoperate with the mobile application allows for realistic communication between the user and the agent. A virtual character creator constructs the static 3D model which will be shown to the user through Unity 3D. Finally, the agent’s human-like animation set is drawn from a motion provider, which is radically different from the two implemented versions of the virtual assistant. All of these components and their products are integrated into the Unity 3D environment, which constructs the 3D scene where the agent operates and awaits commands.
Figure 3. The VA is composed of a Unity scene that incorporates different technologies that work together to provide a complete VA experience.
We chose the Android mobile ecosystem to develop the mobile application. Unity 3D fully supports the compilation and export of the 3D scene to android libraries and provides an API for managing the Unity 3D part of the application using the UnityPlayer class.
The UnityPlayer class behaves similarly to Android “Activity” classes, managing the 3D scene lifecycle to respond according to the Android application lifecycle, including handling starting/stopping (coming into view or being hidden), configuration changes, and key/touch events. The 3D scene can be placed inside an Android layout container view and can be organized flexibly along the other UI elements of the application without many constraints.
The most important part of the UnityPlayer functionalities is the UnitySendMessage method, which, as the name implies, is used to communicate with the imported 3D scene. This method’s purpose is to launch a script on a 3D object on the Unity side by specifying the object’s name, the name of the C# method, and a string parameter. The C# method is defined in a script on the 3D object (e.g., the 3D model of the agent) and can modify its characteristics, launch a new animation, play an audio file, and more. Regarding the opposite part of the communication, the Unity scene has access to Android components such as the current running activity and the UnityPlayer object; thus, any Unity C# script can access parts of the application and provide information such as when the animation has finished running.
The VA needs to interact with the user using realistic-sounding speech. To enhance the experience, the humanoid avatar also needs to perform realistic body and face gestures. To avoid using generic body and face movements, we opted to use lip-synchronization technologies. Furthermore, a text-to-speech (TTS) system is required to handle any type of speech required by the project. Due to the volume of different combinations of text output (e.g., different numerical values of various measurement types), we elected to utilize the native TTS system of Android instead of a prerecorded phase synthesis method.
To implement the aforementioned approach, we construct the text to be spoken using the appropriate information for the occasion, including numerical values if necessary, and pass it through the TTS system to produce a wave file stored in the mobile device. The file can then be played by the lip-synchronization library that is attached to the Unity 3D scene. Since the implementation details differ between the two developed versions of the virtual assistant, they are described in their respective sections.

4.2. First Implementation of VA

In this section and the next, we describe the approach we took to implement each version of the virtual assistant, the virtual character creator used, the motion provider selected, and the lip-synchronization details.

4.2.1. Virtual Character Creator

For the first implementation of the virtual assistant, we used Adobe Fuse CC, a simple 3D character creator developed by Mixamo [] and provided by its parent company Adobe Inc. This software lets users select and adjust character components such as parts of the human body through an intuitive interface. After selecting body part types, the user can select a hairstyle and clothes to dress the character. Finally, after the basic structure of the model is defined, the software offers several sliders corresponding to body features that can be adjusted to change the model’s appearance. These can be seen in Figure 4. Unfortunately, Adobe has discontinued the use of this tool and it is currently not available for further use.
Figure 4. The Fuse CC user interface. The different model creation categories are shown above: Assemble (choose body parts), Customize (use sliders to adjust body features), Clothing, and Texture.

4.2.2. Motion Provider

While Fuse was available, it supported seamless interoperability with an auto-rigger and animation tool by Mixamo []. After creating the model in Fuse, it was uploaded to the Mixamo service website for auto-rigging, which is a procedure that applies machine learning to correctly insert a rig (skeleton) into a 3D model. The service has an animation library that houses a plethora of canned animations for use in games and other applications and which are compatible with the rigged models from Fuse or other humanoid models. Using several animations from Mixamo, a Unity 3D C# script was developed that would receive messages from the android application to start the desired animation.

4.2.3. Lip-Synchronization

For the lip-synchronization component, we utilized version 1.5.5 of the Salsa Lip-Sync Suite []. This version requires the model to contain the appropriate blend shapes, which are a standard approach for making expressive facial animations []. After attaching the appropriate Salsa components in the Unity 3D model, the developer needs to manually map the blend shapes [] corresponding to mouth positions and hook the RandomEyes module that gives models the simple ability to blink and look around randomly or programmatically to appear more human-like.
The lip-synchronization animation takes effect when we play back a sound file by performing waveform analysis and animating four mouth positions to approximate realistic human monologue. This is a quick and efficient method that does not require phoneme-mapping or keyframing, which are time-consuming.
To synchronize the TTS system that runs on the Android part of the application with the lip-synchronization library that is attached to the Unity 3D part of the application, we had to use the UnitySendMessage interoperability method mentioned previously. The procedure is as follows. First, the application constructs the string of characters and numbers to be spoken by the virtual assistant, e.g., “Your latest heart rate is 88 bpm” or “You have no new messages”. The TTS system takes the string and synthesizes it to a temporary file in the mobile device. The application then reads the file and encodes it into a byte array that can be sent through the interoperability layer using the UnitySendMessage method. Afterward, the C# script attached to the VA’s 3D model takes the byte stream and calls the appropriate method to play back the sound file, which will automatically trigger the lip-synchronization, and the agent will finally start speaking.

4.2.4. Final Result

As our goal in the first iteration of the VA was to provide a quick functional prototype using primarily free tools, the combination of Fuse and several ready animations from Mixamo was ideal. The final result can be seen in Figure 5. The rigged skinned model comprises 21,307 vertices, 35,328 triangles, and 150 blendshapes. The size of the textures used for skinning was 13.1 MB. The animations integrated to support natural prompting by the VA were six for body language while speaking, two for greetings, and three for instructions. All animations were stock animations since no MoCap was facilitated in this version.
Figure 5. The first iteration of the VA as it appears in Unity 3D.

4.3. Second Implementation of the VA

For the second iteration of the VA, we decided to use more advanced tools and methods.

4.3.1. Virtual Character Creator

To enhance the quality of the 3D model of our virtual character, we used Reallusion’s Character Creator 3 (CC3) [], which can create high-quality VHs easily and intuitively, and it cooperates with the Unity 3D game engine and the lip-synchronization library. Using CC3, game developers can combine avatar body, face, hair, and clothes by selecting them from a wide variety offered within the software, or they can be downloaded from Reallusion’s market. In Figure 6, a screenshot from the CC3 is shown, where we have zoomed in on the controls for character configuration. Game developers can choose a category from the library, and the respective items available will appear. Then, developers can choose an item to be applied to their character—that can be the face, body, hair, clothes, makeup, pose, and others. Once the developer is satisfied with the created avatar, they can export it in various formats, including the Unity supported .obj and .fbx. An extra option for selecting the target platform is provided in the export window, along with some other export options. In our case, we selected Unity 3D as the target platform and the .fbx format.
Figure 6. Character Creator 3 user interface and functionalities.
The CC3 has an auto setup plugin for Unity which can be downloaded for free from the Reallusion site. It essentially creates a subfolder inside the Unity editor’s assets folder, which contains the proper settings for Unity to render the exported avatars correctly. The exported CC3 avatar should thus be put into this folder to trigger the automatic conversion script, enabling the CC3 model to be appropriately converted to a Unity 3D compatible asset.

4.3.2. Motion Provider

For this iteration of the VA, we decided to forego using scanned animations which are better suited for entertainment purposes and considered a more realistic approach utilizing motion capture equipment. These animations will allow the avatar to make realistic movements when it interacts with the users or when it is standing idle. To achieve such realistic results, we recorded some animations using a motion capture suit. After carefully studying the motion capture systems available on the market, we decided to use the Rokoko studio solutions [] for their high performance-to-price ratio, their ease of use, and the variety of environments where the recordings can take place. The suit comprises a full-body suite incorporating 19 inertial sensors that can track and record body movement, a pair of gloves for the fingers, and the Rokoko studio, which is the software used to translate the sensor data into an on-screen avatar that copies the moves of the actor in real time. Both the suit and the gloves require an initial setup by plugging them into a computer. All sensors are referred to as inertial measurement units (IMUs) because they contain a combination of gyroscope, magnetometer, and accelerometer. The data they collect from these units are wirelessly transmitted to the computer (they have a range of up to 100 m), where the Rokoko studio translates them into movements of an on-screen avatar, so the actors can watch it reflecting their moves. The sensor positions within the suite textile for the upper body part, along with a screenshot of the Rokoko studio and an example illustrating how an actor’s movements are translated into VH movements are shown in Figure 7.
Figure 7. (Left) The sensors inside the suit textile. (Middle) The Rokoko studio. (Right) How an actor’s movements are translated into avatar movements in real time.
The Rokoko suit also features a recording mode where actors can record their moves. These recordings can then be processed by automatic filters embedded in the Rokoko studio, trimmed, and exported into various mainstream formats (FBX, BVH, CSV, C3D). As we needed to import the recording animations into the Unity game engine, we selected to export the recorded animations using the Unity-preferred .fbx format. Inside Unity, it is important to define that the specific fbx has a humanoid rig for Unity to create a humanoid model avatar and translate its bones and its attached animation correctly.

4.3.3. Lip-Synchronization

The lip-synchronization for the second implementation of the VA is still the Salsa Lip-Sync suite but updated to version 2.5.2, which is a major upgrade compared to the previous 1.x version due to its increased degree of perceived accuracy and more advanced animation technologies for speech.
Another advantage of this version is that it is fully compatible with both the Unity 3D game engine and the CC3 avatars. The latter essentially means that Salsa can fully exploit the blend shapes of the CC3-created avatars and achieve the best results possible. Additionally, the Salsa suite provides a OneClick module for Unity that automatically configures the viseme and emote expressions for virtual characters that are created by the CC3 software. This way, Unity programmers only have to import the Salsa asset from the unity asset store and install the OneClick component for the CC3. Having these steps completed, the programmers only need to select the character in the scene and apply the corresponding OneClick module to them. This action will add all the components needed to the selected avatar, enabling it to speak every audio file provided.
To simplify the workflow for speaking with lip-synchronization for this version, we improved the way the Android application communicates with the Unity 3D component. Instead of encoding lengthy sound files into byte arrays and recreating them on the Unity side to be played back, we merely send the path to the temporarily created sound file residing inside the mobile device folder structure, and the Unity C# script reads and plays back the sound file directly from there. As in the previous version, the lip-synchronization library takes care of the agent’s lip and head movements while speaking.

4.3.4. Final Result

The final implemented VA can be seen in Figure 8. The second version of the VA comprises 35,404 vertices, 56,685 triangles, and 443 blendshapes. The size of the textures used for skinning was 85 MB textures. The number of animations integrated to support natural prompting by the VA was twelve for body language while speaking, two for greetings, and six for instructions. All animations were recorded using MoCap equipment. Furthermore, animations on instructions were linked with parts of the UI of the MyHealthWatcher system for the VA to point to areas of interest, i.e., navigational buttons, to enhance understanding.
Figure 8. (a) Virtual assistance appearance. (b) Waving hello to the user, and (c) pointing to a section of the UI.

4.4. Versions Comparison

To summarize, the two implemented variations’ differences in fidelity and implementation approach are presented in Table 1.
Table 1. Differences between the two VA implementations.

5. Focus Groups and Results

The validation of the two variations of avatars was conducted over two focus groups organized in the context of the project. Before these focus groups, co-design sessions were organized to identify the main requirements and characteristics for the avatar. These requirements were used to drive both development variations.
Twelve persons participated in the focus groups, six in each group. The study adopted a between-subjects design, involving different participants in each group and asking each group to assess one version of the avatar only. A sample of convenience approach was used to recruit participants for the focus groups. The inclusion criteria was to be in the age range of 65 to 80, willing to participate, and having experienced no major health problems. All participants were given user consent forms to sign prior to the focus groups. No personal or health-related data of the participants were recorded or stored. The specific focus groups were targeted to the appearance and functionality of the avatar, so the entire set of functionality for the MyHealthWatcher system was produced by employing a demo dataset of vital signs. The focus groups were moderated by an HCI expert with experience in evaluations involving elderly people. Two more persons were present during the sessions for notetaking and providing help when necessary. The research activity was conducted in accordance with the General Data Protection Regulation and received the approval of the Ethics Committee of the Foundation for Research and Technology—Hellas (Approval date: 1 March 2019/Reference number: 35/1-3-2019).
The structure of each focus group session was as follows. Initially, the moderator welcomed the participants and thanked them for their time and contribution to the study. Then, the moderator explained to the participants the goals and purpose of the overall MyHealthWatcher system and performed a demonstration of the VA service of the mobile application. To this end, a representative use case scenario was shown in which the operational VH was providing information regarding monitored data. This demonstration was performed repeatedly both through a mobile device and using a large screen in order to give the possibility to participants to concentrate on the VH’s physical and gestural attributes.
After the demonstrations, the moderator started an open discussion which aimed to gather information on the initial impression of the participants of the VH service and their personal opinions towards specific attributes of the VH, i.e., look and style, gestures, expressions, etc. The discussion was contacted in a relaxed and informal manner to ensure that all participants would feel comfortable sharing their thoughts and feelings. To ensure that the discussion would not derail off the main topic, the moderator posed specific questions, such as: “How did you like the look/the voice/the gestures of the VH”, “Did you find its service useful and why?”, etc., and then encouraged all participants to share their thoughts one after the other. The data collected were in the form of notes taken by the two assistants on the expressed comments and thoughts. After the conclusion of the two focus group sessions, the operators had a one-to-one session to cross-validate their findings and produce the final reporting of the experiment results, which are summarized below.
The main findings of the first focus group (low-quality VA) provided indications regarding the significance of various aspects of the VA. Surprisingly enough, the users did not focus on the visual appearance of the avatar, i.e., the graphics and texture quality. Instead, they focused on the movement of the avatar, which seemed not natural to them since generic animations were used for the movement that in some cases seemed detached from what the avatar was saying. A couple of participants mentioned that the avatar has “an aggressive stance”, while another said that the body movements were “too cartoonish” and “not fitting to the character of a health assistant”. Furthermore, the users negatively judged the lack of voice coloring due to the usage of a synthetic voice and the weird animation of facial expressions, since without blend shapes on the face of the avatar, only the mouth is moving when speaking. These details were better observed on the desktop screen where the VH was shown in a higher resolution. On the mobile phone, these details were less obvious and were not commented on as much as on the larger display. Some representative comments regarding the voice and facial features included the following: “she sounds robotic”, “her face looks unnatural”, “her mouth is weird”, etc.
The second focus group (high-quality VA) started with a positive overall impression due to the quality of the visual representation of the avatar. This made it more difficult to handle the conversation since the users were positively affected by the attractiveness of the avatar. Representative comments on the overall impression included “looks very professional”, “she has perfect skin”, and “she emits a sense of security”. To compensate for this fact, the moderator had to drive the conversation to more specific aspects of the appearance and behavior of the avatar, and ask targeted questions to the group just like in the first group. Regarding the lexical part of the interaction, the users seemed to enjoy that the voice was recorded and the movement of the avatar was natural concerning the spoken text, although they found some issues in the representation of facial expressions, such as that the mouth was not opening as wide as expected in relation with the text. Furthermore, they liked the idle animations that were recorded for the avatar. In general, the only negative comment was that the avatar appeared on top of the UI of the application and this seemed unrealistic. They would prefer to have a dedicated window or even hide the UI.

6. Lessons Learned and Future Work

The implementation and validation of the two VA variations in this research work allow us to safely draw several conclusions as a summary of the lessons learned during this research work. First, it is important to share that due to the existence of mature software components and tools, there is today the possibility to create and render human-like VHs to mobile devices, including human-like animation and behavior integrating speech capabilities. Of course, this comes with some limitations in quality and aesthetics, as underlined by this research work. Nevertheless, existing technology is still capable of supporting mainstream application needs. Second, we learned that there is a fragile connection between realism and user satisfaction concerning VAs. There are upper and lower limits to realism. The lower limit makes the VA creepy and unattractive, while the upper limit is when the uncanny valley effect comes into play when extreme realism is pursued. In this work, we explored the lower and upper limits by introducing a mainstream and a professional approach to character creation. We can safely conclude that behavioral parts of the improvement contribute more to user satisfaction as they enhance the human-like behavior of the model, such as idle animations, lip-synchronization, face morphing, etc. Third, we learned that through mobile devices, there is a possibility of compensating for the negative effects of reduced avatar quality since avatars appear in a small region of the screen and thus human perception is dedicated to identifying more generic aspects of human behavior, such as the quality and likeness of motion, rather than facial expressions. This makes the need for movement realism more important than other features of the agent. Fourth, we learned that due to the nature of our target group, long human agent dialogs may not be the best way to provide health-related information. Thus, we followed a hybrid approach where the UI of the mobile device presents an overview of the monitored values while the avatar is used on demand for the presentation of an overview and for the lexical presentation of notifications. We understood during our study and improvements that this hybrid approach was maximized the usability of the system.
Overall, we acknowledge that there is more to study than the results presented in this work since the conducted experiment focused only on the VA, while the overall system usability and user satisfaction are affected by a combination of the UI of the mobile application in conjunction with the functionality of the VH. Having selected the agent variation based on the results of this experiment, we expect to draw further conclusions through a user-based evaluation of the entire system that is planned as the next step of the MyHealthWatcher project.

Author Contributions

Conceptualization, M.F., I.A., N.P. and S.N.; methodology, M.F., E.K. and N.P.; software, M.F.; formal analysis, I.A.; writing—original draft preparation, N.P., M.F., I.A. and S.N.; writing—review and editing, M.F. and X.Z.; supervision, N.P., X.Z. and C.S.; project administration, M.F.; funding acquisition, X.Z. and N.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the European Regional Development Fund, in the context of the EPAnEK Greek national co-funded operational program “Competitiveness Entrepreneurship and Innovation” with a grant agreement number MIS 5031635.

Institutional Review Board Statement

The studies conducted by the MyHealthWatcher project were approved by the Ethics Committee of the Foundation for Research and Technology—Hellas (approval date: 1 March 2019, reference number: 35/1-3-2019).

Data Availability Statement

Data are available upon request.

Acknowledgments

The authors would like to thank the members and occupational therapy team of the TALOS Community Care Centre of Heraklion Municipality for their participation in activities that took place during the requirements elicitation. They would also like to thank the residents and staff of the Nursing Home-Charitable Foundations of Andreas and Maria Kalokairinos for their participation in the requirements elicitation phase.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Durán-Vega, L.A.; Santana-Mancilla, P.C.; Buenrostro-Mariscal, R.; Contreras-Castillo, J.; Anido-Rifón, L.E.; García-Ruiz, M.A.; Montesinos-López, O.A.; Estrada-González, F. An IoT system for remote health monitoring in elderly adults through a wearable device and mobile application. Geriatrics 2019, 4, 34. [Google Scholar] [CrossRef] [PubMed]
  2. Helbostad, J.L.; Vereijken, B.; Becker, C.; Todd, C.; Taraldsen, K.; Pijnappels, M.; Aminian, K.; Mellone, S. Mobile health applications to promote active and healthy ageing. Sensors 2017, 17, 622. [Google Scholar] [CrossRef]
  3. Engelsma, T.; Jaspers, M.W.; Peute, L.W. Considerate mHealth design for older adults with Alzheimer’s disease and related dementias (ADRD): A scoping review on usability barriers and design suggestions. Int. J. Med. Inform. 2021, 152, 104494. [Google Scholar] [CrossRef] [PubMed]
  4. Hsieh, K.L.; Fanning, J.T.; Rogers, W.A.; Wood, T.A.; Sosnoff, J.J. A fall risk mhealth app for older adults: Development and usability study. JMIR Aging 2018, 1, e11569. [Google Scholar] [CrossRef]
  5. McGarrigle, L.; Todd, C. Promotion of physical activity in older people using mHealth and eHealth technologies: Rapid review of reviews. J. Med. Internet Res. 2020, 22, e22201. [Google Scholar] [CrossRef]
  6. McNeill, A.; Briggs, P.; Pywell, J.; Coventry, L. Functional privacy concerns of older adults about pervasive health-monitoring systems. In Proceedings of the 10th International Conference on Pervasive Technologies Related to Assistive Environments, Island of Rhodes, Greece, 21–23 June 2017; pp. 96–102. [Google Scholar]
  7. Li, J.; Ma, Q.; Chan, A.H.; Man, S. Health monitoring through wearable technologies for older adults: Smart wearables acceptance model. Appl. Ergon. 2019, 75, 162–169. [Google Scholar] [CrossRef]
  8. Foukarakis, M.; Adami, I.; Ntoa, S.; Koutras, G.; Kutsuras, T.; Stefanakis, N.; Partarakis, N.; Ioannidi, D.; Zabulis, X.; Stephanidis, C. An Integrated Approach to Support Health Monitoring of Older Adults. In Proceedings of the ‘HCII 2022—Late Breaking Work—Posters’ Springer CCIS Volumes, Virtual Event, 26 June–1 July 2022. [Google Scholar]
  9. Campbell, J.C.; Hays, M.J.; Core, M.; Birch, M.; Bosack, M.; Clark, R.E. Interpersonal and leadership skills: Using virtual humans to teach new officers. In Proceedings of the Interservice/Industry Training, Simulation, and Education Conference, Orlando, FL, USA, 3–6 December 2011. [Google Scholar]
  10. Ieronutti, L.; Chittaro, L. Employing virtual humans for education and training in X3D/VRML worlds. Comput. Educ. 2007, 49, 93–109. [Google Scholar] [CrossRef]
  11. Chittaro, L.; Ranon, R.; Ieronutti, L. Guiding visitors of Web3D worlds through automatically generated tours. In Proceedings of the Eighth International Conference on 3D Web Technology, Saint Malo, France, 9–12 March 2003; pp. 27–38. [Google Scholar]
  12. Swartout, W.; Traum, D.; Artstein, R.; Noren, D.; Debevec, P.; Bronnenkant, K.; Williams, J.; Leuski, A.; Narayanan, S.; Piepol, D. Virtual museum guides demonstration. In Proceedings of the 2010 IEEE Spoken Language Technology Workshop, Berkeley, CA, USA, 12–15 December 2010; pp. 163–164. [Google Scholar]
  13. Ringas, C.; Tasiopoulou, E.; Kaplanidi, D.; Partarakis, N.; Zabulis, X.; Zidianakis, E.; Patakos, A.; Patsiouras, N.; Karuzaki, E.; Foukarakis, M.; et al. Traditional Craft Training and Demonstration in Museums. Heritage 2022, 5, 25. [Google Scholar] [CrossRef]
  14. Carre, A.L.; Dubois, A.; Partarakis, N.; Zabulis, X.; Patsiouras, N.; Mantinaki, E.; Zidianakis, E.; Cadi, N.; Baka, E.; Thalmann, N.M.; et al. Mixed-reality demonstration and training of glassblowing. Heritage 2022, 5, 6. [Google Scholar] [CrossRef]
  15. Aylett, R.; Vala, M.; Sequeira, P.; Paiva, A. Fearnot!—An emergent narrative approach to virtual dramas for anti-bullying education. In Virtual Storytelling. Using Virtual Reality Technologies for Storytelling; Springer: Berlin/Heidelberg, Germany, 2007; pp. 202–205. [Google Scholar]
  16. Stefanidi, E.; Partarakis, N.; Zabulis, X.; Papagiannakis, G. An approach for the visualization of crafts and machine usage in virtual environments. In Proceedings of the 13th International Conference on Advances in Computer-Human Interactions, Valencia, Spain, 21–25 November 2020; pp. 21–25. [Google Scholar]
  17. Stefanidi, E.; Partarakis, N.; Zabulis, X.; Zikas, P.; Papagiannakis, G.; Magnenat Thalmann, N. TooltY: An approach for the combination of motion capture and 3D reconstruction to present tool usage in 3D environments. In Intelligent Scene Modeling and Human-Computer Interaction; Springer: Cham, Switzerland, 2021; pp. 165–180. [Google Scholar]
  18. Bergenti, F.; Poggi, A. Developing smart emergency applications with multi-agent systems. Int. J. E-Health Med. Commun. 2010, 1, 1–13. [Google Scholar] [CrossRef][Green Version]
  19. Kimani, E.; Bickmore, T.; Trinh, H.; Ring, L.; Paasche-Orlow, M.K.; Magnani, J.W. A smartphone-based virtual agent for atrial fibrillation education and counseling. In Intelligent Virtual Agents; Springer: Cham, Switzerland, 2016; pp. 120–127. [Google Scholar]
  20. Baptista, S.; Wadley, G.; Bird, D.; Oldenburg, B.; Speight, J.; My Diabetes Coach Research Group. Acceptability of an embodied conversational agent for type 2 diabetes self-management education and support via a smartphone app: Mixed methods study. JMIR mHealth uHealth 2020, 8, e17038. [Google Scholar] [CrossRef] [PubMed]
  21. Philip, P.; Dupuy, L.; Morin, C.M.; de Sevin, E.; Bioulac, S.; Taillard, J.; Serre, F.; Auriacombe, M.; Micoulaud-Franchi, J.A. Smartphone-Based Virtual Agents to Help Individuals with Sleep Concerns During COVID-19 Confinement: Feasibility Study. J. Med. Internet Res. 2020, 22, e24268. [Google Scholar] [CrossRef] [PubMed]
  22. DeVault, D.; Artstein, R.; Benn, G.; Dey, T.; Fast, E.; Gainer, A.; Georgila, K.; Gratch, J.; Hartholt, A.; Ljommet, M.; et al. SimSensei Kiosk: A virtual human interviewer for healthcare decision support. In Proceedings of the 2014 International Conference on Autonomous Agents and Multi-Agent Systems, Paris, France, 5–9 May 2014; pp. 1061–1068. [Google Scholar]
  23. Sin, J.; Munteanu, C. An empirically grounded sociotechnical perspective on designing virtual agents for older adults. Hum. –Comput. Interact. 2020, 35, 481–510. [Google Scholar] [CrossRef]
  24. Esposito, A.; Amorese, T.; Cuciniello, M.; Riviello, M.T.; Esposito, A.M.; Troncone, A.; Torres, M.I.; Schlögl, S.; Cordasco, G. Elder user’s attitude toward assistive virtual agents: The role of voice and gender. J. Ambient. Intell. Humaniz. Comput. 2019, 12, 4429–4436. [Google Scholar] [CrossRef]
  25. Esposito, A.; Amorese, T.; Cuciniello, M.; Esposito, A.M.; Troncone, A.; Torres, M.I.; Schlögl, S.; Cordasco, G. Seniors’ acceptance of virtual humanoid agents. In Ambient Assisted Living; Springer: Cham, Switzerland, 2018; pp. 429–443. [Google Scholar]
  26. Hosseinpanah, A.; Krämer, N.C.; Straßmann, C. Empathy for everyone? The effect of age when evaluating a virtual agent. In Proceedings of the 6th International Conference on Human-Agent Interaction, Southampton, UK, 15–18 December 2018; pp. 184–190. [Google Scholar]
  27. Garner, T.A.; Powell, W.A.; Carr, V. Virtual carers for the elderly: A case study review of ethical responsibilities. Digit. Health 2016, 2, 2055207616681173. [Google Scholar] [CrossRef] [PubMed]
  28. Jung, Y.; Kuijper, A.; Kipp, M.; Miksatko, J.; Gratch, J.; Thalmann, D. Believable virtual characters in human-computer dialogs. In Proceedings of the Eurographics 2011—State of The Art Report, Llandudno, UK, 11–15 April 2011; pp. 75–100. [Google Scholar]
  29. Kasap, Z.; Magnenat-Thalmann, N. Intelligent virtual humans with autonomy and personality: SOA. Intell. Decis. Technol. 2007, 1, 3–15. [Google Scholar] [CrossRef]
  30. Shapiro, A. Building a character animation system. In Proceedings of the 4th International Motion in Games Conference, Edinburgh, UK, 13–15 November 2011. [Google Scholar]
  31. Papanikolaou, P.; Papagiannakis, G. Real-time separable subsurface scattering for animated virtual characters. Lecture Notes in Computer Science. In Proceedings of the 2013 Symposium on GPU Computing and Applications by NTU and NVIDIA, 9 October 2013; Springer: Berlin/Heidelberg, Germany, 2013; pp. 1–16. [Google Scholar]
  32. Partarakis, N.; Zabulis, X.; Foukarakis, M.; Moutsaki, M.; Zidianakis, E.; Patakos, A.; Adami, I.; Kaplanidi, D.; Ringas, C.; Tasiopoulou, E. Supporting sign language narrations in the museum. Heritage 2021, 5, 1. [Google Scholar] [CrossRef]
  33. Kosmopoulos, D.; Constantinopoulos, C.; Trigka, M.; Papazachariou, D.; Antzakas, K.; Lampropoulou, V.; Argyros, A.; Oikonomidis, I.; Roussos, A.; Partarakis, N.; et al. Museum Guidance in Sign Language: The SignGuide project. In Proceedings of the 15th International Conference on PErvasive Technologies Related to Assistive Environments, Corfu, Greece, 29 June–1 July 2022; pp. 646–652. [Google Scholar]
  34. Adami, I.; Foukarakis, M.; Ntoa, S.; Partarakis, N.; Stefanakis, N.; Koutras, G.; Kutsuras, T.; Ioannidi, D.; Zabulis, X.; Stephanidis, C. Monitoring Health Parameters of Elders to Support Independent Living and Improve Their Quality of Life. Sensors 2021, 21, 517. [Google Scholar] [CrossRef] [PubMed]
  35. Smart Band Vivosmart 4. Available online: https://www.garmin.com/en-US/p/605739 (accessed on 15 August 2022).
  36. Vivoactive 4. Available online: https://www.garmin.com/en-US/p/643382 (accessed on 15 August 2022).
  37. Fēnix 5X. Available online: https://www.garmin.com/en-US/p/560327 (accessed on 15 August 2022).
  38. Moodmetric Ring. Available online: https://moodmetric.com/services/moodmetric-smart-ring (accessed on 15 August 2022).
  39. Torniainen, J.; Cowley, B.; Henelius, A.; Lukander, K.; Pakarinen, S. Feasibility of an electrodermal activity ring prototype as a research tool. In Proceedings of the 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Milan, Italy, 25–29 August 2015; pp. 6433–6436. [Google Scholar]
  40. Heikkilä, P.; Honka, A.; Mach, S.; Schmalfuß, F.; Kaasinen, E.; Väänänen, K. Quantified Factory Worker: Expert Evaluation and Ethical Considerations of Wearable Self-tracking Devices. In Proceedings of the 22nd International Academic Mindtrek Conference, Tampere, Finland, 10–11 October 2018; Association for Computing Machinery ACM: New York, NY, USA, 2018; pp. 202–211. [Google Scholar] [CrossRef]
  41. Tool for Individual Stress Management. Available online: https://moodmetric.com/services/you/moodmetric-measurement/ (accessed on 2 May 2021).
  42. Malasinghe, L.P.; Ramzan, N.; Dahal, K. Remote patient monitoring: A comprehensive study. J. Ambient. Intell. Humaniz. Comput. 2017, 10, 57–76. [Google Scholar] [CrossRef]
  43. Esposito, A.; Amorese, T.; Cuciniello, M.; Riviello, M.T.; Esposito, A.M.; Troncone, A.; Cordasco, G. The dependability of voice on elders’ acceptance of humanoid agents. In Proceedings of the Interspeech 2019, Graz, Austria, 15–19 September 2019; pp. 31–35. [Google Scholar]
  44. Shaked, N.A. Avatars and virtual agents–relationship interfaces for the elderly. Healthc. Technol. Lett. 2017, 4, 83–87. [Google Scholar] [CrossRef] [PubMed]
  45. Unity 3D. Available online: https://unity.com/ (accessed on 2 May 2021).
  46. Mixamo. Available online: https://www.mixamo.com/ (accessed on 2 May 2021).
  47. Salsa LipSync Suite. Available online: https://crazyminnowstudio.com/unity-3d/lip-sync-salsa/ (accessed on 2 May 2021).
  48. Anjyo, K. Blendshape Facial Animation. In Handbook of Human Motion; Müller, B., Wolf, S., Eds.; Springer: Cham, Switzerland, 2018; pp. 2145–2155. [Google Scholar] [CrossRef]
  49. Character Creator 3. Available online: https://www.reallusion.com/character-creator/ (accessed on 2 May 2021).
  50. Rokoko Studio. Available online: https://www.rokoko.com/studio (accessed on 2 May 2021).
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.