1. Introduction
The global population is rapidly aging, leading to an increased demand for quality care and support for the elderly. According to a recent report published by the United Nations, the world population for people over 60 has surpassed the younger age groups and can be considered now about 16% of the Earth’s population [
1]. Normally, people of that age suffer from loneliness, social isolation, a lack of daily engagement, and a lack of physical and mental monitoring [
2]. Such problems can lead to serious medical problems such as heart diseases, stroke, Type 2 diabetes, depression, anxiety, suicidal thinking, self-harm, and dementia.
As traditional care models face challenges in meeting these needs, emerging technologies offer promising solutions. One such technology is the usage of domestic robots designed specifically for elderly care [
3]. Eldercare robots have the potential to make a real difference in the lives of elderly people. They can provide companionship, help with daily tasks, provide verbal and social communication, monitor health conditions, and provide early warnings in serious medical conditions [
4]. Hence, these robots can have positive impacts on the mental and emotional health of the elderly. Such needs are further accelerated during years of pandemics like COVID-19. As technology continues to develop, it is believed that eldercare robots will become more affordable and easier to use. This will make them a more viable option for families with elders, and it will help to address the growing problem of the shortage of human caregivers [
5].
Research on eldercare robots focuses on providing versatile robots that can track humans in indoor environments, monitor their vital signals, detect their postures, and detect falls and provide aid [
6]. Hence, research examples show continuous development in robot design, sensing technologies, indoor path planning, fall-detection techniques, and human–robot interactions in real-world scenarios.
Examples of eldercare robots include the robot Matilda [
7] which utilizes a fall-detection algorithm that analyzes sensor data from cameras, microphones, and depth sensors. This algorithm is designed to detect specific indicative fall patterns such as sudden changes in body position, impact, or abnormal behavior. Upon detecting a potential fall event, Matilda triggers immediate alerts for assistance.
Another research study was conducted on the robot Hobbit [
8], in which advanced algorithms are applied to monitor the environment using cameras, temperature sensors, and RGB-D cameras. Several algorithms were used to analyze the sensor data to detect falls based on different criteria, such as changes in body posture, rapid movements followed by a sudden stop, or collisions with objects. Once a fall is detected, Hobbit promptly notifies caregivers or emergency services. Another fall-detection algorithm is used for the Nao humanoid robot [
9] which utilizes its sensor suite, including motion sensors, cameras, and depth perception. This algorithm recognizes key fall-related patterns, such as sudden changes in orientation, significant acceleration or deceleration, or rapid descent.
Additionally, the robot Giraff provides remote communication and social interaction, and its fall-detection capabilities are achieved using relatively cheap hardware [
10]. While it does not have a dedicated fall-detection algorithm, Giraff’s video and audio capabilities enable remote caregivers to visually monitor older adults and provide immediate assistance if a fall occurs. Another robot named PARO [
11], which is a therapeutic robot resembling a baby harp seal, focuses on emotional support for older adults but does not have fall-detection capability. However, PARO’s presence and interaction with older adults have shown positive effects on emotional well-being, potentially reducing the risk of falls caused by psychological factors.
Pepper [
12], another humanoid robot, contributes to the emotional well-being of older adults. It helps in potentially reducing the fall risks associated with social isolation and depression but without a fall-detection feature. Moreover, Mabu [
13], an AI-powered robot designed to provide personalized healthcare assistance, does not have specific fall-detection capabilities. However, Mabu’s continuous monitoring of older adults’ health conditions and proactive intervention can contribute to the prevention of falls by addressing underlying health issues. The robot Zora [
14,
15] is also used in healthcare settings. It focuses on entertainment and therapy and has a dedicated fall-detection capability using a pose-estimation technique. In addition, its interactive activities and exercises contribute to cognitive stimulation and physical well-being, potentially reducing the risk of falls through enhanced mobility and engagement.
Other examples of robots in healthcare settings are the robots Care-O-bot [
16] and ElliQ [
17]. These robots provide social interaction, cognitive activities, and object-detection capabilities and can potentially reducing the risk of falls through enhanced mobility and engagement. However, neither of them has a fall-detection feature.
Another approach for eldercare robots is presented in [
18], in which a modular robotic platform with a fixed exoskeleton arm is used for elders who suffer from disabilities like stroke and multiple sclerosis. Such a design provides indoor mobility and assistance and negates the fall possibility. However, such robotic mountable platforms are not our main concern in this research.
In conclusion to the previous eldercare robots, it is evident that Matilda, Hobbit, Nao, and Giraff stand out with their dedicated fall-detection capabilities. These robots utilize different sensing technologies, including cameras, depth sensors, and motion sensors, combined with sophisticated algorithms to accurately detect falls and provide prompt assistance. User feedback supports the effectiveness of these robots in fall detection and the positive impact on older adults’ safety. In comparison to other robots, one potential limitation in the fall-detection algorithms of PARO, Pepper, Mabu, Zora, Care-O-bot, and ElliQ is the lack of real-time and accurate detection of falls. These robots rely on slower algorithms or less advanced techniques for identifying fall events, which can result in delayed response times or false alarms.
In this research, a new eldercare robot is presented. Complete software and hardware frameworks are developed. The software framework has two main features: the first is SLAM (simultaneous localization and mapping) and indoor path planning for the robot to be able to follow the elders continuously while avoiding indoor obstacles. The implemented path planning technique is RRT* [
19]. The second feature is fall and posture detection using real-time image processing. The YOLOv7 [
20] image processing algorithm is used in the fall detection model. This algorithm utilizes the convolutional neural networks (CNNs) machine-learning technique to provide real-time and precise detection of various objects in addition to falling and standing postures. By integrating the YOLOv7 and RRT* algorithms into the system, the robot controller can swiftly analyze video feeds from cameras and identify fall events with a high precision in addition to continuous human tracking. This allows for immediate notification and prompt assistance, minimizing the time between the occurrence of a fall and the initiation of aid.
Additionally, a hardware architecture is developed with high-level and low-level controllers in addition to the sensors, motor drivers, and power supply unit. A custom-designed electronic board (Robot Control Unit—RCU) is developed to act as the low-level controller with SPI, UART, I2C, and Bluetooth communication capabilities. To implement the generated prediction model on the high-level controller, two optimization techniques are used to decrease the model size by 60% to enable enhanced real-time implementation of the model on the high-level controller.
Furthermore, the new care giver robot is based on a cheap modular mechanical design, which makes it easy to customize the robot’s capabilities to meet the specific needs of the elderly person. By integrating these features, the intended robot can not only enhance the quality of life for the elderly but also provide crucial assistance in their day-to-day activities.
3. Control Architecture
3.1. Low-Level Control
For robot actuation, BLDC (brushless DC) motors are selected. This motor is chosen to provide sufficient torque for the robot’s movement and perform its tasks effectively. A capable motor controller is chosen to handle the required power and provide the necessary control interface. A specially designed printed circuit board is developed to be the robot control unit (RCU) as shown in
Figure 4. This RCU contains a low-level controller which is the ATMEGA2560 microcontroller. It is used to control the motor drivers and motors, obtain sensors readings, and enable seamless communication with other controllers through SPI and I
2C protocols.
Additionally, this RCU contains a relay module that is incorporated to control lights or other high-power devices. Velocity feedback is attained with the resolution of 1024 CPR by Hall effect sensor-based incremental encoders. Digital and analogue I/O pins are available for general-purpose use. The RCU unit is compatible with motor drivers that operate with a frequency of 20 kHz, lights, sensors, and other peripherals. The robot power is also monitored in the developed RCU board. The battery power in the active mode of the robot is 137 W, which is calculated by adding the power consumption of the motors, sensors, and electronics. The robot uses a lithium-ion battery with a voltage of 36 V and a capacity of 13 Ah. This battery can provide power to the robot for up to 4.5 h in the active mode. The robot RCU board is also expandable, with the option to add additional components or modules for future enhancements. This RCU software also can update the firmware OTA (over-the-air) and has additional connectors and pinouts for convenient integration with other system components.
3.2. High-Level Control
A hardware stack architecture is developed with both high and low control levels as shown in
Figure 5. An Nvidia Jetson Nano is used as the high-level controller. Additional sensors are connected to this controller like: the 2D Lidar, 9 DOF IMU MPU9520, and Raspberry Pi 2 camera. This high-level controller board is used to accommodate the high processing hardware and software that will later be added to the system. Continuous voltage, current monitoring, and power failure protection exist through the presence of the protection fuse unit. The final prototype of the robot is shown in
Figure 6.
A system software stack was developed, as shown in
Figure 7. It is used to execute the algorithms and techniques used for data processing, mapping, localization, and environment perception. The developed software consists of several layers. The perception layer, mapping and localization layer, path planning and navigation layer, actuation layer, and command execution layer are the added layers for the robot.
This approach has several advantages. First, it allows the low-level sensors to be connected directly to the RCU, which minimizes latency and improves the accuracy of the data. Second, it allows the high-level sensors to be connected to a powerful computer, which allows for more sophisticated processing and analysis of the data. The details of these layers will be explained in the following subsections.
3.3. Motion Control and Visual Interface Layers
The low-level sensors, which measure physical quantities such as acceleration, velocity, and orientation, and the encoders in the robot system are connected to the robot control unit (RCU). The high-level sensors, which perceive the environment around the robot, such as cameras, Lidar, and inertial measurement units (IMUs), are connected to a powerful computer, such as the Jetson Nano board. The RCU is responsible for collecting data from the low-level sensors and sending them to the Jetson Nano. The Jetson Nano is responsible for processing the data from the high-level sensors and detecting falls. The complete control system is built using ROS melodic with a visual user interface and continuous monitoring.
3.4. Perception Layers
For the perception system of the robot, the system incorporates a 2D Lidar sensor with a detection range of 12 m, a sampling rate of 4000 S/s, and a 360° scanning range. It is used for map building, obstacle detection, and avoidance. Additionally, an IMU sensor is added to capture orientation and motion data to enhance the robot’s navigation capabilities. These sensors work in synergy to provide a comprehensive perception system for the robot.
3.5. Indoor Navigation and SLAM Layers
The navigation stack and SLAM algorithms are usually used to enable autonomous navigation in robots. The navigation stack provides the infrastructure for the robot to navigate, while SLAM provides the map of the environment that the robot needs to navigate. Together, these two tools can be used to build accurate, robust, and efficient navigation systems for robots. There are two common SLAM algorithms that can be used: G-mapping and Hector SLAM [
21]. G-mapping is a 2D SLAM algorithm that uses a laser scanner to build a map of the environment. It provides a relatively simple environment, but it is not suitable for complex environments, narrow passages, reflective surfaces, and other complex features in indoor environments. On the other hand, Hector SLAM is a 2D SLAM algorithm that uses a laser scanner to build a map of the environment and is a more sophisticated algorithm that can handle complex and detailed features effectively. Hence, Hector SLAM is chosen for this system to be used in environment navigation, where accuracy and robustness are important.
3.6. Fall Detection Capability
Specific algorithms and techniques used for fall detection were explored, considering factors such as accuracy, false positive rate, real-time processing, sensitivity, and specificity. A fall-detection model was developed using a Raspberry Pi 2 camera and an NVIDIA Jetson Nano computing platform. This study employs a custom dataset of 10K images, compiled and annotated by the author as shown in
Figure 8. using the Roboflow platform [
22]. To facilitate robust model training and assessment, the dataset was divided into an 85.5%–9.5%–5% split for training, validation, and testing, respectively.
The fall-detection model was developed using images acquired from various sources on the internet and processed using an NVIDIA Jetson Nano computing platform. These images encompassed a wide range of scenarios and situations in which falls might occur.
The labelling technique involved manually annotating the collected images. Human annotators reviewed each image and marked the regions where a person was falling or not falling. This process generated ground truth labels for the training dataset.
Figure 8 illustrates a subset of these labelled images to provide an overview of the dataset. This approach allowed for a diverse dataset that covered various fall scenarios.
Next, YOLOv7 was used as the object-detection algorithm. An object detector performs image-recognition tasks by taking an image as input and then predicting bounding boxes and class probabilities for each object in the image. The YOLO algorithm series uses deep convolutional neural networks (CNNs) to extract features from the image to detect objects [
23]. A block diagram that explains the main components of the YOLOv7 algorithm is shown in
Figure 9. The “Backbone” block is responsible for creating image features. The “Neck” block is where a group of neural network layers combines and mixes features. The last block named “Head” receives the features from the previous block and generates predictions.
In this research, object detection directly pinpoints fallen individuals, bypassing complex pose analysis and offering a more efficient and reliable solution for fall detection, prioritizing precision in fall detection and minimizes false alarms.
To reduce the model size and improve its inference speed on the Jetson Nano, the model has been quantized. The model size was reduced by 40% through quantization and an additional 20% through structured pruning in hardware implementation. Normally, quantization is a common machine-learning technique to reduce model size [
24]. On the other hand, pruning is a machine-learning technique that decreases the size of complex models by eliminating some of their parameters to make them run faster on hardware [
25]. Structured pruning is convenient for structured models like CNN models as it preserves their structure. The final model was able to detect falls at a rate of 30 frames per second.
4. Experimental Testing
Field testing for the eldercare robot included indoor path planning and real human fall detection. An indoor testing map is prepared for testing that includes static obstacles with different sizes as shown in
Figure 10. The robot and the patient positions are also shown in the same figure. The map is roughly 3 m × 7.1 m, and the number of static obstacles is 16. The RRT* indoor path planning technique was used for robot navigation.
RRT is a sampling-based path planning technique that is used for single-query problems [
19]. It draws edges by iteratively adding random nodes and uses collision detection method to verify the feasibility of the edges. This algorithm generates a tree of possible trajectories that is rooted at an initial vertex. Normally, it guarantees that the probability of finding a possible path is close to 1 when the number of iterations tends to infinity. RRT* is an optimized version of the RRT technique. It considers the path cost, hence decreasing the time for path generation. The algorithm is explained in Pseudocode 1 and can be summarized as follows (Algorithm 1):
Algorithm 1 Pseudocode for the path planning algorithm RRT* [19] |
Input: n of nodes, step size Output: Tree T = (V, E)
initialize V = {}, E = ∅ |
for i = 0: n do ← random sample from C ← nearest neighbor of qi in C ← + if ∈ then V ← V ∪ {}, E ← E ∪ {(, )} end for
|
The experimental robot path and the reconstructed 2D map using Lidar feedback is shown in
Figure 11. Robot feedback and control were conducted according to the perception procedures that were previously mentioned in
Section 3. The robot actual path has a travel distance = 3.528 × 10
3 mm, an average velocity of 0.784 m/s, a travel time of 4.5 s, and a maximum acceleration of 0.174 m/s
2. The final point is varied from the destination point as the mean absolute error point on the x- and y-axes and is 0.187 m and 0.248 m, respectively.
Fall detection was simulated by humans, and the results were recorded and analyzed by using the YOLOv7 classifier. Fall-detection results were evaluated using standard parameters like accuracy, precision, recall, F-score, and true negative rate [
26]. The true positive (
TP) is obtained when an abnormal event is detected between the first and the last frame where the abnormal action took place. A true negative (
TN) is a normal action that is not detected as abnormal. False positives (
FPs) are normal actions reported as abnormal, and false negatives (
FNs) are abnormal behaviors not reported by the system. These parameters are calculated as follows:
The experimental results of the prediction model are shown in
Figure 12. The figure is divided into two rows, the first indicates the training results in addition to its precision and recall, while the second indicates the validation results in addition to its precision and recall. Three types of loss are shown in this figure: box loss, objectness loss, and categorization loss. The box loss shows the algorithm’s ability to locate the object center and estimate its bounding box. The objectness metric quantifies how likely it is that an object can be found in a given area. High objectivity suggests that it is likely that an object lies inside the visible region of an image. On the other hand, classification loss indicates the accuracy with which an algorithm can determine the right class of an object. All these results are obtained during 0–250 iterations. The accuracy, precision, recall, F-score, and true negative rate parameters were found to be 96%, 99.1%, 93.5%, 96.2%, and 99%, respectively. The confusion matrix for the prediction model is shown in
Figure 13, while the real images for the experiment are shown in
Figure 14.
6. Conclusions and Future Work
In this paper, an eldercare robot was developed to address the unique needs of the elderly population. The comprehensive approach included hardware design, motor calculations, system integration, and experimental testing. The robot’s control architecture accommodates both hardware and software layers, utilizing Hector SLAM and local path planners within the ROS framework. This enables the robot to navigate, detect obstacles, and provide fall detection, showcasing its potential for enhancing elderly well-being. RRT* was chosen as the path-planning technique in experimental testing. The robot was tested in a real indoor environment and achieved a positioning accuracy of 94.7% and 93% in the x- and y-axes, respectively. Additionally, a fall-detection model was developed using the YOLOv7 algorithm which achieved an accuracy of 96% and a precision of 99.1% after testing. The model size was reduced by 40% through quantization and an additional 20% through pruning to enable its deployment in the Jetson Nano controller.
However, this study acknowledges that this is an initial step, and further validation and real-world testing are needed across caregiving settings. Continuous refinement of both hardware and software components is crucial to address evolving elderly needs. Future exploration could involve enhancing human–robot interactions through incorporating natural language processing and emotion recognition. Also, advanced fall detection can be achieved through integrating more sensors for real-time and accurate results. Additional features can be used to classify falls and generate datasets that can be segmented according to different features, such as the human body, head pose, joint angles, gait, background, lighting, and any noise or distortion. Finally, an automatic docking station is planned to be added to the robot.