Falling Detection of Toddlers Based on Improved YOLOv8 Models
Abstract
:1. Introduction
- This study use GHT to identify forms and extract the shape of the “safe zone” surface using a predefined template.
- A new dataset of 500 video clips (×30 FPS) is created using 200 real-time daily videos collected from 100 parents. Another test set of 100 falling video clips (×30 FPS) is created using 30 real-time daily videos collected from 8 parents. The datasets collected from daily surveillance video feature varying lighting conditions, camera settings, furniture placement, and furnishing arrangement, increasing the diversity of the generated dataset and enhancing the framework’s generalization capacity.
- Given that the standard YOLOv8 method is insufficient for collecting body and head information at higher precision, we have improved the original YOLOv8 program. We recommend replacing the C2f module in YOLO with the GELAN module. One advantage of GELAN is that it uses the convolutional network layer and can contain any computational module. It increases the overall flexibility of the network topology and allows it to fulfill the needs of our information extraction techniques successfully.
- A real-time system is proposed to detect instances of toddlers falling in their home surroundings. This technology efficiently and precisely captures data of toddler targets in challenging environments, converting the human body into a rectangular shape, simplifying the data without sacrificing accuracy, and increasing the usefulness of real-time artificial intelligence for fall detection. It is an early warning preparation for a later head injury assessment. After thorough enhancement and streamlining, it may be deployed on the hardware infrastructure of smart home devices.
2. The Proposed Methods
- Posture changes: Toddlers show a variety of postural modifications. Their limbs are often more flexible than adults, who often maintain an upright posture and move regularly. This variation can make it difficult to identify posture via skeletal analysis.
- Differences in physical characteristics: Toddlers have shorter limbs, larger heads, and notably distinct proportions compared to adults. OpenPose may not capture these traits well, leading to imprecise detection outcomes.
- Differences in the coherence of movement: Toddlers have less predictable and coherent motions and tend to move quicker and erratically, especially when lying down or crawling, making it challenging for the OpenPose model to track them effectively.
- Problems with loose clothing covering: Toddlers typically wear clothing that fits loosely, which can conceal joints and make it more challenging to identify them.
- The problem of real-time availability: OpenPose primarily captures human skeletal point data, and excess recognition data may cause interference. In real applications, the approach requires significant computational resources and processes at a slower speed.
2.1. Extracting Body Information of Toddler
2.2. Focusing on Toddler’s Head Information
- The method is robust to image noise or pixel module processing.
- The method boasts robust feature selection and verification capabilities, effectively preventing the extraction of non-target features. Additionally, it significantly reduces the computation time.
- The approach is trained using the minimal entropy concept and may be readily parallelized. However, false alarms may still occur. As a result, the precise head location requires further localization.
Algorithm 1. Body Information Extraction |
Inputs: Images to be processed Output: Center of mass point K of the child’s body, center of mass point H of the head 1. Segment the information of the toddler’s surveillance frame; 2. Capture the toddler’s body information and extract the centroid K; 3. Decompose the body information into two interconnected cubes: the head and the main torso part; 4. Based on the information in the in-1 frame or the in-frame, predict the possible locations of the human body in the current frame and narrow down the search range; IF: Human body has been detected, output center of mass point H of the head; Otherwise: Expanding the search area to the entire frame to look for body information that output center of mass point H of the head. |
2.3. “Safe Zone” Demarcation
Algorithm 2. GHT based bed surface detection |
Inputs: Colorful image to be detected Output: Detected bed boundary line 1. Convert the color image to be detected into gray scale; 2. Remove the high frequency signal and smooth the image by Gaussian kernel denoising; 3. Use gradient operator, Laplace operator, canny, and sobel for edge extraction; 4. Perform the edge point judgment by the principle of binarization, i.e., gray scale value = 0/255; 5. Prepare two containers, one for displaying the Hough-space profile, an array Hough-space used to store the values of voting; 6. Take the local maxima, set the threshold, filter the interfering straight line, draw the straight line, and calibrate the corner points. Output the bed boundary |
2.4. State Label Delineation
Algorithm 3. Security Symbol Division |
Inputs: center of mass point K of the child’s body, center of mass point H of the head, safety zone Output: safety markers S1, S2, S3 If: H is within , then it is currently in S1; Point H leaves the region, and K is within , the current output is S2; Otherwise: The output is S3. |
3. Experiments and Results
3.1. Dataset
3.2. Experimental Setting
- The first is that the system correctly identifies the toddler’s current state.
- The second is that an event that did not result in a fall was wrongly classified as an alarm state.
- The third one is that no fall occurs; hence, the algorithm does not segment it.
- The fourth is a fall event that the system does not recognize.
- True Positive (TP): A fall occurred, and the system accurately classified the fall.
- False Positive (FP): The fall did not occur, but the system misclassified it.
- True negative (TN): No fall occurred, and the system accurately identified it.
- False negative (FN): A fall occurred, but the system incorrectly labeled it.
3.3. GELAN Improved Results
- More efficient feature extraction: The GELAN module combines the advantages of multiple network structures to enhance the model’s ability to recognize toddlers’ body features through effective hierarchical aggregation and the combination of CSPNet and ELAN. This combination not only enhances the richness of the features, but also speeds up the feature extraction by keeping its time difference within 3 s.
- Adaptable: The improved YOLOv8 model is optimized for complex background and small target detection. The flexibility of the GELAN module allows the network to better adapt to the characteristics of young children’s large changes in body size and dynamics, which improves the robustness of the detection and increases the accuracy by nearly 20% compared to the original YOLO model.
- Optimized network structure: GELAN reduces the computational burden of the model through more efficient data processing and simplification of the network structure, while maintaining high accuracy with an output frame rate of 69.97 FPS, which makes the model more suitable for running on resource-limited devices and improves the model’s practicality and scalability.
3.4. Human Information Extraction Results
3.5. State Classification Results
4. Discussion
- The computer’s performance influences the computing time of the recognition algorithm in this article. In practice, computing platforms may behave differently from our specific setting in research. The algorithm’s GPU usage, memory requirements, and energy consumption will be assessed in our future work to discover resource bottlenecks. Based on the evaluation results, we will incorporate lightweight optimization strategies like model trimming and quantization to make the algorithm perform efficiently on traditional CPUs and edge devices utilized in real-time applications. These changes will improve our research’s practicality and sustainability.
- We did not examine the safety zone of other pieces of furniture, such as dining chairs, dining tables, strollers, and other surfaces that toddlers frequently interact with and are at a high risk of falling from. Our future research should focus on expanding this further.
- Based on the error results of our detection, we should develop corresponding correction mechanisms. Firstly, certain toys and patterns may still be mistaken for toddlers, leading to error detection. These confused objects are so close to toddlers in texture, color, and form that even video viewers find it difficult to distinguish between them. To effectively filter out small noises, the algorithm requires more optimization. Secondly, our algorithm is sensitive to noise caused by unusual photos and changing scene conditions. When confronted with a situation in which a toddler is in bed with a quilt covering the body, the system fails to capture the physical features. Thirdly, The RGB camera-based method remains unrecognizable due to excessively dim illumination. We should enhance the technique’s capacity to identify tiny, illegible targets, adjust model parameters, and enhance the complexity of model training.
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
SDI | Shuffle Dimensionally Integrated |
ELAN | Long Short-Term Memory Efficient Layer Aggregation Networks |
SPPCSPC | Spatial Pyramid Pooling, Cross Stage Partial |
STGCN | Spatio-Temporal Graph Convolutional Network |
ECA | Efficient Channel Attention |
SSD | Single Shot Detector |
ADLs | Activities of Daily Living |
GELAN | Generalized Efficient Layer Aggregation Network |
CSPNet | Cross Stage Partial Network |
GHT | Generalized Hough Transform |
RF | Random Forest |
TP | True Positive |
FP | False Positive |
TN | True Negative |
FN | False Negative |
References
- Bartlett, S.N. The problem of children’s injuries in low-income countries: A review. Health Policy Plan. 2002, 17, 1–13. [Google Scholar] [CrossRef]
- Li, L.; Mao, X.; Shi, Q.; Ma, A.; Wang, L. Meta-analysis of the incidence of fall injuries among Chinese children and adolescents, 2002–2012. China School Health 2014, 10, 1534. [Google Scholar]
- The National Center for Chronic and Noncommunicable Disease Control and Prevention China (NCNCD). Review of Chinese Children and Adolescents; NCNCD: Beijing, China, 2018. [Google Scholar]
- Mecocci, A.; Micheli, F.; Zoppetti, C.; Baghini, A. Automatic Falls Detection in Hospital-Room Context. In Proceedings of the 7th IEEE International Conference on Cognitive Infocommunications (CogInfoCom), Wroclaw, Poland, 16–18 October 2016; pp. 127–132. [Google Scholar]
- Mongkolnam, P.; Booranrom, Y.; Watanapa, B.; Visutarrom, T.; Chan, J.H.; Nukoolkit, C. Smart bedroom for the elderly with gesture and posture analyses using Kinect. Maejo Int. J. Sci. Technol. 2017, 11, 1–16. [Google Scholar]
- Song, K.S.; Nho, Y.H.; Kwon, D.S. Histogram based Fall Prediction of Patients using a Thermal Imagery Camera. In Proceedings of the 14th International Conference on Ubiquitous Robots and Ambient Intelligence (URAI), Jeju, Republic of Korea, 28 June–1 July 2017; pp. 161–164. [Google Scholar]
- Kittipanya-Ngam, P.; Guat, O.S.; Lung, E.H. Computer vision applications for patients monitoring system. In Proceedings of the International Conference on Information Fusion, Singapore, 9–12 July 2012. [Google Scholar]
- Banerjee, T.; Enayati, M.; Keller, J.M.; Skubic, M.; Popescu, M.; Rantz, M. Monitoring Patients in Hospital Beds Using Unobtrusive Depth Sensors. In Proceedings of the 36th Annual International Conference of the IEEE-Engineering-in-Medicine-and-Biology-Society (EMBC), Chicago, IL, USA, 26–30 August 2014; pp. 5904–5907. [Google Scholar]
- Ni, B.B.; Dat, N.C.; Moulin, P. RGB-D camera based get-up event detection for hospital fall prevention. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Kyoto, Japan, 25–30 March 2012; pp. 1405–1408. [Google Scholar]
- Brulin, D.; Benezeth, Y.; Courtial, E. Posture Recognition Based on Fuzzy Logic for Home Monitoring of the Elderly. IEEE Trans. Inf. Technol. Biomed. 2012, 16, 974–982. [Google Scholar] [CrossRef]
- Takeda, F. Proposal of an Awakening Behavior Detection System for Medical Use and Adaptation for Fluctuation of the Brightness Quantity with Infrared Camera Device Kinect. In Proceedings of the 9th International Conference on Signal-Image Technology and Internet-Based Systems (SITIS), Kyoto, Japan, 2–5 December 2013; pp. 714–719. [Google Scholar]
- Shen, Y.; Hao, Z.; Wang, P.; Ma, S.; Liu, W. A Novel Human Detection Approach Based on Depth Map via Kinect. In Proceedings of the 26th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, OR, USA, 23–28 June 2013; pp. 535–541. [Google Scholar]
- Zhao, F.; Cao, Z.G.; Xiao, Y.; Mao, J.; Yuan, J.S. Real-Time Detection of Fall from Bed Using a Single Depth Camera. IEEE Trans. Autom. Sci. Eng. 2019, 16, 1018–1032. [Google Scholar] [CrossRef]
- Volkhardt, M.; Schneemann, F.; Gross, H.M. Fallen Person Detection for Mobile Robots using 3D Depth Data. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics (SMC), Manchester, UK, 13–16 October 2013; pp. 3573–3578. [Google Scholar]
- Antonello, M.; Carraro, M.; Pierobon, M.; Menegatti, E. Fast and Robust Detection of Fallen People from a Mobile Robot. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017; pp. 4159–4166. [Google Scholar]
- Nishi, K.; Miura, J. A head position estimation method for a variety of recumbent positions for a care robot. Mechatronics 2015, 2015, 61–64. [Google Scholar] [CrossRef]
- Wang, S.; Zabir, S.; Leibe, B. Lying Pose Recognition for Elderly Fall Detection. In Proceedings of the Robotics: Science and Systems VII, Los Angeles, CA, USA, 27–30 June 2011; Durrant-Whyte, H., Roy, N., Abbeel, P., Eds.; MIT Press: Cambridge, MA, USA, 2012. [Google Scholar]
- Zhao, D.; Song, T.; Gao, J.; Li, D.; Niu, Y. YOLO-Fall: A Novel Convolutional Neural Network Model for Fall Detection in Open Spaces. In Proceedings of the IEEE Access, Manchester, UK, 6 February 2024; pp. 26137–26149. [Google Scholar]
- Lyu, L.; Liu, Y.; Xu, X.; Yan, P.; Zhang, J. EFP-YOLO: A quantitative detection algorithm for marine benthic organisms. Ocean Coast. Manag. 2023, 243, 106770. [Google Scholar] [CrossRef]
- He, C. Image Compressive Sensing via Multi-scale Feature Extraction and Attention Mechanism. In Proceedings of the 2020 International Conference on Intelligent Computing, Automation and Systems (ICICAS), Chongqing, China, 11–13 December 2020; pp. 266–270. [Google Scholar]
- Pereira, G.A. Fall Detection for Industrial Setups Using YOLOv8 Variants. arXiv 2024, arXiv:2408.04605. [Google Scholar]
- Sun, G.; Wang, Z. Fall detection algorithm for the elderly based on human posture estimation. In Proceedings of the 2020 Asia-Pacific Conference on Image Processing, Electronics and Computers (IPEC), Dalian, China, 14–16 April 2020; pp. 172–176. [Google Scholar]
- He, Q.; Zhang, H.; Mei, Z.; Xu, X. High accuracy intelligent real-time framework for detecting infant drowning based on deep learning. Expert Syst. Appl. 2023, 228, 120204. [Google Scholar] [CrossRef]
- Núñez-Marcos, A.; Arganda-Carreras, I. Transformer-based fall detection in videos. Eng. Appl. Artif. Intell. 2024, 132, 107937. [Google Scholar] [CrossRef]
- Martínez-Villaseor, L.; Ponce, H.; Brieva, J.; Moya-Albor, E.; Peafort-Asturiano, C.J.S. UP-Fall Detection Dataset: A Multimodal Approach. Sensors 2019, 19, 1988. [Google Scholar] [CrossRef] [PubMed]
- Kwolek, B.; Kepski, M.J.C.M.; Biomedicine, P.i. Human fall detection on embedded platform using depth maps and wireless accelerometer. Comp. Methods Programs Biomed. 2014, 117, 489–501. [Google Scholar] [CrossRef] [PubMed]
- Qin, Y.; Miao, W.; Qian, C. A High-Precision Fall Detection Model Based on Dynamic Convolution in Complex Scenes. Electronics 2024, 13, 1141. [Google Scholar] [CrossRef]
- Maudsley-Barton, S.; Yap, M.H. Objective falls risk assessment using markerless motion capture and representational machine learning. Sensors 2024, 24, 4593. [Google Scholar] [CrossRef]
- Wagh, K.V.; Kulkarni, R.K. Human Tracking System. In Proceedings of the International Conference on Electronics and Communication Systems (ICECS), Coimbatore, India, 13–14 February 2014. [Google Scholar]
- Rhodes, J.S.; Cutler, A.; Moon, K.R. Geometry and Accuracy Preserving Random Forest Proximities. IEEE Trans. Pattern Anal. Mach. 2023, 45, 10947–10959. [Google Scholar] [CrossRef]
- Ballard, D.H. Generalizing the Hough transform to detect arbitrary shapes. Pattern Recognit. 1981, 13, 111–122. [Google Scholar] [CrossRef]
- Liao, B.; Li, J.; Ju, Z.; Ouyang, G. Hand gesture recognition with generalized hough transform and DC-CNN using realsense. In Proceedings of the 2018 Eighth International Conference on Information Science and Technology (ICIST), Cordoba, Spain, 30 June–6 July 2018; pp. 84–90. [Google Scholar]
- Chen, H.; Gao, T.; Qian, G.; Chen, W.; Zhang, Y. Tensored generalized hough transform for object detection in remote sensing images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 3503–3520. [Google Scholar] [CrossRef]
- Zhang, C.; Wang, F.; Zou, Y.; Dimyadi, J.; Guo, B.H.; Hou, L. Automated UAV image-to-BIM registration for building façade inspection using improved generalised Hough transform. Autom. Constr. 2023, 153, 104957. [Google Scholar] [CrossRef]
- Xia, L.; Chen, C.C.; Aggarwal, J.K. Human detection using depth information by Kinect. In Proceedings of the IEEE Conference on Computer Vision & Pattern Recognition Workshops, Washington, DC, USA, 20–25 June 2011. [Google Scholar]
- Zhou, C.; Xu, X.; Huang, T.; Kaner, J. Effect of different postures and loads on joint motion and muscle activity in older adults during overhead retrieval. Front. Physiol. 2024, 14, 1303577. [Google Scholar] [CrossRef] [PubMed]
Position Relation | Safety Symbol | State Type |
---|---|---|
H ∈ Sp | S1 | security status |
H ⊄ Sp, K ∈ Sp | S2 | alarm status |
H ⊄ Sp, K ∈ Sp | S3 | dangerous status |
Model | P | R | F1 | [email protected] |
---|---|---|---|---|
YOLOv3 | 0.8394 | 0.8222 | 0.7903 | 0.8561 |
YOLOv5 | 0.7699 | 0.7810 | 0.7801 | 0.7994 |
YOLOv6 | 0.8588 | 0.8817 | 0.8591 | 0.9 |
YOLOv8 | 0.905 | 0.9271 | 0.9211 | 0.9411 |
Improved YOLOv8 | 0.9491 | 0.9553 | 0.9572 | 0.9726 |
Safety Status | Warning Status | Dangerous Status |
---|---|---|
60 | 96 | 144 |
Sensitivity | Specificity | Accuracy | Precision | F-Value |
---|---|---|---|---|
96.67% | 95% | 96.33% | 98.72% | 97.68% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yang, Z.; Tsui, B.; Ning, J.; Wu, Z. Falling Detection of Toddlers Based on Improved YOLOv8 Models. Sensors 2024, 24, 6451. https://doi.org/10.3390/s24196451
Yang Z, Tsui B, Ning J, Wu Z. Falling Detection of Toddlers Based on Improved YOLOv8 Models. Sensors. 2024; 24(19):6451. https://doi.org/10.3390/s24196451
Chicago/Turabian StyleYang, Ziqian, Baiyu Tsui, Jiachuan Ning, and Zhihui Wu. 2024. "Falling Detection of Toddlers Based on Improved YOLOv8 Models" Sensors 24, no. 19: 6451. https://doi.org/10.3390/s24196451