Synergistic Integration of Skeletal Kinematic Features for Vision-Based Fall Detection
Abstract
:1. Introduction
- i
- Local features such as the colour, texture, and intensity of the image
- ii
- Global features such as the image silhouette, edges, spatial points
- iii
- Depth features, that extract the depth information of the image.
- Segmenting the human skeleton into five sections (left hand, right hand, left leg, right leg, and a craniocaudal section).
- Extracting the distances from the segmented parts (Spatial domain).
- Calculating the angles within the segments (Spatial domain).
- Calculating the angle of inclination for every segment (Spatial domain).
- To capture temporal dynamics, the extracted spatial features are arranged in the temporal sequence. As a result, the feature descriptor in the proposed approach preserves the spatiotemporal dynamics.
- We have achieved very good performance compared to the state-of-the-art approaches.
- The performance of our method is evaluated on the UPfall dataset.
- By extracting m frames from the UPfall dataset using Equation (1).
- The AlphaPose pre-trained network was applied to retrieve 17 keypoints of the subject from every frame.
- From the 17 keypoints, 13 keypoints were retrieved and two additional keypoints were computed. In total, 15 keypoints were used for the process.
- These 15 keypoints were used to segment the human skeleton into five sections: left hand, right hand, left leg, right leg, and a craniocaudal section.
- Three features were extracted from each section. Specifically, the length of the section (distances), the angle made by the points that depict a section and the angle made by every section with the x-axis.
- As a result, 15 features were retrieved from each frame. Each feature was represented by a column.
- Thus, we extracted characteristics from m frames and aligned them column-wise so that each row represents one video frame in a temporal sequence.
- Hence a feature descriptor was formed. This descriptor was the input to the machine learning algorithms.
- Also, to preserve the ground truth, every video was labelled as a fall or not a fall.
- The accuracy of the machine learning algorithms was computed using the ground truth data.
2. Literature Review
2.1. Sensor-Based Technology
2.2. Vision-Based Technology
3. Proposed Approach
3.1. Dataset
3.2. Pre-Processing
- i
- The SSTN + SPPE method, which combines a symmetric spatial transformer network (SSTN) with a parallel single-person pose estimation (SPPE) method, is used to create pose recommendations from human bounding boxes. SPPE is used in parallel with SSTN to control the output when it is unable to provide the desired pose.
- ii
- A technique known as parametric pose non-max-suppression (NMS) is used to find similar poses in the dataset and choose the one with the highest score to prevent redundant poses. The dataset is streamlined in this way, and only the poses that are most accurate and pertinent are kept.
- iii
- The system’s accuracy and robustness are increased using a technique called the pose-guided proposals generator. It operates by locating the human object in the scene and suggesting several bounding boxes that correspond to the various stances the person might strike. These bounding boxes are created using computer vision techniques that estimate the positions of the human joints. This method enables the system to record a large range of potential poses and motions.
3.3. Assumptions
3.4. Kinetic Vector Calculation
4. Experimental Study and Result Analysis
4.1. Decision Tree Algorithm
4.2. Random Forest Algorithm
4.3. Gradient Boost Algorithm
4.4. Evaluation Metrics
- : Accuracy is an important measure of the classifier’s performance.It is the ratio that relates to the proportion of precise predictions made compared to all other predictions. Accuracy is defined as given in Equation (25).: accuracy: number of correct predictions: total number of predictions
- : The confusion matrix is a statistic used to evaluate the performance of a predictive model that can be provided in either tabular or matrix style. It serves as a visual representation of a classification model’s true positive, false positive, true negative, and false negative predictions, as shown in Figure 9.The number of falls that were reported as falling is known as true positives (). The amount of non-falls that were anticipated to be non-falls is known as true negatives (). The term “false positives” () refers to the number of non-falls that were mistakenly identified as falling. False negative () refers to the number of falls that were assumed not to occur.
- : Precision is another measure that calculates how accurately the model predicts positive outcomes. From the total number of samples classified as positive, the number of true positive predictions is identified as the model’s precision. A model with high precision has a lower number of false positives. The precision (Pre) is calculated in the Equation (26).
- : In machine learning, sensitivity refers to the true positive rate. From the total number of positive samples of ground truth, the number of samples predicted as positive defines the model’s sensitivity. A model with high sensitivity has predicted most of the positive samples correctly, resulting in low false negatives. Sensitivity is calculated as in the Equation (27).
- : In machine learning, specificity refers to the true negative rate. From the total number of negative samples in the ground truth, the number of samples classified as negative defines the specificity of the model. A model with high specificity has predicted most of the negative instances correctly. The specificity (Spe) is calculated as given in Equation (28).
- -score: The F1-score metric represents the balance between the true positive rate and the precision. It combines the precision and recall scores to get a single score that assesses a predictive model’s overall effectiveness. F1-score ranges from 0 to 1, where 1 indicates the best case. The F1-score is calculated using both precision and recall as given in Equation (29).
- : The curve represents the model performance of a binary classifier. refers to the area under the curve and refers to the receiver operating characteristic. The is a probability curve that plots the true positive rate (sensitivity) against the false positive rate (1–specificity) at various classification thresholds. A higher means that the model is better in its classification.
Performance of the Algorithms
5. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Liu, Y.; Chan, J.S.; Yan, J.H. Neuropsychological mechanisms of falls in older adults. Front. Aging Neurosci. 2014, 6, 64. [Google Scholar] [CrossRef] [PubMed]
- CDC. Fact Sheet. 2021. Available online: https://www.cdc.gov/visionhealth/resources/features/vision-loss-falls.html (accessed on 1 January 2023).
- Alarifi, A.; Alwadain, A. Killer heuristic optimized convolution neural network-based fall detection with wearable IoT sensor devices. Measurement 2021, 167, 108258. [Google Scholar] [CrossRef]
- Şengül, G.; Karakaya, M.; Misra, S.; Abayomi-Alli, O.O.; Damaševičius, R. Deep learning based fall detection using smartwatches for healthcare applications. Biomed. Signal Process. Control 2022, 71, 103242. [Google Scholar] [CrossRef]
- Wu, X.; Zheng, Y.; Chu, C.H.; Cheng, L.; Kim, J. Applying deep learning technology for automatic fall detection using mobile sensors. Biomed. Signal Process. Control 2022, 72, 103355. [Google Scholar] [CrossRef]
- De, A.; Saha, A.; Kumar, P.; Pal, G. Fall detection method based on spatio-temporal feature fusion using combined two-channel classification. Multimed. Tools Appl. 2022, 81, 26081–26100. [Google Scholar] [CrossRef]
- Galvão, Y.M.; Ferreira, J.; Albuquerque, V.A.; Barros, P.; Fernandes, B.J. A multimodal approach using deep learning for fall detection. Expert Syst. Appl. 2021, 168, 114226. [Google Scholar] [CrossRef]
- Inturi, A.R.; Manikandan, V.; Garrapally, V. A novel vision-based fall detection scheme using keypoints of human skeleton with long short-term memory network. Arab. J. Sci. Eng. 2023, 48, 1143–1155. [Google Scholar] [CrossRef]
- Forsyth, D.; Ponce, J. Computer Vision: A Modern Approach; Prentice Hall: Hoboken, NJ, USA, 2011. [Google Scholar]
- Baumgart, B.G. A polyhedron representation for computer vision. In Proceedings of the National Computer Conference and Exposition, Anaheim, CA, USA, 19–22 May 1975; pp. 589–596. [Google Scholar]
- Shirai, Y. Three-Dimensional Computer Vision; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
- Jordan, M.I.; Mitchell, T.M. Machine learning: Trends, perspectives, and prospects. Science 2015, 349, 255–260. [Google Scholar] [CrossRef]
- Sammut, C.; Webb, G.I. Encyclopedia of Machine Learning; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2011. [Google Scholar]
- Liu, B. Sentiment analysis and subjectivity. In Handbook of Natural Language Processing; Routledge: Abingdon, UK, 2010; Volume 2, pp. 627–666. [Google Scholar]
- Jelinek, F. Statistical Methods for Speech Recognition; MIT Press: Cambridge, MA, USA, 1997. [Google Scholar]
- Yu, D.; Deng, L. Automatic Speech Recognition; Springer: Berlin, Germany, 2016. [Google Scholar]
- Pavlidis, T. Algorithms for Graphics and Image Processing; Springer Science & Business Media: Berlin, Germany, 2012. [Google Scholar]
- Russ, J.C. The Image Processing Handbook; CRC Press: Boca Raton, FL, USA, 2006. [Google Scholar]
- Huang, T.S.; Schreiber, W.F.; Tretiak, O.J. Image processing. In Advances in Image Processing and Understanding: A Festschrift for Thomas S Huang; World Scientific: Singapore, 2002; pp. 367–390. [Google Scholar]
- Messing, R.; Pal, C.; Kautz, H. Activity recognition using the velocity histories of tracked keypoints. In Proceedings of the 2009 IEEE 12th International Conference on Computer Vision, Kyoto, Japan, 29 September–2 October 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 104–111. [Google Scholar]
- Zhang, C.; Tian, Y. RGB-D camera-based daily living activity recognition. J. Comput. Vis. Image Process. 2012, 2, 12. [Google Scholar]
- Hong, Y.J.; Kim, I.J.; Ahn, S.C.; Kim, H.G. Activity recognition using wearable sensors for elder care. In Proceedings of the 2008 Second International Conference on Future Generation Communication and Networking, Hainan, China, 13–15 December 2008; IEEE: Piscataway, NJ, USA, 2008; Volume 2, pp. 302–305. [Google Scholar]
- Wu, F.; Zhao, H.; Zhao, Y.; Zhong, H. Development of a wearable-sensor-based fall detection system. Int. J. Telemed. Appl. 2015, 2015, 576364. [Google Scholar] [CrossRef] [Green Version]
- Bourke, A.K.; Lyons, G.M. A threshold-based fall-detection algorithm using a bi-axial gyroscope sensor. Med. Eng. Phys. 2008, 30, 84–90. [Google Scholar] [CrossRef] [PubMed]
- Chaccour, K.; Darazi, R.; el Hassans, A.H.; Andres, E. Smart carpet using differential piezoresistive pressure sensors for elderly fall detection. In Proceedings of the 2015 IEEE 11th International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob), Abu Dhabi, United Arab Emirates, 19–21 October 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 225–229. [Google Scholar]
- Feng, G.; Mai, J.; Ban, Z.; Guo, X.; Wang, G. Floor pressure imaging for fall detection with fiber-optic sensors. IEEE Pervasive Comput. 2016, 15, 40–47. [Google Scholar] [CrossRef]
- Jagedish, S.A.; Ramachandran, M.; Kumar, A.; Sheikh, T.H. Wearable Devices with Recurrent Neural Networks for Real-Time Fall Detection. In Proceedings of the International Conference on Innovative Computing and Communications—ICICC 2022, Delhi, India, 19–20 February 2022; Springer: Berlin, Germany, 2022; Volume 2, pp. 357–366. [Google Scholar]
- Li, W.; Zhang, D.; Li, Y.; Wu, Z.; Chen, J.; Zhang, D.; Hu, Y.; Sun, Q.; Chen, Y. Real-time fall detection using mmWave radar. In Proceedings of the ICASSP 2022—2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 23–27 May 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 16–20. [Google Scholar]
- Yao, C.; Hu, J.; Min, W.; Deng, Z.; Zou, S.; Min, W. A novel real-time fall detection method based on head segmentation and convolutional neural network. J. Real-Time Image Process. 2020, 17, 1939–1949. [Google Scholar] [CrossRef]
- Khraief, C.; Benzarti, F.; Amiri, H. Elderly fall detection based on multi-stream deep convolutional networks. Multimed. Tools Appl. 2020, 79, 19537–19560. [Google Scholar] [CrossRef]
- Mobsite, S.; Alaoui, N.; Boulmalf, M.; Ghogho, M. Semantic segmentation-based system for fall detection and post-fall posture classification. Eng. Appl. Artif. Intell. 2023, 117, 105616. [Google Scholar] [CrossRef]
- Ramirez, H.; Velastin, S.A.; Meza, I.; Fabregas, E.; Makris, D.; Farias, G. Fall detection and activity recognition using human skeleton features. IEEE Access 2021, 9, 33532–33542. [Google Scholar] [CrossRef]
- Alaoui, A.Y.; El Fkihi, S.; Thami, R.O.H. Fall detection for elderly people using the variation of key points of human skeleton. IEEE Access 2019, 7, 154786–154795. [Google Scholar] [CrossRef]
- Mansoor, M.; Amin, R.; Mustafa, Z.; Sengan, S.; Aldabbas, H.; Alharbi, M.T. A machine learning approach for non-invasive fall detection using Kinect. Multimed. Tools Appl. 2022, 81, 15491–15519. [Google Scholar] [CrossRef]
- Zhang, X.; Yu, H.; Zhuang, Y. A robust RGB-D visual odometry with moving object detection in dynamic indoor scenes. IET Cyber-Syst. Robot. 2023, 5, e12079. [Google Scholar] [CrossRef]
- Martínez-Villaseñor, L.; Ponce, H.; Brieva, J.; Moya-Albor, E.; Núñez-Martínez, J.; Peñafort-Asturiano, C. UP-fall detection dataset: A multimodal approach. Sensors 2019, 19, 1988. [Google Scholar] [CrossRef] [Green Version]
- Fang, H.S.; Xie, S.; Tai, Y.W.; Lu, C. Rmpe: Regional multi-person pose estimation. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2334–2343. [Google Scholar]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; Springer: Berlin, Germany, 2014; pp. 740–755. [Google Scholar]
- Espinosa, R.; Ponce, H.; Gutiérrez, S.; Martínez-Villaseñor, L.; Brieva, J.; Moya-Albor, E. A vision-based approach for fall detection using multiple cameras and convolutional neural networks: A case study using the UP-Fall detection dataset. Comput. Biol. Med. 2019, 115, 103520. [Google Scholar] [CrossRef] [PubMed]
- Espinosa, R.; Ponce, H.; Gutiérrez, S.; Martínez-Villaseñor, L.; Brieva, J.; Moya-Albor, E. Application of convolutional neural networks for fall detection using multiple cameras. In Challenges and Trends in Multimodal Fall Detection for Healthcare; Springer: Berlin, Germany, 2020; pp. 97–120. [Google Scholar]
Approach | Method | Algorithm |
---|---|---|
[29] | Head and Torso tracking | Convolutional neural network |
[30] | Stacked human silhouette | Binary motion is observed |
[31] | Pixel-wise multi scaling skip connection network | Conv LSTM |
[32] | Skeleton keypoints using AlphaPose | Random forest, Support vector machine, k-Nearest neighbors, Multi layer perceptron |
[33] | Distance and angle between same keypoints in successive frames. | Random forest, Support vector machine, k-Nearest neighbors, Multi layer perceptron |
Accuracy (%) | Precision (%) | Sensitivity (%) | Specificity (%) | F1-Score (%) | |
---|---|---|---|---|---|
Decision Tree | 88.39 | 84.48 | 92.45 | 84.74 | 88.28 |
Random Forest | 96.43 | 98.03 | 94.33 | 98.30 | 96.15 |
Gradient Boost | 98.32 | 98.11 | 98.11 | 98.30 | 98.11 |
Author | Classifier | Sensitivity (%) | Specificity (%) | Precision (%) | F1-Score (%) | Accuracy (%) |
---|---|---|---|---|---|---|
[39] | Convolutional neural network (CNN) Lateral camera (cam1) | 97.72 | 81.58 | 95.24 | 97.20 | 95.24 |
Convolutional neural network (CNN) Frontal camera (cam2) | 95.57 | 79.67 | 96.30 | 96.93 | 94.78 | |
[36] | K-Nearest Neighbors (KNN) | 15.54 | 93.09 | 15.32 | 15.19 | 34.03 |
Support Vector Machine (SVM) | 14.30 | 92.97 | 13.81 | 13.83 | 34.40 | |
Random Forest (RF) | 14.48 | 92.9 | 14.45 | 14.38 | 32.33 | |
Multilayer Perceptron (MLP) | 10.59 | 92.21 | 8.59 | 7.31 | 34.03 | |
Convolutional neural network (CNN) | 71.3 | 99.5 | 71.8 | 71.2 | 95.1 | |
[40] | Convolutional neural network (CNN) | 97.95 | 83.08 | 96.91 | 97.43 | 95.64 |
[32] | Average(RF, SVM, MLP, KNN) | 96.80 | 99.11 | 96.94 | 96.87 | 97.59 |
[8] | Convolutional neural network (CNN) + Long-short term memory (LSTM) | 94.37 | 98.96 | 91.08 | 92.47 | 96.72 |
[31] | ConvLSTM | 97.68 | - | 97.71 | 97.68 | 97.68 |
Our Proposed Work | Gradient Boost (GB) | 98.11 | 98.30 | 98.11 | 98.11 | 98.32 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Inturi, A.R.; Manikandan, V.M.; Kumar, M.N.; Wang, S.; Zhang, Y. Synergistic Integration of Skeletal Kinematic Features for Vision-Based Fall Detection. Sensors 2023, 23, 6283. https://doi.org/10.3390/s23146283
Inturi AR, Manikandan VM, Kumar MN, Wang S, Zhang Y. Synergistic Integration of Skeletal Kinematic Features for Vision-Based Fall Detection. Sensors. 2023; 23(14):6283. https://doi.org/10.3390/s23146283
Chicago/Turabian StyleInturi, Anitha Rani, Vazhora Malayil Manikandan, Mahamkali Naveen Kumar, Shuihua Wang, and Yudong Zhang. 2023. "Synergistic Integration of Skeletal Kinematic Features for Vision-Based Fall Detection" Sensors 23, no. 14: 6283. https://doi.org/10.3390/s23146283
APA StyleInturi, A. R., Manikandan, V. M., Kumar, M. N., Wang, S., & Zhang, Y. (2023). Synergistic Integration of Skeletal Kinematic Features for Vision-Based Fall Detection. Sensors, 23(14), 6283. https://doi.org/10.3390/s23146283