Gesture Recognition of Filipino Sign Language Using Convolutional and Long Short-Term Memory Deep Neural Networks
Abstract
:1. Introduction
- We collected a video dataset of FSL gestures from the partner deaf community.
- We constructed and trained a neural network-based SLR system optimized to recognize FSL and dynamic gestures in real-time.
- We derived a lightweight subset model with performance comparable to the proposed SLR system.
- We assessed and evaluated the performance of the designed SLR and its corresponding lightweight versions.
2. Background and Related Work
2.1. Filipino Sign Language
- Handshape refers to the arrangement of the fingers and their respective joints. For sign languages such as FSL that use a lot of initialization, a change in handshape will change only the meaning of a sign.
- Location refers to the placement of the hand(s). Signs, especially ones made on the head and face, are often used to distinguish similarities or differences between different forms of signs.
- Palm orientation refers to where the palm is facing. For some signs, whether or not the palm of the hand is facing upward or downward is crucial in distinguishing them, as opposed to signs made in the opposite direction that mean entirely different words.
- Movement refers to the change in handshape and/or path of the hands. Some signs with the same handshape and location but with different changes in movement can mean entirely different words.
- Non-manual signals refer to facial expressions and/or movement of the other parts of the body that are used with signing. These signals convey grammar and meaning in addition to the use of hands. They portray tone, emotion, and intent in a visual form.
2.2. Sign Language Recognition (SLR) Systems
2.3. Image Processing
2.4. Contextual Captioning by Neural Networks for Images
2.5. Challenges in FSL Research
3. Methodology
3.1. Design of System Architecture
3.2. System Flow
Algorithm 1: Proposed SLR System Process. |
3.3. Ethical Considerations
3.4. Data Protection Plan
3.5. Data Collection and Preparation
3.5.1. Input Image Data Preparation
3.5.2. Data Augmentation
3.6. CNN-LSTM Model
3.7. Training Environment
3.8. Optimization
3.9. Testing Environment
4. Results and Discussion
4.1. Base Model Training History
4.2. Model Size
4.3. Classification Evaluation
4.4. Inference Time
4.5. Memory Utilization
4.6. Results Summary
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- World Health Organization. Deafness and Hearing Loss. 2021. Available online: https://www.who.int/news-room/fact-sheets/detail/deafness-and-hearing-loss (accessed on 28 October 2021).
- Nearly One in Six in the Philippines Has Serious Hearing Problems. 2021. Available online: https://hearingyou.org/news/nearly-one-in-six-in-the-philippines-has-serious-hearing-problems/ (accessed on 28 October 2021).
- Newall, J.P.; Martinez, N.; Swanepoel, D.W.; McMahon, C.M. A national survey of hearing loss in the Philippines. Asia Pac. J. Public Health 2020, 32, 235–241. [Google Scholar] [CrossRef] [PubMed]
- Maria Feona Imperial. Kinds of Sign Language in the Philippines. 2015. Available online: https://verafiles.org/articles/kinds-of-sign-language-in-the-philippines (accessed on 28 October 2021).
- Mendoza, A. The Sign Language Unique to Deaf Filipinos. 2018. Available online: https://web.archive.org/web/20221010011356/cnnphilippines.com/life/culture/2018/10/29/Filipino-Sign-Language.html (accessed on 29 October 2021).
- RA 11106—An Act Declaring the Filipino Sign Language as the National Sign Language of the Filipino Deaf and the Official Sign Language of Government in All Transactions Involving the Deaf, and Mandating Its Use in Schools, Broadcast Media, and Workplaces. 2018. Available online: https://www.ncda.gov.ph/disability-laws/republic-acts/ra-11106/ (accessed on 28 October 2021).
- Resources for the Blind, Inc.—Philippines. GABAY (Guide): Strengthening Inclusive Education for Blind, Deaf and Deafblind Children. 2021. Available online: https://www.edu-links.org/sites/default/files/media/file/CIES%202021%20Gabay%20Presentation.pdf (accessed on 30 October 2021).
- Butler, C. Signs of Inclusion. USAID Supports Creation of Filipino Sign Language Dictionary and Curriculum. 2021. Available online: https://medium.com/usaid-2030/signs-of-inclusion-5d78d91bce51 (accessed on 29 October 2021).
- Rivera, J.P.; Ong, C. Facial expression recognition in filipino sign language: Classification using 3D Animation units. In Proceedings of the 18th Philippine Computing Science Congress (PCSC 2018), Cagayan de Oro, Philippines, 15–17 March 2018; pp. 1–8. [Google Scholar]
- Cabalfin, E.P.; Martinez, L.B.; Guevara, R.C.L.; Naval, P.C. Filipino sign language recognition using manifold projection learning. In Proceedings of the TENCON 2012 IEEE Region 10 Conference, Cebu, Philippines, 19–22 November 2012; IEEE: Piscataway, NJ, USA, 2012; pp. 1–5. [Google Scholar]
- Montefalcon, M.D.; Padilla, J.R.; Llabanes Rodriguez, R. Filipino Sign Language Recognition Using Deep Learning. In Proceedings of the 2021 5th International Conference on E-Society, E-Education and E-Technology, Taipei, Taiwan, 21–23 August 2021; Association for Computing Machinery: New York, NY, USA, 2021; pp. 219–225. [Google Scholar]
- Jarabese, M.B.D.; Marzan, C.S.; Boado, J.Q.; Lopez, R.R.M.F.; Ofiana, L.G.B.; Pilarca, K.J.P. Sign to Speech Convolutional Neural Network-Based Filipino Sign Language Hand Gesture Recognition System. In Proceedings of the 2021 International Symposium on Computer Science and Intelligent Controls (ISCSIC), Rome, Italy, 12–14 November 2021; pp. 147–153. [Google Scholar] [CrossRef]
- Kishore, P.; Kumar, P.R. A video based Indian sign language recognition system (INSLR) using wavelet transform and fuzzy logic. Int. J. Eng. Technol. 2012, 4, 537–542. [Google Scholar] [CrossRef]
- Tolentino, L.K.S.; Juan, R.S.; Thio-ac, A.C.; Pamahoy, M.A.B.; Forteza, J.R.R.; Garcia, X.J.O. Static sign language recognition using deep learning. Int. J. Mach. Learn. Comput. 2019, 9, 821–827. [Google Scholar] [CrossRef]
- Ong, C.; Lim, I.; Lu, J.; Ng, C.; Ong, T. Sign-language recognition through gesture & movement analysis (SIGMA). In Mechatronics and Machine Vision in Practice 3; Springer: Cham, Switzerland, 2018; pp. 235–245. [Google Scholar]
- Adeyanju, I.; Bello, O.; Adegboye, M. Machine learning methods for sign language recognition: A critical review and analysis. Intell. Syst. Appl. 2021, 12, 200056. [Google Scholar] [CrossRef]
- Pansare, J.; Gawande, S.; Ingle, M. Real-Time Static Hand Gesture Recognition for American Sign Language (ASL) in Complex Background. J. Signal Inf. Process. 2012, 3, 364–367. [Google Scholar] [CrossRef]
- Islam, M.M.; Siddiqua, S.; Afnan, J. Real time Hand Gesture Recognition using different algorithms based on American Sign Language. In Proceedings of the 2017 IEEE International Conference on Imaging, Vision & Pattern Recognition (icIVPR), Dhaka, Bangladesh, 13–14 February 2017; pp. 1–6. [Google Scholar] [CrossRef]
- Sia, J.C.; Cronin, K.; Ducusin, R.; Tuaño, C.; Rivera, P. The Use of Motion Sensing to Recognize Filipino Sign Language Movements; De La Salle University: Manila, Philippines, 2019. [Google Scholar] [CrossRef]
- Burns, E. What Is Machine Learning and Why Is It Important? 2021. Available online: https://searchenterpriseai.techtarget.com/definition/machine-learning-ML (accessed on 5 November 2021).
- Murillo, S.C.M.; Villanueva, M.C.A.E.; Tamayo, K.I.M.; Apolinario, M.J.V.; Lopez, M.J.D.; Edd. Speak the Sign: A Real-Time Sign Language to Text Converter Application for Basic Filipino Words and Phrases. Cent. Asian J. Math. Theory Comput. Sci. 2021, 2, 1–8. [Google Scholar]
- Yasrab, R.; Gu, N.; Zhang, X. An Encoder-Decoder Based Convolution Neural Network (CNN) for Future Advanced Driver Assistance System (ADAS). Appl. Sci. 2017, 7, 312. [Google Scholar] [CrossRef]
- What is Long Short-Term Memory (LSTM)? Available online: https://algoscale.com/blog/what-is-long-short-term-memory-lstm/ (accessed on 30 April 2021).
- Brownlee, J. CNN Long Short-Term Memory Networks. 2017. Available online: https://machinelearningmastery.com/author/jasonb/ (accessed on 30 April 2021).
- Tiongson, P.V.; Martinez, L. Initiatives for Dialogue. In Full Access: A Compendium on Sign Language Advocacy and Access of the Deaf to the Legal System; Initiatives for Dialogue and Empowerment through Alternative Legal Services (IDEALS), Inc.: Quezon City, Philippines, 2007. [Google Scholar]
- Abat, R.; Martinez, L.B. The history of sign language in the Philippines: Piecing together the puzzle. In Proceedings of the 9th Philippine Linguistics Congress: Proceedings, Quezon City, Philippines, 25–27 January 2006. [Google Scholar]
- Andrada, J.; Domingo, R. Key Findings for Language Planning from the National Sign Language Committee (Status Report on the Use of Sign Language in the Philippines). In Proceedings of the 9th Philippine Linguistics Congress: Proceedings, Quezon City, Philippines, 25–27 January 2006. [Google Scholar]
- Philippine Deaf Research Center; Philippine Federation of the Deaf. An Introduction to Filipino Sign Language; Philippine Deaf Resource Center: Quezon City, Philippines, 2004. [Google Scholar]
- Ong, S.C.; Ranganath, S. Automatic sign language analysis: A survey and the future beyond lexical meaning. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 873–891. [Google Scholar] [CrossRef] [PubMed]
- Vakunov, A.; Chang, C.L.; Zhang, F.; Sung, G.; Grundmann, M.; Bazarevsky, V. MediaPipe Hands: On-Device Real-Time Hand Tracking. 2020. Available online: https://mixedreality.cs.cornell.edu/workshop (accessed on 3 March 2022).
- Nikhil, B. Image Data Pre-Processing for Neural Networks; Medium: San Francisco, CA, USA, 2017. [Google Scholar]
- Miao, Q.; Pan, B.; Wang, H.; Hsu, K.; Sorooshian, S. Improving Monsoon Precipitation Prediction Using Combined Convolutional and Long Short Term Memory Neural Network. Water 2019, 11, 977. [Google Scholar] [CrossRef]
- Bilgera, C.; Yamamoto, A.; Sawano, M.; Matsukura, H.; Ishida, H. Application of Convolutional Long Short-Term Memory Neural Networks to Signals Collected from a Sensor Network for Autonomous Gas Source Localization in Outdoor Environments. Sensors 2018, 18, 4484. [Google Scholar] [CrossRef] [PubMed]
- Moudhgalya, N.B.; Sundar, S.; Divi, S.; Mirunalini, P.; Aravindan, C.; Jaisakthi, S.M. Convolutional Long Short-Term Memory Neural Networks for Hierarchical Species Prediction. In Working Notes of CLEF 2018, Proceedings of the Conference and Labs of the Evaluation Forum, Avignon, France, 10–14 September 2018; Cappellato, L., Ferro, N., Nie, J.Y., Soulier, L., Eds.; CEUR Workshop Proceedings; CEUR-WS.org: Kyiv, Ukraine, 2018; Volume 2125. [Google Scholar]
- Öztürk, Ş.; Özkaya, U. Residual LSTM layered CNN for classification of gastrointestinal tract diseases. J. Biomed. Inform. 2021, 113, 103638. [Google Scholar] [CrossRef] [PubMed]
- Öztürk, Ş.; Özkaya, U. Gastrointestinal tract classification using improved LSTM based CNN. Multimed. Tools Appl. 2020, 79, 28825–28840. [Google Scholar] [CrossRef]
- Dertat, A. Applied Deep Learning—Part 4: Convolutional Neural Networks. 2017. Available online: https://towardsdatascience.com/applied-deep-learning-part-4-convolutional-neural-networks-584bc134c1e2 (accessed on 16 April 2022).
- Liu, Z.; Wang, Y.; Han, K.; Zhang, W.; Ma, S.; Gao, W. Post-Training Quantization for Vision Transformer; Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W., Eds.; Advances in Neural Information Processing Systems; Curran Associates Inc.: Red Hook, NY, USA, 2021. [Google Scholar]
- Wang, P.; Chen, Q.; He, X.; Cheng, J. Towards accurate post-training network quantization via bit-split and stitching. In Proceedings of the 37th International Conference on Machine Learning, PMLR, Virtual, 13–18 July 2020; pp. 9847–9856. [Google Scholar]
- Tensorflow Team. Post-Training Quantization. Available online: https://www.tensorflow.org/lite/performance/post_training_quantization (accessed on 9 December 2022).
- Tailor, S.A.; Fernández-Marqués, J.; Lane, N.D. Degree-Quant: Quantization-Aware Training for Graph Neural Networks. arXiv 2020, arXiv:2008.05000. [Google Scholar]
- Li, Y.; Gong, R.; Tan, X.; Yang, Y.; Hu, P.; Zhang, Q.; Yu, F.; Wang, W.; Gu, S. BRECQ: Pushing the Limit of Post-Training Quantization by Block Reconstruction. arXiv 2021, arXiv:2102.05426. [Google Scholar]
- Gholami, A.; Kim, S.; Dong, Z.; Yao, Z.; Mahoney, M.W.; Keutzer, K. A survey of quantization methods for efficient neural network inference. arXiv 2021, arXiv:2103.13630. [Google Scholar]
- Tensorflow Team. Quantization Aware Training Comprehensive Guide: Tensorflow Model Optimization. Available online: https://www.tensorflow.org/model_optimization/guide/quantization/training_comprehensive_guide.md (accessed on 17 January 2023).
- Warden, P.; Situnayake, D. Tinyml: Machine Learning with Tensorflow Lite on Arduino and Ultra-Low-Power Microcontrollers; O’Reilly Media: Sebastopol, CA, USA, 2019. [Google Scholar]
Paper | Parameters | Data Input Form | Method | Accuracy |
---|---|---|---|---|
[11] | Handshapes (FSL) | Static Images | ResNet Convolutional NN | 86.70% |
[12] | Handshapes (FSL) | Video | Convolutional NN | 95.00% |
[13] | Hand signs and Head movement (Indian Sign Language) | Video | Fuzzy Inference System | 96.00% |
[17] | Handshapes (ASL) | Static Images | Euclidean Distance | 90.19% |
[18] | Handshapes (ASL) | Video | Artificial Neural Network (NN) | 94.32% |
Common Expressions | Please | Hello | Yes | Issue |
Thank You | My name is | No | Sick | |
Understand | Don’t Understand | |||
Transactional Expressions | Payment | Where restroom | Fill-up form | |
Check-up Doctor | Wait in line |
Parameters | Details |
---|---|
Batch Size | 8 |
Callback Functions | EarlyStopping, TensorBoard, CSVLogger, ModelCheckpoint |
Loss | Categorical Cross-entropy |
Metrics | Accuracy |
Optimizer | Adam |
Hardware & Software | Details | ||
---|---|---|---|
Setup 1 | Model | Lenovo IdeaPad 3 15ARE05 | |
CPU | AMD Ryzen 7 4700 U | ||
Architecture | Zen 2 (x86) | ||
RAM | 8 GB | ||
OS | Windows 11 22H2 | ||
Python | 3.9.16 | ||
Tensorflow | 2.10.0 | ||
Keras | 2.10.0 | ||
Setup 2 | Model | Raspberry Pi 4 Model B | |
CPU | Broadcom BCM2711 | ||
Architecture | ARMv8-A | ||
RAM | 4 GB | ||
OS | Raspberry Pi OS 11 “Bullseye” | ||
Python | 3.9.16 | ||
Tensorflow | 2.10.0 | ||
Keras | 2.10.0 |
Class | Precision | Recall | F1-Score | Support |
---|---|---|---|---|
0 (Check-up Doctor) | 0.99 | 0.99 | 0.99 | 142 |
1 (Don’t Understand) | 0.94 | 0.99 | 0.97 | 130 |
2 (Fill-up Form) | 1.00 | 0.97 | 0.99 | 139 |
3 (Hello) | 0.98 | 0.99 | 0.99 | 139 |
4 (Issue) | 0.99 | 1.00 | 1.00 | 162 |
5 (My Name is) | 0.98 | 0.96 | 0.97 | 131 |
6 (No) | 0.97 | 0.99 | 0.98 | 140 |
7 (Payment) | 1.00 | 0.98 | 0.99 | 160 |
8 (Please) | 0.99 | 0.94 | 0.96 | 158 |
9 (Sick) | 0.98 | 0.98 | 0.98 | 129 |
10 (Thank You) | 0.97 | 0.99 | 0.98 | 136 |
11 (Understand) | 0.93 | 0.96 | 0.94 | 141 |
12 (Wait in Line) | 0.98 | 0.98 | 0.98 | 133 |
13 (Where restroom) | 0.95 | 0.97 | 0.96 | 125 |
14 (Yes) | 0.98 | 0.93 | 0.96 | 135 |
accuracy | 0.98 | 2100 | ||
macro avg | 0.98 | 0.98 | 0.98 | 2100 |
weighted avg | 0.98 | 0.98 | 0.98 | 2100 |
Setup 1 (Laptop with x86 CPU) | ||||||||
---|---|---|---|---|---|---|---|---|
Descr | Classification Metrics | Average Inference Time (s) | Average Memory Utilization (KB) | Model Size (KB) | ||||
Accuracy | Precision | Recall | F1-Score | |||||
Base | CNN-LSTM | 0.980 | 0.982 | 0.980 | 0.980 | 0.161 | 255.102 | 9843.883 |
LW 1 | PQ | 0.980 | 0.982 | 0.980 | 0.980 | 46.992 | 0.063 | 842.102 |
LW 2 | QAT | 0.980 | 0.982 | 0.980 | 0.980 | 0.401 | 0.063 | 3278.391 |
Setup 2 (Raspberry Pi with ARM CPU) | ||||||||
Model | Descr | Classification Metrics | Average Inference Time (s) | Average Memory Utilization (KB) | Model Size (KB) | |||
Accuracy | Precision | Recall | F1-Score | |||||
Base | CNN-LSTM | 0.990 | 0.991 | 0.990 | 0.990 | 1.450 | 25.141 | 9843.883 |
LW 1 | PQ | 0.970 | 0.972 | 0.970 | 0.970 | 1.615 | 0.063 | 842.102 |
LW 2 | QAT | 0.990 | 0.992 | 0.990 | 0.990 | 2.631 | 0.063 | 3278.391 |
Comparison with Other Works | ||||||||
Ref # | Descr | Classification Metrics | Average Inference Time (s) | Average Memory Utilization (KB) | Model Size (KB) | |||
Accuracy | Precision | Recall | F1-Score | |||||
[11] | ResNet | 0.867 | — | — | — | — | — | — |
[12] | CNN | 0.950 | — | — | — | — | — | — |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cayme, K.J.; Retutal, V.A.; Salubre, M.E.; Astillo, P.V.; Cañete, L.G., Jr.; Choudhary, G. Gesture Recognition of Filipino Sign Language Using Convolutional and Long Short-Term Memory Deep Neural Networks. Knowledge 2024, 4, 358-381. https://doi.org/10.3390/knowledge4030020
Cayme KJ, Retutal VA, Salubre ME, Astillo PV, Cañete LG Jr., Choudhary G. Gesture Recognition of Filipino Sign Language Using Convolutional and Long Short-Term Memory Deep Neural Networks. Knowledge. 2024; 4(3):358-381. https://doi.org/10.3390/knowledge4030020
Chicago/Turabian StyleCayme, Karl Jensen, Vince Andrei Retutal, Miguel Edwin Salubre, Philip Virgil Astillo, Luis Gerardo Cañete, Jr., and Gaurav Choudhary. 2024. "Gesture Recognition of Filipino Sign Language Using Convolutional and Long Short-Term Memory Deep Neural Networks" Knowledge 4, no. 3: 358-381. https://doi.org/10.3390/knowledge4030020
APA StyleCayme, K. J., Retutal, V. A., Salubre, M. E., Astillo, P. V., Cañete, L. G., Jr., & Choudhary, G. (2024). Gesture Recognition of Filipino Sign Language Using Convolutional and Long Short-Term Memory Deep Neural Networks. Knowledge, 4(3), 358-381. https://doi.org/10.3390/knowledge4030020