A Hybrid Image Augmentation Technique for User- and Environment-Independent Hand Gesture Recognition Based on Deep Learning
Abstract
1. Introduction
- (a)
- Introduce a new model augmentation that combines geometric transformation, background, brightness, temperature, and blurriness difference to train deep learning networks to improve hand gesture recognition.
- (b)
- Offering a strategy change background using green screening as a data augmentation technique eliminates the need for manual object annotations. It enhances training performance with limited data by replacing the background using computer vision algorithms.
- (c)
- Depict the proposed green screen technique hand gesture dataset that can be used for training hand gesture recognition to know the effect of background distortion to simulate HGR in real-world uncontrolled environments.
- (d)
- Exploring the potential of classical augmentation techniques to generate unlimited data variations and maintain accuracy.
2. Materials and Methods
2.1. Material
2.1.1. Environmental Surrounding
2.1.2. Simulate Environment as Real World
Augmentation Technique
Geometric Transformation
Image Scaling
Image Rotation
Image Translation
Image Shearing
Image Flipping
Background Augmentation
Temperature
Blurriness
- (a)
- Simulation of Movement or Vibration: In real-world scenarios, images can often become blurred due to camera shake, hand movements, or object motion. By introducing elements of blurriness, we can simulate this effect and train the model to recognize images that might not always be perfectly sharp.
- (b)
- Variations in Lighting: Uneven or changing lighting conditions can make images less sharp. By applying blur, we can create variations in lighting in the training dataset, allowing the model to understand objects under different lighting conditions.
- (c)
- Distance Effects: Images taken from a distance are often blurry. By adding blur elements, we can depict distant objects more realistically, enabling the model to recognize these objects in real-life conditions.
- (d)
- Visual Uncertainty: Not all objects in an image will always be sharp in real-world situations. Some image elements may appear blurry or less sharp, and the model should be capable of identifying these objects in everyday conditions.
2.1.3. Dataset
Primary Dataset/Custom Dataset
Public Datasets/Secondary Dataset
Massey University (MU) HandImages American Sign Language (A.S.L)
Sebastian Marcel Static Hand Gesture Dataset
The NUS Hand Posture II
HG 14
2.1.4. Deep Learning and Neural Network
Convolutional Neural Network
Pre-Trained Neural Network
2.1.5. Evaluation Metric
2.2. Research Methodology
- (a)
- Integration of green screening technique to replace backgrounds for simulating diverse environments.
- (b)
- Analysis of the impact of hybrid image augmentation on Hand Gesture Recognition Accuracy:
- (c)
- Quantitative Assessment of Test Accuracy Post-Augmentations with a Public Dataset.
- (d)
- Exploration of the extent of Classical Augmentation for Generating Varied Data while Maintaining Accuracy
- (e)
- Investigation into the Contribution of Green Screen Dataset for Implementing Hybrid Image Augmentation.
3. Results
3.1. Experimental Setup
3.2. Programming Tools
- (a)
- Finally, a CNN model is built using the ResNet50 architecture as the base, with additional layers suitable for the hand gesture recognition task. Here’s a detailed explanation of each step in building a CNN model using the ResNet50 architecture:
- (b)
- Building the Base Model with ResNet50:- -
- ResNet50 is a Convolutional Neural Network (CNN) architecture developed by Microsoft Research. It consists of 50 layers (hence the “50” in the name) and has been proven highly effective in various computer vision tasks, especially image classification.
- -
- ResNet50 (include_top = False, weights = ‘imagenet’, input_shape = (224, 224, 3))”: This function creates the base ResNet50 model. The argument “include_top = False” indicates that the fully connected layers at the top of the model will not be included, allowing us to customize the top layers according to our task. “Weights = ‘imagenet’ initializes the model with weights learned from the “imagenet” dataset, enabling the model to have an initial understanding of various features present in images. The argument “input_shape = (224, 224, 3)” specifies the size and color channels (RGB) of the input images that the model will receive.
 
- (c)
- Setting Trainable Layers
- (d)
- Adding Additional Layers:- -
- After the base ResNet50 model, we add several additional layers on top of it to tailor the model to the hand gesture recognition task.
- -
- Flatten(): This layer flattens the output of the base model into one dimension. This is necessary because the subsequent Dense layers require input in the form of a one-dimensional vector.
- -
- Dense(512, activation = ‘relu’): This Dense layer consists of 512 neuron units with the ReLU activation function. Dense layers like this aim to learn more abstract feature representations from the image data.
- -
- Dropout(0.5): The Dropout layer is used to reduce overfitting by randomly deactivating some neuron units during the training process.
- -
- Dense(train_generator.num_classes, activation = ‘softmax’): The final Dense layer has the same number of neuron units as the number of classes in the training dataset, and it uses the softmax activation function to generate probabilities of possible classes.
 
- (e)
- Compiling the Model:- -
- After adding the additional layers, the model needs to be compiled before it can be used for the training process.
- -
- model.compile(optimizer = ‘adam’, loss = ‘categorical_crossentropy’,metrics = [‘accuracy’]): In this step, we specify the optimizer to be used (in this case, the Adam optimizer), the loss function appropriate for the multi-class classification task (categorical cross-entropy), and the metrics to be monitored during training (in this case, accuracy).
 
3.3. Data Preparation
- (a)
- Image Resizing: All images in our dataset were resized to 224 × 224 pixels with three color channels (RGB). This step is essential to match the image format commonly used in Convolutional Neural Network (CNN) models like ResNet and Inception.
- (b)
- Pixel Normalization: We performed pixel normalization to ensure that pixel values have a uniform scale. This allows the model to understand patterns without being affected by variations in pixel values.
- (c)
- Data Augmentation: Data augmentation techniques enhanced the dataset’s variability. This includes image rotation, horizontal and vertical shifts, shear transformations, and image zooming. Additionally, geometric-based data augmentation, brightness, temperature adjustments, and image blurring were applied to some images. These techniques enrich the dataset with more variations to help the model understand sign language gestures better.
- (d)
- Data Duplication: Before we replaced the background, each image in our training dataset was duplicated 1, 10, 20, or even 30 times. This duplication served a specific purpose: significantly increasing the volume of training data and introducing a more comprehensive array of variations into our dataset. By duplicating the images, we took an essential initial step before altering the background, ensuring our dataset was diverse. This approach was imperative given the relatively small size of our original training data.
3.4. Experimental Results
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Sun, J.-H.; Ji, T.-T.; Zhang, S.-B.; Yang, J.-K.; Ji, G.-R. Research on the Hand Gesture Recognition Based on Deep Learning. In Proceedings of the 2018 12th International Symposium on Antennas, Propagation and EM Theory (ISAPE), Hangzhou, China, 3–6 December 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–4. [Google Scholar]
- Oudah, M.; Al-Naji, A.; Chahl, J. Hand Gesture Recognition Based on Computer Vision: A Review of Techniques. J. Imaging 2020, 6, 73. [Google Scholar] [CrossRef]
- Muthu Mariappan, H.; Gomathi, V. Real-Time Recognition of Indian Sign Language. In Proceedings of the ICCIDS 2019—2nd International Conference on Computational Intelligence in Data Science, Chennai, India, 21–23 February 2019; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA; Department of Computer Science and Engineering, National Engineering College: Kovilpatti, India, 2019. [Google Scholar]
- Makarov, I.; Veldyaykin, N.; Chertkov, M.; Pokoev, A. Russian Sign Language Dactyl Recognition. In Proceedings of the 2019 42nd International Conference on Telecommunications and Signal Processing, TSP 2019, Budapest, Hungary, 1–3 July 2019; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA; National Research University Higher School of Economics: Moscow, Russia, 2019; pp. 726–729. [Google Scholar]
- Žemgulys, J.; Raudonis, V.; Maskeliūnas, R.; Damaševičius, R. Recognition of Basketball Referee Signals from Real-Time Videos. J. Ambient. Intell. Humaniz. Comput. 2020, 11, 979–991. [Google Scholar] [CrossRef]
- Kong, L.; Huang, D.; Qin, J.; Wang, Y. A Joint Framework for Athlete Tracking and Action Recognition in Sports Videos. IEEE Trans. Circuits Syst. Video Technol. 2020, 30, 532–548. [Google Scholar] [CrossRef]
- Carfi, A.; Motolese, C.; Bruno, B.; Mastrogiovanni, F. Online Human Gesture Recognition Using Recurrent Neural Networks and Wearable Sensors. In Proceedings of the 2018 27th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), Nanjing, China, 27–31 August 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 188–195. [Google Scholar]
- Park, S.; Kim, D. Study on 3D Action Recognition Based on Deep Neural Network. In Proceedings of the 2019 International Conference on Electronics, Information, and Communication (ICEIC), Auckland, New Zealand, 22–25 January 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–3. [Google Scholar]
- Badave, H.; Kuber, M. Head Pose Estimation Based Robust Multicamera Face Recognition. In Proceedings of the 2021 International Conference on Artificial Intelligence and Smart Systems (ICAIS), Coimbatore, India, 25–27 March 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 492–495. [Google Scholar]
- Liaqat, S.; Dashtipour, K.; Arshad, K.; Assaleh, K.; Ramzan, N. A Hybrid Posture Detection Framework: Integrating Machine Learning and Deep Neural Networks. IEEE Sens. J. 2021, 21, 9515–9522. [Google Scholar] [CrossRef]
- Wang, Y.; Liu, J. A Self-Developed Smart Wristband to Monitor Exercise Intensity and Safety in Physical Education Class. In Proceedings of the Proceedings—2019 8th International Conference of Educational Innovation through Technology, EITT 2019, Biloxi, MS, USA, 27–31 October 2019; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2019; pp. 160–164. [Google Scholar]
- Caviedes, J.E.; Li, B.; Jammula, V.C. Wearable Sensor Array Design for Spine Posture Monitoring during Exercise Incorporating Biofeedback. IEEE Trans. Biomed. Eng. 2020, 67, 2828–2838. [Google Scholar] [CrossRef]
- Arathi, P.N.; Arthika, S.; Ponmithra, S.; Srinivasan, K.; Rukkumani, V. Gesture Based Home Automation System. In Proceedings of the 2017 International Conference On Nextgen Electronic Technologies: Silicon to Software, ICNETS2 2017, Chennai, India, 23–25 March 2017; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA; Department of Electronics and Instrumentation Engineering, Sri Ramakrishna Engineering College: Coimbatore, India, 2017; pp. 198–201. [Google Scholar]
- Abraham, L.; Urru, A.; Normani, N.; Wilk, M.P.; Walsh, M.; O’flynn, B. Hand Tracking and Gesture Recognition Using Lensless Smart Sensors. Sensors 2018, 18, 2834. [Google Scholar] [CrossRef]
- Nascimento, T.H.; Soares, F.A.A.M.N.; Nascimento, H.A.D.; Vieira, M.A.; Carvalho, T.P.; de Miranda, W.F. Netflix Control Method Using Smartwatches and Continuous Gesture Recognition. In Proceedings of the 2019 IEEE Canadian Conference of Electrical and Computer Engineering (CCECE), Edmonton, AB, Canada, 5–8 May 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–4. [Google Scholar]
- Ahmed, S.; Cho, S.H. Hand Gesture Recognition Using an IR-UWB Radar with an Inception Module-Based Classifier. Sensors 2020, 20, 564. [Google Scholar] [CrossRef]
- Lee, C.; Kim, J.; Cho, S.; Kim, J.; Yoo, J.; Kwon, S. Development of Real-Time Hand Gesture Recognition for Tabletop Holographic Display Interaction Using Azure Kinect. Sensors 2020, 20, 4566. [Google Scholar] [CrossRef] [PubMed]
- Ekneling, S.; Sonestedt, T.; Georgiadis, A.; Yousefi, S.; Chana, J. Magestro: Gamification of the Data Collection Process for Development of the Hand Gesture Recognition Technology. In Proceedings of the Adjunct Proceedings—2018 IEEE International Symposium on Mixed and Augmented Reality, ISMAR-Adjunct 2018, Munich, Germany, 16–20 October 2018; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA; Department of Computer and Systems Sciences, Stockholm University: Stockholm, Sweden, 2018; pp. 417–418. [Google Scholar]
- Bai, Z.; Wang, L.; Zhou, S.; Cao, Y.; Liu, Y.; Zhang, J. Fast Recognition Method of Football Robot’s Graphics from the VR Perspective. IEEE Access 2020, 8, 161472–161479. [Google Scholar] [CrossRef]
- Nooruddin, N.; Dembani, R.; Maitlo, N. HGR: Hand-Gesture-Recognition Based Text Input Method for AR/VR Wearable Devices. In Proceedings of the Conference Proceedings—IEEE International Conference on Systems, Man and Cybernetics, Toronto, ON, Canada, 11–14 October 2020; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2020; pp. 744–751. [Google Scholar]
- Mezari, A.; Maglogiannis, I. An Easily Customized Gesture Recognizer for Assisted Living Using Commodity Mobile Devices. J. Healthc. Eng. 2018, 2018, 3180652. [Google Scholar] [CrossRef]
- Roberge, A.; Bouchard, B.; Maître, J.; Gaboury, S. Hand Gestures Identification for Fine-Grained Human Activity Recognition in Smart Homes. Procedia Comput. Sci. 2022, 201, 32–39. [Google Scholar] [CrossRef]
- Kaczmarek, W.; Panasiuk, J.; Borys, S.; Banach, P. Industrial Robot Control by Means of Gestures and Voice Commands in Off-Line and On-Line Mode. Sensors 2020, 20, 6358. [Google Scholar] [CrossRef] [PubMed]
- Neto, P.; Simão, M.; Mendes, N.; Safeea, M. Gesture-Based Human-Robot Interaction for Human Assistance in Manufacturing. Int. J. Adv. Manuf. Technol. 2019, 101, 119–135. [Google Scholar] [CrossRef]
- Young, G.; Milne, H.; Griffiths, D.; Padfield, E.; Blenkinsopp, R.; Georgiou, O. Designing Mid-Air Haptic Gesture Controlled User Interfaces for Cars. Proc. ACM Hum. Comput. Interact 2020, 4, 1–23. [Google Scholar] [CrossRef]
- Archived: WHO Timeline—COVID-19. Available online: https://www.who.int/news/item/27-04-2020-who-timeline---covid-19 (accessed on 25 October 2023).
- Katti, J.; Kulkarni, A.; Pachange, A.; Jadhav, A.; Nikam, P. Contactless Elevator Based on Hand Gestures during COVID-19 like Pandemics. In Proceedings of the 2021 7th International Conference on Advanced Computing and Communication Systems, ICACCS 2021, Coimbatore, India, 19–20 March 2021; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA; Pimpri Chinchwad College of Engineering: Maharashtra, India, 2021; pp. 672–676. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Fei-Fei, L. ImageNet: A Large-Scale Hierarchical Image Database. In Proceedings of the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPR Workshops), Miami, FL, USA, 20–25 June 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 248–255. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2015, arXiv:1409.1556. [Google Scholar]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 2818–2826. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 770–778. [Google Scholar]
- Shafiq, M.; Gu, Z. Deep Residual Learning for Image Recognition: A Survey. Appl. Sci. 2022, 12, 8972. [Google Scholar] [CrossRef]
- Khosla, C.; Saini, B.S. Enhancing Performance of Deep Learning Models with Different Data Augmentation Techniques: A Survey. In Proceedings of the International Conference on Intelligent Engineering and Management, ICIEM 2020, London, UK, 17–19 June 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 79–85. [Google Scholar]
- Mikolajczyk, A.; Grochowski, M. Data Augmentation for Improving Deep Learning in Image Classification Problem. In Proceedings of the 2018 International Interdisciplinary PhD Workshop (IIPhDW), Swinoujscie, Poland, 9–12 May 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 117–122. [Google Scholar]
- Kaur, P.; Khehra, B.S.; Mavi, E.B.S. Data Augmentation for Object Detection: A Review. In Proceedings of the 2021 IEEE International Midwest Symposium on Circuits and Systems (MWSCAS), Lansing, MI, USA, 9–11 August 2021; pp. 537–543. [Google Scholar] [CrossRef]
- Leevy, J.L.; Khoshgoftaar, T.M.; Bauder, R.A.; Seliya, N. A Survey on Addressing High-Class Imbalance in Big Data. J. Big Data 2018, 5, 42. [Google Scholar] [CrossRef]
- Shukla, P.; Bhowmick, K. To Improve Classification of Imbalanced Datasets. In Proceedings of the 2017 International Conference on Innovations in Information, Embedded and Communication Systems, ICIIECS 2017, Coimbatore, India, 17–18 March 2017; pp. 1–5. [Google Scholar] [CrossRef]
- Shorten, C.; Khoshgoftaar, T.M. A Survey on Image Data Augmentation for Deep Learning. J. Big Data 2019, 6, 60. [Google Scholar] [CrossRef]
- Mohamed, N.; Mustafa, M.B.; Jomhari, N. A Review of the Hand Gesture Recognition System: Current Progress and Future Directions. IEEE Access 2021, 9, 157422–157436. [Google Scholar] [CrossRef]
- Lim, K.M.; Tan, A.W.C.; Tan, S.C. A Feature Covariance Matrix with Serial Particle Filter for Isolated Sign Language Recognition. Expert Syst. Appl. 2016, 54, 208–218. [Google Scholar] [CrossRef]
- Farahanipad, F.; Rezaei, M.; Nasr, M.S.; Kamangar, F.; Athitsos, V. A Survey on GAN-Based Data Augmentation for Hand Pose Estimation Problem. Technologies 2022, 10, 43. [Google Scholar] [CrossRef]
- Sharma, S.; Singh, S. Vision-Based Hand Gesture Recognition Using Deep Learning for the Interpretation of Sign Language. Expert Syst. Appl. 2021, 182, 115657. [Google Scholar] [CrossRef]
- Kandel, I.; Castelli, M.; Manzoni, L. Brightness as an Augmentation Technique for Image Classification. Emerg. Sci. J. 2022, 6, 881–892. [Google Scholar] [CrossRef]
- Islam, M.Z.; Hossain, M.S.; Ul Islam, R.; Andersson, K. Static Hand Gesture Recognition Using Convolutional Neural Network with Data Augmentation. In Proceedings of the 2019 Joint 8th International Conference on Informatics, Electronics and Vision, ICIEV 2019 and 3rd International Conference on Imaging, Vision and Pattern Recognition, icIVPR 2019 with International Conference on Activity and Behavior Computing, ABC 2019, Spokane, WA, USA, 30 May–2 June 2019; pp. 324–329. [Google Scholar] [CrossRef]
- Bousbai, K.; Merah, M. Hand Gesture Recognition Using Capabilities of Capsule Network and Data Augmentation. In Proceedings of the 2022 7th International Conference on Image and Signal Processing and Their Applications, ISPA 2022—Proceedings, Mostaganem, Algeria, 8–9 May 2022; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA; Mostaganem University, Elctronics and Embededd Systems, Department of Electrical Engineering: Mostaganem, Algeria, 2022. [Google Scholar]
- Alani, A.A.; Cosma, G.; Taherkhani, A.; McGinnity, T.M. Hand Gesture Recognition Using an Adapted Convolutional Neural Network with Data Augmentation. In Proceedings of the 2018 4th International Conference on Information Management (ICIM), Oxford, UK, 25–27 May 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 5–12. [Google Scholar]
- Zhou, W.; Chen, K. A Lightweight Hand Gesture Recognition in Complex Backgrounds. Displays 2022, 74, 102226. [Google Scholar] [CrossRef]
- Luo, Y.; Cui, G.; Li, D. An Improved Gesture Segmentation Method for Gesture Recognition Based on CNN and YCbCr. J. Electr. Comput. Eng. 2021, 2021, 1783246. [Google Scholar] [CrossRef]
- Fadillah Rahmat, R.; Chairunnisa, T.; Gunawan, D.; Fermi Pasha, M.; Budiarto, R. Hand gestures recognition with improved skin color segmentation in human-computer interaction applications. J. Theor. Appl. Inf. Technol. 2019, 97, 727–739. [Google Scholar]
- Yao, Y.; Li, C.T. Hand Gesture Recognition and Spotting in Uncontrolled Environments Based on Classifier Weighting. In Proceedings of the International Conference on Image Processing, ICIP 2015, Quebec City, QC, Canada, 27–30 September 2015; pp. 3082–3086. [Google Scholar] [CrossRef]
- Yang, F.; Shi, H. Research on Static Hand Gesture Recognition Technology for Human Computer Interaction System. In Proceedings of the 2016 International Conference on Intelligent Transportation, Big Data and Smart City, ICITBS 2016, Changsha, China, 17–18 December 2016; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2017; pp. 459–463. [Google Scholar]
- Vasiljevic, I.; Chakrabarti, A.; Shakhnarovich, G. Examining the Impact of Blur on Recognition by Convolutional Networks. arXiv 2016, arXiv:1611.05760. [Google Scholar]
- Salunke, T.P.; Bharkad, S.D. Power Point Control Using Hand Gesture Recognition Based on HOG Feature Extraction and K-Nn Classification. In Proceedings of the International Conference on Computing Methodologies and Communication, ICCMC 2017, Erode, India, 18–19 July 2017; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA; Dept. of EandTC Engineering, Government College of Engineering: Aurangabad, India, 2018; pp. 1151–1155. [Google Scholar]
- Chanu, O.R.; Pillai, A.; Sinha, S.; Das, P. Comparative Study for Vision Based and Data Based Hand Gesture Recognition Technique. In Proceedings of the ICCT 2017—International Conference on Intelligent Communication and Computational Techniques, Jaipur, India, 22–23 December 2017; pp. 26–31. [Google Scholar] [CrossRef]
- Flores, C.J.L.; Cutipa, A.E.G.; Enciso, R.L. Application of Convolutional Neural Networks for Static Hand Gestures Recognition under Different Invariant Features. In Proceedings of the 2017 IEEE 24th International Congress on Electronics, Electrical Engineering and Computing, INTERCON 2017, Cusco, Peru, 15–18 August 2017; pp. 5–8. [Google Scholar] [CrossRef]
- Bao, P.; Maqueda, A.I.; Del-Blanco, C.R.; Garciá, N. Tiny Hand Gesture Recognition without Localization via a Deep Convolutional Network. IEEE Trans. Consum. Electron. 2017, 63, 251–257. [Google Scholar] [CrossRef]
- Qiao, Y.; Feng, Z.; Zhou, X.; Yang, X. Principle Component Analysis Based Hand Gesture Recognition for Android Phone Using Area Features. In Proceedings of the 2017 2nd International Conference on Multimedia and Image Processing, ICMIP 2017, Wuhan, China, 17–19 March 2017; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA; School of Information Science and Engineering, University of Jinan: Jinan, China, 2017; pp. 108–112. [Google Scholar]
- Kadethankar, A.A.; Joshi, A.D. Dynamic Hand Gesture Recognition Using Kinect. In Proceedings of the 2017 Innovations in Power and Advanced Computing Technologies, i-PACT 2017, Vellore, India, 21–22 April 2017; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA; Electronics and Telecommunication, Shri Guru Gobind Singhji Institute of Engg. and Tech.: Maharashtra, India, 2017; pp. 1–3. [Google Scholar]
- Abdul-Rashid, H.M.; Kiran, L.; Mirrani, M.D.; Maraaj, M.N. CMSWVHG-Control MS Windows via Hand Gesture. In Proceedings of the Proceedings of 2017 International Multi-Topic Conference, INMIC 2017, Lahore, Pakistan, 24–26 November 2017; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA; National University of Computer and Emerging Sciences, FAST-NU: Islamabad, Pakistan, 2018; pp. 1–7. [Google Scholar]
- Zhang, Y.; Cao, C.; Cheng, J.; Lu, H. EgoGesture: A New Dataset and Benchmark for Egocentric Hand Gesture Recognition. IEEE Trans. Multimed. 2018, 20, 1038–1050. [Google Scholar] [CrossRef]
- He, Y.; Yang, J.; Shao, Z.; Li, Y. Salient Feature Point Selection for Real Time RGB-D Hand Gesture Recognition. In Proceedings of the 2017 IEEE International Conference on Real-Time Computing and Robotics, RCAR 2017, Okinawa, Japan, 14–18 July 2017; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA; School of Urban Rail Transportation, Soochow University: Suzhou, China, 2017; pp. 103–108. [Google Scholar]
- Sachara, F.; Kopinski, T.; Gepperth, A.; Handmann, U. Free-Hand Gesture Recognition with 3D-CNNs for in-Car Infotainment Control in Real-Time. In Proceedings of the IEEE Conference on Intelligent Transportation Systems, Proceedings, ITSC, Yokohama, Japan, 16–19 October 2017; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA; Computer Science Institute, Hochschule Ruhr West: Bottrop, Germany, 2018; pp. 959–964. [Google Scholar]
- Ahmed, W.; Chanda, K.; Mitra, S. Vision Based Hand Gesture Recognition Using Dynamic Time Warping for Indian Sign Language. In Proceedings of the 2016 International Conference on Information Science, ICIS 2016, Kochi, India, 12–13 August 2016; pp. 120–125. [Google Scholar] [CrossRef]
- Kane, L.; Khanna, P. Vision-Based Mid-Air Unistroke Character Input Using Polar Signatures. IEEE Trans. Hum. Mach. Syst. 2017, 47, 1077–1088. [Google Scholar] [CrossRef]
- Raditya, C.; Rizky, M.; Mayranio, S.; Soewito, B. The Effectivity of Color for Chroma-Key Techniques. Procedia Comput. Sci. 2021, 179, 281–288. [Google Scholar] [CrossRef]
- Zhi, J. An Alternative Green Screen Keying Method for Film Visual Effects. Int. J. Multimed. Its Appl. 2015, 7, 1–12. [Google Scholar] [CrossRef]
- Sengupta, S.; Jayaram, V.; Curless, B.; Seitz, S.; Kemelmacher-Shlizerman, I. Background Matting: The World Is Your Green Screen. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 2288–2297. [Google Scholar] [CrossRef]
- Barczak, A.L.C.; Reyes, N.H.; Abastillas, M.; Piccio, A.; Susnjak, T. A New 2D Static Hand Gesture Colour Image Dataset for ASL Gestures; Massey University: Palmerston North, New Zealand, 2011; Volume 15, Available online: https://mro.massey.ac.nz/server/api/core/bitstreams/09187662-5ebe-4563-8515-3d7e5e1d2a33/content (accessed on 2 February 2023).
- Marcel, S. Hand Posture Recognition in a Body-Face Centered Space. In CHI’99 Extended Abstracts on Human Factors in Computing Systems; Association for Computing Machinery: New York, NY, USA, 1999. [Google Scholar]
- Pisharady, P.K.; Vadakkepat, P.; Loh, A.P. Attention Based Detection and Recognition of Hand Postures against Complex Backgrounds. Int. J. Comput. Vis. 2013, 101, 403–419. [Google Scholar] [CrossRef]
- Güler, O.; Yücedağ, İ. Hand Gesture Recognition from 2D Images by Using Convolutional Capsule Neural Networks. Arab. J. Sci. Eng. 2022, 47, 1211–1225. [Google Scholar] [CrossRef]
- Alzubaidi, L.; Zhang, J.; Humaidi, A.J.; Al-Dujaili, A.; Duan, Y.; Al-Shamma, O.; Santamaría, J.; Fadhel, M.A.; Al-Amidie, M.; Farhan, L. Review of Deep Learning: Concepts, CNN Architectures, Challenges, Applications, Future Directions. J. Big Data 2021, 8, 53. [Google Scholar] [CrossRef] [PubMed]
- Agarap, A.F. Deep Learning Using Rectified Linear Units (ReLU). arXiv 2018, arXiv:1803.08375. [Google Scholar]
- Subburaj, S.; Murugavalli, S. Survey on Sign Language Recognition in Context of Vision-Based and Deep Learning. Meas. Sens. 2022, 23, 100385. [Google Scholar] [CrossRef]
- Srivastava, N.; Hinton, G.; Krizhevsky, A.; Salakhutdinov, R. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
- Poojary, R.; Pai, A. Comparative Study of Model Optimization Techniques in Fine-Tuned CNN Models. In Proceedings of the 2019 International Conference on Electrical and Computing Technologies and Applications, ICECTA 2019, Ras Al Khaimah, United Arab Emirates, 19–21 November 2019; pp. 2–5. [Google Scholar] [CrossRef]
- Ozdemir, M.A.; Kisa, D.H.; Guren, O.; Onan, A.; Akan, A. EMG Based Hand Gesture Recognition Using Deep Learning. In Proceedings of the TIPTEKNO 2020—Tip Teknolojileri Kongresi—2020 Medical Technologies Congress, TIPTEKNO 2020, Antalya, Turkey, 19–20 November 2020. [Google Scholar] [CrossRef]
- Theckedath, D.; Sedamkar, R.R. Detecting Affect States Using VGG16, ResNet50 and SE-ResNet50 Networks. SN Comput. Sci. 2020, 1, 79. [Google Scholar] [CrossRef]
- Esi Nyarko, B.N.; Bin, W.; Zhou, J.; Agordzo, G.K.; Odoom, J.; Koukoyi, E. Comparative Analysis of AlexNet, Resnet-50, and Inception-V3 Models on Masked Face Recognition. In Proceedings of the 2022 IEEE World AI IoT Congress, AIIoT 2022, Seattle, WA, USA, 6–9 June 2022; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2022; pp. 337–343. [Google Scholar]
- Hossain, B.; Sazzad, S.M.H.; Islam, M.; Akhtar, N.; Aziz, A.; Attique, M.; Tariq, U.; Nam, Y.; Nazir, M.; Jeong, C.W.; et al. An Ensemble of Optimal Deep Learning Features for Brain Tumor Classification. In Proceedings of the 2019 International Conference on Electrical and Computing Technologies and Applications, ICECTA 2019, Ras Al Khaimah, United Arab Emirates, 19–21 November 2019; Volume 211, pp. 2–5. [Google Scholar] [CrossRef]
- Muslikhin, M.; Horng, J.R.; Yang, S.Y.; Wang, M.S.; Awaluddin, B.A. An Artificial Intelligence of Things-based Picking Algorithm for Online Shop in the Society 5.0′s Context. Sensors 2021, 21, 2813. [Google Scholar] [CrossRef] [PubMed]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2017, arXiv:1412.6980. [Google Scholar]


















| Category | Technique | Parameter Setting | Value/ Range | Direction | Description | 
|---|---|---|---|---|---|
| Geometric Transformation | Rotation | rotation_ range | 10° | Positive (Clockwise)/Negative (Counterclockwise) | Rotation of the image within the range of −10 degrees to +10 degrees. | 
| Translation | width_shift range | 0.1 | Positive (Rightward)/Negative (Leftward) | Shifting the image width within the range of −10% to +10% of the width. | |
| height_shift_ range | 0.1 | Positive (Downward)/Negative (Upward) | Shifting the image height within the range of −10% to +10% of the height. | ||
| Shearing | shear_range | 10° | Positive (Right Shear)/Negative (Left Shear) | Shearing the image within the range of −10 degrees to +10 degrees. | |
| Scaling | zoom_range | [1, 1.5] | Positive (Zoom In)/Negative (Zoom Out) | Scaling the image within the range of 1 to 1.5 times the original size. | |
| Flipping | Horizontal Flip | Enabled/ Disabled | Horizontal Flip | This enables horizontal flipping. It means the image can be horizontally flipped (resulting in a mirrored image). | |
| Brightness | Adjustment | brightness | 20 | Positive (Brightening)/Negative (Darkening) | Changing the brightness level within the range of −20 to +20. | 
| Temperature | Adjusting | temperature | 20 | Positive (Warming–Red)/Negative (Cooling–Blue) | Adjusting the image’s color temperature within the range of −20 to +20. | 
| Blurriness | Randomly | blurriness | Random | Blurring (No explicit direction) | Randomly adding blur to the image. | 
| Hardware/Software | Specification | 
|---|---|
| Processor (CPU) | Intel Core i5-9300H @2.40 GHz | 
| Memory (RAM) | 32 GB DDR4 | 
| Graphical Processing Unit (GPU) | Nvidia GTX 1660 Ti–6 GB vRAM | 
| Operating System | Windows 11 Home Edition | 
| Python version | 3.6.13 | 
| Cuda/CuDNN version | 11.0/8.0 | 
| Category | Name of Dataset | Number of Data | Number of Classes | Images Size | Image Background/Image Consist | 
|---|---|---|---|---|---|
| Public (As a Testing Data) | HG14 | 14,280 | 14 | 256 × 256 | uniform | 
| MU HandImages ASL (Digit 0–9) | 700 | 10 | vary | uniform | |
| MU HandImages ASL (Alphabet) | 3490 | 26 | vary | uniform | |
| Sebastian Marcel | 659 | 6 | vary | uniform & complex | |
| NUS-II | 2000 | 10 | 160 × 120 | complex | |
| Custom Dataset (using Green Screen B.G. for each Public Dataset as Training Data) | HG14 | 280 | 14 | 224 × 224 | greenscreen | 
| MU HandImages ASL (Digit 0–9) | 201 | 10 | 224 × 224 | greenscreen | |
| MU HandImages ASL (Alphabet) | 781 | 26 | 224 × 224 | greenscreen | |
| Sebastian Marcel | 120 | 6 | 224 × 224 | greenscreen | |
| NUS-II | 210 | 10 | 224 × 224 | greenscreen | |
| Image Background for replacing B.G. Greenscreen | - | 90 | - | 400 × 320 | various outdoor scenarios, gradients, and different colors | 
| Duplication | Change Background | Change Background + Geometric | Change Background + Geometric + Brightness | Change Background + Geometric + Brightness + Temperature | Change Background + Geometric + Brightness + Temperature + Blur | 
|---|---|---|---|---|---|
| Original | 0.8027 | 0.8467 | 0.8346 | 0.8574 | 0.8741 | 
| 10× Original | 0.8801 | 0.8832 | 0.8923 | 0.9059 | 0.9272 | 
| 20× Original | 0.8270 | 0.8741 | 0.8771 | 0.8877 | 0.9241 | 
| 30× Original | 0.8528 | 0.8786 | 0.8816 | 0.8786 | 0.9272 | 
| Duplication | Change Background | Change Background + Geometric | Change Background + Geometric + Brightness | Change Background + Geometric + Brightness + Temperature | Change Background + Geometric + Brightness + Temperature + Blur | 
|---|---|---|---|---|---|
| Original | 0.8135 | 0.8290 | 0.8346 | 0.8535 | 0.8690 | 
| 10× Original | 0.7875 | 0.8905 | 0.8989 | 0.8830 | 0.9040 | 
| 20× Original | 0.7805 | 0.8915 | 0.8974 | 0.8895 | 0.9095 | 
| 30× Original | 0.8035 | 0.8924 | 0.8874 | 0.9059 | 0.9271 | 
| Duplication | Change Background | Change Background + Geometric | Change Background + Geometric + Brightness | Change Background + Geometric + Brightness + Temperature | Change Background + Geometric + Brightness + Temperature + Blur | 
|---|---|---|---|---|---|
| Original | 0.8043 | 0.8057 | 0.8657 | 0.8729 | 0.8929 | 
| 10× Original | 0.8229 | 0.8929 | 0.8857 | 0.9014 | 0.9157 | 
| 20× Original | 0.8057 | 0.8800 | 0.9157 | 0.9143 | 0.9071 | 
| 30× Original | 0.8086 | 0.8814 | 0.8986 | 0.9029 | 0.9000 | 
| Duplication | Change Background | Change Background + Geometric | Change Background + Geometric + Brightness | Change Background + Geometric + Brightness + Temperature | Change Background + Geometric + Brightness + Temperature + Blur | 
|---|---|---|---|---|---|
| Original | 0.6667 | 0.7499 | 0.7322 | 0.7140 | 0.7190 | 
| 10× Original | 0.6171 | 0.7658 | 0.8006 | 0.7548 | 0.7840 | 
| 20× Original | 0.6193 | 0.7427 | 0.7168 | 0.7669 | 0.7614 | 
| 30× Original | 0.6705 | 0.7680 | 0.7972 | 0.7801 | 0.7652 | 
| Duplication | Change Background | Change Background + Geometric | Change Background + Geometric + Brightness | Change Background + Geometric + Brightness + Temperature | Change Background + Geometric + Brightness + Temperature + Blur | 
|---|---|---|---|---|---|
| Original | 0.4852 | 0.4740 | 0.4999 | 0.4758 | 0.4880 | 
| 10× Original | 0.5202 | 0.4964 | 0.5705 | 0.5705 | 0.5682 | 
| 20× Original | 0.5124 | 0.4902 | 0.4932 | 0.5588 | 0.5604 | 
| 30× Original | 0.5030 | 0.5064 | 0.4889 | 0.5547 | 0.5599 | 
| Dataset | Change Background | Change Background + Geometric | Change Background + Geometric + Brightness | Change Background + Geometric + Brightness + Temperature | Change Background + Geometric + Brightness + Temperature + Blur | 
|---|---|---|---|---|---|
| Sebastian Marcel | 0.8407 | 0.8706 | 0.8714 | 0.8824 | 0.9131 | 
| NUS–II | 0.7962 | 0.8758 | 0.8796 | 0.8830 | 0.9024 | 
| Massey–digit | 0.8104 | 0.8650 | 0.8914 | 0.8979 | 0.9039 | 
| Massey–alphabet | 0.6434 | 0.7566 | 0.7617 | 0.7540 | 0.7574 | 
| HG14 | 0.5052 | 0.4918 | 0.5131 | 0.5400 | 0.5441 | 
| Total Average | 0.719175 | 0.77197 | 0.783445 | 0.79143 | 0.8042 | 
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. | 
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Awaluddin, B.-A.; Chao, C.-T.; Chiou, J.-S. A Hybrid Image Augmentation Technique for User- and Environment-Independent Hand Gesture Recognition Based on Deep Learning. Mathematics 2024, 12, 1393. https://doi.org/10.3390/math12091393
Awaluddin B-A, Chao C-T, Chiou J-S. A Hybrid Image Augmentation Technique for User- and Environment-Independent Hand Gesture Recognition Based on Deep Learning. Mathematics. 2024; 12(9):1393. https://doi.org/10.3390/math12091393
Chicago/Turabian StyleAwaluddin, Baiti-Ahmad, Chun-Tang Chao, and Juing-Shian Chiou. 2024. "A Hybrid Image Augmentation Technique for User- and Environment-Independent Hand Gesture Recognition Based on Deep Learning" Mathematics 12, no. 9: 1393. https://doi.org/10.3390/math12091393
APA StyleAwaluddin, B.-A., Chao, C.-T., & Chiou, J.-S. (2024). A Hybrid Image Augmentation Technique for User- and Environment-Independent Hand Gesture Recognition Based on Deep Learning. Mathematics, 12(9), 1393. https://doi.org/10.3390/math12091393
 
        

 
       