Fine-Grained Facial Expression Recognition in Multiple Smiles
Abstract
:1. Introduction
- We have created a dataset of human smile expressions, named Facial Expression Emotions, which contains six basic smile classes (i.e., guffaw, laugh, beaming smile, qualifier smile, polite smile, and contempt smile). The dataset can be used for further research on exploring the richness and diversity of human emotions.
- We have developed an image quality evaluation module that assigns different weights to different complex samples according to their image quality. Then it uses them in the dynamic weight loss function to dynamically adjust the weights using different image qualities and avoiding emphasizing unidentifiable images, while focusing on hard yet recognizable samples in the loss-function stage.
- We have proposed the Smile Transformer network, to enhance the local perception of the model and improve the accuracy of fine-grained FER, using the Swin Transformer as a backbone. A CBAM was designed to focus on important features of the face image and suppress unnecessary regional responses. The focus was placed on extracting strong expression correlation features and effectively suppressing background interference.
2. Related Work
3. Facial Expression Emotions Dataset
4. Methods
4.1. Data Preprocessing
4.2. Backbone Network
4.3. Dynamic Weight Loss Function
5. Results
5.1. Dataset
5.2. Implementation Details
5.3. Recognition Results
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Gunes, H.; Schuller, B. Categorical and dimensional affect analysis in continuous input: Current trends and future directions. Image Vis. Comput. 2013, 31, 120–136. [Google Scholar] [CrossRef]
- Mollahosseini, A.; Hasani, B.; Mahoor, M.H. Affectnet: A database for facial expression, valence, and arousal computing in the wild. IEEE Trans. Affect. Comput. 2017, 10, 18–31. [Google Scholar] [CrossRef] [Green Version]
- Kansizoglou, I.; Misirlis, E.; Tsintotas, K.; Gasteratos, A. Continuous Emotion Recognition for Long-Term Behavior Modeling through Recurrent Neural Networks. Technologies 2022, 10, 59. [Google Scholar] [CrossRef]
- Barsoum, E.; Zhang, C.; Ferrer, C.C.; Zhang, Z. Training deep networks for facial expression recognition with crowd-sourced label distribution. In Proceedings of the 18th ACM International Conference on Multimodal Interaction, Tokyo Japan, 12–16 November 2016; pp. 279–283. [Google Scholar]
- Li, S.; Deng, W.; Du, J.P. Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2852–2861. [Google Scholar]
- Ekman, P.; Friesen, W.V. Constants across cultures in the face and emotion. J. Personal. Soc. Psychol. 1971, 17, 124. [Google Scholar] [CrossRef] [Green Version]
- Ma, F.; Sun, B.; Li, S. Robust facial expression recognition with convolutional visual transformers. arXiv 2021, arXiv:2103.16854. [Google Scholar]
- Li, H.; Sui, M.; Zhao, F.; Zha, Z.; Wu, F. MVT: Mask vision transformer for facial expression recognition in the wild. arXiv 2021, arXiv:2106.04520. [Google Scholar]
- Wen, Z.; Lin, W.; Wang, T.; Xu, G. Distract your attention: Multi-head cross attention network for facial expression recognition. arXiv 2021, arXiv:2109.07270. [Google Scholar]
- Savchenko, A.V. Facial expression and attributes recognition based on multi-task learning of lightweight neural networks. In Proceedings of the 2021 IEEE 19th International Symposium on Intelligent Systems and Informatics (SISY), Subotica, Serbia, 16–18 September 2021; pp. 119–124. [Google Scholar]
- Vo, T.H.; Lee, G.S.; Yang, H.J.; Kim, S.H. Pyramid with super resolution for in-the-wild facial expression recognition. IEEE Access 2020, 8, 131988–132001. [Google Scholar] [CrossRef]
- Zhang, Y.; Wang, C.; Ling, X.; Deng, W. Learn from all: Erasing attention consistency for noisy label facial expression recognition. In European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer: Cham, Switzerland, 2022; pp. 418–434. [Google Scholar]
- Zhang, Y.; Wang, C.; Deng, W. Relative Uncertainty Learning for Facial Expression Recognition. Adv. Neural Inf. Process. Syst. 2021, 34, 17616–17627. [Google Scholar]
- Duchenne, G.B.; de Boulogne, G.B.D. The Mechanism of Human Facial Expression; Cambridge University Press: Cambridge, UK, 1990. [Google Scholar]
- Lucey, P.; Cohn, J.F.; Kanade, T.; Saragih, J.; Ambadar, Z. The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, San Francisco, CA, USA, 13–18 June 2010; pp. 94–101. [Google Scholar]
- Lyons, M.; Akamatsu, S.; Kamachi, M.; Gyoba, J. Coding facial expressions with gabor wavelets. In Proceedings of the Third IEEE International Conference on Automatic Face and Gesture Recognition, Nara, Japan, 14–16 April 1998; pp. 200–205. [Google Scholar]
- Wang, W.; Sun, Q.; Chen, T.; Cao, C.; Zheng, Z.; Xu, G.; Qiu, H.; Fu, Y. A fine-grained facial expression database for end-to-end multi-pose facial expression recognition. arXiv 2019, arXiv:1907.10838. [Google Scholar]
- Valstar, M.; Pantic, M. Induced disgust, happiness and surprise: An addition to the mmi facial expression database. In Proceedings of the 3rd International Workshop on EMOTION (Satellite of LREC): Corpora for Research on Emotion and Affect, Valletta, Malta, 23 May 2010; p. 65. [Google Scholar]
- Wang, K.; Peng, X.; Yang, J.; Meng, D.; Qiao, Y. Region attention networks for pose and occlusion robust facial expression recognition. IEEE Trans. Image Process. 2020, 29, 4057–4069. [Google Scholar] [CrossRef] [Green Version]
- Farzaneh, A.H.; Qi, X. Facial expression recognition in the wild via deep attentive center loss. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2021; pp. 2402–2411. [Google Scholar]
- Fard, A.P.; Mahoor, M.H. Ad-corre: Adaptive correlation-based loss for facial expression recognition in the wild. IEEE Access 2022, 10, 26756–26768. [Google Scholar] [CrossRef]
- Chen, Z.; Huang, D.; Wang, Y.; Chen, L. Fast and light manifold CNN based 3D facial expression recognition across pose variations. In Proceedings of the 26th ACM International Conference on Multimedia, Seoul, Republic of Korea, 22–26 October 2018; pp. 229–238. [Google Scholar]
- Kansizoglou, I.; Bampis, L.; Gasteratos, A. Deep feature space: A geometrical perspective. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 6823–6838. [Google Scholar] [CrossRef]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 10012–10022. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; Volume 1, pp. 886–893. [Google Scholar]
- Ojala, T.; Pietikainen, M.; Maenpaa, T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 971–987. [Google Scholar] [CrossRef]
- Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Kansizoglou, I.; Bampis, L.; Gasteratos, A. An active learning paradigm for online audio-visual emotion recognition. IEEE Trans. Affect. Comput. 2019, 13, 756–768. [Google Scholar] [CrossRef]
- Cai, J.; Meng, Z.; Khan, A.S.; Li, Z.; O’Reilly, J.; Han, S.; Liu, P.; Chen, M.; Tong, Y. Feature-level and model-level audiovisual fusion for emotion recognition in the wild. In Proceedings of the 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), San Jose, CA, USA, 28–30 March 2019; pp. 443–448. [Google Scholar]
- Liang, L.; Lang, C.; Li, Y.; Feng, S.; Zhao, J. Fine-grained facial expression recognition in the wild. IEEE Trans. Inf. Forensics Secur. 2020, 16, 482–494. [Google Scholar] [CrossRef]
- Parrott, W.G. Emotions in Social Psychology: Essential Readings; Psychology Press: Philadelphia, PA, USA, 2001. [Google Scholar]
- Wang, Y.; Sun, Y.; Huang, Y.; Liu, Z.; Gao, S.; Zhang, W.; Ge, W.; Zhang, W. FERV39k: A Large-Scale Multi-Scene Dataset for Facial Expression Recognition in Videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 20922–20931. [Google Scholar]
- Chen, K.; Yang, X.; Fan, C.; Zhang, W.; Ding, Y. Semantic-Rich Facial Emotional Expression Recognition. IEEE Trans. Affect. Comput. 2022, 13, 1906–1916. [Google Scholar] [CrossRef]
- Huang, Q.; Huang, C.; Wang, X.; Jiang, F. Facial expression recognition with grid-wise attention and visual transformer. Inf. Sci. 2021, 580, 35–54. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Trougakos, J.P.; Jackson, C.L.; Beal, D.J. Service without a smile: Comparing the consequences of neutral and positive display rules. J. Appl. Psychol. 2011, 96, 350. [Google Scholar] [CrossRef] [PubMed]
- Schroff, F.; Kalenichenko, D.; Philbin, J. FaceNET: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 815–823. [Google Scholar]
- Kazemi, V.; Sullivan, J. One millisecond face alignment with an ensemble of regression trees. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 1867–1874. [Google Scholar]
- Hornik, K.; Stinchcombe, M.; White, H. Multilayer feedforward networks are universal approximators. Neural Netw. 1989, 2, 359–366. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 5998–6008. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Loshchilov, I.; Hutter, F. Decoupled weight decay regularization. arXiv 2017, arXiv:1711.05101. [Google Scholar]
Related Work | Methods | Contributions |
---|---|---|
CK+ [15] | Active appearance models | Laboratory dataset |
JAFFE [16] | Gabor coding | Laboratory dataset |
RAF-DB [5] | Deep locality-preserving CNN | In-the-wild dataset |
FG-Emotions [31] | Multi-scale action unit-based | Fine-grained facial expression recognition in-the-wild dataset |
FERV 39k [33] | Four stage candidate clip generation and two-stage annotation workflow | Largescale multi-scene in-the-wild dataset |
135-class FER [34] | Pre-trained facial expression embedding and correlation-guided classification | Semantic-rich facial emotional expression recognition in-the-wild dataset |
HOG [26], LBP [27], SIFT [28] | Manual design filter | Extract features |
RAN [19] | Region attention networks and region generation | Solves real-world pose and occlusion problem |
DACL [20] | Sparse center loss and attention network | Improves generalization ability |
Ad-Corre [21] | Feature and mean discriminator, embedding discriminator. | Improves generalization ability |
FER-VT [35] | Low-level feature learning and high-level feature learning | Solves real-world pose and occlusion problem and improves generalization ability |
Dataset | Annotation | Training Set | Validation Set | Test Set | Total |
---|---|---|---|---|---|
Facial Expression Emotions | 6 fine-grained class | 7856 | 1572 | 1572 | 11,000 |
Model | Accuracy (%) |
---|---|
Swin Transformer [24] | 80.03% |
RAN [18] | 84.10% |
AD-Corre [21] | 84.86% |
DACL [20] | 85.34% |
FER-VT [35] | 86.36% |
Smile Transformer | 88.56% |
Image Quality Evaluation Module and Dynamic Weight Loss Function | CBAM | Facial Expression Emotions |
---|---|---|
× | × | 80.03% |
√ | × | 86.53% |
× | √ | 82.36% |
√ | √ | 88.56% |
Stage | Accuracy (%) |
---|---|
Stage 1 | 81.01% |
Stage 2 | 82.36% |
Stage 3 | 81.53% |
Stage 4 | 82.20% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jin, Z.; Zhang, X.; Wang, J.; Xu, X.; Xiao, J. Fine-Grained Facial Expression Recognition in Multiple Smiles. Electronics 2023, 12, 1089. https://doi.org/10.3390/electronics12051089
Jin Z, Zhang X, Wang J, Xu X, Xiao J. Fine-Grained Facial Expression Recognition in Multiple Smiles. Electronics. 2023; 12(5):1089. https://doi.org/10.3390/electronics12051089
Chicago/Turabian StyleJin, Zhijia, Xiaolu Zhang, Jie Wang, Xiaolin Xu, and Jiangjian Xiao. 2023. "Fine-Grained Facial Expression Recognition in Multiple Smiles" Electronics 12, no. 5: 1089. https://doi.org/10.3390/electronics12051089
APA StyleJin, Z., Zhang, X., Wang, J., Xu, X., & Xiao, J. (2023). Fine-Grained Facial Expression Recognition in Multiple Smiles. Electronics, 12(5), 1089. https://doi.org/10.3390/electronics12051089