PRNet: A Priori Embedded Network for Real-World Blind Micro-Expression Recognition
Abstract
:1. Introduction
- A prior network framework is proposed for micro-expression generation to capture the feature distribution of micro-expression datasets. This framework fine-tunes facial micro-expression recognition networks by leveraging facial structure and expression information, thus overcoming facial micro-expression degradation in real-world scenarios. Moreover, its versatility enables applications across various micro-expression recognition networks.
- A multi-scale dynamic cross-domain module, MSCD, is employed to efficiently reconstruct features for integration into micro-expression recognition tasks. It dynamically adjusts the features of the reconstruction task to ensure their effective transmission to the representation layer of subsequent recognition tasks.
- In this study, we propose a novel method that contributes significantly to micro-expression recognition. It attains top-tier results on multiple standard benchmark datasets like CASME II, CASME3, SAMM, and SMIC, outperforming many existing approaches. Additionally, its remarkable effectiveness in complex degradation scenarios showcases its unique value. This validates the method’s superiority in handling real-world challenges.
2. Related Works
2.1. Face Reconstruction
2.2. Micro-Expression Recognition
2.3. Generative Model
3. Materials and Methods
3.1. Materials
3.2. Network Architecture
3.2.1. Micro-Expression Prior Generation Model
3.2.2. Multi-Scale Cross-Domain Information Bridging
3.2.3. Overview Pipeline
Algorithm 1 Algorithm of PRNet |
|
4. Experiment and Results
4.1. Dataset
4.2. Experiment Details
5. Ablation Experiment
5.1. Standard Experimental Comparison
5.2. Degraded Micro-Expression Recognition
5.3. Real-World Face Recognition
6. Discussion and Limitations
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Zhang, F.; Zhang, T.; Mao, Q.; Xu, C. Joint pose and expression modeling for facial expression recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 3359–3368. [Google Scholar]
- Ekman, P.; Friesen, W.V. Nonverbal leakage and clues to deception. Psychiatry 1969, 32, 88–106. [Google Scholar] [CrossRef]
- Ekman, P. Telling Lies: Clues to Deceit in the Marketplace, Politics, and Marriage (Revised Edition); WW Norton & Company: New York, NY, USA, 2009. [Google Scholar]
- Li, G.; Shi, J.; Peng, J.; Zhao, G. Micro-expression recognition under low-resolution cases. In Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications-Volume 5: VISAPP, Prague, Czech Republic, 25–27 February 2019; Science and Technology Publications: Setúbal, Portugal, 2019; pp. 427–434. [Google Scholar]
- Zhang, H.; Yang, J.; Zhang, Y.; Nasrabadi, N.M.; Huang, T.S. Close the loop: Joint blind image restoration and recognition with sparse representation prior. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 770–777. [Google Scholar]
- Wang, X.; Yu, K.; Wu, S.; Gu, J.; Liu, Y.; Dong, C.; Qiao, Y.; Change Loy, C. Esrgan: Enhanced super-resolution generative adversarial networks. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, Munich, Germany, 8–14 September 2018; pp. 63–79. [Google Scholar]
- Ma, C.; Jiang, Z.; Rao, Y.; Lu, J.; Zhou, J. Deep face super-resolution with iterative collaboration between attentive recovery and landmark estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 5569–5578. [Google Scholar]
- Chan, K.C.; Wang, X.; Xu, X.; Gu, J.; Loy, C.C. Glean: Generative latent bank for large-factor image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual Conference, 19–25 June 2021; pp. 14245–14254. [Google Scholar]
- Wang, X.; Xie, L.; Dong, C.; Shan, Y. Real-esrgan: Training real-world blind super-resolution with pure synthetic data. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual Conference, 11–17 October 2021; pp. 1905–1914. [Google Scholar]
- Zhang, Y.; Tian, Y.; Kong, Y.; Zhong, B.; Fu, Y. Residual dense network for image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 2472–2481. [Google Scholar]
- Li, X.; Cheng, Y.; Ren, X.; Jia, H.; Xu, D.; Zhu, W.; Yan, Y. Topo4D: Topology-Preserving Gaussian Splatting for High-fidelity 4D Head Capture. In Proceedings of the European Conference on Computer Vision, Milan, Italy, 29 September–4 October 2024; Springer: Berlin/Heidelberg, Germany; pp. 128–145. [Google Scholar]
- Jia, H.; Li, Y.; Cui, H.; Xu, D.; Wang, Y.; Yu, T. DisControlFace: Adding Disentangled Control to Diffusion Autoencoder for One-shot Explicit Facial Image Editing. arXiv 2023, arXiv:2312.06193. [Google Scholar]
- Tu, X.; Zhao, J.; Liu, Q.; Ai, W.; Guo, G.; Li, Z.; Liu, W.; Feng, J. Joint face image restoration and frontalization for recognition. IEEE Trans. Circuits Syst. Video Technol. 2021, 32, 1285–1298. [Google Scholar] [CrossRef]
- Liao, Y.; Lin, X. Blind image restoration with eigen-face subspace. IEEE Trans. Image Process. 2005, 14, 1766–1772. [Google Scholar] [CrossRef] [PubMed]
- Brox, T.; Malik, J. Large displacement optical flow: Descriptor matching in variational motion estimation. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 33, 500–513. [Google Scholar] [CrossRef]
- Sun, S.; Kuang, Z.; Sheng, L.; Ouyang, W.; Zhang, W. Optical flow guided feature: A fast and robust motion representation for video action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 1390–1399. [Google Scholar]
- Xu, L.; Jia, J.; Matsushita, Y. Motion detail preserving optical flow estimation. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 34, 1744–1757. [Google Scholar]
- Peng, M.; Wang, C.; Chen, T.; Liu, G.; Fu, X. Dual temporal scale convolutional neural network for micro-expression recognition. Front. Psychol. 2017, 8, 273835. [Google Scholar] [CrossRef]
- Liong, G.B.; See, J.; Wong, L.K. Shallow optical flow three-stream CNN for macro-and micro-expression spotting from long videos. In Proceedings of the 2021 IEEE International Conference on Image Processing (ICIP), Anchorage, AK, USA, 19–22 September 2021; pp. 2643–2647. [Google Scholar]
- Kumar, V.; Durst, F.; Ray, S. Modeling moving-boundary problems of solidification and melting adopting an arbitrary Lagrangian–Eulerian approach. Numer. Heat Transf. Part B Fundam. 2006, 49, 299–331. [Google Scholar] [CrossRef]
- Kaneko, T.; Harada, T. Blur, noise, and compression robust generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual Conference, 19–25 June 2021; pp. 13579–13589. [Google Scholar]
- Oh, T.; Lee, S. Blind sharpness prediction based on image-based motion blur analysis. IEEE Trans. Broadcast. 2015, 61, 1–15. [Google Scholar]
- Siarohin, A.; Lathuilière, S.; Tulyakov, S.; Ricci, E.; Sebe, N. First order motion model for image animation. Adv. Neural Inf. Process. Syst. 2019, 32, 641. [Google Scholar]
- Siarohin, A.; Lathuilière, S.; Tulyakov, S.; Ricci, E.; Sebe, N. Animating arbitrary objects via deep motion transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 2377–2386. [Google Scholar]
- Siarohin, A.; Woodford, O.J.; Ren, J.; Chai, M.; Tulyakov, S. Motion representations for articulated animation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual Conference, 19–25 June 2021; pp. 13653–13662. [Google Scholar]
- Lu, H.; Kpalma, K.; Ronsin, J. Motion descriptors for micro-expression recognition. Signal Process. Image Commun. 2018, 67, 108–117. [Google Scholar] [CrossRef]
- Wang, Y.; Wu, C.; Herranz, L.; Van de Weijer, J.; Gonzalez-Garcia, A.; Raducanu, B. Transferring gans: Generating images from limited data. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 218–234. [Google Scholar]
- Frégier, Y.; Gouray, J.B. Mind2mind: Transfer learning for gans. In Proceedings of the Geometric Science of Information: 5th International Conference, GSI 2021, Paris, France, 21–23 July 2021; Springer: Berlin/Heidelberg, Germany, 2021; pp. 851–859. [Google Scholar]
- Davison, A.K.; Lansley, C.; Costen, N.; Tan, K.; Yap, M.H. Samm: A spontaneous micro-facial movement dataset. IEEE Trans. Affect. Comput. 2016, 9, 116–129. [Google Scholar] [CrossRef]
- Li, X.; Pfister, T.; Huang, X.; Zhao, G.; Pietikäinen, M. A spontaneous micro-expression database: Inducement, collection and baseline. In Proceedings of the 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (fg), Shanghai, China, 22–26 April 2013; pp. 1–6. [Google Scholar]
- Yan, W.J.; Li, X.; Wang, S.J.; Zhao, G.; Liu, Y.J.; Chen, Y.H.; Fu, X. CASME II: An improved spontaneous micro-expression database and the baseline evaluation. PLoS ONE 2014, 9, e86041. [Google Scholar] [CrossRef] [PubMed]
- Li, J.; Dong, Z.; Lu, S.; Wang, S.J.; Yan, W.J.; Ma, Y.; Liu, Y.; Huang, C.; Fu, X. CAS (ME) 3: A third generation facial spontaneous micro-expression database with depth information and high ecological validity. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 2782–2800. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1097–1105. [Google Scholar] [CrossRef]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
- Liong, S.T.; Gan, Y.S.; See, J.; Khor, H.Q.; Huang, Y.C. Shallow triple stream three-dimensional cnn (ststnet) for micro-expression recognition. In Proceedings of the 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), Lille, France, 14–18 May 2019; pp. 1–5. [Google Scholar]
- Van Quang, N.; Chun, J.; Tokuyama, T. CapsuleNet for micro-expression recognition. In Proceedings of the 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), Lille, France, 14–18 May 2019; pp. 1–7. [Google Scholar]
- Zhou, L.; Mao, Q.; Xue, L. Dual-inception network for cross-database micro-expression recognition. In Proceedings of the 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), Lille, France, 14–18 May 2019; pp. 1–5. [Google Scholar]
- Xia, Z.; Peng, W.; Khor, H.Q.; Feng, X.; Zhao, G. Revealing the invisible with model and data shrinking for composite-database micro-expression recognition. IEEE Trans. Image Process. 2020, 29, 8590–8605. [Google Scholar] [CrossRef]
- Zhou, L.; Mao, Q.; Huang, X.; Zhang, F.; Zhang, Z. Feature refinement: An expression-specific feature learning and fusion method for micro-expression recognition. Pattern Recognit. 2022, 122, 108275. [Google Scholar] [CrossRef]
- Wang, Z.; Zhang, K.; Luo, W.; Sankaranarayana, R. HTNet for micro-expression recognition. arXiv 2023, arXiv:2307.14637. [Google Scholar] [CrossRef]
- Park, T.; Liu, M.Y.; Wang, T.C.; Zhu, J.Y. Semantic image synthesis with spatially-adaptive normalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 2337–2346. [Google Scholar]
- Bulat, A.; Tzimiropoulos, G. Super-fan: Integrated facial landmark localization and super-resolution of real-world low resolution faces in arbitrary poses with gans. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 109–117. [Google Scholar]
- Yang, L.; Wang, S.; Ma, S.; Gao, W.; Liu, C.; Wang, P.; Ren, P. Hifacegan: Face renovation via collaborative suppression and replenishment. In Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA, 12–16 October 2020; pp. 1551–1560. [Google Scholar]
Database | SAMM | CASME II | SMIC | CASME III |
---|---|---|---|---|
Subjects | 28 | 26 | 16 | 100 |
Samples | 133 | 145 | 164 | 943 |
Frame rate | 200 | 200 | 100 | 30 |
Cropped image resolution | 224 × 224 | 224 × 224 | 224 × 224 | 224 × 224 |
Negative | 92 | 88 | 70 | 508 |
Positive | 26 | 32 | 51 | 64 |
Surprise | 15 | 25 | 43 | 201 |
Onset index | ✓ | ✓ | ✓ | ✓ |
Offset index | ✓ | ✓ | ✓ | ✓ |
Apex index | ✓ | ✓ | ✓ |
Approaches | SMIC | CASME II | CASME III | SAMM | ||||
---|---|---|---|---|---|---|---|---|
UF1 | UAR | UF1 | UAR | UF1 | UAR | UF1 | UAR | |
AlexNet [33] | 0.6201 | 0.6373 | 0.7994 | 0.8312 | 0.2570 | 0.2634 | 0.6104 | 0.6642 |
GoogLeNet [34] | 0.5123 | 0.5511 | 0.5989 | 0.6414 | 0.2658 | 0.2713 | 0.5124 | 0.5992 |
VGG16 [35] | 0.5800 | 0.5964 | 0.8166 | 0.8202 | 0.3209 | 0.3400 | 0.4870 | 0.4793 |
Resnet50 [36] | 0.7251 | 0.7615 | 0.8249 | 0.8556 | 0.3491 | 0.3651 | 0.7260 | 0.7435 |
STSTNet [37] | 0.6801 | 0.7013 | 0.8382 | 0.8686 | 0.3795 | 0.3792 | 0.6588 | 0.6810 |
CapsuleNet [38] | 0.5820 | 0.5877 | 0.7068 | 0.7018 | 0.2478 | 0.2516 | 0.6209 | 0.5989 |
Dual-Inception [39] | 0.6645 | 0.6726 | 0.8621 | 0.8560 | 0.3844 | 0.4001 | 0.5868 | 0.5663 |
RCN [40] | 0.6326 | 0.6441 | 0.8621 | 0.8512 | 0.3928 | 0.3893 | 0.7601 | 0.6715 |
FeatRef [41] | 0.7011 | 0.7083 | 0.8915 | 0.8873 | 0.34938 | 0.3413 | 0.7372 | 0.7155 |
HTNet [42] | 0.8049 | 0.7905 | 0.9532 | 0.9516 | 0.5767 | 0.5415 | 0.8131 | 0.8124 |
PR-AlexNet | 0.6654 | 0.6691 | 0.8518 | 0.8487 | 0.3159 | 0.3209 | 0.6227 | 0.6542 |
PR-GoogLeNet | 0.6078 | 0.6128 | 0.6499 | 0.7011 | 0.3278 | 0.3317 | 0.5745 | 0.6218 |
PR-VGG16 | 0.6476 | 0.6550 | 0.8731 | 0.8639 | 0.3815 | 0.3843 | 0.6854 | 0.7032 |
PR-Resnet50 | 0.8257 | 0.8082 | 0.9625 | 0.9516 | 0.5892 | 0.5760 | 0.8328 | 0.8345 |
Approaches | SMIC | CASME II | CASME III | SAMM | ||||
---|---|---|---|---|---|---|---|---|
UF1 | UAR | UF1 | UAR | UF1 | UAR | UF1 | UAR | |
AlexNet [33] | 0.2340 | 0.3059 | 0.4496 | 0.4787 | 0.0837 | 0.1373 | 0.2242 | 0.3585 |
GoogLeNet [34] | 0.1962 | 0.3305 | 0.3594 | 0.3948 | 0.1049 | 0.1528 | 0.2150 | 0.3595 |
VGG16 [35] | 0.2350 | 0.3582 | 0.4699 | 0.4821 | 0.1083 | 0.1940 | 0.1858 | 0.3876 |
Resnet50 [36] | 0.3125 | 0.4568 | 0.4940 | 0.5314 | 0.1096 | 0.2190 | 0.2904 | 0.4457 |
STSTNet [37] | 0.2900 | 0.4208 | 0.4171 | 0.5011 | 0.1493 | 0.1746 | 0.2799 | 0.4084 |
CapsuleNet [38] | 0.2410 | 0.3585 | 0.3387 | 0.4307 | 0.0986 | 0.1506 | 0.2484 | 0.3395 |
Dual-Inception [39] | 0.2823 | 0.4028 | 0.3552 | 0.5136 | 0.1434 | 0.2200 | 0.2347 | 0.2265 |
RCN [40] | 0.2663 | 0.4021 | 0.3448 | 0.5101 | 0.1864 | 0.1946 | 0.3300 | 0.3358 |
FeatRef [41] | 0.3006 | 0.4342 | 0.4457 | 0.4910 | 0.1247 | 0.1707 | 0.3189 | 0.3577 |
HTNet [42] | 0.3524 | 0.4752 | 0.4766 | 0.5758 | 0.2883 | 0.2708 | 0.4065 | 0.4062 |
PR-AlexNet | 0.6432 | 0.63781 | 0.8108 | 0.8211 | 0.2942 | 0.3001 | 0.6135 | 0.6158 |
PR-GoogLeNet | 0.5902 | 0.6005 | 0.6172 | 0.6716 | 0.3018 | 0.3100 | 0.5471 | 0.5822 |
PR-VGG16 | 0.6342 | 0.6307 | 0.8555 | 0.8389 | 0.3643 | 0.3622 | 0.6611 | 0.6897 |
PR-Resnet50 | 0.8176 | 0.7937 | 0.9542 | 0.9479 | 0.5690 | 0.5689 | 0.8213 | 0.8235 |
Approaches | SMIC | CASME II | CASME III | SAMM | ||||
---|---|---|---|---|---|---|---|---|
Pix2PixHD [43] | 19.61 | 73.97 | 20.09 | 80.22 | 20.18 | 80.20 | 20.65 | 79.40 |
Super-FAN [44] | 20.67 | 132.11 | 21.21 | 142.27 | 21.10 | 140.26 | 22.34 | 133.92 |
HiFaceGAN [45] | 20.51 | 54.45 | 21.01 | 59.19 | 20.68 | 59.21 | 22.18 | 60.67 |
PRNet (ours) | 23.12 | 29.72 | 23.28 | 30.34 | 23.62 | 30.02 | 23.94 | 31.34 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, X.; Wang, F.; Zeng, H.; Chen, Y.; Zheng, L.; Chen, J. PRNet: A Priori Embedded Network for Real-World Blind Micro-Expression Recognition. Mathematics 2025, 13, 749. https://doi.org/10.3390/math13050749
Liu X, Wang F, Zeng H, Chen Y, Zheng L, Chen J. PRNet: A Priori Embedded Network for Real-World Blind Micro-Expression Recognition. Mathematics. 2025; 13(5):749. https://doi.org/10.3390/math13050749
Chicago/Turabian StyleLiu, Xin, Fugang Wang, Hui Zeng, Yile Chen, Liang Zheng, and Junming Chen. 2025. "PRNet: A Priori Embedded Network for Real-World Blind Micro-Expression Recognition" Mathematics 13, no. 5: 749. https://doi.org/10.3390/math13050749
APA StyleLiu, X., Wang, F., Zeng, H., Chen, Y., Zheng, L., & Chen, J. (2025). PRNet: A Priori Embedded Network for Real-World Blind Micro-Expression Recognition. Mathematics, 13(5), 749. https://doi.org/10.3390/math13050749