Learning-Based Text Image Quality Assessment with Texture Feature and Embedding Robustness
Abstract
:1. Introduction
- We are the first to propose the text image quality assessment framework from three perspectives: character-level-based evaluation, sharp texture feature, and embedding robustness.
- We propose a learning-based fine-grained, sharp, and recognizable text image quality assessment method. The Wasserstein distance between the intra-class and inter-class similarity distributions is used to evaluate embedding robustness. The multiscale Haralick feature reflects the degree and clarity of the texture feature of character region. A quality score network is designed under the label-free training manner to normalize the texture feature and output the quality score.
- Extensive experiments indicate that FSR-TIQA has significant discrimination for different quality text images on benchmarks and Textzoom. Meanwhile, our method shows good potential to analyze dataset distribution and guide dataset collection.
2. Analysis and Comparison on Related Work
2.1. Image Quality Assessment
2.2. Low-Quality Scene Text Image Recovery
2.3. Face Image Quality Assessment
3. Methodology
3.1. Data Preparing
3.2. Similarity Distribution Distance
Algorithm 1 Pseudocode of Similarity Distribution Distance |
|
3.3. Haralick Feature Extractor
3.4. Quality Score Network
4. Experiments
4.1. Experimental Setup
4.1.1. Datasets and STR System
4.1.2. Implementation Details
4.1.3. Evaluation Protocols
4.2. Results and Discussion
4.2.1. Results on Character Quality Score
4.2.2. Results on Benchmarks
4.2.3. Results on Textzoom
4.2.4. Results on Quality Score Distribution
4.2.5. Generalization on License Plate Datasets
4.3. Ablation Study on FSR-TIQA
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Shi, B.; Bai, X.; Yao, C. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 2298–2304. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Shi, B.; Yang, M.; Wang, X.; Lyu, P.; Yao, C.; Bai, X. Aster: An attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 41, 2035–2048. [Google Scholar] [CrossRef] [PubMed]
- Wang, T.; Zhu, Y.; Jin, L.; Luo, C.; Chen, X.; Wu, Y.; Wang, Q.; Cai, M. Decoupled attention network for text recognition. Aaai Conf. Artif. Intell. 2020, 34, 12216–12224. [Google Scholar] [CrossRef]
- Jia, Z.; Xu, S.; Mu, S.; Tao, Y.; Cao, S.; Chen, Z. IFR: Iterative Fusion Based Recognizer for Low Quality Scene Text Recognition. Chinese Conference on Pattern Recognition and Computer Vision (PRCV); Springer: Berlin/Heidelberg, Germany, 2021; pp. 180–191. [Google Scholar]
- Tao, Y.; Jia, Z.; Ma, R.; Xu, S. TRIG: Transformer-Based Text Recognizer with Initial Embedding Guidance. Electronics 2021, 10, 2780. [Google Scholar] [CrossRef]
- Wang, W.; Xie, E.; Liu, X.; Wang, W.; Liang, D.; Shen, C.; Bai, X. Scene text image super-resolution in the wild. European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2020; pp. 650–666. [Google Scholar]
- Fang, S.; Xie, H.; Wang, Y.; Mao, Z.; Zhang, Y. Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 7098–7107. [Google Scholar]
- Chen, J.; Li, B.; Xue, X. Scene Text Telescope: Text-Focused Scene Image Super-Resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 12026–12035. [Google Scholar]
- Ma, J.; Guo, S.; Zhang, L. Text Prior Guided Scene Text Image Super-resolution. arXiv 2021, arXiv:2106.15368. [Google Scholar]
- Nakaune, S.; Iizuka, S.; Fukui, K. Skeleton-Aware Text Image Super-Resolution; University of Tsukuba: Tsukuba, Japan, 2021. [Google Scholar]
- Chen, J.; Yu, H.; Ma, J.; Li, B.; Xue, X. Text Gestalt: Stroke-Aware Scene Text Image Super-Resolution. arXiv 2021, arXiv:2112.08171. [Google Scholar]
- Mou, Y.; Tan, L.; Yang, H.; Chen, J.; Liu, L.; Yan, R.; Huang, Y. PlugNet: Degradation Aware Scene Text Recognition Supervised by a Pluggable Super-Resolution Unit. In Proceedings of the 16th European Conference on Computer Vision (ECCV 2020), Glasgow, UK, 23–28 August 2020; pp. 1–17. [Google Scholar]
- Zhai, G.; Min, X. Perceptual image quality assessment: A survey. Sci. China Inf. Sci. 2020, 63, 211301. [Google Scholar] [CrossRef]
- Zhou, W.; Chen, Z. Deep multi-scale features learning for distorted image quality assessment. In Proceedings of the 2021 IEEE International Symposium on Circuits and Systems (ISCAS) IEEE, Daegu, Korea, 22–28 May 2021; pp. 1–5. [Google Scholar]
- Ou, F.Z.; Chen, X.; Zhang, R.; Huang, Y.; Li, S.; Li, J.; Li, Y.; Cao, L.; Wang, Y.G. Sdd-fiqa: Unsupervised face image quality assessment with similarity distribution distance. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 7670–7679. [Google Scholar]
- Terhorst, P.; Kolf, J.N.; Damer, N.; Kirchbuchner, F.; Kuijper, A. SER-FIQ: Unsupervised estimation of face image quality based on stochastic embedding robustness. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 5651–5660. [Google Scholar]
- Mittal, A.; Moorthy, A.K.; Bovik, A.C. No-Reference image quality assessment in the spatial domain. IEEE Trans. Image Process. 2012, 21, 4695–4708. [Google Scholar] [CrossRef] [PubMed]
- Zhou, W.; Chen, Z.; Li, W. Dual-Stream interactive networks for no-reference stereoscopic image quality assessment. IEEE Trans. Image Process. 2019, 28, 3946–3958. [Google Scholar] [CrossRef] [PubMed]
- Shen, W.; Ren, Q.; Liu, D.; Zhang, Q. Interpreting Representation Quality of DNNs for 3D Point Cloud Processing. Adv. Neural Inf. Process. Syst. 2021, 34, 1–14. [Google Scholar]
- Qiao, Z.; Zhou, Y.; Yang, D.; Zhou, Y.; Wang, W. Seed: Semantics enhanced encoder-decoder framework for scene text recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 13528–13537. [Google Scholar]
- Schroff, F.; Kalenichenko, D.; Philbin, J. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 815–823. [Google Scholar]
- Haralick, R.M.; Shanmugam, K.; Dinstein, I.H. Textural features for image classification. IEEE Trans. Syst. Man Cybern. 1973, 6, 610–621. [Google Scholar] [CrossRef] [Green Version]
- Xu, J.; Zhou, W.; Chen, Z. Blind omnidirectional image quality assessment with viewport oriented graph convolutional networks. IEEE Trans. Circuits Syst. Video Technol. 2020, 31, 1724–1737. [Google Scholar] [CrossRef]
- Zhou, W.; Jiang, Q.; Wang, Y.; Chen, Z.; Li, W. Blind quality assessment for image superresolution using deep two-stream convolutional networks. Inf. Sci. 2020, 528, 205–218. [Google Scholar] [CrossRef] [Green Version]
- Schlett, T.; Rathgeb, C.; Henniger, O.; Galbally, J.; Fierrez, J.; Busch, C. Face image quality assessment: A literature survey. ACM Computing Surveys (CSUR) 2021. [Google Scholar] [CrossRef]
- Gao, X.; Li, S.Z.; Liu, R.; Zhang, P. Standardization of face image sample quality. In Proceedings of the International Conference on Biometrics, Seoul, Korea, 27–29 August 2007; Springer: Berlin/Heidelberg, Germany, 2007; pp. 242–251. [Google Scholar]
- Wasnik, P.; Raja, K.B.; Ramachandra, R.; Busch, C. Assessing face image quality for smartphone based face recognition system. In Proceedings of the 2017 5th International Workshop on Biometrics and Forensics (IWBF) IEEE, Coventry, UK, 4–5 April 2017; pp. 1–6. [Google Scholar]
- Aggarwal, G.; Biswas, S.; Flynn, P.J.; Bowyer, K.W. Predicting performance of face recognition systems: An image characterization approach. In Proceedings of the CVPR 2011 WORKSHOPS, Colorado Springs, CO, USA, 20–25 June 2011; pp. 52–59. [Google Scholar]
- Meng, Q.; Zhao, S.; Huang, Z.; Zhou, F. Magface: A universal representation for face recognition and quality assessment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 14225–14234. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Otsu, N. A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man, Cybern. 1979, 9, 62–66. [Google Scholar] [CrossRef] [Green Version]
- Lucas, S.M.; Panaretos, A.; Sosa, L.; Tang, A.; Wong, S.; Young, R.; Ashida, K.; Nagai, H.; Okamoto, M.; Yamamoto, H.; et al. ICDAR 2003 robust reading competitions: Entries, results, and future directions. Int. J. Doc. Anal. Recognit. (IJDAR) 2005, 7, 105–122. [Google Scholar] [CrossRef]
- Karatzas, D.; Shafait, F.; Uchida, S.; Iwamura, M.; i Bigorda, L.G.; Mestre, S.R.; Mas, J.; Mota, D.F.; Almazan, J.A.; De Las Heras, L.P. ICDAR 2013 robust reading competition. In Proceedings of the 2013 12th International Conference on Document Analysis and Recognition, Washington, DC, USA, 25–28 August 2013; pp. 1484–1493. [Google Scholar]
- Karatzas, D.; Gomez-Bigorda, L.; Nicolaou, A.; Ghosh, S.; Bagdanov, A.; Iwamura, M.; Matas, J.; Neumann, L.; Chandrasekhar, V.R.; Lu, S.; et al. ICDAR 2015 competition on robust reading. In Proceedings of the 13th International Conference on Document Analysis and Recognition), Tunis, Tunisia, 23–26 August 2015; pp. 1156–1160. [Google Scholar]
- Mishra, A.; Alahari, K.; Jawahar, C. Scene text recognition using higher order language priors. In Proceedings of the British Machine Vision Conference (BMVC), Virtual, 22–25 November 2012. [Google Scholar]
- Wang, K.; Babenko, B.; Belongie, S. End-to-end scene text recognition. In Proceedings of the 2011 International Conference on Computer Vision IEEE, Washington, DC, USA, 20–25 June 2011; pp. 1457–1464. [Google Scholar]
- Quy Phan, T.; Shivakumara, P.; Tian, S.; Lim Tan, C. Recognizing text with perspective distortion in natural scenes. In Proceedings of the IEEE International Conference on Computer Vision, Sydney, NSW, Australia, 2–8 December 2013; pp. 569–576. [Google Scholar]
- Risnumawan, A.; Shivakumara, P.; Chan, C.S.; Tan, C.L. A robust arbitrary text detection system for natural scene images. Expert Syst. Appl. 2014, 41, 8027–8048. [Google Scholar] [CrossRef]
- Grother, P.; Tabassi, E. Performance of biometric quality measures. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29, 531–543. [Google Scholar] [CrossRef] [PubMed]
- Zhang, L.; Wang, P.; Li, H.; Li, Z.; Shen, C.; Zhang, Y. A robust attentional framework for license plate recognition in the wild. IEEE Trans. Intell. Transp. Syst. 2020, 22, 6967–6976. [Google Scholar] [CrossRef]
- Xu, Z.; Yang, W.; Meng, A.; Lu, N.; Huang, H.; Ying, C.; Huang, L. Towards end-to-end license plate detection and recognition: A large dataset and baseline. In Proceedings of the European conference on computer vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 255–271. [Google Scholar]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jia, Z.; Xu, S.; Mu, S.; Tao, Y. Learning-Based Text Image Quality Assessment with Texture Feature and Embedding Robustness. Electronics 2022, 11, 1611. https://doi.org/10.3390/electronics11101611
Jia Z, Xu S, Mu S, Tao Y. Learning-Based Text Image Quality Assessment with Texture Feature and Embedding Robustness. Electronics. 2022; 11(10):1611. https://doi.org/10.3390/electronics11101611
Chicago/Turabian StyleJia, Zhiwei, Shugong Xu, Shiyi Mu, and Yue Tao. 2022. "Learning-Based Text Image Quality Assessment with Texture Feature and Embedding Robustness" Electronics 11, no. 10: 1611. https://doi.org/10.3390/electronics11101611
APA StyleJia, Z., Xu, S., Mu, S., & Tao, Y. (2022). Learning-Based Text Image Quality Assessment with Texture Feature and Embedding Robustness. Electronics, 11(10), 1611. https://doi.org/10.3390/electronics11101611