Scene Text Recognition That Eliminates Background and Character Noise Interference
Abstract
:1. Introduction
1.1. A Single-Dimensional Attention-Based Method for Scene Text Recognition
1.2. A Two-Dimensional Attention-Based Method for Scene Text Recognition
2. Related Work
2.1. Text Recognition of Scenes Using Deep Learning
2.2. Scene Text Recognition Method Based on Image Segmentation
3. Proposed Framework
3.1. Pre-Processing
3.2. The ENBC System
3.2.1. Encoder
3.2.2. Analysis of Deep Convolutional Neural Network Models
3.2.3. Decoder
3.2.4. Loss of Model
3.3. Scene Text Recognition Model
4. Experimental Results and Analyses
4.1. Experimental Datasets
- (1)
- Synthetic scene dataset
- (2)
- Real scene dataset
4.2. Experimental Setup
4.3. Eliminate Background and Character Self-Noise Interference
4.4. Scene Text Recognition
- (1)
- Synthetic scene dataset experiment
- (2)
- Experiments on real scene datasets
- (3)
- Trade-off study
- (4)
- Model failure analysis
- (5)
- Adaptation Strategy
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Liang, Y.; Li, X. Reassembling shredded document stripes using wordpath metric and greedy composition optimal matching solver. IEEE Trans. Multimedia 2020, 22, 1168–1181. [Google Scholar] [CrossRef]
- Zhang, J.; Sang, J.; Xu, K.; Wu, S.; Zhao, X.; Sun, Y.; Hu, Y.; Yu, J. Robust CAPTCHAs towards malicious OCR. IEEE Trans. Multimedia 2021, 23, 2575–2587. [Google Scholar] [CrossRef]
- Tian, S.; Lu, S.; Su, B.; Tan, C.L. Scene text recognition using co-occurrence of histogram of oriented gradients. In Proceedings of the 2013 12th International Conference on Document Analysis and Recognition, Washington, DC, USA, 25–28 August 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 912–916. [Google Scholar]
- Su, B.; Lu, S.; Tian, S.; Lim, J.H.; Tan, C.L. Character recognition in natural scenes using convolutional co-occurrence hog. In Proceedings of the 2014 22nd International Conference on Pattern Recognition, Stockholm, Sweden, 24–28 August 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 2926–2931. [Google Scholar]
- Su, B.; Lu, S. Accurate scene text recognition based on recurrent neural network. In Proceedings of the Asian Conference on Computer Vision, Singapore, 1–5 November 2014; Springer: Berlin/Heidelberg, Germany, 2014; pp. 35–48. [Google Scholar]
- Casey, R.; Nagy, G. Recognition of printed Chinese characters. IEEE Trans. Electron. Comput. 1966, EC-15, 91–101. [Google Scholar] [CrossRef]
- Zhou, S.S.; Chen, Q.C.; Wang, X.L.; Guo, X.; Li, H. An empirical evaluation on HIT-OR3C database. In Proceedings of the International Conference on Document Analysis and Recognition, Beijing, China, 18–21 September 2011; IEEE Computer Society Press: Los Alamitos, CA, USA, 2011; pp. 1150–1154. [Google Scholar]
- Qu, X.W.; Xu, N.; Wang, W.Q.; Lu, K. Similar handwritten Chinese character recognition based on adaptive discriminative locality alignment. In Proceedings of the 14th IAPR International Conference on Machine Vision Applications, Tokyo, Japan, 18–22 May 2015; IEEE Computer Society Press: Los Alamitos, CA, USA, 2015; pp. 130–133. [Google Scholar]
- Wu, X.; Chen, Q.; Xiao, Y.; Li, W.; Liu, X.; Hu, B. LCSegNet: An efficient semantic segmentation network for large-scale complex Chinese character recognition. IEEE Trans. Multimed. 2020, 23, 3427–3440. [Google Scholar] [CrossRef]
- Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
- Sheng, F.; Zhai, C.; Chen, Z.; Xu, B. End-to-end chinese image text recognition with attention model. In Proceedings of the International Conference on Neural Information Processing, Guangzhou, China, 14–18 November 2017; Springer: Berlin/Heidelberg, Germany, 2017; pp. 180–189. [Google Scholar]
- Shi, B.; Bai, X.; Yao, C. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 2298–2304. [Google Scholar] [CrossRef]
- He, T.; Huang, W.; Qiao, Y.; Yao, J. Text-attentional convolutional neural network for scene text detection. IEEE Trans. Image Process. 2016, 25, 2529–2541. [Google Scholar] [CrossRef]
- Graves, A.; Fernández, S.; Gomez, F.; Schmidhuber, J. Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd International Conference on Machine Learning, Pennsylvania, PA, USA, 25–29 June 2006; ACM: New York, NY, USA, 2006; pp. 369–376. [Google Scholar]
- Liao, M.; Zhang, J.; Wan, Z.; Xie, F.; Liang, J.; Lyu, P.; Yao, C.; Bai, X. Scene text recognition from two-dimensional perspective. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 8714–8721. [Google Scholar]
- Cheng, Z.; Xu, Y.; Bai, F.; Niu, Y.; Pu, S.; Zhou, S. Aon: Towards arbitrarily-oriented text recognition. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 5571–5579. [Google Scholar]
- Wang, T.; Zhu, Y.; Jin, L.; Luo, C.; Chen, X.; Wu, Y.; Wang, Q.; Cai, M. Decoupled attention network for text recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 12216–12224. [Google Scholar]
- Nguyen, N.; Tran, V.; Tran, M.-T.; Ngo, T.D.; Nguyen, T.H.; Hoai, M. Dictionary-guided scene text recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 7383–7392. [Google Scholar]
- Fang, S.; Xie, H.; Wang, Y.; Mao, Z.; Zhang, Y. Read like humans: Autonomous, bidirectional and iterative language modelling for scene text recognition. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 7098–7107. [Google Scholar]
- Tounsi, M.; Moalla, I.; Pal, U.; Alimi, A.M. Arabic and Latin scene text recognition by combining handcrafted and deep-learned features. Arab. J. Sci. Eng. 2022, 47, 9727–9740. [Google Scholar] [CrossRef]
- Yan, R.; Peng, L.; Xiao, S.; Yao, G. Primitive representation learning for scene text recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 284–293. [Google Scholar]
- Zhang, Z.; Xu, Y.; Liu, C.-L. Natural scene character recognition using robust pca and sparse representation. In Proceedings of the 2016 12th IAPR Workshop on Document Analysis Systems (DAS), Santorini, Greece, 11–14 April 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 340–345. [Google Scholar]
- Tian, S.; Bhattacharya, U.; Lu, S.; Su, B.; Wang, Q.; Wei, X.; Lu, Y.; Tan, C.L. Multilingual scene character recognition with co-occurrence of histogram of oriented gradients. Pattern Recognit. 2016, 51, 125–134. [Google Scholar] [CrossRef]
- Su, B.; Lu, S. Accurate recognition of words in scenes without character segmentation using recurrent neural network. Pattern Recognit. 2017, 63, 397–405. [Google Scholar] [CrossRef]
- Liang, M.; Hu, X. Recurrent convolutional neural network for object recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; IEEE: New York, NY, USA, 2015; pp. 3367–3375. [Google Scholar]
- Wang, J.; Hu, X. Gated recurrent convolution neural network for ocr. In Advances in Neural Information Processing Systems; Curran Associates: New York, NY, USA, 2017; Volume 30. [Google Scholar]
- Albahli, S.; Nawaz, M.; Javed, A.; Irtaza, A. An improved faster-RCNN model for handwritten character recognition. Arab. J. Sci. Eng. 2021, 46, 8509–8523. [Google Scholar]
- Zhang, Z.; Wang, H.; Liu, S.; Xiao, B. Consecutive convolutional activations for scene character recognition. IEEE Access 2018, 6, 35734–35742. [Google Scholar]
- Zhang, Y.; Nie, S.; Liu, W.; Xu, X.; Zhang, D.; Shen, H.T. Sequence-to-sequence domain adaptation network for robust text image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; IEEE: New York, NY, USA, 2019; pp. 2740–2749. [Google Scholar]
- Liu, Z.; Li, Y.; Ren, F.; Goh, W.L.; Yu, H. Squeezedtext: A real-time scene text recognition by binary convolutional encoder-decoder network. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; AAAI: Menlo Park, CA, USA, 2018. [Google Scholar]
- Bhunia, A.K.; Banerjee, P.; Konwer, A.; Bhowmick, A.; Roy, P.P.; Pal, U. Word level font-to-font image translation using convolutional recurrent generative adversarial networks. In Proceedings of the 2018 24th International Conference on Pattern Recognition (ICPR), Beijing, China, 20–24 August 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 3645–3650. [Google Scholar]
- Bhunia, A.K.; Das, A.; Bhunia, A.K.; Kishore, P.S.R.; Roy, P.P. Handwriting recognition in low-resource scripts using adversarial learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4767–4776. [Google Scholar]
- Wu, S.; Zhai, W.; Cao, Y. Pixtextgan: Structure aware text image synthesis for license plate recognition. IET Image Process. 2019, 13, 2744–2752. [Google Scholar] [CrossRef]
- Wang, Y.; Lian, Z.; Tang, Y.; Xiao, J. Boosting scene character recognition by learning canonical forms of glyphs. Int. J. Doc. Anal. Recognit. (IJDAR) 2019, 22, 209–219. [Google Scholar]
- Lin, Q.; Liang, L.; Huang, Y.; Jin, L. Learning to generate realistic scene chinese character images by multitask coupled gan. In Proceedings of the Chinese Conference on Pattern Recognition and Computer Vision (PRCV), Guangzhou, China, 23–26 November 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 41–51. [Google Scholar]
- Bonechi, S.; Bianchini, M.; Scarselli, F.; Andreini, P. Weak supervision for generating pixel–level annotations in scene text segmentation. Pattern Recognit. Lett. 2020, 138, 1–7. [Google Scholar]
- Wang, C.; Zhao, S.; Zhu, L.; Luo, K.; Guo, Y.; Wang, J.; Liu, S. Semi-supervised pixel-level scene text segmentation by mutually guided network. IEEE Trans. Image Process. 2021, 30, 8212–8221. [Google Scholar] [CrossRef]
- Chaitra, Y.L.; Dinesh, R.; Gopalakrishna, M.T.; Prakash, B.V.A. Deep-CNNTL: Text localization from natural scene images using deep convolution neural network with transfer learning. Arab. J. Sci. Eng. 2022, 47, 9629–9640. [Google Scholar]
- Xu, X.; Zhang, Z.; Wang, Z.; Price, B.; Wang, Z.; Shi, H. Rethinking text segmentation: A novel dataset and a text specific refinement approach. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2021; IEEE: New York, NY, USA, 2021; pp. 12045–12055. [Google Scholar]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoderdecoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 801–818. [Google Scholar]
- Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; ACM: New York, NY, USA, 2015; pp. 448–456. [Google Scholar]
- Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; IEEE: New York, NY, USA, 2017; pp. 1251–1258. [Google Scholar]
- Wang, J.; Sun, K.; Cheng, T.; Jiang, B.; Deng, C.; Zhao, Y.; Xiao, B. Deep high-resolution representation learning for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 3349–3364. [Google Scholar]
- Noh, H.; Hong, S.; Han, B. Learning deconvolution network for semantic segmentation. In Proceedings of the IEEE international Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1520–1528. [Google Scholar]
- Liu, C.; Chen, L.-C.; Schroff, F.; Adam, H.; Hua, W.; Yuille, A.L.; Li, F.-F. Auto-deeplab: Hierarchical neural architecture search for semantic image segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 82–92. [Google Scholar]
- Yan, J.; Cheng, Y.; Wang, Q.; Liu, L.; Zhang, W.; Jin, B. Transformer and graph convolution-based unsupervised detection of machine anomalous sound under domain shifts. IEEE Trans. Emerg. Top. Comput. Intell. 2024, 8, 2827–2842. [Google Scholar]
- Yan, J.; Wang, X.; Cai, J.; Qin, Q.; Yang, H.; Wang, Q.; Cheng, Y.; Gan, T.; Jiang, H.; Deng, J.; et al. Medical image segmentation model based on triple gate MultiLayer perceptron. Sci. Rep. 2022, 12, 6103. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 834–848. [Google Scholar] [PubMed]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]
- Lyu, P.; Liao, M.; Yao, C.; Wu, W.; Bai, X. Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 67–83. [Google Scholar]
- Jaderberg, M.; Simonyan, K.; Vedaldi, A.; Zisserman, A. Reading text in the wild with convolutional neural networks. Int. J. Comput. Vis. 2016, 116, 1–20. [Google Scholar]
- Fisher, Y.; Koltun, V. Multi-scale context aggregation by dilated convolutions. Int. Conf. Learn. Represent. 2016, 18, 554–568. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. MobileNet v2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 4510–4520. [Google Scholar]
- Shi, B.; Yang, M.; Wang, X.; Lyu, P.; Yao, C.; Bai, X. Aster: An attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 41, 2035–2048. [Google Scholar]
- Jaderberg, M.; Simonyan, K.; Vedaldi, A.; Zisserman, A. Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition. [2022–07-25]. Available online: https://arxiv.org/abs/1406.2227 (accessed on 10 December 2024).
- Mishra, A.; Alahari, K.; Jawahar, C.V. Scene Text Recognition Using Higher Order Language Priors. [2022–07-25]. Available online: https://inria.hal.science/hal-00818183/document (accessed on 10 December 2024).
- Karatzas, D.; Gomez-Bigorda, L.; Nicolaou, A.; Ghosh, S.; Bagdanov, A.; Iwamura, M.; Matas, J.; Neumann, L.; Chandrasekhar, V.R.; Lu, S.; et al. ICDAR 2015 competition on robust reading. In Proceedings of the 13th International Conference on Document Analysis and Recognition, Tunis, Tunisia, 23–26 August 2015; IEEE Computer Society Press: Los Alamitos, CA, USA, 2015; pp. 1156–1160. [Google Scholar]
- Lucas, S.M.; Panaretos, A.; Sosa, L.; Tang, A.; Wong, S.; Young, R.; Ashid, K.; Nagai, H.; Okamoto, M.; Yamamoto, H.; et al. ICDAR 2003 robust reading competitions: Entries, results, and future directions. Int. J. Doc. Anal. Recognit. (IJDAR) 2005, 7, 105–122. [Google Scholar]
- Risnumawan, A.; Shivakumara, P.; Chan, C.S.; Tan, C.L. A robust arbitrary text detection system for natural scene images. Expert Syst. Appl. 2014, 41, 8027–8048. [Google Scholar]
- Yue, X.; Kuang, Z.; Lin, C.; Sun, H.; Zhang, W. RobustScanner: Dynamically enhancing positional clues for robust text recognition. In Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 135–151. [Google Scholar]
- Yu, D.; Li, X.; Zhang, C.; Liu, T.; Han, J.; Liu, J.; Ding, E. Towards accurate scene text recognition with semantic reasoning networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 12113–12122. [Google Scholar]
- Liu, W.; Chen, C.; Wong, K.-Y.; Su, Z.; Han, J. STAR-Net: A SpaTial attention residue network for scene text recognition. In Proceedings of the British Machine Vision Conference, York, UK, 19–22 September 2016; Volume 2, p. 7. [Google Scholar]
- Atienza, R. Vision transformer for fast and efficient scene text recognition. In Document Analysis and Recognition—ICDAR 2021; Springer: Cham, Switzerland, 2021; pp. 319–334. [Google Scholar]
- Li, M.; Lv, T.; Chen, J.; Cui, L.; Lu, Y.; Florencio, D.; Zhang, C.; Li, Z.; Wei, F. Trocr: Transformer-based optical character recognition with pre-trained models. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 13094–13102. [Google Scholar]
- Lu, N.; Yu, W.; Qi, X.; Chen, Y.; Gong, P.; Xiao, R.; Bai, X. Master: Multi-aspect non-local network for scene text recognition. Pattern Recognit. 2021, 117, 107980. [Google Scholar]
Dataset | P | R | F | IoU |
---|---|---|---|---|
Text-test | 93.91 | 98.69 | 96.24 | 92.76 |
Methods | Training Dataset | Text-Test |
---|---|---|
CRNN | Text-train | 82.91 |
RobustScanner | Text-train | 88.28 |
SRN | Text-train | 87.67 |
STAR-Net | Text-train | 85.75 |
PREN | Text-train | 86.04 |
ViTSTR-Small | Text-train | 81.72 |
ABINet | Text-train | 90.58 |
TrOCR | Text-train | 93.80 |
MASTER | Text-train | 91.57 |
Ours | Text-train | 98.00 |
Methods | Training Dataset | Test Dataset | |||||
---|---|---|---|---|---|---|---|
IIIT5K | IC15 | IC03-Char | IC03-Word | CUTE80 | Avg | ||
CRNN | Text-train+MJ | 79.38 | 61.35 | 72.22 | 76.31 | 71.87 | 72.22 |
RobustScanner | Text-train+MJ | 83.16 | 72.1 | 77.96 | 81.46 | 80.49 | 79.03 |
SRN | Text-train+MJ | 83.52 | 72.82 | 83.52 | 81.52 | 82.37 | 80.75 |
STAR-Net | Text-train+MJ | 85.20 | 74.50 | 83.05 | 82.00 | 69.20 | 78.79 |
PREN | Text-train+MJ | 87.05 | 73.45 | 81.38 | 80.61 | 85.61 | 81.62 |
ViTSTR-Small | Text-train+MJ | 85.60 | 75.30 | 83.10 | 82.80 | 71.30 | 79.62 |
ABINet | Text-train+MJ | 86.45 | 74.69 | 82.33 | 83.73 | 84.05 | 82.25 |
TrOCR | Text-train+MJ | 84.35 | 74.22 | 84.66 | 84.73 | 89.67 | 83.53 |
MASTER | Text-train+MJ | 93.18 | 70.52 | 83.76 | 82.67 | 82.07 | 82.44 |
Ours | Text-train+MJ | 96.10 | 80.47 | 87.22 | 83.75 | 87.50 | 87.00 |
Methods | Avg | Parameters (1 × 106) | Flops (1 × 109) | Speed (ms/Image) |
---|---|---|---|---|
CRNN | 72.22 | 8.5 | 1.4 | 3.7 |
RobustScanner | 79.03 | 20.6 | 5.0 | 16.8 |
SRN | 80.75 | 57.3 | 10.8 | 18.8 |
STAR-Net | 78.79 | 48.9 | 10.7 | 8.8 |
PREN | 81.62 | 20.0 | 3.45 | 29.5 |
ViTSTR-Small | 79.62 | 21.5 | 4.6 | 9.5 |
ABINet | 82.25 | 36.7 | 8.3 | 33.9 |
TrOCR | 83.53 | 558.0 | 15.2 | 318.0 |
MASTER | 82.44 | 28.0 | 13.2 | 6.45 |
Ours | 87.00 | 39.7 | 8.65 | 38.0 |
Strategy | Frozen Modules | Trainable Modules | Learning Rate | Training Data |
---|---|---|---|---|
Strategy A | Encoder | Decoder | IC15(1000) | |
Strategy B | Encoder + Decoder | Recognition Module | IC15(1000) | |
Strategy C | Encoder | Decoder + Recognition Module | IC15(1000) |
Strategy | IC15 Acc (%) | IIIT5K Acc (%) | CUTE80 Acc (%) |
---|---|---|---|
Baseline | 80.47 | 96.10 | 87.50 |
Strategy A | 83.20 (+2.73) | 95.85 (−0.25) | 87.30 (−0.20) |
Strategy B | 85.62 (+5.15) | 96.05 (−0.05) | 87.45 (−0.05) |
Strategy C | 86.10 (+5.63) | 96.15 (+0.05) | 87.60 (+0.10) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tang, S.; Cao, Y.; Liang, S.; Jin, Z.; Lai, K. Scene Text Recognition That Eliminates Background and Character Noise Interference. Appl. Sci. 2025, 15, 3545. https://doi.org/10.3390/app15073545
Tang S, Cao Y, Liang S, Jin Z, Lai K. Scene Text Recognition That Eliminates Background and Character Noise Interference. Applied Sciences. 2025; 15(7):3545. https://doi.org/10.3390/app15073545
Chicago/Turabian StyleTang, Shancheng, Yaoqian Cao, Shaojun Liang, Zicheng Jin, and Kun Lai. 2025. "Scene Text Recognition That Eliminates Background and Character Noise Interference" Applied Sciences 15, no. 7: 3545. https://doi.org/10.3390/app15073545
APA StyleTang, S., Cao, Y., Liang, S., Jin, Z., & Lai, K. (2025). Scene Text Recognition That Eliminates Background and Character Noise Interference. Applied Sciences, 15(7), 3545. https://doi.org/10.3390/app15073545