Arabic Cursive Text Recognition from Natural Scene Images
Abstract
:1. Introduction
- More than one shape of a character increases the complexity in character recognition.
- The diacritics on a character are a more important feature of some characters in Arabic, because without using them, it becomes difficult to read a proper word and understand its meaning.
- Contrary to Latin, the writing style is from right to left.
- The texture-based method relies on the properties of an image like intensity and hue values, wavelet transformation of an image and by applying different filtration techniques, which contribute to representing the image. Such properties may help to detect the text in an image as explained in some of the presented work, like [9,10,11,12].
- The component-based method depends on the specific region(s) of an image. The region is often marked by colour clustering and coordinate values. Different filtration techniques may be applied to segment the text and non-text region from an image. If scene text images are taken in specific settings, then component-based methods produce good results. This method is not suitable for invariant text images like the difference in font size, rotation, etc. Some researchers proposed their techniques by using this method, as mentioned by the researchers in [4,5,7,13,14].
- Scattered works have been seen on cursive scene text recognition, especially in Arabic script. The aim is to provide detailed knowledge to researchers about the current status of work for the purpose of discussing the future possibilities of research in Arabic scene text analysis.
- The details about state-of-the-art techniques are also provided, which exhibit the research on cursive scene text recognition and may assist Arabic text recognition in natural images.
- The details about available Arabic scene text datasets are also provided, which may guide the researcher about what level the research has been done in Arabic scripts and what are the hindrances during the process.
- This paper provides new insights to researcher and gives us an idea where Arabic or Arabic-like (such as Farsi, Urdu) languages stand in scene text recognition field.
2. Complexities Involved in the Scene Text Recognition Process and Its Solutions
2.1. Methods Designed for Text Localization
2.2. Methods Designed for Feature Extraction
2.3. Classification Techniques for Scene Text Recognition
2.4. Arabic Scene Text Datasets
3. Advances of Deep Learning Network in Text Analysis
3.1. Recurrent Neural Networks
3.2. Convolutional Neural Networks
3.3. Recently-Proposed Deep Learning Research
4. Discussion
5. Future Directions
- Lack of publicly-available Arabic scene text dataset: Capture and compiling Arabic script scene images is an old challenge itself. The text in natural images usually appears with blur, shudder and at low resolution. Taking an ideal image for research purposes is a provocative task. Although few Arabic scene text datasets are available as reported in [87], to define a detailed dataset in which natural images have text captured in the presence of illumination variation, different text sizes and font styles, there is a need for research on cursive scene text analysis. To create such a dataset, it may prove to be a starting point to address the intrinsic challenges associated with Arabic text in natural images.
- Localization/detection of text: One of the difficult tasks in scene text images is to localize the text correctly by specific techniques or specialized tools. Extracting text with high precision is still a challenge for researchers to tackle. There exist various approaches to address this issue [4,10,30]. For the purpose of obtaining the segmentation of natural text accurately, Arabic text was localized manually in most of the reported research. A work about text localization of Latin script was presented with high precision by [2], but for cursive script, which is still an open research problem.
- Text image preprocessing: The scene text image contains unwanted data, which ultimately need to be removed. Most reported work removed such noise manually, or it can be removed by image processing techniques. However, in practical applications when the dataset is large, it is recommended to define methods that may help to make clean text images, as explained in [74]. In that case, an automatic layout analysis tool is preferred, which may detect and remove unwanted information from the given text.
- Implicit segmentation techniques: Arabic script is very complex in nature due to its various representations of the same characters and ligatures’ combination. The association of diacritic marks to a base character is another very important issue. Whilst the automated tools cannot accurately segment this intrinsic script, most of the reported work relied on implicit segmentation of Arabic text. In this regard, LSTM provides an implicit segmentation approach for text recognition as reported in recent research on Urdu character recognition presented by [4,5,6]. If some research is performed on extensive segmentation approaches, which yields the solution as an implicit method, then it may provide new horizons for researchers who want to exploit their ideas in this direction by using Arabic text recognition in natural images.
- Apply state-of-the-art deep learning techniques: As mentioned before, most of the proposed Arabic scene text recognition has been exposed to an unsupervised learning algorithm. If applying the state-of-the-art supervised deep learning algorithm for the purpose of estimating correct parameters that are required for training the given sample, then this may lead to the proposal of a new dimension for learning complex patterns like Arabic, as tried by [30]. In this regard, one possible future direction may be the exposure of Arabic scene text images to supervised learning tasks and those deep learning classifiers that perform implicit segmentation of a given text image such as RNNs [7], Convolutional neural Networks (ConvNets) [13] and Bidirectional Long Short-Term Memory networks (BLSTM) [1].
6. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Graves, A. Teaching Computers to Read and Write: Recent Advances in Cursive Handwriting Recognition and Synthesis with Recurrent Neural Networks. In Proceedings of the CORIA 2014—Conférence en Recherche d’Infomations et Applications—11th French Information Retrieval Conference, CIFED 2014 Colloque International Francophone sur l’Ecrit et le Document, Nancy, France, 19–23 March 2014; Available online: http://dblp.uni-trier.de/db/conf/coria/coria2014.html#Graves14 (accessed on 8 April 2017).
- Shahab, A.; Shafait, F.; Dengel, A. ICDAR 2011 Robust Reading Competition Challenge 2: Reading Text in Scene Images. In Proceedings of the International Conference on Document Analysis and Recognition, Beijing, China, 18–21 September 2011; pp. 1491–1496. [Google Scholar]
- Liwicki, M.; Bunke, H. Feature Selection for HMM and BLSTM Based Handwriting Recognition of Whiteboard Notes. Int. J. Pattern Recognit. Artif. Intell. 2009, 23, 907–923. [Google Scholar] [CrossRef]
- Ahmed, S.B.; Naz, S.; Razzak, M.I.; Rashid, S.F.; Afzal, M.Z.; Breuel, T.M. Evaluation of cursive and non-cursive scripts using recurrent neural networks. Neural Comput. Appl. 2016, 27, 603–613. [Google Scholar] [CrossRef]
- Naz, S.; Umar, A.I.; Ahmad, R.; Ahmed, S.B.; Shirazi, S.H.; Siddiqi, I.; Razzak, M.I. Offline cursive Urdu-Nastaliq script recognition using multidimensional recurrent neural networks. Neurocomputing 2016, 177, 228–241. [Google Scholar] [CrossRef]
- Ahmed, S.B.; Naz, S.; Swati, S.; Razzak, M.I.; Umar, A.I.; Khan, A.A. UCOM offline dataset: Aa Urdu handwritten dataset generation. Int. Arab J. Inf. Technol. 2014, 14, 239–245. [Google Scholar]
- Graves, A.; Liwicki, M.; Fernández, S.; Bertolami, R.; Bunke, H.; Schmidhuber, J. A Novel Connectionist System for Unconstrained Handwriting Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 31, 855–868. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ul-Hasan, A.; Ahmed, S.B.; Rashid, F.; Shafait, F.; Breuel, T.M. Offline Printed Urdu Nastaleeq Script Recognition with Bidirectional LSTM Networks. In Proceedings of the 12th International Conference on Document Analysis and Recognition, Washington, DC, USA, 25–28 August 2013; pp. 1061–1065. [Google Scholar]
- Fabrizio, J.; Marcotegui, B.; Cord, M. Text segmentation in natural scenes using Toggle-Mapping. In Proceedings of the 6th IEEE International Conference on Image Processing (ICIP), Cairo, Egypt, 7–10 November 2009; pp. 2373–2376. [Google Scholar]
- Newell, A.J.; Griffin, L.D. Multiscale Histogram of Oriented Gradient Descriptors for Robust Character Recognition. In Proceedings of the International Conference on Document Analysis and Recognition, Beijing, China, 18–21 September 2011; pp. 1085–1089. [Google Scholar]
- Mao, J.; Li, H.; Zhou, W.; Yan, S.; Tian, Q. Scale based region growing for scene text detection. In Proceedings of the 21st ACM International Conference on Multimedia, Barcelona, Spain, 21–25 October 2013; pp. 1007–1016. [Google Scholar]
- Neuhaus, M. Learning Graph Edit Distance. Master’s Thesis, University of Bern, Bern, Switzerland, 2003. [Google Scholar]
- Ahmed, S.B.; Naz, S.; Razzak, M.I.; Yousaf, R. Deep Learning based Isolated Arabic Scene Character Recognition. In Proceedings of the 1st Workshop on Arabic Script Analysis and Recognition, Nancy, France, 3–5 April 2017. [Google Scholar]
- Neumann, L.; Matas, J. Scene Text Localization and Recognition with Oriented Stroke Detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Sydney, Australia, 1–8 December 2013; pp. 97–104. [Google Scholar]
- Simian, D.; Stoica, F. Evaluation of a Hybrid Method for Constructing Multiple SVM Kernels; WSEAS Press: Athens, Greece, 2009; pp. 619–623. ISBN 978-960-474-099-4. [Google Scholar]
- Tola, E.; Fossati, A.; Strecha, C.; Fua, P. Large occlusion completion using normal maps. In Proceedings of the Asian Conference on Computer Vision, Queenstown, New Zealand, 8–12 November 2010. [Google Scholar]
- Belongie, S.; Malik, J.; Puzicha, J. Shape Matching and Object Recognition Using Shape Contexts. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 509–522. [Google Scholar] [CrossRef]
- Berg, A.C.; Berg, T.L.; Malik, J. Shape Matching and Object Recognition Using Low Distortion Correspondences. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 20–25 June 2005; pp. 26–33. [Google Scholar] [CrossRef]
- Alginahi, Y.M. A survey on Arabic character segmentation. Int. J. Doc. Anal. Recognit. 2013, 16, 105–126. [Google Scholar] [CrossRef]
- Bissacco, A.; Cummins, M.; Netzer, Y.; Neven, H. PhotoOCR: Reading Text in Uncontrolled Conditions. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Sydney, Australia, 1–8 December 2013; pp. 785–792. Available online: http://dblp.uni-trier.de/db/conf/iccv/iccv2013.html#BissaccoCNN13 (accessed on 23 November 2016).
- Yu, C.; Song, Y.; Zhang, Y. Scene text localization using edge analysis and feature pool. Neurocomputing 2016, 175, 652–661. [Google Scholar] [CrossRef]
- Neumann, L.; Matas, J. Efficient Scene text localization and recognition with local character refinement. In Proceedings of the 13th International Conference on Document Analysis and Recognition (ICDAR), Tunis, Tunisia, 23–26 August 2015; pp. 746–750. [Google Scholar] [CrossRef]
- Shekar, B.H. Skeleton Matching based approach for Text Localization in Scene Images. In Proceedings of the 8th International Conference on Image and Signal Processing; Elsevier: New York City, NY, USA, 2014; pp. 145–153. ISBN 9789351072522. [Google Scholar]
- Liu, X.; Wang, W. An effective graph-cut scene text localization with embedded text segmentation. Multimed. Tools Appl. 2015, 74, 4891–4906. [Google Scholar] [CrossRef]
- Gomez, l.; Nicolaou, A.; Karatzas, D. Improving patch-based scene text script identification with ensembles of conjoined networks. Pattern Recognition 2017, 67, 85–96. [Google Scholar] [CrossRef] [Green Version]
- Raja, S.D.M.; Shanmugam, A. Wavelet Features Based War Scene Classification using Artificial Neural Networks. Scene Classification; Haar and Daubechies Wavelet. 2013. Available online: http://www.enggjournals.com/ijcse/doc/IJCSE10-02-09-104.pdf (accessed on 28 July 2016).
- Busta, M.; Neumann, L.; Matas, J. FASText: Efficient Unconstrained Scene Text Detector. In Proceedings of the IEEE International Conference on Computer Vision; IEEE: Piscataway, NJ, USA, 2016; pp. 1206–1214. ISBN 978-1-4673-8391-2. [Google Scholar]
- Veit, A.; Matera, T.; Neumann, L.; Matas, J.; Belongie, S. COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images. arXiv, 2016; arXiv:1601.07140. [Google Scholar]
- Naz, S.; Hayat, K.; Razzak, M.I.; Anwar, M.W.; Madani, S.A.; Khan, S.U. The optical character recognition of Urdu-like cursive scripts. Pattern Recognit. 2014, 47, 1229–1248. [Google Scholar] [CrossRef]
- Parwej, F. An Empirical Evaluation of Off-line Arabic Handwriting And Printed Characters Recognition System. Int. J. Comput. Sci. Issues 2012, 9, 29–35. [Google Scholar]
- Tounsi, M.; Moalla, I.; Alimi, A.M.; Lebouregois, F. Arabic characters recognition in natural scenes using sparse coding for feature representations. In Proceedings of the 13th International Conference on Document Analysis and Recognition (ICDAR), Tunis, Tunisia, 3–26 August 2015; pp. 1036–1040. [Google Scholar]
- Ben Halima, M.; Karray, H.; Alimi, A.M. Arabic Text Recognition in Video Sequences. Int. J. Comput. Linguist. Res. 2013. Available online: https://arxiv.org/abs/1308.3243 (accessed on 20 July 2016).
- Sharma, N.; Mandal, R.; Sharma, R.; Pal, U.; Blumenstein, M. Proceedings of the 13th International Conference on Document Analysis and Recognition (ICDAR), Nancy, France, 23–26 August 2015; IEEE Computer Society: Washington, DC, USA, 2015; ISBN1 978-1-4799-1805-8. Available online: http://dblp.uni-trier.de/db/conf/icdar/icdar2015.html#SharmaMSPB15 (accessed on 14 April 2016)ISBN2 978-1-4799-1805-8.
- Shivakumara, P.; Sreedhar, R.P.; Phan, T.Q.; Lu, S.; Tan, C.L. Multioriented Video Scene Text Detection Through Bayesian Classification and Boundary Growing. IEEE Trans. Circuits Syst. 2012, 22, 1227–1235. [Google Scholar] [CrossRef] [Green Version]
- Yi, C.; Yang, X.; Tian, Y. Feature Representations for Scene Text Character Recognition: A Comparative Study. In Proceedings of the 12th International Conference on Document Analysis and Recognition, Washington, DC, USA, 25–28 August 2013; pp. 907–911. [Google Scholar]
- Jung, K.; Kim, K.I.; Jain, A.K. Text information extraction in images and video: a survey. Pattern Recognition. 2004, 37, 977–997. [Google Scholar] [CrossRef]
- Pan, Y.F.; Hou, X.; Liu, C.L. Text Localization in Natural Scene Images Based on Conditional Random Field. In Proceedings of the 10th International Conference on Document Analysis and Recognition, Barcelona, Spain, 26–29 July 2009; pp. 6–10. [Google Scholar]
- Lowe, D.G. Distinctive Image Features from Scale-Invariant Keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef] [Green Version]
- Yao, C.; Bai, X.; Shi, B.; Liu, W. Strokelets: A Learned Multi-scale Representation for Scene Text Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June 2014; pp. 4042–4049. [Google Scholar]
- Shi, C.; Wang, C.; Xiao, B.; Zhang, Y.; Gao, S.; Zhang, Z. Scene Text Recognition Using Part-Based Tree-Structured Character Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; pp. 2961–2968. [Google Scholar]
- Liu, Q.; Jung, C.; Kim, S.; Moon, Y.; Kim, J.Y. Stroke Filter for Text Localization in Video Images. In Proceedings of the International Conference on Image Processing, Atlanta, GA, USA, 8–11 October 2006; pp. 1473–1476. [Google Scholar]
- Tian, S.; Bhattacharya, U.; Lu, S.; Su, B.; Wang, Q.; Wei, X.; Lu, Y.; Tan, C.L. Multilingual scene character recognition with co-occurrence of histogram of oriented gradients. Pattern Recognit. 2016, 51, 125–134. [Google Scholar] [CrossRef]
- Pajdla, T.; Urban, M.; Chum, O.; Matas, J. Robust Wide Baseline Stereo from Maximally Stable Extremal Regions. Image Vis. Comput. 2004, 22, 761–767. [Google Scholar]
- Chen, H.; Tsai, S.S.; Schroth, G.; Chen, D.M.; Grzeszczuk, R.; Girod, B. Robust text detection in natural images with edge-enhanced Maximally Stable Extremal Regions. In Proceedings of the 18th IEEE International Conference on Image Processing, Brussels, Belgium, 11–14 September 2011; pp. 2609–2612. [Google Scholar]
- Gomez, L.; Karatzas, D. Multi-script Text Extraction from Natural Scenes. In Proceedings of the 12th International Conference on Document Analysis and Recognition, Washington, DC, USA, 25–28 August 2013; pp. 467–471. [Google Scholar]
- Yalniz, I.Z.; Gray, D.; Manmatha, R. Adaptive exploration of text regions in natural scene image. In Proceedings of the 13th International Conference on Document Analysis and Recognition (ICDAR), Washington, DC, USA, 25–28 August 2013. [Google Scholar]
- Yin, X.C.; Yin, X.; Huang, K.; Hao, H.W. Robust Text Detection in Natural Scene Images. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 36, 970–983. [Google Scholar] [Green Version]
- Zarechensky, M. Text detection in natural scenes with multilingual text. In Proceedings of the Tenth Spring Researcher’s Colloquium on Database and Information Systems, Veliky Novgorod, Russia, 30–31 May 2014. [Google Scholar]
- Serra, J. Toggle mappings. In From Pixels to Features; Elsevier: North Holland, The Netherlands, 1989; pp. 61–72. [Google Scholar]
- Bay, H.; Tuytelaars, T.; Van Gool, L. SURF: Speeded Up Robust Features. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2006; pp. 404–417. [Google Scholar]
- Calonder, M.; Lepetit, V.; Ozuysal, M.; Trzcinski, T.; Strecha, C.; Fua, P. BRIEF: Computing a Local Binary Descriptor Very Fast. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 1281–1298. [Google Scholar] [CrossRef] [Green Version]
- Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G. ORB: An efficient alternative to SIFT or SURF. In Proceedings of the IEEE International Conference on Computer Vision ICCV 2011, Barcelona, Spain, 6–13 November 2011; pp. 2564–2571. [Google Scholar]
- Liu, L.; Wang, L.; Liu, X. In defense of soft-assignment coding. In Proceedings of the IEEE International Conference on Computer Vision ICCV 2011, Barcelona, Spain, 6–13 November 2011; pp. 2486–2493. [Google Scholar]
- De Campos, T.E.; Babu, B.R.; Varma, M. Character Recognition in Natural Images. In Proceedings of the Fourth International Conference on Computer Vision Theory and Applications, Lisbon, Portugal, 5–8 February 2009; Volume 2, pp. 273–280. [Google Scholar]
- Lucas, S.M.; Panaretos, A.; Sosa, L.; Tang, A.; Wong, S.; Young, R. ICDAR 2003 Robust Reading Competitions; IEEE: Piscataway, NJ, USA, 2003; pp. 682–687. [Google Scholar]
- Olshausen, B.A.; Field, D.J. Emergence of simple-cell receptive-field properties by learning a sparse code for natural images. Nature 1996, 381, 607–609. [Google Scholar] [CrossRef]
- Lazebnik, S.; Schmid, C.; Ponce, J. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), New York, NY, USA, 17–22 June 2006; pp. 2169–2178. [Google Scholar]
- Zhang, Z.; Jin, L.; Ding, K.; Gao, X. Character-SIFT: A Novel Feature for Offline Handwritten Chinese Character Recognition. In Proceedings of the 10th International Conference on Document Analysis and Recognition, Barcelona, Spain, 26–29 July 2009; pp. 763–767. [Google Scholar]
- Wu, T.; Ma, S. Feature extraction by hierarchical overlapped elastic meshing for handwritten Chinese character recognition. In Proceedings of the Seventh International Conference on Document Analysis and Recognition, Edinburgh, UK, 6 August 2003; pp. 529–533. [Google Scholar]
- Zheng, Q.; Chen, K.; Zhou, Y.; Gu, C.; Guan, H. Text Localization and Recognition in Complex Scenes Using Local Features. In Asian Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2010; Volume 6494, pp. 121–132. ISBN 978-3-642-19317-0. [Google Scholar]
- Gomez, L.; Karatzas, D. A fine-grained approach to scene text script identification. In Proceedings of the 12th IAPR Workshop on Document Analysis Systems (DAS), Santorini, Greece, 11–14 April 2016. [Google Scholar]
- Coates, A.; Carpenter, B.; Case, C.; Satheesh, S.; Suresh, B.; Wang, T.; Wu, D.J.; Ng, A.Y. Text Detection and Character Recognition in Scene Images with Unsupervised Feature Learning. In Proceedings of the International Conference on Document Analysis and Recognition, Beijing, China, 18–21 September 2011; pp. 440–445. [Google Scholar]
- Muja, M.; Lowe, D.G. Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration. In Proceedings of the International Conference on Computer Vision Theory and Applications, Lisboa, Portugal, 5–8 February 2009; pp. 331–340. [Google Scholar]
- Wolf, C.; Jolion, J.M. Object count/area graphs for the evaluation of object detection and segmentation algorithms. Int. J. Doc. Anal. Recognit. 2006, 8, 280–296. [Google Scholar] [CrossRef] [Green Version]
- Shi, B.; Bai, X.; Yao, C. An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition. 2015. Available online: http://dblp.uni-trier.de/db/journals/corr/corr1507.html#ShiBY15 (accessed on 23 December 2016).
- Boiman, O.; Shechtman, E.; Irani, M. In defense of Nearest-Neighbor based image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 23–28 June 2008; pp. 1–8. [Google Scholar]
- Lowe, D.G. Object Recognition from Local Scale-Invariant Features. In Proceedings of the Eventh IEEE International Conference on Computer Vision, Kerkyra, Greece, 20–27 September 1999; pp. 1150–1157. [Google Scholar]
- Lazebnik, S.; Schmid, C.; Ponce, J. A sparse texture representation using local affine regions. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 1265–1278. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Varma, M.; Zisserman, A. Classifying Images of Materials: Achieving Viewpoint and Illumination Independence. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2002. [Google Scholar]
- Varma, M.; Zisserman, A. Texture classification: are filter banks necessary? In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Madison, WI, USA, 18–20 June 2003; pp. 691–698. [Google Scholar]
- Vedaldi, A.; Zisserman, A. Efficient Additive Kernels via Explicit Feature Maps. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 480–492. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Neumann, L.; Matas, J. A Method for Text Localization and Recognition in Real-World Images. In Asian Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2010; pp. 770–783. ISBN 978-3-642-19317-0. [Google Scholar]
- Eldar, Y.C.; Chan, A.M. An optimal whitening approach to linear multiuser detection. IEEE Trans. Inf. Theory 2003, 49, 2156–2171. [Google Scholar] [CrossRef] [Green Version]
- Hua, X.S.; Wenyin, L.; Zhang, H.J. An automatic performance evaluation protocol for video text detection algorithms. IEEE Trans. Circuits Syst. Video Tech. 2004, 14, 498–507. [Google Scholar] [CrossRef]
- Zhang, Z.; Zhang, C.; Shen, W.; Yao, C.; Liu, W.; Bai, X. Multi-Oriented Text Detection with Fully Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, CA, USA, 26 July 2016; Available online: https://arxiv.org/abs/1604.04018 (accessed on 21 September 2017).
- Zayene, O.; Touj, S.M.; Hennebert, J.; Ingold, R.; Amara, N.E.B. Open Datasets and Tools for Arabic Text Detection and Recognition in News Video Frames. J. Imaging 2018, 4, 32. [Google Scholar] [CrossRef]
- Yousfi, S.; Berrani, S.A.; Garcia, C. ALIF: A dataset for Arabic embedded text recognition in TV broadcast. In Proceedings of the 13th International Conference on Document Analysis and Recognition (ICDAR), Tunis, Tunisia, 23–26 August 2015; pp. 1221–1225. [Google Scholar]
- Slimane, F.; Ingold, R.; Kanoun, S.; Alimi, A.M.; Hennebert, J. A New Arabic Printed Text Image Database and Evaluation Protocols. In Proceedings of the 10th International Conference on Document Analysis and Recognition, Barcelona, Spain, 26–29 July 2009; pp. 946–950. [Google Scholar]
- Ma, J.; Shao, W.; Ye, H.; Wang, L.; Wang, H.; Zheng, Y.; Xue, X. Arbitrary-Oriented Scene Text Detection via Rotation Proposals. IEEE Trans. Multimedia 2018. [Google Scholar] [CrossRef]
- Tian, S.; Yin, X.C.; Su, Y.; Hao, H.W. A Unified Framework for Tracking Based Text Detection and Recognition from Web Videos. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 542–554. [Google Scholar] [CrossRef]
- Liao, M.; Shi, B.; Bai, X. TextBoxes++: A Single-Shot Oriented Scene Text Detector. IEEE Trans. Image Process. 2018, 27, 3676–3690. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Tang, Y.; Wu, X. Scene Text Detection and Segmentation Based on Cascaded Convolution Neural Networks. IEEE Trans. Image Process. 2017, 26, 1509–1520. [Google Scholar] [CrossRef] [PubMed]
- Ren, X.; Zhou, Y.; Huang, Z.; Sun, J.; Yang, X.; Chen, K. A Novel Text Structure Feature Extractor for Chinese Scene Text Detection and Recognition. IEEE Access 2017, 5, 3193–3204. [Google Scholar] [CrossRef]
- Ahmed, S.B.; Naz, S.; Swati, S.; Razzak, M.I. Handwritten Urdu Character Recognition using 1-Dimensional BLSTM Classifier. Neural Comput. Appl. 2017, 30, 1–9. [Google Scholar] [CrossRef]
- Jiang, Y.; Zhu, X.; Wang, X.; Yang, S.; Li, W.; Wang, H.; Fu, P.; Luo, Z. R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection. arxiv, 2017; arXiv:1706.09579. [Google Scholar]
- Tang, Y.; Wu, X. Scene Text Detection Using Superpixel-Based Stroke Feature Transform and Deep Learning Based Region Classification. IEEE Trans. Multimedia 2018, 20, 2276–2288. [Google Scholar] [CrossRef]
- Tounsi, M.; Moalla, I.; Alimi, A.M. ARASTI: A database for Arabic scene text recognition. In Proceedings of the 1st International Workshop on Arabic Script Analysis and Recognition (ASAR), Nancy, France, 3–5 April 2017; pp. 140–144. [Google Scholar]
Study | Feature Extraction Approach | Script | Dataset Used |
---|---|---|---|
Newell et al. [10] | Histogram of Oriented Gradients (HOG) | Latin | Char74k , ICDAR03-CH |
Neumann et al. [14] | Stroke orientation | Latin | ICDAR2011 |
Tounsi et al. [31] | SIFT | Arabic, Latin | ARASTEC |
Yi et al. [35] | Global and local HOG | Latin | Char74k , ICDAR2003 |
Tian et al. [42] | HOG | Chinese and Bengali | IIIT-5k word, Pan Chinese, |
ISI Bengali Characters | |||
Wu et al. [59] | Minimum Euclidean distance, SIFT | Chinese | ETL9B |
Zheng et al. [60] | SIFT | Chinese, Japanese, Korean | Datasets A, B, C, D, E |
(own compiled) | |||
Campos et al. [54] | Geometric blur, shape context, SIFT, | Latin, Kannada | Own compiled |
Patches, spin images , Maximum Response (MR8) | |||
Gomez et al. [61] | Convolutional neural network and K-means | Multilingual | CVSI, MLe2e dataset |
Mao et al. [11] | SIFT | Latin, multilingual | ICDAR, SWG, MSRG |
Study | Classifier Techniques | Script | Database |
---|---|---|---|
Newell et al. [10] | Not Reported | Latin | Char74k, ICDAR2003-Characters |
Tounsi et al. and | SVM | Arabic, Latin | ARASTEC, Char-74k, ICDAR2003 |
Yi et al. [31,35] | |||
Neumann et al. [14] | Nearest neighbour | Latin | ICDAR2011 |
Gomez et al. [61] | Convolutional neural | multilingual | CVSI , MLe2e dataset |
network and K-means | |||
Campos et al. [54] | Nearest neighbour, | Latin, Kannada | Own compiled |
SVM and kernel learning | |||
Mao et al. [11] | Neural network | Latin, multilingual | ICDAR, SWT, MSRG |
Wu et al. [59] | Minimum Euclidean distance classifier | Chinese | ETL9B |
Zheng et al. [60] | NR | Chinese, Japanese, Korean | Datasets A, B, C, D, |
E (own compiled) | |||
Zhang et al. [75] | Fully-convolutional network | Chinese | MSRA-TD500, ICDAR2015 |
and ICDAR2013 |
Language | Text Lines | Words | Characters |
---|---|---|---|
Arabic | 8915 | 2593 | 12,000 |
English | 2601 | 5172 | 7390 |
Study | Script | Phases | Database |
---|---|---|---|
Ma et al. [79] (2018) | Latin | Scene text detection | MSRA-TD500, ICDAR2013, ICDAR2015 |
Tang et al. [86] (2018) | Latin | Feature extraction | ICDAR2011, ICDAR2013, SVT |
Tian et al. [80] (2018) | Latin | Text detection from video images | USTB-VidTEXT |
Liao et al. [81] (2018) | Latin | Text detection | ICDAR2015, ICDAR2013, COCO-Text images, |
and SVT dataset | |||
Tang et al. [82] (2017) | Latin | Feature extraction | 7390 |
Ren et al. [83] (2017) | Chinese | Feature extraction | Own compiled dataset |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ahmed, S.B.; Naz, S.; Razzak, M.I.; Yusof, R. Arabic Cursive Text Recognition from Natural Scene Images. Appl. Sci. 2019, 9, 236. https://doi.org/10.3390/app9020236
Ahmed SB, Naz S, Razzak MI, Yusof R. Arabic Cursive Text Recognition from Natural Scene Images. Applied Sciences. 2019; 9(2):236. https://doi.org/10.3390/app9020236
Chicago/Turabian StyleAhmed, Saad Bin, Saeeda Naz, Muhammad Imran Razzak, and Rubiyah Yusof. 2019. "Arabic Cursive Text Recognition from Natural Scene Images" Applied Sciences 9, no. 2: 236. https://doi.org/10.3390/app9020236