*2.1. Scene Text Detection*

Traditional and deep-learning machine-learning methods are used to detect texts from a natural image. In [1,3,23–25], scene text detection methods have been presented to detect and bind text areas from a natural image, but this approach has manual computation problems. Lee et al. [25] presented sliding-window-based methods measured by shifting over the image and determining text proximity based on local image highlights. In [26,27], a connected component analysis method was presented to detect scene texts using Stroke Width Transform (SWT) and Maximum Stable Extreme Region (MSER), respectively. However, these approaches are limited when it comes to detecting text regions from distorted images.

Recently, deep-learning techniques improved several machine-learning problems, including scene text detection and recognition problem. Tian et al. [1] presented a Connectionist Text Proposal Network (CTPN), which uses a vertical anchor mechanism that jointly predicts location and text/no-text scores of each fixed width. Shi et al. [14] introduced Segment Linking (SegLink), which is an oriented scene text detection method that segments and then links the text to complete instances using a linkage prediction. Ma et al. [28] presented a novel rotation-based framework to detect arbitrarily oriented texts found in natural images by proposing region proposal network (RPN) and rotation RoI pooling. A deep direct regression-based method for detecting multi-oriented scene text has been presented in [29]. Efficient and accuracy scene Text detector (EAST) [5] has been introduced to effectively detect words or text lines using a single neural network.
