4.2.1. Synthetic Scene Text Dataset

To train the proposed model, we use a bilingual scene text dataset, which is generated by adding a simple modification to the scene text dataset generation technique presented in [12]. The generated scene text images are like real scene images. This technique is very important to get more training data for those scripts that do not have prepared real scene text datasets. As far as we know, there is no prepared real scene text dataset for Ethiopic script. Moreover, most texts found in natural images are written in two languages (Amharic and English). Due to this, we prepare 500,000 bilingual training datasets from 54,735 words (825,080 characters), which were collected from social, political, and governmental websites that are written in Amharic and English. In the dataset generation process, 72 freely available Ethiopic Unicode fonts, different background images, font size, rotation along the horizontal line, and skew and thickness parameters are tuned. The sample generated scene image and statistics of the generated dataset are presented in Figure 2 and Table 2, respectively.

**Figure 2.** Sample of synthetically generated scene text images.
