*4.2. Dataset Collection*

In any machine learning technique, a dataset plays an important role in training and obtaining a better machine learning model. In particular, deep learning methods are more data-hungry than traditional machine learning algorithms. However, preparing a large dataset was a challenging task specifically for under-resourced languages. In this paper, we use a syntactically generated scene text dataset, and real scene text dataset for training and testing the proposed model, respectively. Following [12], a bilingual scene text dataset is prepared. A detailed description of synthetic dataset generation and real scene text dataset preparation is provided in the following sections.
