**3. Methodology**

In this section, the details of the proposed bilingual scene text-reading model are presented. The architecture of the model, shown in Figure 1, is trained in an end-to-end manner that concurrently detects and recognizes words from a natural image.
