4.1.2. Creating an Annotated Dataset for Numeral Spotting

In order to employ the dhSegment toolbox [17] for page segmentation, we formed a dataset with annotations. Two classes were created. The first class is the background, which is the area other than numeral regions. We marked this area as black. The second class is the numeral region, and these fields were marked with green. We marked 50 pages of registers that belong to the Nicaea district with the described labels. In those pages, there were approximately 5000 numerals. A sample original image and marked version are presented in Figure 4.

**Figure 4.** In the left, the red filtered register image is shown. In the right, the numerals marked and annotated for training the CNN model.
