Offline Mongolian Handwriting Identification Based on Convolutional Neural Network

Sun, Yuxin; Fan, Daoerji; Wu, Huijuan; Wang, Zhixin; Tian, Jia

doi:10.3390/electronics13010111

Open AccessArticle

Offline Mongolian Handwriting Identification Based on Convolutional Neural Network

by

Yuxin Sun

,

Daoerji Fan

^*,

Huijuan Wu

,

Zhixin Wang

and

Jia Tian

College of Electronic Information Engineering, Inner Mongolia University, Hohhot 010021, China

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(1), 111; https://doi.org/10.3390/electronics13010111

Submission received: 17 November 2023 / Revised: 24 December 2023 / Accepted: 24 December 2023 / Published: 27 December 2023

Download

Browse Figures

Versions Notes

Abstract

:

Handwriting is a biometric behavioral characteristic with evident individual distinctiveness. With the rise of the deep learning trend and demands for forensic identification, handwriting identification has become one of the focal points of research in the field of pattern recognition. Research in handwriting identification for major global languages has matured. However, in China, there is limited attention in the field of writer identification for minority languages such as Mongolian, making it challenging to resolve criminal cases involving handwriting issues. This paper initiates an initial exploration of Mongolian handwriting identification by constructing a structurally simple convolutional neural network. This convolutional neural network, consisting of 12 convolution operations and designed for Mongolian handwriting identification, is referred to as MWInet-12. In this paper, the model evaluation experiments were conducted using a dataset comprising 156,372 samples contributed by 125 writers from the MOLHW dataset. The dataset was divided into training, validation, and test sets in an 8:1:1 ratio. The final results of the experiments reveal impressive accuracy on the test set, achieving a top-1 accuracy of 89.60% and a top-5 accuracy of 97.53%. Furthermore, through comparative experiments involving Resnet50, Fragnet, GRRNN, VGG16, and VGG19 models, this paper establishes that the proposed model yields the most favorable results for Mongolian handwriting identification. The exploratory research on Mongolian handwriting identification in this paper contributes to increasing awareness of information processing for minority languages. It aids in advancing research on classifying writers of Mongolian historical texts and provides technical support for judicial authentication involving handwriting issues.

Keywords:

offline handwriting identification; Mongolian; CNN; MOLHW

1. Introduction

Handwriting identification is a technique to determine who the writer of a given sample is by studying the specific characteristics of a person’s handwriting. Handwriting identification technology has widespread applications in various fields, including the judicial, financial, and security authentication sectors. This technology can address issues such as disputes related to handwritten promises or wills, verify document signatures, and conduct identity authentication through handwritten signatures [1].

Automatic handwriting identification for popular global languages such as Chinese, English, and Arabic has now reached a high level of maturity [2]. However, there is a notable lack of research in the field of handwriting identification for Mongolian, a language spoken by over 4 million people. In the Inner Mongolia Autonomous Region of China, communication among Mongolian people is primarily conducted in the Mongolian language. There have been several cases where disputes arising from handwritten Mongolian scripts proved challenging to resolve, as there were inadequate standards for Mongolian handwriting identification. Despite the introduction of an industry standard draft titled “Forensic Mongolian Handwriting Examination” in recent years, research in Mongolian handwriting identification continues to face significant challenges.

Different languages have their own unique handwriting characteristics and possess different research difficulties [2]. Mongolian writing is characterized by writing vertically from top to bottom and characters are closely connected to each other; sometimes there is even an interrupted space between the last character of a word and the previous one. When writing a word, there are different forms of distortion depending on where the characters are located. The order of writing on each page is from top to bottom, left to right. Figure 1 shows a sample of Mongolian writing.

Distinct limitations exist in manual handwriting identification for Mongolian compared to other languages. Similar strokes and variations in individual stroke simplifications make it challenging for human experts with a strong foundation in the Mongolian language. The manual process is time-consuming, labor-intensive, and impractical for large-scale datasets. Additionally, it struggles with issues of forgery and deception. Therefore, manual methods may prove less practical in certain scenarios, emphasizing the need for more feasible automated approaches, such as computer vision and machine learning.

This paper presents preliminary experiments in Mongolian handwriting identification using deep learning, aiming to address the existing gap in this domain. In this study of Mongolian handwriting identification, offline handwriting identification is more common as it allows the analysis of existing handwritten image data. The application of this technology is expected to contribute significantly to the fight against criminal activities and the maintenance of social security and stability in the Inner Mongolia region.

In the realm of automated Mongolian handwriting identification, the model needs to be adapted to the linguistic features of the Mongolian language as well as to its different arrangements. The variations in writing style and the omission of strokes in Mongolian make recognizing and comparing different handwriting more complex than in other languages. The tightly connected nature of the characters makes it challenging to recognize the connections between letters. In addition, the similarity of the shapes of some letters makes it easy to misidentify them.

In response to the aforementioned issue, this paper introduces an exploration of utilizing a convolutional neural network for intelligent Mongolian handwriting identification. The convolutional neural network has achieved excellent performance in the field of image classification. The convolutional layer of the convolutional neural network can learn the more detailed features of the image, such as the shape and angle of the strokes, the connection between neighboring strokes, and the lightness and weight changes of the handwriting, when performing handwriting identification. With an increase in convolutional layers, the deeper the network becomes, the larger the sensory field it obtains, and thus it will be able to obtain richer handwriting style information. The features extracted from the convolutional layer are passed through a maximum pooling layer to retain the key features and reduce the amount of computation. Finally, the extracted features are mapped to classification labels through a fully connected layer, which allows the model to synthesize the relationships between different features in order to make a final classification decision.

This paper is organized as follows: Section 2 describes the work related to handwriting identification. Section 3 presents the research methodology used in this paper. Section 4 conducts specific experiments and analyzes the results. Finally, Section 5 summarizes the whole paper and presents the outlook.

2. Related Work

Commonly used datasets for offline handwriting identification research include the CEDAR [3] and IAM datasets for English [4]; KHATT [5], AHDB [6], and Al Isra [7] for Arabic; and WDB [8] and HCL2000 [9]. In 2017, Gloria Jennis Tan and colleagues summarized research on handwriting identification in three major languages: English, Arabic, and Chinese [2]. And datasets commonly used in the field of writer identification also include the Firemaker dataset [10], ICDAR2013 [11], CVL [12], QUWI [13], etc.

For Mongolian script, there are publicly available handwriting datasets such as the MHW offline handwriting dataset [14], online handwriting datasets like MOLHW [15], and MRG-OHMW [16]. However, the MHW offline handwriting dataset does not contain writer information, while the online Mongolian handwriting dataset MOLHW does. On the MOLHW dataset, ref. [17] has carried out work on Mongolian handwriting identification, but no writer identification, and this paper will utilize this dataset to transform into corresponding offline images for offline research.

The emergence of deep neural networks has divided the evolution of handwriting identification research into two phases: those relying on manual feature extraction and those harnessing deep learning features. Methods based on manual feature extraction include texture, shape, and contour analysis. In [18], Behzad Helli and colleagues introduced a novel texture-based approach employing XGabor filter feature extraction and the LCS classifier for Persian writer identification. In [19], Djeddi Chawki and colleagues proposed a global texture analysis method, treating each writer’s handwriting as distinct textures, successfully identifying and verifying 130 handwritten images from 650 different Arabic writers. In [20], Fathi H utilized a discrete contour transformation approach with MLP neural network classification to distinguish 50 Arabic writers. In [21], Tayeb Bahram treated contours as textures, computing the joint probability distribution of binary patterns (MLBP) and ink width and shape letter (IWSL) at different pixel locations, achieving excellent results across eight prominent handwriting datasets.

Currently, the most popular approach for handwriting identification is the utilization of deep learning. In 2015, S. Fiel and R. Sablatnig introduced the concept of deep learning into the field by employing an eight-layer convolutional neural network (CNN). They generated a feature vector for each author based on the CNN’s activation features and compared it with precomputed feature vectors stored in a database [22]. Subsequently, deep-learning-based methodologies continued to evolve. In 2016, a deep multi-stream CNN was proposed, where handwritten patches were processed and softmax classification was used for writer-independent identification. It was discovered that different languages might share common features and could undergo joint training [23]. In 2016, Youbao Tang and colleagues combined features extracted by CNN with joint Bayesian techniques to achieve handwriting identification. The literature utilized the GMM super-vector encoding method in conjunction with SVMs for relatively robust offline writer identification [24]. To fully leverage the deep features learned by the network, Sheng He and others introduced an adaptive convolutional layer, improving the performance of author recognition based on single-character images [25]. In 2020, Parveen Kumar and team presented a segmentation-free and pretrained deep convolutional neural network for offline text-independent handwriting identification, achieving results of 92.79%, 99.35%, and 98.30% on popular datasets like IAM, CVL, and IFN/ENIT, respectively [26].

While handwriting data are typically obtained at the page level, writer identification processes often involve segmenting it into lines or words. Compared to processing at the line or page level, individual characters, though providing fewer overall stylistic cues, offer finer-grained features. In 2017, the Sheng He team used deep adaptive learning to achieve writer identification for single-character images through multi-task learning. They improved the performance of single-character image writer identification on benchmark datasets like CVL and IAM [25]. In 2020, the same team introduced the Fragnet network, which combines features extracted from complete word images and segmented feature maps to extract relevant author-specific information [27]. In 2021, they proposed the Gr-rnn network, which captures complementary information from both global context and local segments for author recognition [28]. In 2022, Vineet Kumar conducted research on offline text-independent author identification at the word level. The SIFT algorithm was used to extract feature points, which were then fed into a convolutional network to generate corresponding feature maps. Weight learning for these feature maps was achieved using an entropy-based method [29].

In 2022, Spurthi Bhat and colleagues conducted exploratory research on handwriting identification in Indian languages. Due to the lack of appropriate databases for Bengali handwriting, they used local binary patterns as texture descriptors and a support vector machine classifier, achieving a result of 93.34% on the Bengali dataset [30]. In 2023, Syed Tufael Nabi and colleagues focused on the less-explored Urdu language for offline handwriting identification. They obtained an impressive 99.11% result using an improved VGG16 model on a dataset of 318 Urdu script writers [31]. However, as of now, no relevant research has been discovered for handwriting identification in Mongolian script. Different languages possess their unique handwriting characteristics, leading to distinct research challenges [2]. There is currently no universally applicable model, and the absence of a standardized dataset makes it challenging to evaluate which model yields the best performance, thus lacking comparability. This paper presents an exploratory study that introduces a convolutional neural network for handwriting identification in Mongolian script.

3. Method

The methods for handwriting identification typically involve three standard steps: preprocessing, feature extraction, and writer identification (classification) [32], as illustrated in Figure 2.

In this chapter, the process of offline Mongolian handwriting identification is introduced. In Section 3.1, the original dataset and preprocessing techniques are presented. In Section 3.2, the proposed model and its implementation details are discussed, with a focus on handwriting identification. In Section 3.3, the loss functions utilized in this study are described.

3.1. Dataset and Preprocessing

The MOLHW dataset encompasses a total of 164,631 Mongolian word samples written by 200 writers and is saved as a text file. Each line of this dataset is the basic information of a word, including the encoding of the word, the writer ID, the size of the screen, and the coordinate trajectory of each point of the word with [−1, −1] as the breakpoint. Statistically, the number of writing samples from each writer is not the same, and there may even be a great disparity, with the most having more than 3000 samples, and the least having 1, as shown in Figure 3. In this paper, according to the sample number distribution, we consider deleting the samples with less than 500 writers, so that finally 125 writers and 156,372 samples remain, which constitute the dataset used in this paper.

Due to the usage of the MOLHW dataset, which is an online dataset, certain transformations are required for offline handwriting identification experiments. In this study, the dataset is processed based on the provided coordinate data to generate handwritten images for each character. These images are saved in a picture format, serving as the offline dataset needed for this research. The original data indicate that variations in the length of Mongolian words and the diverse writing habits of individuals result in uneven positioning of characters on the screen. For instance, shorter words may only occupy a small portion in the upper-left corner of the screen, while longer words, written with larger characters, might fill the entire screen. Such inconsistencies in handwriting pose challenges for recognition. To address this, the article restricts the canvas size to a ratio of 5:12, with units in inches. By adjusting the pixels per inch, the image clarity can be modified without distortion, ensuring consistent and clear representation of the original handwriting. This method produces images of the same size with clear character traces, as illustrated in Figure 4, showcasing some successfully transformed image samples.

3.2. Method

The overall architecture of the model built in this paper is shown in Figure 5. The structure developed in this study to address handwritten writer identification in Mongolian script is abbreviated as MWInet. This architecture is flexible, allowing for the modification of the number of convolutional layers. Through experimentation, it has been found that using 12 convolutional layers yields better results, and this configuration is referred to as MWInet-12. In the case of 8 convolutional layers (by replacing 3Conv with 2Conv in the MWInet-12 architecture), it is labeled as MWInet-8. The following sections will use MWInet-12 as an example to provide an overview of the MWInet structure. The MWInet-12 network model in this paper undergoes a total of 12 convolutional operations, four maximal pooling processes, one global average pooling (GAP) process, and one fully connected layer operation. The detailed structure of 3Conv is shown in the red box on the right half of MWInet-12. In this part, batch normalization and ReLU non-linear activation are applied after each convolution operation, as illustrated in Figure 5 by adding Batch Normalization (BN) layers and a ReLU layer. This enhances the stability and non-linear discriminative capability of the model.

In this network, the convolutional layer is the core component, and a shallow convolutional layer is used to extract low-level features of handwriting, such as the shape, thickness, angle, and curve of the strokes, which are captured by sliding a convolutional kernel over the input image and gradually adjusting the weights based on the training data. As the number of convolutional layers increases, the model gradually learns more abstract features such as shapes, textures, and patterns. In this model, all convolutional operations use a small 3 × 3 convolutional kernel with a step size of 1 and a padding of 1. Three convolutions yield the same receptive field as a 7 × 7 convolutional kernel, and, compared to the latter, the smaller 3 × 3 convolutional kernel yields fewer computations and fewer parameters. At the same time, it can extract finer-grained features, with better training efficiency and generalization ability. At the network level, the stacking of multiple convolutional layers not only increases the depth but makes the model more capable of data processing, giving it the ability to learn more complex features and more abstract representations [33]. All the maximum pooling layers in the model use a convolution kernel size of 2 × 2 with a step size of 2. The pooling layer can reduce the computational complexity and the number of parameters of the model by reducing the size of the feature map and retaining the handwriting image to express more prominent features.

After the convolutional and pooling layers, the model is able to capture most detailed image features. However, for Mongolian handwriting identification, this is not sufficient. It is essential to grasp the overall style features of the characters comprehensively. For instance, some individuals might intentionally imitate the stroke features of another person’s handwriting but will often overlook their holistic style. Each person’s handwriting style is challenging to replicate. Adding a global average pooling layer can help to capture the global features of the entire image, which aids the model in better interpreting the handwriting information. Furthermore, the global average pooling layer serves to reduce data dimensions and the number of model parameters, thereby mitigating the risk of overfitting and enhancing the model’s robustness [34]. By first focusing on the detailed study of local handwriting features and then integrating the overall global style features, the model’s ability to identify a writer’s handwriting is significantly improved. Finally, the model utilizes a fully connected classification layer to classify and identify writers, ultimately producing predictions.

3.3. Loss

During training, the model utilizes two loss functions: cross-entropy loss and label smoothing regularization loss.

3.3.1. Cross-Entropy Loss

The cross-entropy loss function is a loss function commonly used in classification tasks. It assesses the model’s performance by calculating the cross-entropy between the predicted probability distribution and the true labels. In the context of multi-class classification problems, the cross-entropy loss function can be expressed as

L = - \sum_{i = 1}^{C} y_{i} log (p_{i})

(1)

where C is the number of classes,

y_{i}

denotes the i-th element of the true labels (taking values of 0 or 1 to indicate the actual class of the sample), and

p_{i}

is the predicted probability by the model for the ith class.

In some cases, the traditional cross-entropy loss function may have certain drawbacks. For example, in situations where the labels are not entirely reliable or contain noise, the model may make overly confident predictions for a particular class. To address this issue, label smoothing loss can be introduced.

3.3.2. Label Smoothing Loss

Label smoothing regularization introduces a smoothing term when calculating the cross-entropy loss. The introduction of this smoothing term prevents the model from being overly confident in its predictions, instead allocating probabilities more smoothly across different categories. The purpose of label smoothing regularization is to enhance the robustness of the model, reducing sensitivity to label noise in the training data and thereby improving generalization performance on unseen data.

The calculation of label smoothing loss is as follows:

l = - (1 - ϵ) log (p_{y}) - \frac{ϵ}{N} \sum_{n = 1}^{N} log (p_{n}) .

(2)

where

ε

is a smoothing parameter, and we take

ε

= 0.1 [28]. In the formula, N represents the number of authors,

p_{y}

represents the predicted probability of the model for the true label, and

p_{n}

represents the predicted probability of the model for the n-th category. The first term in the formula is similar to the traditional cross-entropy loss and it is used to measure the difference between predicted values and true values. The second term introduces a uniform distribution to add some fuzziness to the true labels.

4. Experiment

To validate the Mongolian handwritten text identification model proposed in this paper, we conducted training and testing using the publicly available Mongolian handwritten dataset MOLHW. We evaluated the experimental model using top-k accuracy; specifically, we employed both top-1 and top-5 accuracy as evaluation metrics in our experiments.

4.1. Environment and Hyperparameter

The basic environment and configuration for this experiment are as follows: The runtime platform is a Linux server with a 64 GB NVIDIA RTX A6000 GPU. The experimental code is written in Python, with version 3.10.10, and is based on the PyTorch framework, with Torch version 1.12.0.

Regarding the dataset allocation during model training, 80% is used as the training set, 10% as the validation set, and 10% as the test set, ensuring that the training set contains data from all authors. To maximize server memory utilization, the batch size is set to 128, and eight threads run in parallel. The optimizer used is the commonly used Adam optimizer with a weight decay parameter of

1 \times 10^{- 4}

. The initial learning rate is set to 0.0001 [27], and a fixed interval learning rate adjustment mechanism is applied, with the learning rate reduced to half of the previous learning rate every 10 epochs.

4.2. Image Transformation

Prior to inputting the images into the model, they undergo binary thresholding and resizing to a dimension of 128 × 256. The model is divided into four segments, each consisting of three convolutional operations and one max-pooling operation. The model has a batch size of 128, and the number of channels in each segment is 64, 128, 256, and 512, respectively. The input images have dimensions of 128 × 256 with 64 channels. In the first segment, the convolutional operations do not alter the image size or the number of channels; however, after the first max-pooling layer, the image is compressed to 64 × 128. This process continues, with the second segment producing feature maps of size 32 × 64, the third segment outputting feature maps of size 16 × 32, and the fourth segment resulting in an output of 8 × 16. The specific changes in each segment are outlined in Table 1. In the table, “Conv2d-m-n” represents the nth convolution in the mth three-layer convolution, where m can take values of 1, 2, 3, and 4, and n can take values of 1, 2, and 3. Then, “k” represents the kernel size, “S” represents the stride, and “P” represents padding.

4.3. Experimental Results of Model

During the experimental process, we employed methods such as increasing the network depth, changing the loss function, and segmenting image fragments to classify authors of Mongolian text. We compared the results with relevant convolutional models. Throughout the entire experimental process, the time taken for each epoch during training was consistently recorded and analyzed. It was observed that the time required for each epoch remained relatively consistent. Therefore, for the purpose of data recording in the table, only the time taken for one epoch of training was selected and it is denoted as “Time-Train”, measured in seconds and abbreviated as “s”.

4.3.1. Increasing Network Depth

In this subsection, the chosen loss function is the cross-entropy loss function. Since the network model in this paper can be divided into four parts of convolution-pooling, and each part has the same composition, we can customize the network for each layer. In the experiments, we initially selected each part to contain two convolution layers, two batch normalization operations, two ReLU activation functions, and, finally, an output pooling layer. The model ultimately comprises eight convolution operations and four max-pooling operations, referred to as MWInet-8. Under this model, the accuracy results were as follows: the top-1 accuracy was 83.28% and top-5 accuracy was 96.56%. The duration of one epoch’s training was 246 s.

Increasing the network depth can help in extracting more abstract features. Utilizing the MWInet-12 network proposed in this paper, each part replaces the two convolution operations in MWInet-8 with three convolution operations. The experimental results indicate an improvement in the accuracy of MWInet-12, with a top-1 accuracy of 88.67% and a top-5 accuracy of 97.54%. This represents a 5.4% increase in top-1 accuracy and a 1.0% increase in top-5 accuracy compared to MWInet-8. Additionally, the increased depth of the network contributes to an extended running time. The duration for one epoch of training in this experiment is 383 s.

However, blindly increasing the network depth may not necessarily yield ideal results. Experimental results show that after four rounds of convolution operations, not only did the recognition performance not improve, but stability decreased, and the model’s parameters increased, consuming more GPU memory. Therefore, MWInet-12 represents the optimal network for this model.

4.3.2. Adding Label Smoothing Regularization

In this section, as compared to the previous subsection, we applied label smoothing to the cross-entropy loss function using the label-smoothed regularization loss mentioned in Section 3.3.2. Based on the experimental results, the running time for one epoch did not exhibit a significant change. The accuracy of both the MWInet-8 and MWInet-12 models has been enhanced to varying degrees after undergoing label smoothing regularization, as illustrated in the experimental results shown in Table 2. For MWInet-8, the top-1 accuracy was 86.43% and the top-5 accuracy was 96.66%. For MWInet-12, the top-1 accuracy reached 89.6% and the top-5 accuracy was 97.53%. It can be observed that after applying label-smoothed regularization, both networks experienced a noticeable increase in top-1 accuracy. Shallow convolution networks showed more substantial improvement compared to deeper networks, while top-5 accuracy remained almost unchanged, indicating that the models achieved a relatively optimal performance level.

All the results are as shown in Table 2.

4.3.3. Experiments in Different Models

In order to verify the advantages of this paper’s model in Mongolian handwriting identification, the model proposed in this paper is compared with other models in the field of handwriting identification. The label-smoothing regularized loss function is used in all comparison experiments. The experimental results are shown in Table 3. The test results for Resnet50 are 81.44% for top-1 accuracy and 94.75% for top-5 accuracy, while in the Fragnet network [27] experiment, top-1 accuracy only reached 76.5% and top-5 accuracy was 91%. Testing on the GRRNN [28] model resulted in 82.71% for top-1 accuracy and 94.37% for top-5 accuracy. Additionally, as our architecture shares similarities with VGG, we conducted comparative experiments with VGG16 and VGG19. By applying transfer learning and fine-tuning the final classification layer, we achieved a top-1 accuracy of 83.69% and top-5 accuracy of 94.8% for VGG16, and a top-1 accuracy of 85.55% and top-5 accuracy of 96.54% for VGG19. Compared to the above five models, the MWInet-12 proposed in this paper has the highest accuracy for both top-1 accuracy and top-5 accuracy. The table not only summarizes the top-1 accuracy and top-5 accuracy for each model experiment but also provides the training time for one epoch in each experiment.

To provide a more comprehensive representation of the models’ performance and training process, in Figure 6, we present the loss–epoch curves for each model, including both training and validation losses. As shown in Figure 6, the proposed MWInet-12 model exhibits significantly lower loss compared to the other models.

4.4. Discussion

From the perspective of the entire process of handwriting identification for Mongolian text, the research results of this paper can be analyzed as follows. The experimental dataset in this paper consists of images generated directly from online data. Compared to offline images collected directly, the noise interference is relatively low. This means that the quality of the dataset itself is high. Additionally, during the preprocessing of the images, efforts were made to ensure that the content of the handwriting occupies a reasonable proportion of the entire image, which improved the model’s performance. MWInet-12 utilizes 12 layers of convolution operations, and deeper networks can learn the specific features of Mongolian text more thoroughly. Adding global average pooling to retain crucial global information and performing classification recognition through fully connected layers allows the model to consider the relationships between different features, resulting in better classification capability.

The experiments in this paper began with a shallow model and compared the results after increasing network depth, using label-smoothed regularization loss. In all these cases, the top-1 accuracy improved to varying degrees, while the top-5 accuracy remained relatively stable. Although the runtime of MWInet-12 has increased compared to MWInet-8, it has achieved a significant improvement in accuracy and has also demonstrated increased stability. In conclusion, the MWInet-12 model demonstrates feasibility and delivers promising results for handwriting identification in Mongolian text.

However, there are some limitations to this research. The lack of experiments on actual offline Mongolian handwriting due to dataset limitations, the absence of an in-depth exploration of hyperparameter optimization methods, and the relatively simple model structure provide areas for further investigation. The next steps will involve collecting and creating an offline Mongolian text dataset and continuing exploration in this domain. Additionally, the dataset used in this study may still be relatively small. In future experiments, optimization of the Mongolian handwriting identification method can be achieved through techniques such as transfer learning and data augmentation.

5. Conclusions

Due to the absence of research outcomes in the automated handwriting identification of Mongolian script, real-world disputes arising from handwriting issues pose challenges in the judicial system, with manual identification also presenting various difficulties. Faced with practical needs, this paper explores automated handwriting identification research for Mongolian script. Through experimental exploration, the study achieves favorable results by constructing a straightforward convolutional neural network, namely MWInet-12, for the traditional Mongolian script handwriting identification. Firstly, the MOLHW online Mongolian script dataset was processed into offline images based on its coordinates. Following the distribution of author samples, 125 writers with a total of 156,372 samples were retained. These images were divided into training, validation, and test sets in an 8:1:1 ratio. The images are input into the MWInet-12 network, which undergoes a total of 12 convolution operations. Each convolution operation employs a small convolution kernel, and the network concludes with global average pooling. After 100 rounds of model training, the model achieves a single-character identification accuracy of 89.60% for Mongolian script. This study conducted experiments comparing two networks of different depths, MWInet-8 and MWInet-12, and experimented with label-smoothed regularization loss. The results indicated that the MWInet-12 model with label-smoothed regularization achieved the best performance. Additionally, the optimal result from this experiment was compared with five other recent network models, namely Resnet50, Fragnet, GRRNN, VGG16, and VGG19, achieving top-1 accuracies of 81.44, 72.94, 82.71, 83.96, and 85.55, respectively. In summary, the MWInet-12 network demonstrates promising potential in traditional Mongolian script handwriting identification. Despite achieving a certain degree of improvement in accuracy, we have identified some limitations in our research. Future work necessitates further optimizing the model and enhancing recognition efficiency. Strategies such as introducing attention mechanisms, utilizing transfer learning, and incorporating offline handwriting data samples, among others, should be explored to address the challenges of Mongolian script handwriting in the judicial domain. Our study provides a preliminary empirical investigation into resolving Mongolian script handwriting identification challenges. Moreover, it establishes a foundation for future, more in-depth and extensive exploration into tasks such as the authentication of Mongolian calligraphy, identification of ancient manuscript writers, and classification efforts in this domain. Through this article, we hope to draw more attention to the study of Mongolian script, thereby advancing its information processing efforts.

Author Contributions

Conceptualization, methodology, software, writing—original draft, Y.S.; software, validation, writing—review, D.F.; software, writing—review, H.W.; validation, writing—review and editing, Z.W.; validation, writing—review and editing, J.T. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the National Natural Science Foundation of China (Grant No. 61763034), the National Natural Science Foundation of the Inner Mongolia Autonomous Region (Grant No. 2020MS06005), the National Natural Science Foundation of Behavioral Recognition (Grant No. 62261041), and Independent scientific research project—research on the generation and identification of Mongolian handwriting (Grant No. 21700-5237002).

Data Availability Statement

Data were obtained from Kaggle and are available at http://www.kaggle.com/datasets/fandaoerji/molhw-ooo (accessed on 16 November 2023) with the permission of kaggle.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Tan, G.J.; Kumoi, R.; Rahim, M.S.M.; Wee, T.C.; Sulong, G. A study of current trends writer identification in large-scale across three world major languages with retrieval approaches. In Proceedings of the 2020 6th International Conference on Interactive Digital Media (ICIDM), Bandung, Indonesia, 14–15 December 2020; pp. 1–6. [Google Scholar]
Tan, G.J.; Sulong, G.; Rahim, M.S.M. Writer identification: A comparative study across three world major languages. Forensic Sci. Int. 2017, 279, 41–52. [Google Scholar] [CrossRef] [PubMed]
Hull, J.J. A database for handwritten text recognition research. IEEE Trans. Pattern Anal. Mach. Intell. 1994, 16, 550–554. [Google Scholar] [CrossRef]
Marti, U.V.; Bunke, H. A full English sentence database for off-line handwriting recognition. In Proceedings of the Fifth International Conference on Document Analysis and Recognition, ICDAR’99 (Cat. No. PR00318), Bangalore, India, 20–22 September 1999; pp. 705–708. [Google Scholar]
Mahmoud, S.A.; Ahmad, I.; Al-Khatib, W.G.; Alshayeb, M.; Parvez, M.T.; Märgner, V.; Fink, G.A. KHATT: An open Arabic offline handwritten text database. Pattern Recognit. 2014, 47, 1096–1112. [Google Scholar] [CrossRef]
Al-Ma’adeed, S.; Elliman, D.; Higgins, C.A. A data base for Arabic handwritten text recognition research. In Proceedings of the Eighth International Workshop on Frontiers in Handwriting Recognition, Niagara-on-the-Lake, ON, Canada, 6–8 August 2002; pp. 485–489. [Google Scholar]
Kharma, N.; Ahmed, M.; Ward, R. A new comprehensive database of handwritten Arabic words, numbers, and signatures used for OCR testing. In Proceedings of the 1999 IEEE Canadian Conference on Electrical and Computer Engineering (Cat. No. 99TH8411), Edmonton, AB, Canada, 9–12 May 1999; Volume 2, pp. 766–768. [Google Scholar]
Liu, C.L.; Yin, F.; Wang, D.H.; Wang, Q.F. CASIA online and offline Chinese handwriting databases. In Proceedings of the 2011 International Conference on Document Analysis and Recognition, Beijing, China, 18–21 September 2011; pp. 37–41. [Google Scholar]
Zhang, H.G.; Guo, J.; Chen, G.; Li, C.G. CL2000-A large-scale handwritten Chinese character database for handwritten character recognition. In Proceedings of the 2009 10th International Conference on Document Analysis and Recognition, Barcelona, Spain, 26–29 July 2009; pp. 286–290. [Google Scholar]
Schomaker, L.; Vuurpijl, L. Forensic Writer Identification: A Benchmark Data Set and a Comparison of Two Systems; NICI (NIjmegen Institute of Cognitive Information), Katholieke Universiteit Nijmegen: Nijmegen, The Netherlands, 2000. [Google Scholar]
Louloudis, G.; Gatos, B.; Stamatopoulos, N.; Papandreou, A. ICDAR 2013 competition on writer identification. In Proceedings of the 2013 12th International Conference on Document Analysis and Recognitio, Washington, DC, USA, 22–28 August 2013; pp. 1397–1401. [Google Scholar]
Kleber, F.; Fiel, S.; Diem, M.; Sablatnig, R. Cvl-database: An off-line database for writer retrieval, writer identification and word spotting. In Proceedings of the 2013 12th International Conference on Document Analysis and Recognition, Washington, DC, USA, 22–28 August 2013; pp. 560–564. [Google Scholar]
Al Maadeed, S.; Ayouby, W.; Hassaine, A.; Aljaam, J.M. QUWI: An Arabic and English handwriting dataset for offline writer identification. In Proceedings of the 2012 International Conference on Frontiers in Handwriting Recognition, Bari, Italy, 18–20 September 2012; pp. 746–751. [Google Scholar]
Fan, D.; Gao, G.; Wu, H. MHW Mongolian offline handwritten dataset and its application. J. Chin. Inf. Process. 2018, 32, 89–95. [Google Scholar]
Pan, Y.; Fan, D.; Wu, H.; Teng, D. A new dataset for mongolian online handwritten recognition. Sci. Rep. 2023, 13, 26. [Google Scholar] [CrossRef]
Ma, L.L.; Liu, J.; Wu, J. A new database for online handwritten Mongolian word recognition. In Proceedings of the 2016 23rd International Conference on Pattern Recognition (ICPR), Cancun, Mexico, 4–8 December 2016; pp. 1131–1136. [Google Scholar]
Fan, D.; Sun, Y.; Wang, Z.; Peng, Y. Online Mongolian Handwriting Recognition Based on Encoder–Decoder Structure with Language Model. Electronics 2023, 12, 4194. [Google Scholar] [CrossRef]
Helli, B.; Moghaddam, M.E. A text-independent Persian writer identification system using LCS based classifier. In Proceedings of the 2008 IEEE International Symposium on Signal Processing and Information Technology, Sarajevo, Bosnia and Herzegovina, 16–19 December 2008; pp. 203–206. [Google Scholar]
Chawki, D.; Labiba, S.M. A texture based approach for Arabic writer identification and verification. In Proceedings of the 2010 International Conference on Machine and Web Intelligence, Algiers, Algeria, 3–5 October 2010; pp. 115–120. [Google Scholar]
Abdullah, H.F.M.R.M.; Taha, R. Writer Identification of Arabic Handwriting Using Contourlet Transform and Neural Network. Available online: https://www.researchgate.net/publication/340436011 (accessed on 16 November 2023).
Bahram, T. A texture-based approach for offline writer identification. J. King Saud-Univ.-Comput. Inf. Sci. 2022, 34, 5204–5222. [Google Scholar] [CrossRef]
Fiel, S.; Sablatnig, R. Writer identification and retrieval using a convolutional neural network. In Proceedings of the International Conference on Computer Analysis of Images and Patterns, Valletta, Malta, 2–4 September 2015; pp. 26–37. [Google Scholar]
Xing, L.; Qiao, Y. Deepwriter: A multi-stream deep CNN for text-independent writer identification. In Proceedings of the 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), Shenzhen, China, 23–26 October 2016; pp. 584–589. [Google Scholar]
Tang, Y.; Wu, X. Text-independent writer identification via cnn features and joint bayesian. In Proceedings of the 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), Shenzhen, China, 23–26 October 2016; pp. 566–571. [Google Scholar]
He, S.; Schomaker, L. Deep adaptive learning for writer identification based on single handwritten word images. Pattern Recognit. 2019, 88, 64–74. [Google Scholar] [CrossRef]
Kumar, P.; Sharma, A. Segmentation-free writer identification based on convolutional neural network. Comput. Electr. Eng. 2020, 85, 106707. [Google Scholar] [CrossRef]
He, S.; Schomaker, L. Fragnet: Writer identification using deep fragment networks. IEEE Trans. Inf. Forensics Secur. 2020, 15, 3013–3022. [Google Scholar] [CrossRef]
He, S.; Schomaker, L. Gr-rnn: Global-context residual recurrent neural networks for writer identification. Pattern Recognit. 2021, 117, 107975. [Google Scholar] [CrossRef]
Kumar, V.; Sundaram, S. Offline Text-Independent Writer Identification based on word level data. arXiv 2022, arXiv:2202.10207. [Google Scholar]
Bhat, S.; Bhokare, V.; Bhirud, R.; Joglekar, P. Writer Identification using Handwriting Samples. In Proceedings of the 2022 IEEE 4th International Conference on Cybernetics, Cognition and Machine Learning Applications (ICCCMLA), Goa, India, 8–9 October 2022; pp. 456–461. [Google Scholar]
Nabi, S.T.; Kumar, M.; Singh, P. DeepNet-WI: A deep-net model for offline Urdu writer identification. In Evolving Systems; Springer: Berlin/Heidelberg, Germany, 2023; pp. 1–11. [Google Scholar]
Purohit, N.; Panwar, S. State-of-the-Art: Offline writer identification methodologies. In Proceedings of the 2021 IEEE International Conference on Computer Communication and Informatics (ICCCI), Coimbatore, India, 27–29 January 2021; pp. 1–8. [Google Scholar]
Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
Lin, M.; Chen, Q.; Yan, S. Network In Network. arXiv 2013, arXiv:1312.4400. [Google Scholar]

Figure 1. Mongolian handwritten samples.

Figure 2. The general process of handwriting identification.

Figure 3. Statistics of sample counts for the 200 authors in the MOLHW dataset.

Figure 4. Offline handwriting images.

Figure 5. Structure diagrams of MWInet.

Figure 6. Loss for each model in Table 3.

Table 1. The outputs of each layer in the model.

Layers	Specific Configuration	Output Shape
Conv2d-1-1	k:3×3, S = 1, P = 1	(128,64,256,128)
Conv2d-1-2	k:3×3, S = 1, P = 1	(128, 64, 256, 128)
Conv2d-1-3	k:3×3, S = 1, P = 1	(128, 64, 256, 128)
Maxpooling2d-1	k:3×3, S = 2, P = 0	(128, 64, 128, 64)
Conv2d-2-1	k:3×3, S = 1, P = 1	(128, 128, 128, 64)
Conv2d-2-2	k:3×3, S = 1, P = 1	(128, 128, 128, 64)
Conv2d-2-3	k:3×3, S = 1, P = 1	(128, 128, 128, 64)
Maxpooling2d-2	k:2×2, S = 2, P = 0	(128, 128, 64, 32)
Conv2d-3-1	k:3×3, S = 1, P = 1	(128, 256, 64, 32)
Conv2d-3-2	k:3×3, S = 1, P = 1	(128, 256, 64, 32)
Conv2d-3-3	k:3×3, S = 1, P = 1	(128, 256, 64, 32)
Maxpooling2d-3	k:2×2, S = 2, P = 0	(128, 256, 32, 16)
Conv2d-4-1	k:3×3, S = 1, P = 1	(128, 512, 32, 16)
Conv2d-4-2	k:3×3, S = 1, P = 1	(128, 512, 32, 16)
Conv2d-4-3	k:3×3, S = 1, P = 1	(128, 512, 32, 16)
Maxpooling2d-4	k:2×2, S = 2, P = 0	(128, 512, 16, 8)

Table 2. Experimental results for MWInet.

Model	Top-1 (%)	Top-5 (%)	Time-Train (s)
MWInet-8 (cross-entropy)	83.28	96.56	246
MWInet-12 (cross-entropy)	88.67	97.54	383
MWInet-8 (label smooth)	86.43	96.66	246
MWInet-12 (label smooth)	89.60	97.53	383

Table 3. Experimental results in different models.

Model	Top-1 (%)	Top-5 (%)	Time-Train (s)
Resnet50	81.44	94.75	348
Fragnet	72.94	89.2	315
GRRNN	82.71	94.37	234
VGG16	83.69	94.8	277
VGG19	85.55	96.54	401
MWInet-12 (label smooth)	89.60	97.53	383

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sun, Y.; Fan, D.; Wu, H.; Wang, Z.; Tian, J. Offline Mongolian Handwriting Identification Based on Convolutional Neural Network. Electronics 2024, 13, 111. https://doi.org/10.3390/electronics13010111

AMA Style

Sun Y, Fan D, Wu H, Wang Z, Tian J. Offline Mongolian Handwriting Identification Based on Convolutional Neural Network. Electronics. 2024; 13(1):111. https://doi.org/10.3390/electronics13010111

Chicago/Turabian Style

Sun, Yuxin, Daoerji Fan, Huijuan Wu, Zhixin Wang, and Jia Tian. 2024. "Offline Mongolian Handwriting Identification Based on Convolutional Neural Network" Electronics 13, no. 1: 111. https://doi.org/10.3390/electronics13010111

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Offline Mongolian Handwriting Identification Based on Convolutional Neural Network

Abstract

1. Introduction

2. Related Work

3. Method

3.1. Dataset and Preprocessing

3.2. Method

3.3. Loss

3.3.1. Cross-Entropy Loss

3.3.2. Label Smoothing Loss

4. Experiment

4.1. Environment and Hyperparameter

4.2. Image Transformation

4.3. Experimental Results of Model

4.3.1. Increasing Network Depth

4.3.2. Adding Label Smoothing Regularization

4.3.3. Experiments in Different Models

4.4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI