Recognizing Digital Ink Chinese Characters Written by International Students Using a Residual Network with 1-Dimensional Dilated Convolution

Xu, Huafen; Zhang, Xiwen

doi:10.3390/info15090531

Open AccessArticle

Recognizing Digital Ink Chinese Characters Written by International Students Using a Residual Network with 1-Dimensional Dilated Convolution

by

Huafen Xu

^1,2,* and

Xiwen Zhang

^1,*

¹

College of Information Science, Beijing Language and Culture University, Beijing 100083, China

²

College of Computer, North China Institute of Science and Technology, Langfang 065201, China

^*

Authors to whom correspondence should be addressed.

Information 2024, 15(9), 531; https://doi.org/10.3390/info15090531

Submission received: 25 July 2024 / Revised: 23 August 2024 / Accepted: 26 August 2024 / Published: 2 September 2024

Download

Browse Figures

Versions Notes

Abstract

:

Due to the complex nature of Chinese characters, junior international students often encounter writing problems related to strokes, components, and their combinations when writing Chinese characters. Digital ink Chinese characters (DICCs) are obtained by sampling the writing trajectory of Chinese characters with a pen input device. DICCs contain rich information, such as the time and space of strokes and sampling points. Recognizing DICCs is crucial for evaluating and correcting writing errors and enhancing the quality of Chinese character teaching for international students. Here, the paper first employs a one-dimensional dilated convolution to digital ink Chinese character recognition (DICCR) and proposes a novel residual network with one-dimensional dilated convolution (1-D ResNetDC). The 1-D ResNetDC not only utilizes multi-scale convolution kernels, but also employs different dilation rates on a single-scale convolution kernel to obtain information from various ranges. Additionally, residual connections facilitate the training of deep one-dimensional convolutional neural networks. Moreover, the paper proposes a more expressive ten-dimensional feature representation that includes spatial, temporal, and writing direction information for each sampling point, thereby improving classification accuracy. Because the DICC dataset of international students is small and unbalanced, the 1-D ResNetDC is pre-trained on the published available dataset. The experiments demonstrate that our approach is effective and superior. This model features a compact architecture, a reduced number of parameters, and excellent scalability.

Keywords:

digital ink Chinese character recognition; multi-scale context; dilated convolution; 1-dimensional convolution; residual network; trajectory representation

1. Introduction

Chinese characters serve as the universal writing system in China and exhibit several distinct characteristics: (1) Large Variety: The basic collection defined by the GB2312-80 standard contains a total of 6763 Chinese characters, while GB18010-2000 encoding standard defines 27,533 Chinese characters. (2) Similarity Among Characters: Many Chinese characters are visually similar, such as “已-己”, “口-囗”, “雎-睢”, “睛-晴”, “海-诲”, “绞-纹”, “莱-菜”, and so on [1]. (3) Complex Structures: Chinese characters have intricate structures, a large number of strokes, specific stroke orders, and weak writing regularities [2]. These characteristics of Chinese characters make international Chinese characters teaching a challenging aspect of international Chinese language teaching [2].

The writing habits of international students differ significantly from those of native speakers. Influenced by their native language and varying levels of Chinese proficiency, there are various non-standardizations and errors in their DICCs. Errors and non-standardizations in DICCs for international students are illustrated in Figure 1 and are primarily reflected in the following three aspects: (1) Stroke Errors: These include extra, missing, connected, broken, and incomplete strokes; (2) Errors in Stroke Relation: These encompass stroke order, stroke direction, and geometric errors; (3) Non-standardizations: These include non-standard strokes and components as well as the resulting overall structural imbalances. For characters with more strokes and more complex stroke combinations, the correctness and standardization of DICCs for international students tend to decline. All of these bring great challenges to international students’ DICCR.

With the assistance of pen input devices, such as handwriting pads and digital pens, the writing trajectories of Chinese characters produced by international students can be sampled regularly to generate DICCs. Compared with handwritten Chinese character images that only contain spatial information, DICCs provide richer and more precise spatial and temporal information, such as the location of sampling point, sampling time, pressure, and writing strokes. In contrast to video files that record the writing process, DICCs have a smaller file size, making them easier to store, transmit, and analyze. By recognizing DICCs, it is possible to evaluate their correctness (such as identifying extra or missing strokes) and calculate their standardization, achieving the objective of computer-aided Chinese characters writing for international students and enhancing the quality of international Chinese characters teaching.

For DICCR, popular approaches include the two-dimensional conventional neural networks (2-D CNNs) [3,4,5,6], 1-dimensional conventional neural networks (1-D CNNs) [7], and recurrent neural networks (RNNs) [8,9]. For example, Zhang et al. [5] combined the directional map with 2-D CNN and achieved new state-of-the-art at that time. Gan et al. [7] proposed using 1-D CNN for DICCR. Zhang et al. [8] proposed an RNN-based approach that directly deals with handwriting trajectories. Although these approaches have achieved a high recognition rate, they mainly focus on DICCs for native Chinese speakers. International students’ DICCs often contain various errors and non-standardizations, leading to unsatisfactory recognition performance. Few studies have addressed DICCs for non-native Chinese speakers. In 2016, Bai et al. [10] proposed an approach for DICCR based on hierarchical models with HMM (hidden Markov models) for international students. However, the recognition performance has been limited by the attributes of the generative model of HMM. In 2020, they applied HCRF (hidden conditional random field) to improve the performance of hierarchical models [11]. Their approaches imitate the process of human writing Chinese characters, considering that Chinese characters are composed of different strokes. Therefore, Chinese character recognition should start from the strokes of Chinese characters and extract the structural features of Chinese strokes. However, in practice, it is challenging to stably extract the strokes of Chinese characters and their relationships, resulting in less than ideal recognition accuracy. Deep learning has brought a new breakthrough for DICCR. Employing deep learning for international students’ DICCR is an inevitable choice.

In the existing approaches for DICCR, 2-D CNN approaches convert DICCs into images or extract feature images for training [3,4,5,6]. They ignore the temporal information contained in DICCs and require strong domain knowledge to extract feature images. Moreover, increased computational and storage costs are needed to represent one-dimensional trajectory information as two-dimensional image information. One-dimensional CNN and RNN approaches can better utilize temporal information. However, RNN approaches [7,8] are not easy to parallelize and have slower speeds when processing long sequences. In comparison, 1-D CNN approaches [7] offer a faster computation speed.

The 1-D CNN recognition approach offers advantages, such as a compact model size, rapid training, low computational demands, and ease of parallelization. In the traditional 1-D CNN [7], multi-scale context information is integrated by continuous convolution strides or pooling operations, with the sequence length gradually reduced until a global prediction is achieved. However, traditional 1-D CNN lacks the capability to capture multi-scale correlations of writing trajectories from DICCs of international students. To address this, in addition to traditional ways, our model combines convolutional kernels with varying dilation rates to aggregate multi-scale contextual information, ensuring the capture of both short-distance and long-distance correlations within sequences. The contributions are summarized as follows.

(1): The paper proposes more expressive ten-dimensional feature representation that includes spatial, temporal, and writing direction information for each point from the raw data sampled, significantly improving classification accuracy.
(2): The paper first uses one-dimensional dilated convolutional networks for DICCR, a novel approach in this domain. The proposed 1-D ResNetDC effectively captures multi-scale contextual information in DICCs, encompassing both short-distance and long-distance sequence correlations.
(3): The model is compact and has a reduced number of parameters along with excellent scalability.

The rest of the paper is organized as follows. Section 2 briefly reviews the related works. Section 3 introduces the preprocessing of DICCs. Section 4 presents the proposed 1-D ResNetDC. Section 5 discusses recognition based on pre-trained model. Section 6 reports the experimental results and provides the detailed analysis, and Section 7 draws the conclusion.

2. Related Work

The complexity of Chinese characters and the diversity of writing styles make DICCR a challenging task for pattern recognition and machine learning.

2.1. Digital Ink Chinese Characters Recognition

Traditional DICCR typically involves three steps: preprocessing, feature extraction, and classification recognition. Common preprocessing methods include shape normalization [12], adding imaginary strokes [13,14], and resampling. Feature extraction extracts discriminant features by direction decomposition, such as eight-direction feature maps [15] and path signatures [4]. Common classifiers used for feature classification include support vector machine (SVM), modified quadratic discriminant function (MQDF) [16,17], hidden Markov model (HMM) [18], and so on. The traditional DICCR approaches are limited by low-capacity features, and their recognition performances are difficult to improve.

In recent years, deep learning, supported by GPU parallel computing, has led to significant breakthroughs in DICCR, substantially improving recognition performance beyond traditional approaches and surpassing human-level accuracy. Existing deep learning approaches for DICCR can be categorized into three types based on network architecture.

2.1.1. RNN Approaches

RNN-based approaches directly process DICCs, effectively utilizing the temporal and spatial information inherent in sequence data. Additionally, they can fully exploit the long-distance dependence of time sequence data, achieving notable success in DICCR. Zhang X-Y et al. [8] used LSTM and GRU to model RNN and realized end-to-end DICCR using stacking bidirectional RNN models. Zhang J et al. [9] proposed TRAN, which consists of an encoder and a decoder. The encoder uses a stacked bidirectional RNN to encode the original data of DICCs into new representations, which the decoder then uses to recognize Chinese characters. Due to the recurrent computing mechanism of RNNs, they are difficult to parallelize and are slower when processing long sequences.

2.1.2. Graph Neural Network Approaches

Graph neural network (GNN) approaches recognize Chinese characters by abstracting their inherent graph structures. In 2023, Ji Gan et al. [19] proposed a spatial graph convolution network (SGCN) for DICCR. SGCN integrates local neighborhood information through spatial graph convolution and further learns global shape attributes by hierarchical residual structure to realize final classification. Ji Gan et al. [20] proposed using a skeleton graph to represent Chinese characters. PyGT (pyramid graph transformer) architecture integrates a graph attention module and a graph convolution module to capture global information and local topological structure of graphs, respectively. However, due to the high similarity between certain characters, GNNs struggle to accurately distinguish between characters with similar structures [20]. Moreover, GNN approaches require converting DICCs into graphs, which adds complexity to the process.

2.1.3. Convolution Neural Network Approaches

The structure of CNN is composed of a series of convolution layers and pooling layers alternately, and the output of each layer is a set of feature maps. Depending on the dimensionality of the input data, convolution neural networks are divided into 2-D CNNs and 1-D CNNs.

(1) 2-D CNN approaches for DICCR

2-D CNN approaches for DICCR can be divided into two categories: end-to-end methods and domain knowledge-based methods. The former transforms DICCs into digital images and feeds them into the CNN for training, without any feature extraction or feature selection, and the output of CNN is directly used as the final recognition result. A typical example of an end-to-end recognition method is the MCDNN [3] proposed by Cireşan et al. It consists of several standard CNNs, and each comprises four convolution layers, four pooling layers, and two full connection layers. The final prediction is obtained using a simple average of all CNN outputs. This approach leverages only the spatial information of the writing trajectory while ignoring temporal information, which may reduce recognition performance. The latter extracts feature images (such as eight-direction features and path signatures) from the handwriting trajectory using a directional decomposition strategy and then feeds these feature images into 2-D CNNs for training. Graham et al. [4] proposed a sparse CNN model, introducing path signature features into the input layer of the CNN. Zhong et al. [6] used traditional feature extraction methods to extract Gabor features and gradient features of handwritten Chinese characters. These features, combined with character images, were input into the HCCR-GoogLeNet network, leading to improved recognition accuracy through an ensemble of 10 HCCR-GoogLeNet models. Zhang et al. [5] combined the deep convolution neural network with the domain knowledge, such as shape normalization and direction feature map, and proposed the DirectMap + ConvNet + Adaption approach. These approaches embed the manually extracted Chinese characters feature images into the CNN network as a priori knowledge, which can help the CNN learn the auxiliary features of Chinese characters better and effectively improve the recognition performance of the network. However, 2-D CNN approaches have several drawbacks: converting DICCs into images results in the loss of temporal information, and extracting feature images demands extensive domain knowledge. Additionally, representing one-dimensional trajectory information as two-dimensional images increases computational and storage costs.

(2) 1-D CNN approaches for DICCR

1-D CNN directly deals with one-dimensional patterns, and the convolution kernel only moves in one direction. It is generally used for the analysis of time sequence data. The process of inputting DICCs is to input the coordinates of sampling points in temporal order. Therefore, DICCs can be regarded as time sequence data composed of sampling points, and one-dimensional convolution can be used to capture contextual information between sampling points. Ji Gan used 1-D CNN to recognize online handwritten English words [21], demonstrating the potential of 1-D CNN in sequence prediction. After that, 1-D CNN is proposed to recognize Chinese characters in digital ink [7], and good accuracy is obtained using the ICDAR 2013 dataset. Table 1 presents a comparison of 1-D CNN, 2-D CNN, RNN, and GNN approaches for DICCR.

2.2. Digital Ink Trajectory Representation

There are different trajectory representations of DICCs. Among them, X.-Y. Zhang and J. Zhang use a six-dimensional vector to represent a trajectory point [8,9]:

p_{t} = (x_{t}, y_{t}, {∆ x}_{t}, {∆ y}_{t}, δ (s_{t} = s_{t + 1}), δ (s_{t} \neq s_{t + 1}))

. Here,

{∆ x}_{t} = x_{t} - x_{t - 1}, {∆ y}_{t} = y_{t} - y_{t - 1}

,

δ (.) = 1

if the condition holds; otherwise,

δ (.) = 0

. The last two items indicate the state of the pen, and (1, 0) and (0, 1) indicate the pressing and lifting of the pen, respectively. J. Gan [7] uses an eight-dimensional vector to represent a trajectory point:

p_{t} = ({∆ x}_{t}, {∆ y}_{t}, \sin α_{t}, \cos α_{t}, \sin β_{t}, \cos β_{t}, δ (s_{t} = s_{t + 1}), δ (s_{t} \neq s_{t + 1}))

. Compared with the six-dimensional vector representation, this method eliminates the absolute coordinates of trajectory points and incorporates writing direction features

(\sin α_{t}, \cos α_{t}, \sin β_{t}, \cos β_{t})

.

The representation methods of DICCs affect classification accuracy. Based on the six-dimensional and eight-dimensional representation methods, this paper puts forward a more discriminant ten-dimensional representation method, which improves the classification accuracy.

2.3. Dilated Convolution

Dilated convolution is an effective method to expand the receptive field. The main idea is to insert “holes” between the pixels of the convolution core, keeping the size of the feature image unchanged. In addition, the computing cost is not increased while expanding the receptive field. It is widely used in semantic segmentation [22,23,24], object detection [25], depth evaluation [26], etc. Dilated convolution is essentially a convolution operation with a dilated convolution kernel. The dilated convolution operation can apply the same convolution kernel to different ranges using different dilation rates, with small dilation rates capturing information at close range and large dilation rates capturing long-range information. The paper first applies dilated convolution to recognize DICCs for international students and proposes a novel residual network with one-dimensional dilated convolution (1-D ResNetDC). This is the first exploration of dilation convolution in DICCs classification task.

3. Preprocessing of DICCs for International Students

Trajectory preprocessing is an important step in improving recognition accuracy. It mainly includes trajectory normalization and feature extraction.

3.1. Trajectory Normalization

There may be redundant points in the trajectory sequence. Removing these points will not change the shape of Chinese characters, but it can effectively reduce the size of input data and the amount of computation. First, adjacent repeating points, except those at the start and end of strokes, are removed, as illustrated in Figure 2a. Next, the sampling points are horizontally flipped to correct the display of Chinese characters, as shown in Figure 2b. Last, x- and y-coordinates are normalized into a standard interval, as shown in Figure 2c. The steps of trajectory normalization are shown in Table 2.

3.2. Trajectory Representation

Based on the preprocessed data, a ten-dimensional feature vector is extracted for each sampling point

p_{t}

.

p_{t} = (x_{t}, y_{t}, {∆ x}_{t}, {∆ y}_{t}, \sin α_{t}, \cos α_{t}, \sin β_{t}, \cos β_{t}, s_{t}, q_{t})

The first two elements represent the absolute coordinates of the sampling point, while the third and fourth elements represent the XY coordinate offsets, indicating the pen’s movement direction. They can store both the spatial structure and the temporal sequence information of the original trajectory. Elements 5–8 describe the writing direction, providing a more robust representation of the trajectory. These features are difficult for neural networks to infer directly from the original coordinates, so they are included in the feature representation of sampling points

p_{t}

.

\sin a_{t} = \frac{y_{t + 1} - y_{t - 1}}{\sqrt{{(x_{t + 1} - x_{t - 1})}^{2} + {(y_{t + 1} - y_{t - 1})}^{2}}}

(1)

\cos a_{t} = \frac{x_{t + 1} - x_{t - 1}}{\sqrt{{(x_{t + 1} - x_{t - 1})}^{2} + {(y_{t + 1} - y_{t - 1})}^{2}}}

(2)

\sin β_{t} = \cos a_{t - 1} \times \sin a_{t + 1} - \sin a_{t - 1} \times \cos a_{t + 1}

(3)

\cos β_{t} = \cos a_{t - 1} \times \cos a_{t + 1} + \sin a_{t - 1} \times \sin a_{t + 1}

(4)

s_{t}

denotes the stroke index of the sampling point

p_{t}

, indicating which stroke the sampling point belongs to within the Chinese character, with the index starting from 1.

q_{t}

represents the index of the sample point within the current stroke, that is, which sample point of the current stroke

p_{t}

belongs to, and the index value starts from 1. Compared with

(δ (s_{t} = s_{t + 1}), δ (s_{t} \neq s_{t + 1}))

in six-dimensional and eight-dimensional feature vectors,

{(s}_{t}, q_{t})

can represent not only the state of the pen but also the stroke information of Chinese characters and the information of points in each stroke. In this way, the original trajectory is represented as a new sequence:

X = [p_{0}, \dots, p_{t}, \dots, p_{T}]

. According to the CASIA-OLHWDB1.0 dataset, the maximum number of Chinese characters sampling points is 435, and the proportion with less than 160 points is 99.99%. In the model design, the input layer size is 320. Only 5 Chinese characters have more than 320 sampling points, and this number can be reduced by eliminating non-essential points.

4. One-Dimensional Residual Networks with Dilation Convolution

1-D CNN uses convolutional operations to extract contextual information, and the larger receptive field learns the longer range dependency of the sequence. Therefore, our model expands the receptive field based on the following four aspects.

4.1. Deep Structure

1-D CNN needs to stack many convolution layers to fully extract the context of sequences, so the 1-D ResNetDC is a deep network. After trajectory preprocessing, the input sequence

X = [p_{0}, \dots, p_{t}, \dots, p_{T}]

, where T = 320.

p_{t}

is a ten-dimensional vector. X is sent to the 1-D ResNetDC, and the network will predict the Chinese characters category y. The 1-D ResNetDC designed in this paper, as shown in Figure 3, directly processes Chinese characters trajectory sequences. The first layer of the 1-D ResNetDC is a convolutional layer with a kernel size of 7 and a stride of 2, followed by a second convolutional layer with a kernel size of3 and a stride of 2. The third layer is a pooling layer with a window size of 3 and a stride of 2. This is followed by four Block modules, concluding with an average pooling layer and a fully connected layer.

4.2. Residual Modules

To avoid gradient vanishing and exploding in deep networks, 1-D ResNetDC employs residual networks. The Block modules are designed as residual modules, with all four Block modules sharing the same structure, differing only in the number of channels in their convolutional layers. The structure of the Block 1 module is depicted in Figure 4. Each Block module consists of five convolutional layers and a shortcut connection, with batch normalization and ReLU activation applied after each convolutional operation.

The number of channels in the convolutional layers of the four Block modules is presented in Table 3. The input Chinese character trajectory is represented as a 10 × 320 sequence. In the initial three layers of the network, the sequence is down-sampled to one-eighth of its original length. The input feature sequence length for Block 1 is 40, which is relatively small. Therefore, all convolution strides in the four Block modules are set to 1, meaning that no further down-sampling is performed. As a result, both the input and output feature sequence lengths for all four Block modules remain at 40.

4.3. Multi-Scale Dilated Convolution

For DICCR, multi-scale acquisitions of short-distance and long-distance correlation of sequences are necessary for successful prediction. Therefore, the convolution layer in the residual module adopts a series of different dilation rates to aggregate multi-scale context information. In addition to the traditional way, considering the length of the feature sequence, the convolution strides of four Block modules are 1, the pooling layer is removed, and multi-scale correlations of sequences are obtained through convolutional layers with different dilation rates.

A convolution kernel with size k and dilation rate r is dilated to a size

k_{d}

, where

k_{d} = k + (k - 1) \times (r - 1)

, thus expanding the receptive field. Figure 5 illustrates the modeling process of a regular one-dimensional convolution network with k = 3, s = 1, and r = 1. A neuron on the second convolutional layer has a receptive field of five on the input sequence. Figure 6 depicts the modeling process of a one-dimensional dilation convolution network with k = 3, s = 1, and r = 2. In this configuration, a neuron on the second convolutional layer has a receptive field of 9 on the input sequence, demonstrating how dilated convolution effectively expands the receptive field. Since dilated convolution inserts 0 between adjacent elements of the convolution kernel, the actual sampling points participating in the calculation in the range of

k_{d}

size are only k, so each neuron in the dilated convolution layer can only view the information in a non-continuous manner, leading to significant information loss. This is known as sparse sampling, as illustrated in Figure 6.

In a two-dimensional dilation convolution network, when multiple dilation convolutions are stacked, certain pixels may not be utilized, resulting in the loss of information continuity and correlation, which hinders learning. This phenomenon is referred to as the gridding effect [27] of dilated convolution. As shown in Figure 6, this gridding effect is also present in a one-dimensional dilated convolution network.

As shown in Table 4, to effectively solve the gridding effect, a series of different dilation rates are used in the convolution layer with a kernel size of three in the four Block modules in the model design. The dilation rates used are [1, 2, 3], corresponding to dilated convolutional kernel sizes of [3, 5, 7], and the receptive fields on the input feature sequence for the Block modules are [3, 7, 13]. This configuration effectively covers all holes and significantly enlarges the receptive fields.

The length of the input feature sequence for each Block module is 40, and the convolution stride is 1, which is explained in Section 4.2. There is no down-sampling layer. Taking the fourth convolutional layer of the Block module as an example, the kernel size and dilation rate of this layer are three. According to the formula

k_{d} = k + (k - 1) \times (r - 1)

, the calculated dilated kernel size is 7. The receptive field on the input feature sequence of the Block module is equal to the receptive field of the previous layer plus dilated kernel size minus 1, resulting in a receptive field of 13.

Without reducing the feature sequence length, both local features and global features are captured, thus aggregating context information of different scales and improving the classification performance.

4.4. Multi-Scale Convolution Kernels

CNNs with large kernels have larger effective receptive fields and higher shape bias, aligning with the human cognitive system that primarily identifies objects through shape cues [28]. Therefore, the first layer of convolution adopts a relatively large convolution kernel of seven. Simultaneously, deeper convolutional networks contribute to higher classification accuracy, and the use of smaller convolutional kernels facilitates the design of deeper networks [29]. Consequently, 1-D ResNetDC stacks multiple Block modules to construct a deep convolutional network. Each Block module utilizes convolutional kernels with sizes of three and one. A kernel size of three is the minimum necessary to capture the spatial concepts of left, middle, and right, while a kernel size of one enables cross-channel information interaction. This multi-scale convolution kernel design enhances the expression ability of the 1-D ResNetDC.

5. Recognizing DICCs Based on the Pre-Trained Model

Pre-training divides the training task into two steps: commonalities learning and characteristic learning. First, learn the commonalities on a large amount of low-cost collected datasets that are easily available. Next, transfer the commonalities into the model of a specific task, and then use a small amount of labeled data from relevant specific fields for “fine-tuning”. The model only needs to start from “commonalities” and “learn” the “special” parts of that specific task.

DICCR for international students is a specific task. Although there are relatively few samples in the dataset, it still takes a significant amount of manpower to label the samples. Therefore, this paper chooses to pre-train the proposed model on the public dataset CASIA-OLHWDB1.0. The training loss and verification accuracy of the model on CASIA-OLHWDB1.0 are shown in Figure 7 and Figure 8.

Table 5 presents the recognition results of various approaches using the CASIA-OLHWDB1.0 dataset. Our model significantly outperforms the traditional MQDF [30] and MCDNN [3] in terms of accuracy. Although the accuracy of 1-D ResNetDC is slightly lower than that of DropSample-DCNN [31], it is important to note that 1-D ResNetDC is a 1-D CNN, whereas DropSample-DCNN is a 2-D CNN. Qualitative analysis indicates that 1-D ResNetDC has fewer parameters and a faster training speed, enabling end-to-end recognition without requiring any domain-specific knowledge.

The model parameters are further trained with the DICC dataset of international students, and the accuracy changes are shown in Figure 9. The results demonstrate that the model pre-trained on the CASIA-OLHWDB1.0 dataset achieves higher classification accuracy, reaching 97.2%, compared to 96.5% without pre-training.

6. Experiments

To evaluate the effectiveness and robustness of the 1-D ResNetDC, a comparative experiment was conducted based on the DICC dataset of international students and the CASIA-OLHWDB1.0 dataset.

6.1. Datasets

Our experiment uses two DICC datasets. The CASIA-OLHWDB1.0 dataset is primarily used for pre-training the model due to the relatively small size of the DICC dataset for international students.

The DICC dataset for international students was collected during classroom teaching sessions for “Zero Starting Point” international students at Beijing Language and Culture University. Students used an Anoto digital pen to write on dot matrix paper, capturing the DICC text, which was then segmented into individual Chinese characters [32]. The dataset includes 525 categories of Chinese characters, with the number of samples per category ranging from a few to over 300. The dataset comprises 31,734 samples. Figure 10 displays the number of samples per category in descending order. To address the imbalance in this dataset, we pre-train the model on the CASIA-OLHWDB1.0 dataset, thereby mitigating the adverse effects of data imbalance during training and avoiding complex processing techniques.

The CASIA-OLHWDB1.0 digital ink dataset from the Chinese Academy of Sciences [33] contains 171 alphanumeric symbols and 3866 commonly used Chinese characters, among which, 3740 characters are in the GB2312-80 level-1 set. CASIA-OLHWDB1.0 comprises a total of 420 sets of samples, totaling 1,694,741 samples.

6.2. Model Training

The proposed 1-D ResNetDC is implemented on the PyTorch platform, and the version number of PyTorch is 2.0.1 + cu118. In the whole training process, a small batch gradient descent method is used to minimize the loss. The size of the mini-batch is set to 100, and the loss function adopts a cross entropy of one. The activation function used is the ReLU (rectified linear unit). The initial value of the learning rate is set at 0.01 and is adaptively adjusted during training. When the validation loss stops improving, the learning rate is adjusted to 0.35 times.

6.3. Influence of Different Trajectory Representation on Classification Accuracy

Different trajectory representations contain different information and different discrimination, which makes the accuracy of the network have a big gap. This section verifies the importance of trajectory representation through experiments.

The data in CASIA-OLHWDB1.0 are represented by six-dimensional feature vectors [8,9], eight-dimensional feature vectors [7], and ten-dimensional feature vectors proposed in this paper, and the 1-D ResNetDC is trained. The verification accuracies are shown in Figure 11.

After pre-training, the verification accuracy on the DICC dataset for international students is shown in Figure 12.

It can be seen that the eight-dimensional representation has the worst effect because the eight-dimensional representation removes the absolute coordinates, thus losing the spatial structure of the original trajectory. Ten-dimensional representation has the highest verification accuracy. These findings demonstrate that the ten-dimensional feature representation proposed in this paper is more effective and discriminative.

6.4. Ablation Experiment

Based on digital ink Chinese characters for international students, experiments were conducted to compare the classification performance across three networks with varying numbers of Block modules, the performance of standard convolution versus dilated convolution, and the impact of different dimensional feature representations on classification.

6.4.1. Varying Numbers of Block Modules

In the 1-D ResNetDC depicted in Figure 3, the number of Block modules is configured as two, three, and four, respectively, forming three networks of varying depths. The training losses and verification accuracy of the three networks are presented in Figure 13 and Figure 14.

It can be seen that the network with three Block modules has the smallest training losses and the highest verification accuracy. Although the Block module with a residual structure effectively solves gradient vanishing and exploding, which makes the deep network more conducive to feature learning and improving classification accuracy, there is no positive correlation between the number of Block modules and the accuracy of recognition. This does not mean that the deeper the network is, the better the performance. Experimental results suggest that configuring three Block modules in the 1-D ResNetDC is the most optimal setup.

6.4.2. Varying Dilation Rates

To verify the role of dilation convolution in the 1-D ResNetDC, we compare the classification effects of an ordinary convolution network and dilation convolution network. In the 1-D ResNetDC shown in Figure 3 and Figure 4, the number of Block modules is set to three. For the three convolution layers with a kernel size of 3 in each Block module, in an ordinary convolution network, the dilation rates of these three convolution layers are all set to 1, while in the dilated convolution network, the dilation rates of these three convolution layers are set as [1, 2, 3] respectively. Figure 15 presents the training loss for both networks, while Figure 16 illustrates their validation accuracies.

The training loss of the dilation convolution network is smaller, and the verification accuracy is higher than the ordinary convolution network. The highest accuracy of the dilation convolution network reaches 96.5%. It proves that the dilation convolution has a remarkable effect on improving the classification accuracy of digital ink Chinese characters for international students.

6.4.3. Scalability of Feature Representation

Each trajectory point of the DICC is represented as a four-dimensional feature vector.

p_{t} = (x_{t}, y_{t}, s_{t}, q_{t})

The four features are the absolute coordinates of the point, the stroke index to which the current point belongs, and the point index in the current stroke. Then, each trajectory point is represented as a six-dimensional feature vector.

p_{t} = (x_{t}, y_{t}, {∆ x}_{t}, {∆ y}_{t} s_{t}, q_{t})

Relative to the four-dimensional feature vector, the offsets of XY coordinates are added to the six-dimensional feature vector. Then, the writing direction feature is added to form a ten-dimensional feature vector.

p_{t} = (x_{t}, y_{t}, {∆ x}_{t}, {∆ y}_{t}, \sin α_{t}, \cos α_{t}, \sin β_{t}, \cos β_{t}, s_{t}, q_{t})

All of the points constitute multi-channel 1-dimensional data, which are used to train the designed 1-D ResNetDC.

The training loss and verification accuracy of the network are compared, as shown in Figure 17 and Figure 18. After 32 epochs, the training of the network is basically completed. The training loss is close to 0, and the verification accuracy is stable. The convergence speed and verification accuracy of the four-dimensional representation method are the worst. The six-dimensional representation method converges fastest, but the ten-dimensional representation has the highest validation accuracy. Adding more discriminative features to the feature representation can further improve recognition accuracy.

The trajectory points of CASIA-OLHWDB1.0 are represented by six-dimensional and ten-dimensional features. After pre-training the model, the verification accuracies on the digital ink Chinese characters dataset of international students are shown in Figure 19. Experiments further prove the effectiveness of the ten-dimensional feature representation and the extensibility of the model.

6.5. Result Analysis

The 1-D ResNetDC not only performs well in recognizing standardized DICCs for international students, as shown in Figure 20, but also has good recognition performance for DICCs with various stroke errors, structural imbalances, and even alterations, as shown in Figure 21, Figure 22, Figure 23 and Figure 24.

Here, 1-D ResNetDC adopts a series of convolution operations with different dilation rates, which not only avoids the gridding effect of dilation convolution but also learns multi-scale contextual information, which helps to simultaneously obtain the short-distance correlation and long-distance correlation of DICCs.

Grad-CAM [34] is used to visualize the contribution of different trajectory points for the classification decision of the 1-D ResNetDC. In 1-D ResNetDC, the dilation rate of the convolution layer with kernel size three in the Block module is set to one, which is called an ordinary convolution network. Figure 25 and Figure 26 show the most concerned time steps with an ordinary convolution network and 1-D ResNetDC, respectively, when classifying digital ink Chinese character “典”. Six sub-graphs in Figure 25 and Figure 26 present six-dimensional representations

(x, y, ∆ x, ∆ y, s, q)

of trajectory sequences.

By visualizing the trajectory sequence in two dimensions, the sampling points that ordinary convolution network and 1-D ResNetDC pay the most attention to when classifying Chinese character “典” are intuitively seen, as shown in Figure 27 and Figure 28. The prediction result of the ordinary convolution network is wrong, and the prediction category is “兴”. The 1-D ResNetDC pays attention to the sequence correlation over a long range, and the classification prediction of “典” is correct. This intuitively explains the high accuracy of 1-D ResNetDC.

7. Conclusions

This paper employs ten-dimensional feature vectors to represent sampling points of DICCs, fully utilizing the temporal, spatial, and writing direction information, as well as the information regarding strokes and sampling points themselves. This results in enhanced expressive power. The sequence structure of writing trajectories is processed directly using 1-D ResNetDC. The model has a compact architecture and a reduced number of parameters. Additionally, the model aggregates multi-scale contextual information through a series of different dilation rates, achieving high classification accuracy. The 1-D ResNetDC also demonstrates good scalability, allowing for the addition of other more discriminative features to the feature representation of trajectory points, which can further improve the model’s classification accuracy. Experimental results confirm that the proposed approach is effective and superior for DICCR for international students. The compact size of the model, its reduced parameter count, and its good scalability are essential advantages that make the model more practical and effective for various applications. However, 1-D ResNetDC is unsuitable for classifying optical character images that lack temporal information.

DICCR for international students provides a technical foundation for applications, such as automatic grading system, language learning apps, and cross-cultural communication platforms. It assists teachers in efficiently assessing the proficiency of Chinese character writing among international students and provides tools for self-practice and immediate feedback. The paper not only contributes to the academic understanding of one-dimensional dilated convolutional neural networks in the context of handwriting recognition but also offers practical insights that could influence the development of more advanced and efficient recognition systems.

Author Contributions

Conceptualization, X.Z. and H.X.; Methodology, X.Z. and H.X.; Software, H.X.; Data curation, H.X.; Writing—original draft, H.X.; Writing—review & editing, H.X.; Funding acquisition, X.Z. and H.X.; Resources, X.Z.; Supervision, X.Z.; Quality check, X.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data will be made available on request.

Acknowledgments

I sincerely thank my mentor Xiwen Zhang for his careful guidance and selfless dedication. He not only gave me valuable advice academically, but also provided me with high-performance experimental equipment, ensuring the smooth progress of my research work. I am deeply grateful for this.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Jin, L.W.; Zhong, Z.Y.; Yang, Z.; Xie, Z.; Sun, J. Applications of deep learning for handwritten Chinese character recognition: A review. Acta Autom. Sin. 2016, 42, 1125–1141. [Google Scholar] [CrossRef]
Sun, S.J.; Zhang, X.W. The evolution and development trend of computer-aided Chinese character writing teaching technology for international learners of Chinese. TCSOL Stud. 2022, 3, 68–76. [Google Scholar] [CrossRef]
Ciresan, D.; Meier, U.; Schmidhuber, J. Multi-column deep neural networks for image classification. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; IEEE: Piscataway, NJ, USA, 2012; pp. 3642–3649. [Google Scholar] [CrossRef]
Graham, B. Sparse arrays of signatures for online character recognition. arXiv 2013, arXiv:1308.0371. [Google Scholar]
Zhang, X.-Y.; Bengio, Y.; Liu, C.-L. Online and offline handwritten Chinese character recognition: A comprehensive study and new benchmark. Pattern Recognit. 2017, 61, 348–360. [Google Scholar] [CrossRef]
Zhong, Z.; Jin, L.; Xie, Z. High performance offline handwritten Chinese character recognition using GoogLeNet and directional feature maps. In Proceedings of the 2015 13th International Conference on Document Analysis and Recognition (ICDAR), Tunis, Tunisia, 23–26 August 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 846–850. [Google Scholar] [CrossRef]
Gan, J.; Wang, W.; Lu, K. A new perspective: Recognizing online handwritten Chinese characters via 1-dimensional CNN. Inf. Sci. 2019, 478, 375–390. [Google Scholar] [CrossRef]
Zhang, X.-Y.; Yin, F.; Zhang, Y.-M.; Liu, C.-L.; Bengio, Y. Drawing and Recognizing Chinese Characters with Recurrent Neural Network. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 849–862. [Google Scholar] [CrossRef]
Zhang, J.; Zhu, Y.; Du, J.; Dai, L. Trajectory-based Radical Analysis Network for Online Handwritten Chinese Character Recognition. In Proceedings of the 2018 24th International Conference on Pattern Recognition (ICPR), Beijing, China, 20–24 August 2018. [Google Scholar]
Bai, H.; Zhang, X. Recognizing Chinese characters in digital ink from non-native language writers using hierarchical models. In Proceedings of the Second International Workshop on Pattern Recognition, Singapore, 19 June 2017; Jiang, X., Arai, M., Chen, G., Eds.; SPIE: Paris, France, 2017; p. 104430A. [Google Scholar] [CrossRef]
Bai, H.; Zhang, X.-W. Improved hierarchical models for non-native Chinese handwriting recognition using hidden conditional random fields. In Proceedings of the Fifth International Workshop on Pattern Recognition, Chengdu, China, 24 June 2020; Jiang, X., Zhang, C., Song, Y., Eds.; SPIE: Paris, France, 2020; p. 9. [Google Scholar] [CrossRef]
Liu, C.-L.; Marukawa, K. Pseudo two-dimensional shape normalization methods for handwritten Chinese character recognition. Pattern Recognit. 2005, 38, 2242–2255. [Google Scholar] [CrossRef]
Ding, K.; Deng, G.; Jin, L. An Investigation of Imaginary Stroke Techinique for Cursive Online Handwriting Chinese Character Recognition. In Proceedings of the 2009 10th International Conference on Document Analysis and Recognition, Barcelona, Spain, 26–29 July 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 531–535. [Google Scholar] [CrossRef]
Okamoto, M.; Yamamoto, K. On-line handwriting character recognition using direction-change features that consider imaginary strokes. Pattern Recognit. 1999, 32, 1115–1128. [Google Scholar] [CrossRef]
Bai, Z.-L.; Huo, Q. A study on the use of 8-directional features for online handwritten Chinese character recognition. In Proceedings of the Eighth International Conference on Document Analysis and Recognition (ICDAR’05), Seoul, Republic of Korea, 31 August–1 September 2005; IEEE: Piscataway, NJ, USA, 2005; Volume 1, pp. 262–266. [Google Scholar] [CrossRef]
Long, T.; Jin, L. Building compact MQDF classifier for large character set recognition by subspace distribution sharing. Pattern Recognit. 2008, 41, 2916–2925. [Google Scholar] [CrossRef]
Kimura, F.; Takashina, K.; Tsuruoka, S.; Miyake, Y. Modified quadratic discriminant functions and the application to chinese character recognition. IEEE Trans. Pattern Anal. Mach. Intell. 1987, 9, 149–153. [Google Scholar] [CrossRef]
Kim, H.J.; Kim, K.H.; Kim, S.K.; Lee, J.K. On-line recognition of handwritten chinese characters based on hidden markov models. Pattern Recognit. 1997, 30, 1489–1500. [Google Scholar] [CrossRef]
Gan, J.; Wang, W.; Lu, K. Characters as Graphs: Recognizing Online Handwritten Chinese Characters via Spatial Graph Convolutional Network. arXiv 2020, arXiv:2004.09412. [Google Scholar]
Gan, J.; Chen, Y.; Hu, B.; Leng, J.; Wang, W.; Gao, X. Characters as graphs: Interpretable handwritten Chinese character recognition via Pyramid Graph Transformer. Pattern Recognit. 2023, 137, 109317. [Google Scholar] [CrossRef]
Gan, J.; Wang, W.; Lu, K. A Unified CNN-RNN Approach for in-Air Handwritten English Word Recognition. In Proceedings of the 2018 IEEE International Conference on Multimedia and Expo (ICME), San Diego, CA, USA, 23–27 July 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–6. [Google Scholar] [CrossRef]
Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs. arXiv 2016, arXiv:1412.7062. [Google Scholar]
Yu, F.; Koltun, V. Multi-Scale Context Aggregation by Dilated Convolutions. arXiv 2016, arXiv:1511.07122. [Google Scholar]
Chen, L.-C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking Atrous Convolution for Semantic Image Segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Computer Vision—ECCV 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2016; Volume 9905, pp. 21–37. [Google Scholar] [CrossRef]
Zhuang, C.; Lu, Z.; Wang, Y.; Xiao, J.; Wang, Y. ACDNet: Adaptively Combined Dilated Convolution for Monocular Panorama Depth Estimation. arXiv 2022, arXiv:2112.14440. [Google Scholar] [CrossRef]
Wang, P.; Chen, P.; Yuan, Y.; Liu, D.; Huang, Z.; Hou, X.; Cottrell, G. Understanding Convolution for Semantic Segmentation. arXiv 2018, arXiv:1702.08502. [Google Scholar]
Ding, X.; Zhang, X.; Zhou, Y.; Han, J.; Ding, G.; Sun, J. Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs. arXiv 2022, arXiv:2203.06717. [Google Scholar]
Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2015, arXiv:1409.1556. [Google Scholar]
Liu, C.; Yin, F.; Wang, D.; Wang, Q. Online and offline handwritten Chinese character recognition: Benchmarking on new databases. Pattern Recognit. 2013, 46, 155–162. [Google Scholar] [CrossRef]
Yang, W.; Jin, L.; Tao, D.; Xie, Z.; Feng, Z. DropSample: A new training method to enhance deep convolutional neural networks for large-scale unconstrained handwritten Chinese character recognition. Pattern Recognit. 2016, 58, 190–203. [Google Scholar] [CrossRef]
Bai, H.; Zhang, X.W.; Fu, Y.G. Adaptive visualization of extracted digital ink characters in Chinese. Comput. Eng. Appl. 2012, 48, 153–158. [Google Scholar]
Liu, C.-L.; Yin, F.; Wang, D.-H.; Wang, Q.-F. CASIA Online and Offline Chinese Handwriting Databases. In Proceedings of the 2011 International Conference on Document Analysis and Recognition, Beijing, China, 18–21 September 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 37–41. [Google Scholar] [CrossRef]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. Int. J. Comput. Vis. 2020, 128, 336–359. [Google Scholar] [CrossRef]

Figure 1. Examples of Chinese characters with writing errors and non-standardizations from international students.

Figure 2. The steps of trajectory normalization. (a) The character after removing redundant points. (b) The character after horizontal flipping. (c) The character after coordinate normalization.

Figure 3. One-dimensional ResNetDC network architecture. Note that k and s represent the convolution kernel size and convolution stride, respectively.

Figure 4. The structure of the Block 1 module. Note that k, s, and r represent the convolution kernel size, stride, and dilation rate, respectively.

Figure 5. The modeling process of regular one-dimensional convolutional networks.

Figure 6. The modeling process of one-dimensional dilated conventional networks.

Figure 7. The training loss of 1-D ResNetDC on CASIA-OLHWDB1.0.

Figure 8. The verification accuracy of 1-D ResNetDC on CASIA-OLHWDB1.0.

Figure 9. Effect of pre-training.

Figure 10. Number of samples per category. Note that the black line indicates the number of samples per category in descending order.

Figure 11. Comparison of the validation accuracies of three trajectory representations on CASIA-OLHWDB1.0.

Figure 12. Comparison of the verification accuracies of three trajectory representations on DICCs for International Students.

Figure 13. Comparison of training losses for three networks with different numbers of Block modules.

Figure 14. Comparison of the verification accuracies of three networks with different numbers of Block modules.

Figure 15. Comparison of training losses between ordinary convolution and dilation convolution.

Figure 16. Comparison of verification accuracy between ordinary convolution and dilation convolution.

Figure 17. Training loss comparison of different trajectory representations.

Figure 18. Comparison of the verification accuracies of different trajectory representations.

Figure 19. Comparison of the verification accuracy of six-dimensional and ten-dimensional feature representation after pre-training.

Figure 20. Recognition of standard Chinese characters. Note that P represents the predicted category, and T represents the true category.

Figure 21. Recognition of Chinese characters with extra strokes.

Figure 22. Recognition of Chinese characters with missing strokes.

Figure 23. Recognition of Chinese characters with non-standard strokes.

Figure 24. Recognition of Altered Chinese Characters.

Figure 25. The time steps of concern in the ordinary convolutional network when classifying the Chinese character “典”.

Figure 26. The time steps of concern in the 1-D ResNetDC when classifying the Chinese character “典”.

Figure 27. The points of concern in the ordinary convolutional network when classifying the Chinese character “典”.

Figure 28. The points of concern in the 1-D ResNetDC when classifying the Chinese character “典”.

Table 1. Comparison of 1-D CNN, 2-D CNN, RNN, and GNN approaches for DICCR.

Methods	Data Type	Computing Mechanism	Storage Cost	Data Type Conversion
1-D CNN [7]	Sequence	Parallel	Low	Not need
2-D CNN [3,4,5,6]	Image	Parallel	High	Sequences to (feature) images
RNN [8,9]	Sequence	Serial	Low	Not need
GNN [20,21]	Graph	Parallel	Middle	Sequences to graphs

Table 2. The main steps of trajectory normalization.

Steps	Aims
Removing redundant points	Reduce the size of input data and the amount of computation
Flipping horizontally	Adjust the display of Chinese characters
Coordinate normalization	Normalize the x- and y-coordinates into a standard interval

Table 3. Number of channels in the convolutional layers of the Block modules.

Layer	Block 1	Block 2	Block 3	Block 4
1st layer	64	128	256	512
2nd layer	64	128	256	512
3rd layer	64	128	256	512
4th layer	64	128	256	512
5th layer	256	512	1024	2048
Shortcut	256	512	1024	2048

Table 4. Information about the convolution layers of the Block modules.

Layer	Convolution Kernel Size	Dilation Rate	Dilated Kernel Size	Receptive Field
1	1	1	1	1
2	3	1	3	3
3	3	2	5	7
4	3	3	7	13
5	1	1	1	13

Table 5. Comparison of recognition results for various approaches on the CASIA-OLHWDB1.0 dataset.

Method	CASIA-OLHWDB1.0
Traditional approach of MQDF [30]	95.28
MCDNN [3]	94.39
DropSample-DCNN [31]	96.96
1-D ResNetDC	96.26

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xu, H.; Zhang, X. Recognizing Digital Ink Chinese Characters Written by International Students Using a Residual Network with 1-Dimensional Dilated Convolution. Information 2024, 15, 531. https://doi.org/10.3390/info15090531

AMA Style

Xu H, Zhang X. Recognizing Digital Ink Chinese Characters Written by International Students Using a Residual Network with 1-Dimensional Dilated Convolution. Information. 2024; 15(9):531. https://doi.org/10.3390/info15090531

Chicago/Turabian Style

Xu, Huafen, and Xiwen Zhang. 2024. "Recognizing Digital Ink Chinese Characters Written by International Students Using a Residual Network with 1-Dimensional Dilated Convolution" Information 15, no. 9: 531. https://doi.org/10.3390/info15090531

APA Style

Xu, H., & Zhang, X. (2024). Recognizing Digital Ink Chinese Characters Written by International Students Using a Residual Network with 1-Dimensional Dilated Convolution. Information, 15(9), 531. https://doi.org/10.3390/info15090531

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Recognizing Digital Ink Chinese Characters Written by International Students Using a Residual Network with 1-Dimensional Dilated Convolution

Abstract

1. Introduction

2. Related Work

2.1. Digital Ink Chinese Characters Recognition

2.1.1. RNN Approaches

2.1.2. Graph Neural Network Approaches

2.1.3. Convolution Neural Network Approaches

2.2. Digital Ink Trajectory Representation

2.3. Dilated Convolution

3. Preprocessing of DICCs for International Students

3.1. Trajectory Normalization

3.2. Trajectory Representation

4. One-Dimensional Residual Networks with Dilation Convolution

4.1. Deep Structure

4.2. Residual Modules

4.3. Multi-Scale Dilated Convolution

4.4. Multi-Scale Convolution Kernels

5. Recognizing DICCs Based on the Pre-Trained Model

6. Experiments

6.1. Datasets

6.2. Model Training

6.3. Influence of Different Trajectory Representation on Classification Accuracy

6.4. Ablation Experiment

6.4.1. Varying Numbers of Block Modules

6.4.2. Varying Dilation Rates

6.4.3. Scalability of Feature Representation

6.5. Result Analysis

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI