1. Introduction
In the past, the artificial intelligence field has accomplished a quantum leap. Currently, studies utilizing artificial intelligence are conducted and applied to various fields [
1,
2,
3]. Especially, owing to the improvement in computing power, the areas of utilization constantly become wider in the field of computer vision. Furthermore, previous studies performed research related to image-text recognition [
4,
5,
6,
7]. Similarly, there are studies on optical character recognition (OCR), which is considered the representative technique of today for detecting texts utilizing image data [
8,
9,
10,
11]. Text and handwriting recognition started with the recognition of numbers in Latin text in the late-1950s and is expanding to include various languages, such as Chinese, Japanese, Arabic, and Persian. Currently, the methods for recognizing new languages are becoming more accurate [
12,
13]. The recognition accuracy has increased through different recognition models because of the differences in the form of each language and in note-taking methods. Recently, the demand for handwriting recognition, such as the automation of mail sorting and electronic memo pads, has exponentially increased in various industrial fields. In addition, in the image recognition field, methods using convolutional neural networks (CNN), which show outstanding performance, have been applied to handwriting recognition [
14,
15].
Ashiquzzaman et al. [
16] proposed an efficient CNN-based text recognition for Arabic handwriting. Their study used data augmentation to enhance the accuracy of the model. In addition, the method used dropout layers to resolve the problem of data overfitting and appropriately changed the activation function to overcome the vanishing gradient problem. Sampath and Gomati [
17] proposed a hybrid neural network training algorithm for English handwriting OCR. It removed the noise of the input image and adjusted the size of the image using a median filter. In addition, feature sets and positional and structural descriptors were extracted from the input image. After the feature sets were extracted, the proposed FLM-based neural network recognized the handwritten text. FLM is a method that combines the firefly algorithm with the Levenberg–Marquardt (LM) algorithm for neural network training [
18]. The proposed FLM-based neural network was integrated into a feedforward neural network, and, based on the size of the training data, the number of hidden neurons, and the number of hidden layers, 95% accuracy was achieved.
Shivakumara et al. [
19] proposed a CNN-recurrent-neural-network (RNN)-based license plate recognition method. Their study investigated the combination of CNN and RNN, that is, bi-directional long short-term memory (BLSTM). Since CNN has high recognition power, it has been used in feature extraction, while BLSTM has been used in the extracting function of the context information based on past information. In addition, for the classification of license plates, dense cluster-based voting was proposed to differentiate the foreground from the background. The methods of these studies can achieve fast speed and high accuracy regarding the input data, can objectively and accurately judge complicated images, and can provide various services. However, the recognition rate is low, and there is a limitation to the recognition of various styles of writing if they fall outside a certain standard, or there is a modification of the image. In addition, there is a disadvantage that a large volume of labeled data is needed for the classification work [
20,
21]. Moreover, it is difficult to maintain spatial and structural consistency for the results of image segmentation. These technical drawbacks cause problems, such as unclear image outlines and incomprehensible small-area segmentation. In addition, to accomplish a high recognition rate in the handwriting recognition field, a deep structure CNN has been used. However, in the handwriting recognition field, terminals with limited resources, such as smartphones or tablet PCs, are often used. Therefore, memory occupied by the model and the calculation speed must be considered [
22,
23].
Thus, in this paper, we propose a line-segment feature analysis algorithm using input dimensionality reduction for handwritten text recognition. The proposed method is a dimensional reduction algorithm that compensates for the previously mentioned issues. It extracts points, lines, and faces from the original data. To extract such information, it uses a filter obtained by modifying the parameters of a 3 × 3 Laplacian mask. The information extracted through this filter is called a line segment map (LS-map), and it has a value of 0 or 1. LS-map performs a synthesis multiplication using a 3 × 3 filter and a 5 × 5 filter with unique numbers 0, 1, 2, 4, 8, 16, 32, 64, and 128 as parameters. Through this process, area 1 in the LS-map becomes replaced with a unique number, and because of the replaced information, the information about all line segments contains a series of numerical patterns according to each feature (visual information, such as vertical, horizontal, or curved). By summing up these patterns, we assign a unique value to each feature, and, by accumulating these patterns, we generate a 1-D vector. This vector has a maximum size of 256 owing to the filter parameters. This process is executed once for a 3 × 3 filter and 5 × 5 filter and, thus, generates two vectors. By merging these two vectors, a 1-D vector with the size of 512 elements is created. This vector is then used as input data for the learning model. This newly proposed algorithm is called line-segment feature analysis (LFA) because it generates new data using the features of line segments.
This paper is organized as follows.
Section 2 describes the handwritten text recognition service using artificial intelligence (AI) and the feature selection algorithm for data dimensionality reduction.
Section 3 describes the line-segment feature analysis algorithm for the input dimensionality reduction in machine learning. For that purpose, we use feature extraction of the line-segment features (LS-features) analysis algorithm, unique number matching with LS-features, and eigenvalue cumulative aggregation.
Section 4 describes the experiment with the proposed algorithm.
Section 5 concludes the paper.
3. Line-Segment Feature Analysis Algorithm for Input Dimensionality Reduction
This algorithm utilizes line-segment features to reduce the data dimensions. This means that the features of a point, line, and face are considered the fundamental factors constituting an image. In this study, line-segment information of an image is applied to a series of operation processes, and, then, one-dimensional line-segment cumulative total data with a size of 512 elements are created. For this reason, the algorithm is called a line-segment feature algorithm.
Figure 3 shows the flow of the line-segment feature analysis algorithm for input dimensionality reduction. This algorithm identifies and totals line-segment information, and then creates a series of features through three processes: (a) feature extraction from the handwritten text image in which features such as points, lines, and faces are extracted through a filter, (b) unique number matching with LS-features (by applying a series of unique numbers to the extracted LS-maps, each line segment is assigned a series of numeric patterns), (c) eigenvalue cumulative aggregation (unique values are generated by summing the given numeric patterns, and this value is used to generate cumulative aggregate data).
3.1. Feature Extraction Process for LFA
Algorithm 1 shows the feature detection algorithm, and
Figure 4 shows the feature extraction process of the LFA algorithm, which is the first step of the algorithm. In this process, the features of a point, line, and side of input data are extracted. The original data are classified into points, lines, and faces to detect various characteristics. A line refers to the basic segment information, while a side refers to the original information. A point is information that distorts a segment, and the characteristics are extracted by distorting segments in the subsequent process.
Algorithm 1: Feature Detection Algorithm |
Input: [x1, x2, …, xn], Kw, Kh |
def Extraction of Feature |
LF = [[0, 1, 0][1, −4, 1][0, 1, 0]] |
PF = [[1, −1, 1],[−1, 1, −1],[1, −1, 1]] |
for i from 0 to n do |
for w from 0 to W-Kw do |
for h from 0 to H-Kh do |
for m from 0 to Kw do |
for n from 0 to Kh do |
L(w,h) += xi(w + m, h + n) × LF |
P(w,h) += xi(w + m, h + n) × PF |
if L > threshold then L[L > threshold] = 1 else L[L < threshold] = 0 |
if P > threshold then P[P > threshold] = 1 else P[P < threshold] = 0 |
LSMap(i) = [L, P, xi] |
Output: LSMap |
and in Algorithm 1 are the size of the filter, and x is the handwritten text image. W and H are the dimensions of x and ⨂ is a convolution operation. n is the number of images, and the data calculated through the filter is .
The process of extracting the characteristics of point, line, and faces in the LFA is described in Equations (1)–(5). This algorithm uses PF and LF filters in Equation (1) in order to extract the point, line, and faces features of line segments. Each filter is able to extract the contour, point segment, and face areas of the original data. The convolution (⨂) of each filter by x, each point, line, and side map is generated (as shown in Equations (2) and (3)). The parameters of LF and PF filter in Equation (1) used and changed the parameters of the Laplacian filter. The LF filter is the Laplacian filter {{0, −1, 0}, {−1, 4, −1}, {0, −1, 0}}, and the PF is a filter designed by changing some parameters of the Laplacian filter in order to generate distortion, such as points around the contour of an object.
The features extracted by using the image of the upper case ‘L’ are shown in
Figure 5.
Figure 5a shows the original data. When
is applied to (a), the contour of an area has some distortion, and the point segment as presented in
Figure 5b is generated.
extracts the contour of
. The extracted contour is shown in
Figure 5c, and
is the application of the original data as is, which is shown in
Figure 5d (as shown in Equation (4)). The italic-type number database used in this study is from the Extending Modified National Institute of Standards and Technology (EMNIST) [
44], which has binary data as presented in
Figure 5a. Therefore, it is possible to obtain side data without any additional work. The features of each segment extracted in this way are bound to generate an
which is shown in Equation (5). When ⨂ in Equations (2) and (3) is executed, stride and padding are 1 value. Therefore, each feature has the same size as the original data, and the converted data are internal data only, which are either 0 or 1.
3.2. Unique Number Matching with LS-Features
In the second process of the LFA, a unique number matches the extracted line-segment information for the
, as shown in
Figure 6.
Figure 6a shows the characteristics map extracted from
Section 3.1, and
Figure 6b,c are filters with a series of parameters. The reason this unique number is provided is to express the information about all the segments (e.g., curves, horizontal lines, and vertical lines) constituting the visual data as numerical information in a series, which makes it easy to combine. Algorithm 2 shows the unique number matching algorithm. K is the filter (the filter of size 3 × 3 or 5 × 5), and
and
are the sizes of K. N is the number of
(the number of images). LS is the data with unique numbers matched. The unique numbers used are 0, 1, 2, 4, 8, 16, 32, 64, and 128 values. Each unique number is not the same, and a different total unique number is drawn. They are similar to a series of values used to convert a binary number to a decimal number in the engineering field. The LFA algorithm utilizes a 3 × 3 filter (
Figure 6b) with unique numbers for matching with normal line-segment data, and a 5 × 5 filter (
Figure 6c) with 0 in between unique numbers for extracting data of the distorted line segment. The
is 0 or 1. If the filter with unique numbers matches the
, only the area where the unique number is 1 reacts and is displayed. This, again, emulates the process of converting a binary number to a decimal number in the engineering field. The line-segment data are presented with 1, and a unique number is assigned to each line segment.
Algorithm 2: Unique Number Matching Algorithm |
Input: , K, Kw, Kh |
def Unique Number Matching |
for i from 0 to N do |
for w from 0 to W-Kw do |
for h from 0 to H-Kh do |
for m in 0 to Kw do |
for n in 0 to Kh do |
L(w, h) += (i,0)(w + m, h + n) × K(Kw, Kh))
|
P(w, h) += (i,1)(w + m, h + n) × K(Kw, Kh))
|
S(w, h] += (i,2)(w + m, h + n) × K(Kw, Kh))
|
LS(i) = [L, P, S]
|
Output: LS |
Table 1 shows some of the types of normal line-segment data that can be extracted by a 3 × 3 filter.
Table 1 presents some of the values that can be matched and extracted with the
using a 3 × 3 filter. As shown in
Table 1, there are basic line types (e.g., point, vertical, horizontal, curved, and diagonal) and additional types, which include an activate area and a non–activate area. In an activate area, all areas in a filter are filled with 1, and all unique numbers 0, 1, 2, 4, 8, 16, 32, 64, and 128 values are found. In a non–activate area, all areas are filled with 0, and no unique numbers are detected. The median value of the 3 × 3 filter is filled with zeros. In other words, if the median value of the 3 × 3 filter is activated, it is judged to be a non–activate area.
The 5 × 5 filter has 0 added in between unique numbers of the 3 × 3 filter. Since unique numbers are placed in the outermost positions, it is possible to obtain information on distorted line segments. A distorted line-segment technique is used to analyze the activate areas presented in the outer position, and to extract line segments without considering the center value of the 5 × 5 filter.
Figure 7 shows the process of activating the area extraction using the 5 × 5 filter. In
Figure 7, (a) is the input data, (b) shows the 5 × 5 filter, and (c) is the matched result.
As shown in
Figure 7, the center of the 5 × 5 filter is a black box (consisting of zeros). If input data come in, the 1 located at the center is ignored. In other words, no matter what value is located in the area of the black box, the value is excluded. As presented in
Figure 7c, a unique number can match the activated value presented around the area of the black box, and, thereby, the line-segment {1, 16} can be obtained. When the data in
Table 1 are compared, it is possible to find that they are diagonal. From
Figure 7a, it is possible to obtain partial diagonal and side information. However, if the 5 × 5 filter is applied, only diagonal information is detected.
Through this process, the line-segment information on the input data is distorted and extracted. As described in this section, the LFA extracts the and enables a series of unique numbers to match line segments of the through the 3 × 3 and 5 × 5 filters. Through this process, it is possible to give an eigenvalue to each line segment in terms of the matched unique numbers. An eigenvalue is the total amount of the matched unique numbers. In the engineering field, the value is calculated by matching a series of values to digits of a binary value and then finally adding them up. The LFA algorithm adds up the parameter values of the filter converted from an active area (area expressed as 1) by multiplying the convolution (stride, padding is 1) of 3 × 3 filters having 0, 1, 2, 4, 8, 16, 32, 64, and 128 parameters to the . In this way, each line segment can obtain a particular value. If different line segments have the same value, they can be judged to be of the same type of line. In other words, by totaling the matched unique numbers, it is possible to find the types of line segments placed in a particular image.
3.3. Eigenvalue Cumulative Aggregation
The eigenvalues of the line segments extracted from the processes described above are different depending on the type of the line segment. In this study, the eigenvalue of a line segment is called the eigenvalues map (
). An
includes information on all line segments distributed in the input data. In this section, the process of aggregating information on all line segments of an
is described (
Figure 8). Algorithm 3 shows the cumulative aggregation algorithm. N is the number of LS, and
[256] is one-dimension arrangement of 256 elements in size. {0, …} is the expression that initializes the arrangement to 0. W and H are the sizes of LS.
Algorithm 3: Cumulative Aggregation Algorithm |
Input: LS |
def Cumulative Aggregation |
for n in N do |
= {0, …}
|
for w from 0 to W do |
for h from 0 to H do |
= LS(n)(w,h)
|
for i from 0 to 256 do |
= +1
|
(n) = |
Output: |
//Depending on the size of the filter, it is aggregated into or .
|
It is possible to measure an eigenvalue of up to 256 types. This is because the total number of matched unique numbers cannot exceed 256 values. In other words, it is only possible to calculate a vector with a maximum size of 256 elements. The aggregation process is presented in Equation (6).
In Equation (6),
represents aggregation data,
represents the size of the aggregation data, or 256 value.
and
indicate the size of the
. As shown in Equation (6), a one-dimensional one-vector
is generated. During this time, the size was 256 values. The maximum presentation range is fixed at 256 types owing to the size of the unique number, as described in
Section 3.2. Therefore, the size of
is set to 256 value. The initial value of the generated
is set to 0, and each index value of
increases with the use of the internal value of the
. In other words,
cumulatively aggregates with the use of an eigenvalue as an index value. The calculated
is used to obtain the number of line segments according to the types of line segments in the data. The LFA extracts information on accurate line segments with the use of the 3 × 3 filter, extracts information on distorted line segments with the use of the 5 × 5 filter, and then aggregates them. By summing up the cumulative aggregate data generated through the two filters, a 1-D vector with a size of 512 elements is obtained.
4. Experiment and Results
The experiment was performed on 64bit Windows, Intel
® Core CPU i7-6700, 16GB RAM, and VGA NVIDIA GeForce GTX 1060 6GB. The data used in the experiments were the handwritten text data of the Extended MNIST (EMNIST) dataset with a size of 28 × 28. The learning data of this dataset is 124,800 data points, and the test dataset is composed of 20,800 data points. The PCA used in these experiments compressed 99% of the original data and had data with a size of 331 elements. This environment was designed to identify the features in a more accurate manner, using the highest compression rate of the PCA. The test data were generated by adjusting the parameters of PCA to be equal to the size of the learning data. In this study, the data processed using LFA and PCA were compared and analyzed, and LFA and PCA were analyzed using k-nearest neighbors (KNN) and support vector machine (SVM), which are representative machine learning techniques. In addition, to raise the reliability of the accuracy measurement, other dimensionality reduction techniques, linear discriminant analysis (LDA), independent component analysis (ICA), and t-distributed stochastic neighbor embedding (TSNE), were also performed. This experiment compared and considered the data processed through the suggested LFA and PCA.
Figure 9 shows the size of the resulting data according to the compression of the data. The division was made by combining the training and testing data of the EMNIST database and then randomly mixing them and dividing them into the predetermined training and test set sizes.
The results in
Figure 9 show that the training and testing sizes were different for each compression rate. This is attributed to the linear combination, which is one of the shortcomings of PCA. PCA calculates the principal components by taking a covariance matrix for all inputs. In this process, PCA selects data with a large variance as the principal components without considering the features of the selected principal components. Therefore, PCA can yield different-sized outputs depending on the number and class distribution of the data. One technique to solve this issue is to fix the parameters of the PCA. As such, PCA can generate data of the same size regardless of the inputs. In this case, the selection of parameters for PCA is very critical. In the case of
Figure 9, the output data shows a large difference of approximately 90 in the size between 99% and 98% compression. On the other hand, LFA maintains a constant size by identifying the number of linear segments and generating cumulative aggregate data. In addition, because its internal elements are of an integer type, they have relatively light capacities.
Figure 10 shows the capacities of the processed data.
Since LFA derives an outcome by cumulatively summing up the unique values of all line segments, it only takes an integer value, and its aggregate data is simple count data. Therefore, it requires low memory capacity to store the data. In this experiment, the accuracy of KNN and SVM on LFA, PCA, LDA, ICA, and TSNE was measured.
Figure 11 shows the accuracy measurement data using machine learning. The KNN and SVM were used with all parameters to set to their default values. We evaluated the performance of each technique in the same environment, which allowed us to understand the accuracy of each dimensionality reduction algorithms in the default environment. In the analysis in
Figure 11, it is shown that the accuracy of each dimensionality reduction algorithms was 90% or above. In terms of accuracy with KNN, LFA showed the highest accuracy (97.5%), which was followed by PCA (96.6%), as shown in
Figure 11a. This was followed by ICA (96.2%), TSNE (93.1%), and LDA (92.3%). On the other hand, in terms of SVM accuracy, LFA showed the highest accuracy (98.9%), which was followed by TSNE (96.4%), ICA (96.1%), PCA (93.8%), and LDA (92.1%), as shown in
Figure 11b. In conclusion, LFA, showed relatively higher accuracy than the other dimensionality reduction algorithms. Especially using SVM, LFA showed the highest accuracy at 98.9% among the measured performances.
We suggest the LFA algorithm that reduces the size of image through the analysis of line types. The suggested algorithm showed the reduction of computation by the size reduction of the image with the use of EMNIST in KNN and SVM environments, along with higher performance than the existing model in accuracy. We consider the reason for these experimental results as the result of strong classification of characteristics. In the LFA algorithm, the lines for the shapes of objects existing in the input images are converted to aggregated data. This property maintains the characteristics of shapes while classifying strong characteristics effectively in reducing data sensitive to shapes. Letters are one of data sensitive to shapes. In recognizing letters, colors are classified as insignificant characteristics, and it classifies based only on the shapes of letters. If letter “O” is converted through the LFA algorithm, the aggregated values on the forms of the curve and diagonal line are greatly enhanced, and other forms show a low aggregate. This factor functions as a big advantage in classifying shapes. The distribution of aggregated values includes different shape characteristics. For these reasons, in the reduction of sizes of cursive letter images, the linear characteristics of the relevant letters are highlighted, and the sizes are effectively reduced without loss of characteristics. Thanks to this process, the high computation of the existing learning model is decreased.
In addition, this study measured precision, recall, accuracy, and the receiver operating characteristic (ROC) curve to evaluate the classification performance [
45,
46].
Figure 12 and
Figure 13 show precision and recall, and ROC is shown in
Figure 14. After examining each performance evaluation, it is noted that LFA shows high accuracy, precision, and recall, and it is found that its ROC curve indicates a good performance.
Figure 12 and
Figure 13 show low performance at some classes (6, 8, and 16). However, the other algorithms also show low classification performance of those classes. When reviewing ROC curves, LFA shows higher performance than the comparison algorithms with KNN. In SVM, on the other hand, LFA, LDA, and TSNE show similar performances with high performance in some sections. When these results are combined, the LFA shows a similar and partially higher performance than the comparison algorithms across all results calculated by the two models. Although this algorithm cannot be evaluated as showing a better performance than the existing dimensionality reduction algorithms, it can be evaluated as effective in reducing image data dimensionality.
5. Conclusions
In this paper, we proposed a line-segment feature analysis algorithm using input dimensionality reduction for handwritten text recognition. It suggests a dimensionality reduction algorithm to compensate for the problems that might be caused by the linear combination generated by PCA used for dimensional reduction algorithms. Unlike PCA, this newly suggested algorithm uses all the information of linear segments with all features to generate new features by identifying the main features and then orthogonally projecting all the remaining features. This new algorithm extracts contours, lines, and faces from the given data and then identifies the types of line segments and sums them up. Through this process, it generates a 1-D vector with a size of 512 elements. LFA uses 3 × 3 and 5 × 5 filters to extract features from line-segment information. These filters have serial numbers of 0, 1, 2, 4, 8, 16, 32, 64, and 128 as parameters. The 3 × 3 filter derives information based on the type of line segment, such as point, vertical, horizontal, and diagonal. On the other hand, the 5 × 5 filter distorts the features of linear segments to derive information. Each derived information is in the format of a 1-D vector with a size of 256 elements, and, by merging these data, LFA derives data with a size of 512 elements.
To evaluate the performance of the algorithm suggested in this study, machine learning (KNN and SVM) with the EMNIST database was used. To increase the reliability of this experiment, LDA, ICA, and TSNE were also performed. The results show that LFA achieved 97.5% accuracy with KNN and 98.9% with SVM, and PCA achieved 96.6% and 93.8% with KNN and SVM, respectively. We obtained insight into the potential of LFA in this experiment. Therefore, we are dedicated to designing an optimal learning model for the LFA algorithm as well as working on the improvement and future direction of the LFA algorithm by using various data (emotion, face, etc.). In addition, we will continue to enhance the LFA, so that it can relate to various learning neural networks that are currently used.