Next Article in Journal
Exploring User Engagement in Museum Scenario with EEG—A Case Study in MAV Craftsmanship Museum in Valle d’Aosta Region, Italy
Next Article in Special Issue
A Study on Webtoon Generation Using CLIP and Diffusion Models
Previous Article in Journal
Advanced Image Steganography Using a U-Net-Based Architecture with Multi-Scale Fusion and Perceptual Loss
Previous Article in Special Issue
Latent Regression Bayesian Network for Speech Representation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Principal Component Analysis-Based Logistic Regression for Rotated Handwritten Digit Recognition in Consumer Devices

Department of Aeronautics and Astronautics, National Cheng Kung University, Tainan 70101, Taiwan
*
Author to whom correspondence should be addressed.
Electronics 2023, 12(18), 3809; https://doi.org/10.3390/electronics12183809
Submission received: 5 August 2023 / Revised: 1 September 2023 / Accepted: 5 September 2023 / Published: 8 September 2023
(This article belongs to the Special Issue Machine Learning and Deep Learning Based Pattern Recognition)

Abstract

:
Handwritten digit recognition has been used in many consumer electronic devices for a long time. However, we found that the recognition system used in current consumer electronics is sensitive to image or character rotations. To address this problem, this study builds a low-cost and light computation consumption handwritten digit recognition system. A Principal Component Analysis (PCA)-based logistic regression classifier is presented, which is able to provide a certain degree of robustness in the digit subject to rotations. To validate the effectiveness of the developed image recognition algorithm, the popular MNIST dataset is used to conduct performance evaluations. Compared to other popular classifiers installed in MATLAB, the proposed method is able to achieve better prediction results with a smaller model size, which is 18.5% better than the traditional logistic regression. Finally, real-time experiments are conducted to verify the efficiency of the presented method, showing that the proposed system is successfully able to classify the rotated handwritten digit.

1. Introduction

Computer vision has been widely used in many fields to enable image recognition [1]. As a result, there have been many applications of image recognition, such as automatic following car, license plate recognition, and facial recognition [2,3]. The work in [3] developed a Convolutional Neural Network (CNN)-based recognition system to recognize Peruvian license plates. The research developed a model with 100% accuracy, 0% failure, and 100% sensibility, with 100% specificity. It is a typical use of computer vision to perform image recognition. Among all of its uses, one of the most common applications in the field of consumer electronics is handwritten digit recognition, such as handwriting input on smartphones or smart pads.
A common use of the handwritten digit recognition system in common consumer mobile phones is shown in Figure 1. With a handwritten character “8”, as illustrated in Figure 1a, the typing system can automatically recognize the digit. However, this experiment shows that handwritten digit recognition only works successfully when the digit is vertically well aligned. Recognition failure will be induced when the digit is subjected to apparent rotation, as shown in Figure 1b.
Therefore, the novelty of the research is that it proposes a handwritten digit recognition system with high accuracy for rotated images that significantly reduces the requirements of high computation complexity and memory, such as the Convolutional Neural Network (CNN)-based method shown in ref. [4]. Unlike the CNN-based pattern recognition methods, the proposed method is adequate for implementation in low-cost embedded systems or consumer electronics with restricted storage capacities. To develop such a low-cost consumer electronics system, it is necessary to choose appropriate methods of image pose correction and image recognition algorithms.
There are a lot of studies working on handwritten image recognition. For example, ref. [5] proposed a Deep Convolutional Self-Organizing Maps (DCSOM) network to learn unlabeled visual data. The experimental result showed that the proposed DCSOM had a high accuracy when predicting noise digits. Ref. [6] proposed an offline handwritten digit recognition system, which was trained using the MNIST dataset. The proposed system is a CNN-based recognition system, which is one of the most common types of computer vision model. Ref. [7] proposed a DeblurGAN-CNN model to recognize character images with noise, which is a system composed of two networks. The proposed system mentioned in ref. [7] uses different datasets to test the system, and the experimental results showed that the recognition system could recognize images with noise. The methods proposed by different researchers all have relatively high accuracy compared to basic methods. However, training such large models often takes long time. This issue is one of the most significant disadvantages of using these methods to perform image recognition.
Due to the long training and recognition time, lots of studies have tried to solve this problem. Ref. [8] focused on the detection Speed of Synthetic Aperture Radar (SAR), which became faster using faster region-based CNN (R-CNN). Though the accuracy of the proposed model is roughly the same as that of conventional R-CNN, the recognition speed is 8 times faster. Ref. [9] developed a CNN-based Range Partition (CRP), which satisfied the requirements of fast packet classification and online rule updating. As seen in the two studies, nowadays, developing recognition system with shorter training times and faster recognition speeds is one of the most important objectives.
Out of all of the methods of solving recognition problems, logistic regression is one of the most common [10]. The idea of this method is to use the concept of probability to solve the multiple-class classification problem, which can, thus, be used to enable recognition. As it is a basic but powerful method, there are several studies using logistic regression to perform different types of tasks. Ref. [11] used online logistic regression to approach the parameter estimation problem to replace Sequential Monte Carlo (SMC), while ref. [12] used logistic regression to approach the binary dynamical process used to reconstruct a network. Logistic regression is used in these articles for different purposes, which shows that it is a basic but powerful algorithm in terms of modeling. In this article, it will be used to perform digit image recognition.
The other important field of digit recognition is image straightening. Inspired by the ideas of [13,14], the proposed method applies PCA to image rotation to increase the accuracy of handwritten digit identification. The two previous studies showed that PCA can be used to perform image alignment. Using PCA to straighten the rotated images, the accuracy of recognition can be improved. However, it is found that PCA has a problem of orientation uncertainty. When an image is aligned via PCA, the image might be upside down, as PCA only uses the direction of the first principal component. To solve this problem, ref. [13] used a method of finding the direction and the angle of the rotation. In this research, PCA is not only used to reduce the dimension of the data, but also to find appropriate axes to perform image rotation. Therefore, the result of the “principal component” via PCA has its physical meanings and is hard to replace with other feature extraction methods, such as Linear Discriminant Analysis (LDA). The LDA can produce a better data separation result, since it can maximize the separation between different classes and minimize it between the data with the same classes. However, the coordinates found via LDA are not guaranteed to be orthogonal, which may cause the digits to distort after rotation.
There are also different methods used to find the orthogonal coordinates to prevent distortion, such as Sparse Principal Component Analysis (SPCA) and Multilinear Principal Component Analysis (MPCA). However, they require more time and computation complexity compared to conventional PCA. Considering the real-time application in electronic devices with limited storage capacities, PCA will be the best candidate.
Unlike the paper in ref. [13], PCA is also utilized to reduce the dimension of the features to perform classification, as illustrated in refs. [15,16]. With a reduced data dimension, the training process can be much faster than traditional logistic regression. This speed allows the proposed method to use fewer parameters and reduces the storage capacities required to perform implementation. Similarly, although the identification accuracy in refs. [17,18] may have better results than the proposed method, it can be challenging to realize these methods in embedded systems or consumer electronic devices. In contrast, the proposed method has the advantage of seamless integration into real-time applications of constrained storage devices.
Based on this advantage, PCA is, thus, used in some studies that focus on high-dimensional data. Ref. [19] proposed a deep method based on PCA and DCNN to classify normal traffic and abnormal traffic, while ref. [20] used PCA in many different cases to verify that PCA and random projection can be used to perform dimension reduction. On the other hand, the low recognition time can make the real-time recognition frequency much higher. This fact helps improve the performance of a trained model when performing the real-time task.
In this article, an improved PCA is proposed to work with logistic regression, which solves the orientation uncertainty of traditional PCA. The model trained via PCA-based logistic regression has a much better performance than that trained via traditional logistic regression and Neural Networks (NN).
Experiments show that the model trained via the proposed method can successfully recognize the oblique handwritten digits that cannot be recognized on current mobile phones, as shown in the case demonstrated in Figure 1. Accordingly, the method’s three main advantages are summarized as follows:
  • Better robustness of the trained model. The model trained via the proposed method can recognize a rotated image, which allows the model to work well, despite rotation issues affecting the incoming data.
  • Faster training process and higher real-time recognition frequency. Using PCA to reduce the data dimension, the model can recognize the image using much fewer features, which makes the model easier to train and require less memory storage to perform implementation.
  • Higher accuracy than many other classifiers. The PCA-based logistic regression presented in this article can solve the problem of orientation uncertainty, which leads to the higher accuracy of testing data.
  • Owing to the light computational complexity, the developed handwritten digit recognition algorithm can be realized in an embedded system or integrated into current consumer 3C devices.
Using the proposed PCA-based logistic regression method, the accuracy of testing using the rotated testing set can reach 77.5%. Compared to other algorithms trained via the MATLAB toolbox, the proposed method has a much faster training process and higher testing accuracy. The experimental result shows that the three contributions make the proposed method a potential tool for performing real-time recognition.
In Section 2.1, the research roadmap will be given to enable a clear understanding of the structure of the proposed method. The used methodology will be introduced, as will the processes of recognition. The detail of each method will then be explained in the rest of Section 2. The experimental platform description and associated result are presented in in Section 3 and Section 4, respectively. The further comparison studies are considered and discussed in Section 5. Finally, the conclusion of the article is given in Section 6.

2. Methodology

2.1. Description of Research Roadmap

The structure of the PCA-based logistic regression proposed in this article is shown in Figure 2. The proposed method was divided into two parts: data pre-processing and logistic regression. The first part used PCA to straighten the image, smoothened the image using a convolution filter, and reduced the dimension of the whole dataset. The second part was logistic regression, which could efficiently identify the class of each image.
To straighten a rotated image, the coordinates of each pixel on the image were treated as data points. By finding the first and second principal components of all of the data points on one image, the two components were used as a new basis. This new basis spanned a subspace onto which the data points were mapped. Once the data points were mapped onto the subspace, the digit image composed of these data points became straight.
However, there was a downside to using data points in the subspace to draw a digit. As the new coordinate of each pixel might not have been an integer after mapping, the data points would deviate from the correct pixel after rounding. This issue led to some noise being present on the straightened digit image. To remove the noise caused by the image straightening, a convolution filter was used in this study. The details on how to identify the type of noise and choose the kernel are described in the methodology section.
Having a straightened digit image, the whole dataset dimension was reduced via PCA. Each image in MNIST had 28 × 8 pixels, which is a large number for model training. It was extremely inefficient to train a model with all 784 features derived from 60,000 images. To make the training more efficient, this study used PCA to find the principal components of the whole dataset. The MNIST dataset was mapped onto the subspace spanned by sufficient principal components.
After all of the previous pre-processing, the dataset was used to train a model using logistic regression. The recognition model trained via the proposed method contained the data pre-processing and the matrix from logistic regression. It was tested via a validation set to check the result and determine necessary adjustments. The testing dataset was used to test the model after producing a proficient accuracy of validation. In the experiment, the validation set was not used again to perform testing. The original data were split into a training dataset, validation dataset, and testing dataset. The model after training was first evaluated using the validation set. If the validation accuracy was high enough, the model was then examined using the testing dataset.
The model was eventually used to perform real-time digit recognition. It was proved that the proposed method could be used to recognize different digits in the experiment. This result made the proposed method a potentially highly useful tool for usage in consumer electronics.

2.2. Logistic Regression

Compared to linear regression, logistic regression is a more advanced method with a logic function. The structure of the logistic regression is shown in Figure 3, which shows that the class with the highest probability will be considered to be the class to which the input belongs.
The input is multiplied by a weighting matrix with a bias added, which will be placed into a logic function to generate a probability-based output [21]. The mathematic expression can be written as follows:
f w i ( x ) = s ( w i T x ) = y ^ i , s ( w i T x ) = e w i T x i = 1 m e w i T x
To identify the difference between the output and the target, a loss function is required. The difference between two probability distributions can be found using maximum likelihood as follows:
L ( W ) = j = 1 n i = 1 m s ( w i T x j ) y j i
By taking the negative nature log of the likelihood, the equation becomes cross-entropic in nature [22].
ln L ( W ) = j = 1 n i = 1 m y j i ln s ( w i T x j ) = j = 1 n i = 1 m y j i ( w i T x ln i = 1 m e w i T x j )
In other words, finding maximum likelihood has the same purpose of finding minimum cross-entropy, which is shown in (3). Taking the first order of the a t h class that the j t h data belong to, it becomes
ln L ( W ) w a = j = 1 n i = 1 m y j i ( w i T x w a ln i = 1 m e w i T x w a )
The minimum cross-entropy can be found by differentiating (4) on a term-by-term basis as follows
w i T x j w a = { x j , i = a 0 , i a
ln i = 1 m e w i T x j w a = 1 i = 1 m e w i T x j i = 1 m e w i T x j w a = x e w a T x j i = 1 m e w i T x j = s ( w a T x j ) x j
Notice that w i T x j w a = x j only at the a t h class. Substitute both terms back to the cross-entropy equation as follows:
ln L ( W ) w a = j = 1 n i = 1 m y j i ( w i T x j w a ln i = 1 m e w i T x j w a ) = j = 1 n i = 1 m y j i ( x j ( i = a ) s ( w a T x j ) x j ) = j = 1 n y j a ( 1 s ( w a T x j ) ) x j = j = 1 n ( y j a s ( w a T x j ) ) x j
This equation is the result of the a t h class to which the j t h data belong. Now, if we take the first-order differentiation of b t h class to which the j t h data do not belong, the result becomes
ln L ( W ) w b = j = 1 n i = 1 m y j i ( w i T x j w b ln i = 1 m e w i T x j w b ) = j = 1 n i = 1 m y j i ( 0 ( i = b ) s ( w b T x j ) x j ) = j = 1 n y j a ( 0 s ( w b T x j ) ) x j = j = 1 n ( y j b s ( w b T x j ) ) x j
where y j a = 1 and y j b = 1 . The result is the same, no matter which class of weight we differentiate. Thus, to perform the update of the matrix of all classes, we have
W n e w = W η n = 1 i [ Y j s ( W T x j ) ] x j
The physical meaning is that the weights will be updated according to the error of the output, considering the learning rate. This approach can lead to the weighting matrix that yields the output with high accuracy.

2.3. Principal Component Analysis

In this article, image pre-processing will be focused on applying PCA to each of the digit images to perform image straightening and the whole dataset to perform data dimension reduction. To explain how PCA works with regard to image rotation, the article first employs a two-dimensional image dataset as an illustration example.
Let P be a two-dimensional dataset, with n data points having two features as follows:
P = [ p 1 p 2 p n ] 2 × n , p i = [ x i y i ]
where x i and y i represent the pixel coordinate of a digit for the x-axis and y-axis, respectively. The idea of PCA is to find a new basis for a subspace of the space to which the original features belongs, which gives the largest variance of the projected feature.
This demand of maximum variance can be achieved by taking
arg max α 1 R m × 1 i = 1 n ( α 1 T ( p i μ ) ) 2 = arg max α 1 R m × 1 i = 1 n ( α 1 T p c i ) 2 = arg max α 1 R m × 1 i = 1 n ( α 1 T p c i p c i T α 1 )
where μ represents the mean value of (10), and α 1 represents the 1st principle axis.
Consider the unit vector constraint α 1 T α 1 = 1 , imposing a Lagrange operator as follows:
arg max α 1 R m × 1 i = 1 n ( α 1 T p c i p c i T α 1 ) λ 1 ( α 1 T α 1 1 )
where the maximum value can be found by taking the partial derivative with respective to α 1 and λ 1 , which leads to
α 1 arg max α 1 R m × 1 i = 1 n ( α 1 T p c i p c i T α 1 ) λ 1 ( α 1 T α 1 1 ) = 0 i = 1 n ( p c i p c i T α 1 ) λ 1 α 1 = 0
and
λ 1 arg max α 1 R m × 1 i = 1 n ( α 1 T p c i p c i T α 1 ) λ 1 ( α 1 T α 1 1 ) = 0 α 1 T α 1 1 = 0
Based on (13), it is indicated that
M c α 1 λ 1 α 1 = 0
where M c = i = 1 n ( p c i p c i T ) is the covariance matrix, which plays a key role in the PCA algorithm.
To calculate the second principal axis, consider the constraints
α 2 T α 2 = 1 ,   α 1 T α 2 = 0
Equation (16) shows that the principal axes are mutually orthogonal, which guarantees the shape and angle preservations during the rotation.
Based on (16), we considered another maximization problem using Lagrange operators as follows:
arg max α 2 m × 1 α 2 T M α 2 λ 2 ( α 2 T α 2 1 ) + φ 1 ( α 1 T α 2 0 )
Taking the partial derivative of (17) w.r.t α 2 gives
2 M α 2 2 λ 2 α 2 + φ 1 α 1 = 0
Multiplying α 1 on both side gives
2 α 1 T M α 2 2 λ 2 α 1 T α 2 + φ 1 α 1 T α 1 = 0 φ 1 α 1 T α 1 1 = 0 φ 1 = 0
Therefore, from (15) and (18), we can conclude that
M c α 1 = λ 1 α 1
M c α 2 = λ 2 α 2
which leads to the standard “eigen-value” and “eigen-vector” problem.
So far, it can be seen that the first and second principal axes satisfy the orthogonal properties. Thus, these axes can be applied for 2D image rotation without causing pixel distortions. As a result, when the two eigen-vectors with the first two largest eigen-values are set to be the projections vectors, it is guaranteed that the two vectors are orthogonal, and the variances after the projection remain the largest.
For high-dimensional conditions, the PCA can be used to perform dimensional reduction. Based on the above derivation, we can find the following extension
[ M c α 1 M c α 2 M c α m ] = [ λ 1 α 1 λ 2 α 2 λ m α m ]
where
{ α i T α j = 1 i f i = j α i T α j = 0 i f i j
Recalling the singular value decomposition (SVD), we give a data set P c n × m , and we have
M c = P c T P c = ( U Σ V T ) T ( U Σ V T ) = V Σ T U T U Σ V T = V Σ T Σ V T
where V : = [ v 1 v 2 v m ] and Σ T Σ : = d i a g ( σ 1 , σ 2 , , σ m ) , which represent the singular vector and singular value of M c .
Equation (24) can be re-represented by
[ M c v 1 M c v 2 M c v m ] = [ σ 1 v 1 σ 2 v 2 σ m v m ]
Comparing (25) to (22), it can be seen that the eigen-values of the covariance matrix M c are also the singular value of the covariance matrix M c . Moreover, for (24), the covariance matrix M c can be written as the combination of matrices, that is
M c = σ 1 v 1 v 1 T + σ 2 v 2 v 2 T + + σ m v m v m T = i = 1 m σ i v i v i T
Equation (26) represents the essential concept of the matrix significance contribution and can be applied to perform dimension reduction, which will be introduced later in this paper.
Using the first two principal components is the same as using the first two eigenvectors as a new basis to perform a linear transformation. We applied the PCA using the first two principal components of dataset P , and the result is shown in Figure 4.
It shows that the data points have been rotated after applying PCA, which is the key property that will be applied to the MNIST dataset to solve the image straightening problem. We noticed that the second principal component needed to be used as the negative basis to prevent the numbers from flipping.

2.4. Image Straighten Using PCA

As shown in the previous section, a two-dimensional dataset can be straightened via PCA. We implemented PCA using the MNIST dataset, and the result is shown in Figure 5. Regarding image rotation of the 2D cases, a computation trick is presented in Appendix A, which can greatly enhance the computation speed and is useful when performing real-time implementation.
This result shows that it is possible to straighten the image via PCA if each pixel is viewed as a data point instead of a feature. However, there are two problems involved in using PCA to straighten an image. The first problem is orientation uncertainty. Using principal components found via traditional PCA as the new basis leads to digit straightening according to the direction of the components. However, the direction of the components might not be appropriate for some digits to follow. An improved PCA is proposed in this study, which is described in Section 2.5. The other problem concerns the new data point locations determined via PCA, which are composed of decimals rather than integers. As the image is composed of 28 × 28 pixels, the locations of the data points need to be integers. This issue causes some noise when the locations are rounded to integers. Therefore, an appropriate smoothing method needs to be provided to work with PCA to perform image straightening. The image smoothing issue is discussed in Section 2.6.

2.5. PCA Improving

Although PCA can be used to straighten an image, the traditional PCA remains subject to some problems. The examples of the two types of problems are shown in Figure 6.
The first problem is orientation uncertainty. As the first principal component might not be pointing upward, the straightened image might be rotated. The second problem is horizontal digits. If the digit has a horizontal ellipse shape, the image might be rotated 90 degrees in a leading or lagging manner.
To solve these problems, the first step is to consider the principal component matrix to be a rotation matrix as follows:
U = [ u 1 , 1 u 1 , 2 u 2 , 1 u 2 , 2 ] = [ cos ( θ ) sin ( θ ) sin ( θ ) cos ( θ ) ]
As the matrix U is the new basis of the projected subspace, the matrix will lead to the first issue if the component u 2 , 2 is negative. Therefore, the matrix should be multiplied by the negative component if u 2 , 2 is negative as follows:
u 2 , 2 < 0 U = [ u 1 , 1 u 1 , 2 u 2 , 1 u 2 , 2 ]
On the other hand, the pre-processing will encounter the second issue if the image is rotated more than 60 degrees. According to the practical observations, human-written digits are rarely rotated over 60 degrees. Furthermore, some digits may become similar when the rotation angle is too large, such as the digits “6“and “9”. Therefore, it is reasonable to have a rotation angle threshold for the proposed method.
Whether the matrix is 90 degrees leading or lagging can be found based on the value of atan 2 ( u 1 , 1 , u 1 , 2 ) . The image is rotated clockwise if the value is positive, which means that the angle of the rotation matrix should be positive, and vice versa.
atan 2 ( u 1 , 2 , u 1 , 1 ) > 0 U = U C W atan 2 ( u 1 , 2 , u 1 , 1 ) < 0 U = U C C W
Equation (29) shows how the rotation matrix is corrected. By correcting the matrix encountering the second problem in this way, the image can be correctly straightened. These two matrix corrections improve the performance of PCA. The model using this improved PCA has a 87.15% training accuracy, while the model using traditional PCA only has an 81.22% accuracy.
The proposed improved PCA prevents a normal digit from rotating due to image straightening. It makes the model performance better compared to using traditional PCA.

2.6. Image Smoothing

As mentioned in the previous section, the straightened images have some noises, which worsens the performance of the model. Therefore, it is necessary to remove those noises.
The noises are caused by missing pixels after straightening, as the data points might not be an integer after linear transformation. The rounding of image pixel values from decimals to integers leads to some holes on the digits, such as the rounding results, not being continuous. The holes become noises when the data points are seen as one image.
Considering the causes and distribution of noises, the noise type is very similar to pepper noise, which is often seen in computer vision studies. Therefore, an ideal filter for filtering the image should be a filter that can remove this kind of noise. There are two common filters used to perform image demising: the Gaussian filter and the median filter [23].
The way that convolution works is shown in Figure 7. A kernel, which is also known as filter, needs to be chosen to perform convolution. The kernel matrix is implemented on an image following a certain direction, normally from left to right and top to bottom. The matrix kernel and a part of an image with 3 × 3 pixels is multiplied on an element-by-element basis. The summation value of the part is used as the pixel value of the new image. For the number of pixels used as a filter, 3 × 3 pixels is the minimum number required to perform image smoothing. This result shows that the proposed recognition system is able to use fewer parameters to achieve high accuracy in terms of handwritten digit recognition.
For a 3 by 3 Gaussian filter, the equation can be written in kernel form as follows:
K g = 1 G ( x , y ) [ G ( 1 , 1 ) G ( 0 , 1 ) G ( 1 , 1 ) G ( 1 , 0 ) G ( 0 , 0 ) G ( 1 , 0 ) G ( 1 , 1 ) G ( 0 , 1 ) G ( 1 , 1 ) ]
where
G ( x , y ) = 1 2 π σ 2 exp ( x 2 + y 2 2 σ 2 )
The kernel (30) is implemented on an image and generates the output via convolution. The result of this kernel is a low-pass filter, which works by giving different weights to each pixel to smoothen the image.
K m = [ I x 1 , y 1 I x , y 1 I x + 1 , y 1 I x 1 , y I x , y I x + 1 , y I x 1 , y + 1 I x , y + 1 I x + 1 , y + 1 ]
where I x , y = m e d i a n ( K m ) . For the median filter, the median value of the image that the kernel covered is taken to replace I x , y , which is shown as (32). This approach can prevent the image’s features from becoming smoothed too much while removing the values that are extremely different from those of its neighbors. Due to this property, the median filter is a more appropriate filter for removing impulse noise.
The image’s appearance when filtered using each filter is shown in Figure 8. The image filtered using the median filter retains more features than that using the Gaussian filter. Therefore, this article chooses the median filter to work with PCA to perform image pre-processing.

3. Experiment Background

3.1. Data Image

Figure 9 shows the images of MNIST. The MNIST dataset has been well studied in recent years with regard to image recognition, especially handwritten digits. Therefore, the MNIST dataset is utilized to train and validate the proposed method in this paper. In this article, the training data will be divided into a training set and a validation set, which contain 50,000 and 10,000 images, respectively.
The testing set will only be used to perform the final testing to ensure the objectivity of the testing result. To verify whether the model is able to recognize the rotated digits that are not similar to those of the training set, the testing set is manually rotated. Each of the figures is composed of 28 × 28 pixels.

3.2. Hardware Specification

The hardware specifications used in the experiment and real-time recognition are shown in Table 1. All of the models are trained using the same computer for a fair comparison. Despite the proposed model being trained via the computer, a much simpler computer or computation unit is sufficient to perform the training of the proposed model.

3.3. Real-Time Recognition System

The model is eventually used to create a real-time handwritten digit recognition system, as shown in Figure 10. The recognition system is built via MATLAB to enable the calculation and user interface, and a handwriting board is used to perform digit writing.
The system is connected as shown in Figure 11. The digit is written by the user using the handwriting board, which is the input of the recognition system. Both the recognition result and the handwritten digit display are calculated using the same computer. explanation

3.4. Comparison Study Platform

The results of other comparison methods shown in Table 1 are conducted via the MATLAB Classification Learner toolbox. The interface can be seen in Figure 12. The MATLAB toolbox is able to train different methods and optimize their recognition results.

4. Experimental Result

4.1. MNIST Pre-Processing

Before performing logistic regression, image pre-processing needs to be implemented on the images derived from MNIST. The result of the pre-processed testing data is shown in Figure 13. The pre-processed data have much reasonable digit orientation, which makes the digits easier for the model to recognize.
After image pre-processing, PCA is implemented on the whole dataset to perform dimension reduction. We recall (26), where a data were actually composed of all the principal component. Therefore, the first few principal components with larger singular values are sufficient to perform recognition. The result of the training with different numbers of features is shown in Figure 14. The training result using only 100 features would be as accurate as that using 784 features.
Each of the principal components can be reconstructed to form a 28 by 28 image using each data component, as shown in Figure 15. The result shows the importance of the principal component that drops with the order of it, where the 784th component has nothing at all.
As mentioned, the accuracy of the logistic regression using 100 features is as high as that using 784 features, the input images of which are shown in Figure 16, along with reconstructed images of different dimension subspaces. The m represents the dimension of the projected subspace of the original 784-dimensional feature space.
The result shows that the PCA can help to reduce the data to a sufficient dimension, which apparently reduces the training time.

4.2. MNIST Recognition Result

The final training method uses both training and validation sets to perform model training without rotation, while the testing set is rotated manually. The reason for keeping the training set in a non-rotated form is to make sure that all of the compared methods are fairly trained and used. The experimental result is shown in Figure 17.
The training accuracy is 87.15%, while the testing accuracy is 73.50%. The accuracy is calculated by dividing the number of the successfully recognized images by the total number of MNIST images.
The result shows that even if the model is trained using raw images derived from MNIST, it can still recognize a rotated MNIST digit image. The result shows that the model trained via the proposed method can be used on consumer electronics, which are often required to be robust to the unseen rotated images in the training data.
The model trained via the proposed method can be used to recognize the digit that cannot be recognized on a mobile phone, as shown in Figure 18. The characters in red blocks represent the recognition results for smart phones, while the number with the largest probability in the bar plot indicates the identification result of the proposed system. The correct digit can be successfully identified by applying the presented method. Therefore, the developed classification scheme is indeed a potential tool for the recognition of handwritten digit subjects to orientation uncertainties and can be further integrated into existing low-cost consumer electronic devices.

5. Comparison Study

To verify how much better the proposed method is than the traditional logistic regression, a logistic regression model without image straightening is used to recognize the rotated images in this study. The testing accuracy of the traditional logistic regression is only 55%. The result shows that the proposed method, the testing accuracy of which is 73.5%, does improve the performance of the rotated digit recognition.
To further validate the effectiveness of the proposed method, different methods are applied to conduct a classification comparison. The associated statistical result is summarized in Table 2. Experiments shows that the PCA-based logistic regression has one of the highest testing accuracies for recognition while using the shortest training time. All of the models are trained using original MNIST training data and tested using rotated MNIST testing data.
In Table 2, most of the models cannot recognize the rotated digits as effectively as the proposed method. Except for the proposed model, there are only two models that have accuracies higher than 70%, the training times of which are 6000 times longer than that of the proposed method. The results show that the proposed method has a better recognition accuracy, as it uses much fewer parameters and spends less time on training, meaning that it shows great potential for application in systems with limited storage capacities.
The recognition results of the different rotated handwritten digits are shown in Figure 19. The result shows that the model can successfully recognize the rotated digit for a real-time application, as the model classifies the digit via an extremely simple but efficient calculation.
As the low-cost recognition system can recognize a rotated digit that has never been trained during the training process, users of the system are able to write the digits without a limited angle restriction.

6. Conclusions

Nowadays, the handwritten digit recognition system is widely used in several kinds of 3C consumer electronics, including mobile phones, laptops, and tablet computers. However, when a handwritten number is rotated due to some human factors, the recognition system might fail to identify the digit. Therefore, this article proposes a PCA-based logistic regression to solve the digit rotation problem. The proposed method has four main contributions. Firstly, the PCA is proven to be able to perform image straightening. This ability can help to improve the practicability of the trained model, as the model can be used when the image is rotated. Secondly, the PCA helps to reduce the dimension of the whole dataset without affecting the accuracy. This ability leads to a lower training time and higher detection frequency, which can improve the performance of logistic regression and real-time detection tasks. Thirdly, an improved PCA for image orientation correction was presented, which makes the model’s performance better than that of the model using traditional PCA. The improved PCA uses the rotated matrix to decide whether a digit has an orientation problem after pre-processing and revising the rotation matrix. The digits can be more appropriately straightened through this improvement. The first three contributions improve the performance of the model trained via the logistic regression and the process of training it. The proposed method leads to a more robust model that has the highest accuracy and shortest training time among the three compared models.
The main limitation of the proposed method is that the alignment process in the proposed method cannot address the rotation angle of the digits over 60 degrees. However, it is rare to find a digit that is rotated over 60 degrees written by humans. Furthermore, the proposed recognition has a high accuracy with the artificial rotated digit, as seen in Figure 10 in this paper. Therefore, it is reasonable to have a rotation angle threshold for the proposed method.
Although there are some issues waiting to be solved in the future, such as improving the accuracy, this study shows that PCA is indeed a potential choice to perform data pre-processing to improve logistic regression, especially when the handwritten digits are subject to rotations. Finally, owing to the low computational complexity, the proposed method can be integrated into the current low-cost portable devices, and real-time recognition can be achieved. The attached demonstration (Supplementary Video S1) verified the effectiveness of the developed system.

Supplementary Materials

The following supporting information can be downloaded via the following link: https://www.mdpi.com/article/10.3390/electronics12183809/s1, Video S1: Real-Time Handwritten Digit Recognition.

Author Contributions

Conceptualization, C.-C.P.; Methodology, C.-C.P.; Software, C.-C.P. and C.-Y.H.; Validation, C.-Y.H.; Formal analysis, C.-C.P. and C.-Y.H.; Investigation, C.-C.P. and Y.-H.C.; Writing—review & editing, C.-C.P. and Y.-H.C.; Supervision, Y.-H.C.; Project administration, C.-C.P.; Funding acquisition, C.-C.P. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partly supported by the Ministry of Science and Technology under grant number MOST 111-2923-E-006-004-MY3.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author, upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Since the PCA plays an important role in 2D image pose correction, in the following equation, a computation trick is presented, and it is further used in real-time implementation.
Recall a centralized 2D image data set as follows:
{ p c 1 , p c 2 , p c N } = { [ x c 1 y c 2 ] , [ x c 2 y c 2 ] , , [ x c N y c N ] } T P c N × 2
For (A1), define a 2D covariance matrix using
S = 1 N 1 P c T P c 2 × 2
which can be factorized using SVD as follows:
S = U s Σ s V s T
Since S is a symmetric matrix, it gives S T = S , and, therefore, we have U s = V s , where
U s = [ u 1 u 2 ] = [ u 1 , 1 u 1 , 2 u 2 , 1 u 2 , 2 ] = [ cos ( θ ) sin ( θ ) sin ( θ ) cos ( θ ) ]
Apparently, there are only two unknown decision variables. These variables are
U s = [ u 1 u 2 ] = [ u 1 , 1 u 2 , 1 u 2 , 1 u 1 , 1 ]
where u 11 and u 21 need to be solved. Put simply, u 2 is available as long as u 1 can be solved.
Since U s T = U s 1 , it follows that
S U s = U s Σ S [ u 1 u 2 ] = [ u 1 u 2 ] [ σ 1 0 0 σ 2 ] [ S u 1 S u 2 ] = [ σ 1 u 1 σ 2 u 2 ]
Equation (A6) shows that the pairs ( σ 1 , σ 2 ) and ( u 1 , u 2 ) are the eigen-values and eigen-vectors of S , respectively.
Let S = [ s 1 , 1 s 1 , 2 s 1 , 2 s 2 , 2 ] . As a result, the eigen-values (or singular values) can be easily solved by applying
det ( [ σ s 1 , 1 s 1 , 2 s 1 , 2 σ s 2 , 2 ] ) = 0 σ 2 ( s 1 , 1 + s 2 , 2 ) σ + s 1 , 1 s 2 , 2 s 1 , 2 2
which gives
σ 1 , 2 = α ± β 2
where
α = ( s 1 , 1 + s 2 , 2 ) β = ( s 1 , 1 s 2 , 2 ) 2 ( 2 s 1 , 2 ) 2
So far, the eigen-values (or singular values) are available. Next, the corresponding eigen-vectors can be easily solved using
( S σ 1 I ) u 1 = 0 [ s 1 , 1 σ 1 s 1 , 2 s 1 , 2 s 2 , 2 σ 1 ] [ u 1 , 1 u 2 , 1 ] = 0
Through observation, we can find that the solution
u 1 = [ u 1 , 1 u 2 , 1 ] = [ s 1 , 2 σ 1 s 1 , 1 ] [ s 1 , 2 σ 1 s 1 , 1 ]
can be applied to meet the requirement (A10) and, thus, it is
u 2 = [ u 2 , 1 u 1 , 1 ]
Obviously, using the aforementioned equivalent representation, the singular value, as well as the singular vector, can be obtained via simple algebra computations without using SVD decomposition. This process can enhance the tool’s real-time performance.

References

  1. Bishop, C.M.; Nasrabadi, N.M. Pattern Recognition and Machine Learning; Springer: Berlin/Heidelberg, Germany, 2006; Volume 4. [Google Scholar]
  2. Salah, A.A.; Alpaydin, E.; Akarun, L. A selective attention-based method for visual pattern recognition with application to handwritten digit recognition and face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 420–425. [Google Scholar] [CrossRef]
  3. Valdeos, M.; Velazco, A.S.V.; Paredes, M.G.P.; Velásquez, R.M.A. Methodology for an automatic license plate recognition system using Convolutional Neural Networks for a Peruvian case study. IEEE Lat. Am. Trans. 2022, 20, 1032–1039. [Google Scholar] [CrossRef]
  4. Ahmed, S.S.; Mehmood, Z.; Awan, I.A.; Yousaf, R.M. A novel technique for handwritten digit recognition using deep learning. J. Sens. 2023, 2023, 2753941. [Google Scholar] [CrossRef]
  5. Aly, S.; Almotairi, S. Deep Convolutional Self-Organizing Map Network for Robust Handwritten Digit Recognition. IEEE Access 2020, 8, 107035–107045. [Google Scholar] [CrossRef]
  6. Li, J.; Sun, G.; Yi, L.; Cao, Q.; Liang, F.; Sun, Y. Handwritten Digit Recognition System Based on Convolutional Neural Network. In Proceedings of the 2020 IEEE International Conference on Advances in Electrical Engineering and Computer Applications (AEECA), Shanghai, China, 25–27 August 2020; pp. 739–742. [Google Scholar]
  7. Gonwirat, S.; Surinta, O. DeblurGAN-CNN: Effective Image Denoising and Recognition for Noisy Handwritten Characters. IEEE Access 2022, 10, 90133–90148. [Google Scholar] [CrossRef]
  8. Li, Y.; Zhang, S.; Wang, W.Q. A Lightweight Faster R-CNN for Ship Detection in SAR Images. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
  9. Zhang, X.; Xie, G.; Wang, X.; Zhang, P.; Li, Y.; Salamatian, K. Fast Online Packet Classification with Convolutional Neural Network. IEEE/ACM Trans. Netw. 2021, 29, 2765–2778. [Google Scholar] [CrossRef]
  10. Watt, J.; Borhani, R.; Katsaggelos, A.K. Machine Learning Refined: Foundations, Algorithms, and Applications; Cambridge University Press: Cambridge, UK, 2020. [Google Scholar]
  11. Leong, A.S.; Zamani, M.; Shames, I. A Logistic Regression Approach to Field Estimation Using Binary Measurements. IEEE Signal Process. Lett. 2022, 29, 1848–1852. [Google Scholar] [CrossRef]
  12. Liu, Q.M.; Ma, C.; Xiang, B.B.; Chen, H.S.; Zhang, H.F. Inferring Network Structure and Estimating Dynamical Process from Binary-State Data via Logistic Regression. IEEE Trans. Syst. Man Cybern. Syst. 2021, 51, 4639–4649. [Google Scholar] [CrossRef]
  13. Rehman, H.Z.U.; Lee, S. Automatic image alignment using principal component analysis. IEEE Access 2018, 6, 72063–72072. [Google Scholar] [CrossRef]
  14. Vretos, N.; Nikolaidis, N.; Pitas, I. A model-based facial expression recognition algorithm using Principal Components Analysis. In Proceedings of the 2009 16th IEEE International Conference on Image Processing (ICIP), Cairo, Egypt, 7–10 November 2009; pp. 3301–3304. [Google Scholar]
  15. Garg, I.; Panda, P.; Roy, K. A low effort approach to structured CNN design using PCA. IEEE Access 2019, 8, 1347–1360. [Google Scholar] [CrossRef]
  16. Akbar, M.A.; Ali, A.A.S.; Amira, A.; Bensaali, F.; Benammar, M.; Hassan, M.; Bermak, A. An Empirical Study for PCA- and LDA-Based Feature Reduction for Gas Identification. IEEE Sens. J. 2016, 16, 5734–5746. [Google Scholar] [CrossRef]
  17. Zhong, Y.-W.; Jiang, Y.; Dong, S.; Wu, W.-J.; Wang, L.-X.; Zhang, J.; Huang, M.-W. Tumor radiomics signature for artificial neural network-assisted detection of neck metastasis in patient with tongue cancer. J. Neuroradiol. 2022, 49, 213–218. [Google Scholar] [CrossRef] [PubMed]
  18. Cohen, T.; Welling, M. Group equivariant convolutional networks. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; pp. 2990–2999. [Google Scholar]
  19. Yao, C.; Yang, Y.; Yin, K.; Yang, J. Traffic Anomaly Detection in Wireless Sensor Networks Based on Principal Component Analysis and Deep Convolution Neural Network. IEEE Access 2022, 10, 103136–103149. [Google Scholar] [CrossRef]
  20. Yang, F.; Liu, S.; Dobriban, E.; Woodruff, D.P. How to Reduce Dimension With PCA and Random Projections? IEEE Trans. Inf. Theory 2021, 67, 8154–8189. [Google Scholar] [CrossRef] [PubMed]
  21. Michalis Titsias RC AUEB. One-vs-each approximation to softmax for scalable estimation of probabilities. Adv. Neural Inf. Process. Syst. 2016, 29. [Google Scholar] [CrossRef]
  22. Mannor, S.; Peleg, D.; Rubinstein, R. The cross entropy method for classification. In Proceedings of the 22nd International Conference on Machine Learning, Bonn, Germany, 7–11 August 2005; pp. 561–568. [Google Scholar]
  23. Brownrigg, D.R. The weighted median filter. Commun. ACM 1984, 27, 807–818. [Google Scholar] [CrossRef]
Figure 1. Handwritten digit recognition system in a mobile phone. The images in red blocks represent the recognition result of the system. (a) The system correctly recognizes the digit. (b) The system fails to recognize the digit subjected to rotations.
Figure 1. Handwritten digit recognition system in a mobile phone. The images in red blocks represent the recognition result of the system. (a) The system correctly recognizes the digit. (b) The system fails to recognize the digit subjected to rotations.
Electronics 12 03809 g001
Figure 2. Research roadmap.
Figure 2. Research roadmap.
Electronics 12 03809 g002
Figure 3. Structure of logistic regression.
Figure 3. Structure of logistic regression.
Electronics 12 03809 g003
Figure 4. (a) is the raw dataset P , where (b) is the dataset with new basis. Linear transformation determined via setting the first two principal components, the two arrows, as the new basis.
Figure 4. (a) is the raw dataset P , where (b) is the dataset with new basis. Linear transformation determined via setting the first two principal components, the two arrows, as the new basis.
Electronics 12 03809 g004
Figure 5. Image straightening of MNIST using PCA.
Figure 5. Image straightening of MNIST using PCA.
Electronics 12 03809 g005
Figure 6. Examples of two main types of data in the error set: (a) represents the orientation uncertainty, and (b) represents the horizontal digit problem.
Figure 6. Examples of two main types of data in the error set: (a) represents the orientation uncertainty, and (b) represents the horizontal digit problem.
Electronics 12 03809 g006
Figure 7. The convolution process of an image.
Figure 7. The convolution process of an image.
Electronics 12 03809 g007
Figure 8. Result of image smoothing with different filters.
Figure 8. Result of image smoothing with different filters.
Electronics 12 03809 g008
Figure 9. The images of MNIST.
Figure 9. The images of MNIST.
Electronics 12 03809 g009
Figure 10. Real-time handwritten digit recognition system.
Figure 10. Real-time handwritten digit recognition system.
Electronics 12 03809 g010
Figure 11. Real-time handwritten digit recognition system connection.
Figure 11. Real-time handwritten digit recognition system connection.
Electronics 12 03809 g011
Figure 12. Snapshot of MATLAB Classification Learner.
Figure 12. Snapshot of MATLAB Classification Learner.
Electronics 12 03809 g012
Figure 13. Image pre-processing of MNIST. The images on the top are the raw images derived from MNIST, while those at the bottom are the pre-processed images.
Figure 13. Image pre-processing of MNIST. The images on the top are the raw images derived from MNIST, while those at the bottom are the pre-processed images.
Electronics 12 03809 g013
Figure 14. Training accuracy with different data dimensions.
Figure 14. Training accuracy with different data dimensions.
Electronics 12 03809 g014
Figure 15. Reconstructed PCA image of different components.
Figure 15. Reconstructed PCA image of different components.
Electronics 12 03809 g015
Figure 16. Reconstructed images of different dimension subspaces.
Figure 16. Reconstructed images of different dimension subspaces.
Electronics 12 03809 g016
Figure 17. Confusion tables: (a) training result and (b) testing result.
Figure 17. Confusion tables: (a) training result and (b) testing result.
Electronics 12 03809 g017
Figure 18. Handwritten digit recognition system comparisons.
Figure 18. Handwritten digit recognition system comparisons.
Electronics 12 03809 g018
Figure 19. Result of the model’s real-time application.
Figure 19. Result of the model’s real-time application.
Electronics 12 03809 g019
Table 1. Hardware specification for experiment.
Table 1. Hardware specification for experiment.
ModelInformation
SystemWindows 10 22H2
SoftwareMATLAB 2023a
CPU11th Gen Intel(R) Core(TM) i7-11800H @ 2.30 GHz
RAM16 GB
Handwritten BoardBamboo CTH-470
Table 2. Accuracy comparison between different methods.
Table 2. Accuracy comparison between different methods.
ModelTraining Time (Sec)Validation Acc
(%)
Testing Acc
(%)
Model Size
(kB)
Fine Tree78.9871.841.5156
Median Tree73.6061.234.7115
Coarse Tree58.8436.021.8107
Linear Discriminant401.8981.849.81000
Quadratic DiscriminantFailed
Efficient Logistic Regression631.8883.959.86000
Efficient Linear
Support Vector Machine
432.2585.956.810,000
Gaussian Naïve BayesFailed
Linear Support Vector Machine7890.6090.059.16000
Quadratic
Support Vector Machine
9528.8094.366.2220,000
Cubic Support Vector Machine12,029.0095.065.1218,000
Fine Gaussian
Support Vector Machine
94,873.0056.310.43,000,000
Median Gaussian
Support Vector Machine
25,055.0091.737.7493,000
Coarse Gaussian
Support Vector Machine
27,872.0088.756.9543,000
Fine K Nearest Neighbor7052.6090.961.2360,000
Medium K Nearest Neighbor21,222.0091.262.3360,000
Coarse K Nearest Neighbor25,057.0086.659.3360,000
Cosine K Nearest Neighbor28,134.0090.857.5360,000
Cubic K Nearest Neighbor68,775.0087.643.3360,000
Weighted K Nearest Neighbor36,165.0091.562.5360,000
Booster Trees28,201.0073.845.63000
Bagged Trees47,474.0091.964.260,000
Subspace Discriminant32,226.0032.351.975,000
Subspace K Nearest Neighbor98,531.0095.280.75,000,000
RUSBooster Trees31,726.0065.335.43000
Narrow Neural Network38,601.0085.845.5181
Medium Neural Network34,970.0089.649.5274
Wide Neural Network36,564.0094.266.7740
Bilayered Neural Network43,035.0085.846.1183
Trilayered Neural Network43,716.0086.148.2184
Support Vector Machine Kernel107,320.0095.375.61,600,000
Logistic Regression Kernel88,626.0092.668.71,600,000
PCA-Based Logistic Regression15.2087.273.54628
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Peng, C.-C.; Huang, C.-Y.; Chen, Y.-H. Principal Component Analysis-Based Logistic Regression for Rotated Handwritten Digit Recognition in Consumer Devices. Electronics 2023, 12, 3809. https://doi.org/10.3390/electronics12183809

AMA Style

Peng C-C, Huang C-Y, Chen Y-H. Principal Component Analysis-Based Logistic Regression for Rotated Handwritten Digit Recognition in Consumer Devices. Electronics. 2023; 12(18):3809. https://doi.org/10.3390/electronics12183809

Chicago/Turabian Style

Peng, Chao-Chung, Chao-Yang Huang, and Yi-Ho Chen. 2023. "Principal Component Analysis-Based Logistic Regression for Rotated Handwritten Digit Recognition in Consumer Devices" Electronics 12, no. 18: 3809. https://doi.org/10.3390/electronics12183809

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop