Next Article in Journal
A Multi-Agent Prediction Method for Data Sampling and Transmission Reduction in Internet of Things Sensor Networks
Previous Article in Journal
Investigation on the Integration of Low-Cost NIR Spectrometers in Mill Flour Industries for Protein, Moisture and Ash Content Estimation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Two-Stage Feature Generator for Handwritten Digit Classification

1
Vakifbank, 06200 Ankara, Turkey
2
Department of Avionics, Atilim University, 06830 Ankara, Turkey
3
Department of Computer Engineering, Konya Food and Agriculture University, 42080 Konya, Turkey
4
Department of Computer Engineering, KTH Royal Institute of Technology, SE-114 28 Stockholm, Sweden
5
Department of Computer Engineering, OSTIM Technical University, 06370 Ankara, Turkey
*
Author to whom correspondence should be addressed.
Sensors 2023, 23(20), 8477; https://doi.org/10.3390/s23208477
Submission received: 8 June 2023 / Revised: 12 September 2023 / Accepted: 9 October 2023 / Published: 15 October 2023
(This article belongs to the Section Sensing and Imaging)

Abstract

:
In this paper, a novel feature generator framework is proposed for handwritten digit classification. The proposed framework includes a two-stage cascaded feature generator. The first stage is based on principal component analysis (PCA), which generates projected data on principal components as features. The second one is constructed by a partially trained neural network (PTNN), which uses projected data as inputs and generates hidden layer outputs as features. The features obtained from the PCA and PTNN-based feature generator are tested on the MNIST and USPS datasets designed for handwritten digit sets. Minimum distance classifier (MDC) and support vector machine (SVM) methods are exploited as classifiers for the obtained features in association with this framework. The performance evaluation results show that the proposed framework outperforms the state-of-the-art techniques and achieves accuracies of 99.9815% and 99.9863% on the MNIST and USPS datasets, respectively. The results also show that the proposed framework achieves almost perfect accuracies, even with significantly small training data sizes.

1. Introduction

Pattern recognition typically involves both feature generation and classification. In pattern recognition approaches, such as face recognition and digit recognition, a feature extractor aims to find the characteristics of patterns that can discriminate and separate classes. However, a variability of features can lead to difficulties in such approaches. For example, even if it is desirable to have small within-class variability in a face recognition approach, varying lighting conditions can lead to differences in features. Similarly, digits written by different people in digit recognition systems can cause variability in features [1,2,3,4,5,6,7]. Hence, determining and using the most efficient framework for feature generation and classification is crucial in the pattern recognition approaches.
There are studies conducted on feature generation and classification in the literature. For instance, linear transformation techniques, such as principal component analysis (PCA), singular value decomposition (SVD), independent component analysis, discrete Fourier transform, Hadamard and Haar transforms, and discrete time wavelet transform (DTWT), are used for feature generation in the literature [7]. Moreover, neural networks (NNs) are used for classification in numerous studies such as in [2,3,4,5,6,7]. It should be stressed that typical recognition architectures use a single feature extractor followed by a supervised classifier. However, as stated in [8,9], two successive stages of feature generation yield higher accuracies than a one-stage extractor. There are also studies in the literature in which two or more feature extractors are cascaded, and the resulting features are used to train a supervised classifier. Even though such studies are exploited for the feature generation and classification in the literature, they cannot deal with within-class variability with a high performance which may cause difficulties for them to distinguish features into different classes.
In this paper, a new feature extraction method is presented. The method uses two consecutive attribute extractors. The first one generates the projected patterns on principal components or eigenvectors obtained from the covariance matrix of the data via PCA. The second provides the hidden layer outputs of a partially trained neural network (PTNN), where training of the neural network is stopped after a few epochs, i.e., the training is not fully completed. These two generators are cascaded, that is, the outputs of the PCA stage become the inputs of PTNN. We show that the proposed feature generator reduces within-cluster variability. This makes it much easier to distinguish data from different classes. The original input data are first transformed into a new space referred to as the PCA feature space. Then, the feature space is transformed into another space through a PTNN with one hidden layer with various hidden units. We show that a two-stage feature generator is advantageous in terms of the distribution of clusters in the feature space.
It should be stressed here that the framework proposed in this study reduces within-cluster variability as compared to state-of-the-art studies. In this way, it becomes much easier to differentiate data from different classes using the framework. Moreover, the proposed framework can enable achieving low intra-class and high inter-class variations. In addition to all these advantages, more importantly, the proposed framework achieves the best performance on the MNIST and USPS handwritten digit datasets compared to all the studies in the literature. To assess the clusterability of the features generated using the proposed method, minimum distance classifier (MDC) and support vector machine (SVM) are used as classifiers.
This paper is organized as follows. Section 2 presents state-of-the-art studies in the literature. The proposed framework is discussed in Section 3. Section 4 discusses the verification of intra-class and inter-class feature distributions. The experimental results are described in Section 5. Finally, Section 6 concludes the study.

2. State of the Art

There are studies based on handwritten character recognition in the literature. For instance, Mellouli et al. [1] proposed a new convolutional neural network (CNN) architecture using morphological filters for digit recognition. The morphological configuration was called Morph-CNN, which achieved a test accuracy of 99.66% on the MNIST dataset. Patel et al. proposed a multi-resolution technique using a discrete wavelet transform (DWT)-based approach for handwritten character recognition [10]. The authors used the DWT to extract features and they also used the MDC to recognize the system output. Their technique achieved an overall success rate of 90.00%. Ayyaz et al. [11] proposed a hybrid feature extraction system based on the SVM. Their system was tested on both handwritten digits and uppercase alphabets, which achieved higher efficiency compared to other methods. Shubhangi et al. [12] proposed a structural micro-feature system based on the SVM to recognize handwritten English characters and digits with a high recognition rate.
Liu et al. [13] proposed an NN-based system, which achieved improved accuracy by discriminative training and achieved a 98.45% recognition rate on the CENPARMI dataset. Suen et al. [14] developed a system to sort and identify cheques and financial documents on the CENPARMI dataset, which achieved a success rate of 98.85%. Lee et al. [15] proposed an offline handwritten digit recognition system for the CEDAR dataset, which achieved a recognition rate of 99.09%. Filatov et al. [16] designed a system based on an address script to identify handwritten postal addresses for US mail on the CEDAR dataset, which achieved a success rate of 99.54%.
In [17], a discriminative cascaded CNN model was used, which achieved an error rate of 0.18% on the MNIST dataset. Ganapathy et al. [18] studied a multiscale NN recognition system. In [19], a single-layer NN achieved a 98.39% accuracy on the proposed MNIST dataset. In [20], four different techniques, i.e., the PCA, CNN, SVM, and multi-classifier systems, were used to develop a powerful system for handwritten character recognition, which achieved a success rate of 98.50% on the MNIST dataset. In [21], a cascaded PCA, binary hashing, and block-wise histograms were used with a very simple deep learning network for image classification, which achieved a 99.67% recognition rate. In [22], a system based on a multicolumn deep neural network (MCDNN) was developed using 35 pre-trained CNNs, which achieved an error rate of 0.23% on the MNIST dataset. Bruna et al. [9] used an invariant scattering convolution network, which achieved an error rate of 0.43% on the MNIST dataset. Goodfellow et al. [23] used a convolution max-out system to regularize dropout, which achieved an error rate of 0.45% on the MNIST dataset. Zeiler et al. [24] proposed stochastic pooling on deep CNN, which achieved an error rate of 0.47% on the MNIST dataset. In [25], a context-dependent deep NN/hidden Markov model was used for large-vocabulary speech recognition. This system was tested on both the MNIST and TIMIT datasets, which achieved an error rate of 0.83% on the MNIST dataset. Jarrett et al. [8] used large CNNs and achieved an error rate of 0.53% on the MNIST dataset without distortions. Yu et al. [26] used a hierarchical two-layer sparse coding network on pixels, which achieved an error rate of 0.77% on the MNIST dataset. Keysers et al. [27] proposed an image distortion model based on local optimization, which achieved a low error rate of 0.54% on the MNIST dataset. In [28], a scalable generative model based on a convolutional deep belief network was used for unlabeled data from the MNIST dataset, which achieved an error rate of 0.82%.
In [29], pattern recognition using average patterns of categorical k-nearest neighbors was proposed, which achieved error rates of 1.27% and 3.44% on the MNIST and USPS datasets, respectively, using kernel classification on categorical average patterns. In [30], a discriminative-based supervised dictionary learning was developed, which achieved test error rates of 0.60% and 2.40% for the MNIST and USPS datasets, respectively. Error rates of 1.66% and 2.59% were achieved using SVM and KNN, respectively, on the USPS test set in [31]. In [32], perceptron learning of a modified quadratic discriminant function (MQDF) was used to achieve error rates of 1.49% and 2.19% on the MNIST and USPS datasets, respectively, which indicates that discriminative learning of MQDF can further improve MQDF’s performance. Xu et al. [33] presented a nonnegative representation-based classifier for pattern classification, which achieved accuracies of 99% and 95.1% on the MNIST and USPS datasets, respectively. Prasad et al. [34] presented novel features and cascaded classifiers KNN and SVM, resulting in an accuracy of 99.26 on the MNIST dataset.

3. Proposed Framework

Employing appropriate features to classify data can directly influence desired learning results. Therefore, selecting and generating features that are easily separable is vital for accurate classification [1,2,3]. Considering this motivation, a two-stage cascaded feature generator framework is proposed in this study.
In this sub-section, first, a one-stage feature generator, which provides the basis for the proposed framework, is discussed, and then the proposed two-stage feature generator framework of this study is introduced.

3.1. Soft Sensor Implementation for the Feature Generation

The proposed method has been implemented by using both hardware sensors (cameras, scanners, etc.) and soft sensors. The former captures the digits. The latter provides features that no hardware sensor is able to measure. In this study, a soft sensor model was developed to generate features for handwritten digit classification. The soft sensor has been realized by two cascaded modules, namely PCA and PTNN. The following section presents the details of each module.
Handwritten digit classification (HDC) has found such applications as postal automation, bank check automation, and human–computer interactions in practice. Many studies have been conducted for the classification of digits, as mentioned in Section 2. The first and most vital step in the recognition cycle is the collection of handwritten digits from people. There exist various ways to acquire the digits depending on the way the digits are generated. Therefore, different sensors are utilized for capturing the digits. While the digits written on paper can be recorded by handheld scanners or cameras, the digits created in the air can be captured by Kinect cameras, wearable inertial measurement unit (IMU) sensors, and wearable smart gloves and armbands. They rely on capturing hand and finger movements. In addition, a smart pen that exploits the inertial force sensors can record the digits [35,36,37,38].

3.2. One-Stage Feature Generator

Figure 1 depicts a one-stage feature generator framework that employs the PCA for the feature extraction. As can also be seen from the figure, the implementation of the one-stage classifier is based on either the MDC, which is a simple algorithm, or the SVM, which is a sophisticated algorithm, for classification [11,12]. MDC can be defined as calculating the distance between the unknown data and each class center and assigning the data to the nearest class center with the shortest Euclidean distance.
Algorithm 1, which is presented below, describes the steps to generate the features based on the PCA within this framework.
Algorithm 1: Obtaining principal component (PC)-based features
The   data   matrix   D = ( d 1   d 2 d N )   of   size   M x N   where   d i   represents   the   i t h   sample   from   the   data   matrix ,   where   i   =   1 ,   ,   N   and   N is the number of examples in data matrix.
  • S 1 :   Scale   the   values   of   the   data   matrix   in   [ 0 ,   1 ] .   Resulting   matrix   is   called   D s .
  • S 2 :   Calculate   the   principal   components   ( PCs )   of   the   D s .
  • S 3 :   Select   the   PCs   corresponding   to   K highest eigenvalues.
  • S 4 :   Construct   the   matrix   whose   columns   are   formed   by   the   principal   component   vectors   ( eigenvectors   of   D s )   C   =   ( c 1   c 2   c K )   with   the   size   of   M x K .
  • S5: Calculate the feature matrix by
           F   =   D s T   C   =   ( d 1 c 1   d 1 c 2   d 1 c K )   with   the   size   of   N x K .
    Although the algorithm ends in this step, the following step demonstrates the effectiveness of the generated features.
  • S 6 :   Train   a   classifier   ( such   as   SVM   or   MDC )   by   the   rows   of   the   matrix   F .
It is easy to show that the elements of matrix F are the projection of each data sample on principal component vectors. In F, the product d i c j denotes the inner product of the two vectors. Hence, we can express this product as:
d i , c j = d i c j cos θ
where θ is the angle between d i and c j , i = 1 ,   ,   N and j = 1 ,   ,   K .
Since c j = 1 , the inner product can be written as:
d i , c j = d i cos θ
Equation (2) represents the projection of d i and c j , that is:
p r o j c j d i = d i cos θ
Consequently, the projected data are employed as features to train the selected classifier, which is based on the MDC or SVM.
It is a fact that the variance in clusters obtained from using the PCA is very large; hence, the MDC or SVM classifiers cannot successfully separate one cluster from another. This is due to features sparsely scattered around the center of the cluster (i.e., distances between samples within the same cluster are high). This reduces the classification performance which yields low success rates.

3.3. Two-Stage Feature Generator

To enhance the performance of the one-stage generator, we propose inserting another transformation operator between the PCA and MDC/SVM modules to form a two-stage feature generator framework. Figure 2 depicts the proposed framework. The framework for a two-stage feature generator is explained in more detail in Algorithm 2 step by step.
The PTNN module in the framework is simply a multilayer perceptron (MLP) [2] with one hidden layer with various neurons. It is structured for the purpose of classification. Thus, the outputs of the network correspond to the clusters to be identified, i.e., the number of digits in our test cases. The network is fed by the projected data features. However, the network is not fully trained, but partially trained. Therefore, training is halted after a few epochs. The epoch errors are high, which indicates that training is far from complete. When training is halted, the network cannot correctly identify the clusters. However, we keep on training the network to observe the behavior of the PTNN at the early stage of the training. In summary, PTNN is simply an MLP without full training, or the training period is stopped after a predefined number of epochs.
Figure 3a,b illustrate mean squared error (MSE) results obtained from the fully trained NN and PTNN training, respectively. MSE is computed as the mean of the squared differences between the actual output and the estimated output. Figure 3b represents the performance of the neural network at the early stages of training. The results show that the MSE decreases rapidly at the beginning of the training phase and changes slowly until 2000 epochs are reached. Then, it remains almost constant, implying that the NN is fully trained. In total, 60,000 and 10,000 samples are employed during the training and testing phases, respectively, and a test accuracy of 98.58% is achieved [39]. Additionally, when an MLP NN is trained for classifying the digits in the MNIST dataset with zero feature extraction, the number of the epochs required varies from 40 to 50 to achieve test accuracies between 87% and 98% using 60,000 samples for the training set and 10,000 samples for the test set [40,41].
Despite stopping the training at a significantly early stage, if the outputs of the hidden unit of the partially trained network are used as features, we find that intra-cluster distances are reduced as compared to those in the PCA feature space. On the other hand, the size of the feature vectors in the two-stage feature generator is higher than those in the one-stage feature generator (i.e., larger than K). That is, the feature space composed of the two-stage feature generator includes more features than that of the one-stage feature generator. Hence, the proposed approach does not reduce the number of features. However, it improves the accuracy of the classifier.
Algorithm 2 describes the transformation to generate features based on the PCA plus PTNN.
Algorithm 2: Obtaining neural network-based features from projected data on the PCs
The   feature   matrix   F   =   ( f 1   f 2     f K )
  • S 1 :   Build   an   MLP   network   with   one   hidden   layer   and   P hidden nodes (neuron).
  • S 2 :   Start   training   the   network   for   classifying   the   examples   represented   by   the   rows   of   F .
  • S3: Halt training in early iterations.
  • S4: Calculate the outputs of the hidden layer
    h i   =   s i g m o i d ( F W + b )  
    where ,   W   is   the   weight   matrix   between   input   and   hidden   layers   and   i   =   1 ,   ,   P
  • S 5 :   Construct   the   hidden   layer   output   matrix   H   =   ( h 1   h 2     h P )   whose   size   is   N x P . Although the algorithm ends in this step, the following step demonstrates the effectiveness of the generated features.
  • S 6 :   Train   a   classifier   ( such   as   SVM   or   MDC )   by   the   rows   of   the   matrix   H .
The algorithms discussed above are tested on the MNIST and USPS digit datasets to analyze the distance distribution of each digit class in this study. For this purpose, the distances within the class and between classes are calculated, where the Euclidean distance is used as the distance metric. Let d i and d j be row vectors in R N . Then, the Euclidean distance between these two vectors is defined as
  L = | | d j d i | | = s q r t [   ( d j 1 d i 1 ) 2 + + ( d j N d i N ) 2   ]
The within-cluster distances are calculated by Algorithm 3.
Algorithm 3: Calculating the distances among the feature vectors within a digit class
Assume   that   F m   =   ( f 1   f 2   f S )   is   a   feature   matrix   for   m t h   class   and   m   =   1 , 2 ,   , O   and   S is the number of the examples in a given class.
  • S1: Calculate the centroid of the class m as:
    F c m = 1 S s = 1 S f s
  • S2: Calculate the Euclidean distance between each example and centroid vectors as:
    L m c = | | f m F c m | |
It is envisaged that the proposed framework should yield minimized intra-cluster distances or maximized inter-class distances. This envisagement is proved in the following section by considering both the one-stage and two-stage feature generators and the algorithms considered.

4. Verification of Inter and Intra Class Distributions

In this section, the intra-class and inter-class distance distributions are verified using the distance metric presented in Equation (5). To form the metric, first of all, the standard deviation, which indicates how sparsely or densely distributed the distances are within a class, is determined for each class. Then, to quantify the distance between the classes, the separation metric (SM) is formed:
S M = d i j ( σ i + σ j ) / 2
where d i j is the distance between the centers of classes i and j, while σ i and σ j are the standard deviations for classes i and j, respectively. This metric represents the degree of separability. The inter-class distances are calculated by Algorithm 4.
Algorithm 4: Calculating the distances between the two-digit classes
Assume   that   F m   =   ( f 1   f 2   f S )   is   a   feature   matrix   for   m t h class   and   m   =   1 , 2 ,   ,   O   and   S is the number of the examples in a given class.
  • S1: Calculate the centroids for each class in a given dataset.
  • S2: Calculate the distance between the centroids of two classes.
    L m ( m 1 ) = | | F c m F c m 1 | |
In step 2 of Algorithm 4, m ( m 1 ) represents the distance of the center of each class from the centers of all other classes. Suppose the following:
  • Case 1: if the distance remains constant and the standard deviations in Equation (5) have small values, then SM becomes higher. Note that a higher SM indicates better separation.
  • Case 2: if the standard deviations in Equation (5) are constant and the distance has high values, then SM becomes higher.
These two cases are illustrated in Figure 4. Table 1 and Table 2 show the standard deviations of distances between the centers of classes using the one-stage and two-stage feature generators, respectively, in the USPS dataset. The results show that the standard deviations (i.e., σ ’s) using the two-stage feature generator are smaller than those using the one-stage feature generator. This is associated with the fact that samples in the given class are distributed close to the center of the class. A consequence of this is that the data are more separable in the feature space formed by the PCA plus PTNN. In other words, the boundary or volume of each cluster shrinks inward. On the other hand, with the one-stage case, the samples in each class are scattered away from the center of the class so that small values of standard deviations are obtained. Consequently, the variation within a cluster without the PTNN is higher than that with the PTNN.
Table 3 and Table 4 show the separability values calculated by Equation (5) for the one-stage and two-stage feature generators, respectively. It can be seen that classes scattered in the feature space are more separable in the two-stage case (i.e., the separability increases). In pattern recognition, this is one of the desired requirements for a classifier to classify data accurately. Furthermore, we can obtain the SM ratio of the value of a selected class from Table 4 to the value of that class in Table 3. Once these ratios are calculated, it can be seen from Table 5 that they are mostly greater than 1. Thus, the classes in the feature space built from the PCA plus NN are more separable compared to those in the PCA space.
The same cluster behavior is also observed for the MNIST digit dataset. Table 6 and Table 7 show the standard deviations for the clusters formed with 5000 and 10,000 samples, respectively. It can be seen from the tables that the variation within a cluster with the two-stage extractor is lower than that with the one-stage generator. The separability values and SM ratios for the MNIST dataset are shown in Table 8, Table 9 and Table 10, respectively.

5. Results and Discussion

The performance of the proposed feature generator is tested on the MNIST and USPS digit datasets. The USPS handwritten digit dataset is derived from a project on recognizing handwritten digits on envelopes [42]. The digits have sizes of 16 × 16 pixels. It contains 7291 samples for the training set and 2007 samples for the test set. The standard MNIST dataset is derived from the NIST dataset and was created by LeCun et al. [43]. The digits have sizes of 28 × 28 pixels. It has 60,000 samples for the training set and 10,000 samples for the test set. Figure 5 and Figure 6 show some examples of the digits from the MNIST and USPS datasets, respectively. The MDC and SVM are utilized to identify digits in these datasets. The MDC is a simple classifier. In the training phase, training vectors are separated by each class. Then, the mean values of each class are computed. In the test phase, the closest mean to the test vector is calculated via the Euclidean distance. Then, the corresponding class is predicted. The SVM is much more complex than the MDC. It is capable of extracting not only linear but also curved decision boundaries. Thus, more accurate classification can be achieved by setting a maximum margin separator among the sample points, where the margin is defined as the distance of the decision boundary to the closest sample.
In Table 11, the results of the MDC for the USPS digit classes are shown for both the one-stage and two-stage feature generators. We then determine the accuracies for different eigenvalues and different training sizes. The results show that the best recognition rate is achieved using 4000 samples for the training set and 5298 samples for the test set with K = 10. Note that the NN is partly trained for various epochs, i.e., the training is halted in the early stage of iterations. As an example, Table 12 presents the accuracies for K = 10 at different epochs and different training sizes. The table shows that the performance of PCA plus PTNN (two-stage generator) is higher than that of the one-stage extractor. Moreover, as an example, the performance of the two-stage generator framework with a training size of 500 samples is improved by 2.386 points with reference to the one-stage extractor at an epoch of 15 for the USPS dataset. During the training for each scenario, the learning rate and the number of hidden nodes are set to 0.5 and 50, respectively. Then, hidden layer outputs are extracted from the NN. The mean values of these outputs are calculated for each digit class. For the unseen test data, the hidden layer output is calculated. Then, Euclidean distances of the test data to the mean values of digit classes are computed. The test data are classified according to the digit class with minimum distance. For all the scenarios, two-stage features lead to higher performance than one-stage features. The average test recognition rates for 10 classes are 91.60% and 90.13% at the training size of 4000 for the two-stage and one-stage cases, respectively.
Table 13 presents the performance rates for the MNIST digit classes. As seen, the performance of a two-stage extractor is lower than that of the one-stage extractor for small training sizes. However, an improvement in the performance appears for the full training size of 60,000.
Table 14 and Table 15 show the test success rates of the SVM classifier for the USPS and MNIST datasets, respectively. The experiments on SVM are held with the RBF kernel function. Although the best performance is obtained using 60,000 samples for the training set and 10,000 samples for the test set, it is clear that small training sizes also result in very high accuracies. PTNN is trained with a learning rate of 0.50 and 50 hidden nodes.
Table 16 lists the accuracies with K = 8 at different epochs and training sizes. The improvements in the performance of the two-stage extractor are clear; for instance, accuracy is increased by 1.5869 points with respect to the one-stage extractor at an epoch of 10 for a training size of 5000.
Although the PTNN is trained for 5% to 30% of the MNIST and USPS datasets, the proposed method achieves almost perfect performance with the SVM. Furthermore, the performance is acceptable even for a simple MDC. The results show that the proposed approach provides more relevant features for the data. Hence, the classifier achieves much better performance scores, i.e., 99.9863% and 99.9815%, for the USPS and MNIST datasets, respectively. To the best of our knowledge, these are the best performances in the current literature.
Table 17 shows the effectiveness of the two-stage feature extractor. Improvements in the accuracies with respect to the one-stage extractor are clear for each classifier. This shows that the proposed features give quite better abilities of generalization to the classifiers.
Table 18 and Table 19 show comparisons of the performances of our framework and some state-of-the-art methods on the MNIST and USPS datasets, respectively. The results show that the proposed method outperforms well-known techniques in the literature. Note that the SVM using two-stage features achieves error rates of 0.0185% and 0.0137% for the MNIST and USPS datasets, respectively, which are currently the best performances in the literature.

6. Conclusions and Future Work

In this paper, we proposed a novel framework based on a two-stage feature generator for handwritten digit classification. The first stage of this framework relies on the PCA, which generates the projected data from the eigenvectors corresponding to the highest K eigenvectors. The second stage has been constructed by a PTNN whose training has been halted at early epochs, i.e., it was not fully trained to recognize the input classes. This PTNN has been fed by the projected data on principal components and then its hidden layer outputs have been selected as new features, which have been used to train the MDC and SVM classifiers.
We evaluated the performance of the proposed method on the MNIST and USPS datasets. In both datasets, the best results are performed by using an SVM classifier. We found out that the two-stage feature extractor has led to noticeable improvements in terms of accuracy. Moreover, compared to current state-of-the-art methods, the proposed framework has resulted in almost perfect performances even with small training sizes. In addition, our experiments have shown that the proposed method can achieve error rates of 0.0185% and 0.0137% for the MNIST and USPS datasets, respectively, which can currently be considered the best performances in the literature.
In future work, as an easier but meaningful expansion, sign recognition will be added to the study. As a more complex study, we will use face and texture datasets to further evaluate the usefulness of our proposed framework.

Author Contributions

Conceptualization, H.T., M.A.G.P. and K.O.; methodology, H.T., M.A.G.P., İ.B. and K.O.; software, H.T. and M.A.G.P.; validation, H.T., M.A.G.P. and K.O.; formal analysis, H.T. and K.O.; investigation, H.T., M.A.G.P., K.O. and İ.B.; resources, H.T. and İ.B.; data curation, H.T., M.A.G.P. and K.O.; writing—original draft preparation, M.A.G.P.; writing—review and editing, H.T., K.O. and İ.B.; visualization, H.T. and K.O.; supervision, H.T. and K.O.; project administration, H.T. and K.O.; funding acquisition, İ.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by KTH Royal Institute of Technology (via KTH Library’s Open Access Policy).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Correction Statement

Due to an error in article production, an incorrect author was listed as a corresponding author in the original publication. This manuscript has been updated and this change does not affect the scientific content of the article.

References

  1. Mellouli, D.; Hamdani, T.M.; Sanchez-Medina, J.J.; Ayed, M.B.; Alimi, A.M. Morphological Convolutional Neural Network Architecture for Digit Recognition. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 2876–2885. [Google Scholar] [CrossRef] [PubMed]
  2. Bourland, H.; Kamp, Y. Auto-Association by Multilayer Perceptions and Singular Value Decomposition. Biol. Cybern. 1988, 59, 291–294. [Google Scholar] [CrossRef] [PubMed]
  3. Lowe, D.; Webb, A.R. Optimized Feature Extraction and the Bayes Decision and Feed-Forward Classifier. IEEE Trans. Paterrn Anal. Mach. Intell. 1991, 13, 355–364. [Google Scholar] [CrossRef]
  4. Chatterjee, C.; Roychowdhury, V. On Self-Organizing Algorithms and Networks for Class-Separability Features. IEEE Tran. Neural Netw. 1997, 8, 663–678. [Google Scholar] [CrossRef] [PubMed]
  5. Mao, J.; Jain, A.K. Artificial Neural Networks for Feature Extraction and Multivariate Data Projection. IEEE Tran. Neural Netw. 1995, 6, 296–317. [Google Scholar]
  6. Lee, C.; Landgrebe, D.A. Decision Boundary Feature Extraction for Neural Networks. IEEE Tran. Neural Netw. 1997, 8, 75–83. [Google Scholar]
  7. Theodoridis, S.; Koutroumbas, K. Pattern Recognition; Academic Press: Cambridge, MA, USA, 2006. [Google Scholar]
  8. Jarrett, K.; Kavukcuoglu, K.; Ranzato, M.A.; LeCun, Y. What is the best multi-stage architecture for object recognition. In Proceedings of the IEEE 12th International Conference on Computer Vision, Kyoto, Japan, 29 September–2 October 2009; pp. 2146–2153. [Google Scholar]
  9. Bruna, J.; Mallat, S. Invariant scattering convolution networks. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 1872–1886. [Google Scholar] [CrossRef]
  10. Patel, D.K.; Som, T.; Yadav, S.K.; Singh, M.K. Handwritten character recognition using multiresolution technique and Euclidean distance metric. J. Signal Inf. Process. 2012, 3, 208–214. [Google Scholar]
  11. Ayyaz, M.N.; Javed, I.; Mahmood, W. Handwritten character recognition using multiclass SVM classification with hybrid feature extraction. Pak. J. Eng. Appl. Sci. 2012, 10, 57–67. [Google Scholar]
  12. Shubhangi, D.C.; Hiremath, P.S. Handwritten English character and digit recognition using multiclass SVM classifier and using structural micro features. Int. J. Recent Trends Eng. 2009, 2, 193. [Google Scholar]
  13. Liu, C.L.; Nakagawa, M. Handwritten numeral recognition using neural networks: Improving the accuracy by discriminative training. In Proceedings of the Fifth International Conference on Document Analysis and Recognition, Bangalore, India, 20–22 September 1999; pp. 257–260. [Google Scholar]
  14. Suen, C.Y.; Liu, K.; Strathy, N.W. Sorting and recognizing cheques and financial documents. In Document Analysis Systems: Theory and Practice; Springer: Berlin/Heidelberg, Germany, 1999; pp. 173–187. [Google Scholar]
  15. Lee, D.S.; Srihari, S.N. Handprinted digit recognition: A comparison of algorithms. In Proceedings of the Third International Workshop on Frontiers of Handwriting Recognition, Buffalo, NY, USA, 25–27 May 1993; pp. 153–164. [Google Scholar]
  16. Filatov, A.; Nikitin, N.; Volgunin, A.; Zelinsky, P. The Address Script TM recognition system for handwritten envelopes. In Document Analysis Systems: Theory and Practice; Springer: Berlin/Heidelberg, Germany, 1999; pp. 157–171. [Google Scholar]
  17. Pan, S.; Wang, Y.; Liu, C.; Ding, X. A discriminative cascade CNN model for offline handwritten digit recognition. In Proceedings of the 14th IAPR International Conference on Machine Vision Applications (MVA), Tokyo, Japan, 18–22 May 2015; pp. 501–504. [Google Scholar]
  18. Ganapathy, V.; Liew, K.L. Handwritten character recognition using Multi scale neural network training technique. World Acad. Sci. Eng. Technol. 2008, 39, 32–37. [Google Scholar]
  19. Singh, V.; Lal, S. Digit recognition using single layer neural network with PCA. In Proceedings of the Asia-Pacific World Congress on Computer Science and Engineering (APWC on CSE), Nadi, Fiji, 4–5 November 2014; pp. 1–7. [Google Scholar]
  20. Soman, S.T.; Nandigam, A.; Chakravarthy, V.S. An efficient multi classifier system based on convolutional neural network for offline handwritten Telugu character recognition. In Proceedings of the National Conference Communications (NCC, 1–5), Delhi, India, 15–17 February 2013. [Google Scholar]
  21. Chan, T.H.; Jia, K.; Gao, S.; Lu, J.; Zeng, Z.; Ma, Y. PCANet: A simple deep learning baseline for image classification. IEEE Trans. Image Process. 2015, 24, 5017–5032. [Google Scholar] [CrossRef] [PubMed]
  22. Ciresan, D.; Meier, U.; Schmidhuber, J. Multi-column deep neural networks for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 3642–3649. [Google Scholar]
  23. Goodfellow, I.J.; Farley, D.W.; Mirza, M.; Courville, A.; Bengio, Y. Maxout networks. In Proceedings of the 30th ICML, Atlanta, GA, USA, 16–21 June 2013; pp. 1–9. [Google Scholar]
  24. Zeiler, M.D.; Fergus, R. Stochastic pooling for regularization of deep convolutional neural networks. In Proceedings of the ICLR, Scottsdale, AZ, USA, 2–4 May 2013. [Google Scholar]
  25. Deng, L.; Yu, D. Deep convex network: A scalable architecture for speech pattern classification. In Proceedings of the International Speech Communication Association, Florence, Italy, 27–31 August 2011; pp. 2285–2288. [Google Scholar]
  26. Yu, K.; Lin, Y.; Lafferty, J. Learning image representations from the pixel level via hierarchical sparse coding. In Proceedings of the IEEE Conference CVPR, Colorado Springs, CO, USA, 20–25 June 2011; pp. 1713–1720. [Google Scholar]
  27. Keysers, D.; Deselaers, T.; Gollan, C.; Ney, H. Deformation models for image recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29, 1422–1435. [Google Scholar] [CrossRef] [PubMed]
  28. Lee, H.; Grosse, R.; Rananth, R.; Ng, A.Y. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In Proceedings of the 26th Annual ICML, Montreal, QC, Canada, 14–18 June 2009; pp. 609–616. [Google Scholar]
  29. Hotta, S.; Kiyasu, S.; Miyahara, S. Pattern recognition using average patterns of categorical k-nearest neighbors. In Proceedings of the 17th International Conference on Pattern Recognition (ICPR), Cambridge, UK, 26 August 2004; pp. 412–415. [Google Scholar]
  30. Mairal, J.; Bach, F.; Ponce, J.; Sapiro, G.; Zisserman, A. Supervised dictionary learning. In Proceedings of the Advances in Neural Information Processing Systems NIPS, Vancouver, BC, Canada, 8–11 December 2008. [Google Scholar]
  31. Zhang, H.; Berg, A.C.; Maire, M.; Malik, J. SVM-KNN: Discriminative nearest neighbor classification for visual category recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), New York, NY, USA, 17–22 June 2006; pp. 2126–2136. [Google Scholar]
  32. Su, T.H.; Liu, C.L.; Zhang, X.Y. Perceptron Learning of Modified Quadratic Discriminant Function. In Proceedings of the International Conference on Document Analysis and Recognition, Beijing, China, 18–21 September 2011; pp. 1007–1011. [Google Scholar]
  33. Xu, J.; An, W.; Zhang, L.; Zhang, D. Sparse, collaborative, or nonnegative representation: Which helps pattern classification. Pattern Recognit. 2019, 88, 679–688. [Google Scholar] [CrossRef]
  34. Prasad, B.K.; Sanyal, G. Novel features and a cascaded classifier based Arabic numerals recognition system. Multidimens. Syst. Signal Process. 2018, 29, 321–338. [Google Scholar] [CrossRef]
  35. Zhang, Y.; Li, Z.; Yang, Z.; Yuan, B.; Liu, X. Air-GR: An Over-the-Air Handwritten Character Recognition System Based on Coordinate Correction YOLOv5 Algorithm and LGR-CNN. Sensors 2023, 23, 1464. [Google Scholar] [CrossRef]
  36. Chen, M.; Lin, J.; Zou, Y.; Wu, K. Acoustic Sensing Based on Online Handwritten Signature Verification. Sensors 2022, 22, 9343. [Google Scholar] [CrossRef]
  37. Campos, C.; Sandak, J.; Kljun, M.; Čopič Pucihar, K. The Hybrid Stylus: A Multi-Surface Active Stylus for Interacting with and Handwriting on Paper, Tabletop Display or Both. Sensors 2022, 22, 7058. [Google Scholar] [CrossRef]
  38. Alemayoh, T.T.; Shintani, M.; Lee, J.H.; Okamoto, S. Deep-Learning-Based Character Recognition from Handwriting Motion Data Captured Using IMU and Force Sensors. Sensors 2022, 22, 7840. [Google Scholar] [CrossRef] [PubMed]
  39. Pirim, M.A.G. Neural Network Based Feature Extraction for Handwritten Digit Recognition. Ph.D. Thesis, Atilim University, Ankara, Turkey, 2017. [Google Scholar]
  40. Hou, Y.; Zhao, H. Handwritten Digit Recognition Based on Depth Neural Network. In Proceedings of the International Conference on Intelligent Informatics and Biomedical Sciences (ICIIBMS), Okinawa, Japan, 24–26 November 2017; pp. 35–38. [Google Scholar]
  41. Bettilyon, T.E. How to Classify MNIST Digits with Different Neural Network Architectures. 2018. Available online: https://medium.com/tebs-lab/how-to-classify-mnist-digits-with-different-neural-network-architectures-39c75a0f03e3 (accessed on 25 May 2022).
  42. Hull, J.J. A database for Handwritten Text Recognition Research. IEEE Trans. Pattern Anal. Mach. Intell. 1994, 16, 550–554. [Google Scholar] [CrossRef]
  43. LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
  44. Tax, D.M.J.; Laskov, P. Online SVM learning from classification to data description and back. In Proceedings of the IEEE 13th Workshop on Neural Networks for Signal Processing (IEEE Cat. No. 03TH8718), Toulouse, France, 17–19 September 2003. [Google Scholar]
Figure 1. Structure of one-stage feature generator with handwritten digit inputs.
Figure 1. Structure of one-stage feature generator with handwritten digit inputs.
Sensors 23 08477 g001
Figure 2. Structure of two-stage feature generator with handwritten digit inputs.
Figure 2. Structure of two-stage feature generator with handwritten digit inputs.
Sensors 23 08477 g002
Figure 3. MSE of fully (a) and partially (b) trained neural network.
Figure 3. MSE of fully (a) and partially (b) trained neural network.
Sensors 23 08477 g003
Figure 4. Representative distribution of features in space with high and low standard deviations.
Figure 4. Representative distribution of features in space with high and low standard deviations.
Sensors 23 08477 g004
Figure 5. Samples of digits from the MNIST dataset.
Figure 5. Samples of digits from the MNIST dataset.
Sensors 23 08477 g005
Figure 6. Samples of digits from the USPS dataset.
Figure 6. Samples of digits from the USPS dataset.
Sensors 23 08477 g006
Table 1. Standard deviations for each digit class in USPS with a training size of 500.
Table 1. Standard deviations for each digit class in USPS with a training size of 500.
Stage/ClusterDigit 0Digit 1Digit 2Digit 3Digit 4Digit 5Digit 6Digit 7Digit 8Digit 9
One stage0.75070.53420.60960.60580.80840.72810.72140.89740.52270.8824
Two stage (50 epoch)0.30030.31330.42700.26960.36270.40060.37330.52400.32430.3276
Two stage (30 epoch)0.32720.32480.39200.30230.32380.43370.36910.56040.25100.2918
Two stage (25 epoch)0.34390.29000.39400.27950.30750.36570.32540.51330.30020.3253
Two stage (20 epoch)0.38140.30380.42930.31320.30550.36500.32180.49480.25160.3534
Two stage (15 epoch)0.37270.22990.40770.35720.32990.41970.31260.42130.30780.3053
Two stage (10 epoch)0.33190.23720.35160.33150.37090.38830.37090.44830.25790.3074
Table 2. Standard deviations for each digit class in USPS with a training size of 1000.
Table 2. Standard deviations for each digit class in USPS with a training size of 1000.
Stage/ClusterDigit 0Digit 1Digit 2Digit 3Digit 4Digit 5Digit 6Digit 7Digit 8Digit 9
One stage0.76890.59030.53890.65510.77500.71850.73860.93490.68920.8449
Two stage (50 epoch)0.35090.33450.36590.33660.39160.46670.32810.57370.36550.2778
Two stage (30 epoch)0.31940.34150.33560.34100.37630.32800.33560.48250.30350.2602
Two stage (25 epoch)0.39120.35820.38610.36950.36860.42290.32870.50400.36990.3368
Two stage (20 epoch)0.37060.33960.35030.38690.37840.43960.36200.56760.35000.3260
Two stage (15 epoch)0.31860.32450.38870.39010.42360.42730.32950.46600.29480.2926
Two stage (10 epoch)0.40810.32600.34180.37850.31520.35870.33130.47180.31570.3764
Table 3. Separability values for only PCA with a training size of 500.
Table 3. Separability values for only PCA with a training size of 500.
DigitsDigit 0Digit 1Digit 2Digit 3Digit 4Digit 5Digit 6Digit 7Digit 8Digit 9
Digit 009.87485.84675.87755.99384.75615.36916.30966.87996.0279
Digit 19.874806.89088.29246.46157.12656.59545.90767.83575.5381
Digit 25.84676.890805.37334.80584.67864.27044.95575.02314.8123
Digit 35.87758.29245.373305.40383.77725.68874.95385.31374.5252
Digit 45.99386.46154.80585.403804.32114.77303.91994.72422.9392
Digit 54.75617.12654.67863.77724.321103.87734.53864.44243.9667
Digit 65.36916.59544.27045.68874.77303.877305.63525.57185.3602
Digit 76.30965.90764.95574.95383.91994.53865.635204.96002.3206
Digit 86.87997.83575.02315.31374.72424.44245.57184.960003.7385
Digit 96.02795.53814.81234.52522.93923.96675.36022.32063.73850
Table 4. Separability values for only PCA plus NN with a training size of 500.
Table 4. Separability values for only PCA plus NN with a training size of 500.
DigitsDigit 0Digit 1Digit 2Digit 3Digit 4Digit 5Digit 6Digit 7Digit 8Digit 9
Digit 0011.36466.35289.20708.61456.59087.09158.14598.16969.9020
Digit 111.364607.767810.27328.41498.07779.97247.57228.60998.6258
Digit 26.35287.767807.07777.12126.20136.17365.91185.76257.6732
Digit 39.207010.27327.077708.49516.38149.63346.69716.90237.8166
Digit 48.61458.41497.12128.495106.25508.36576.17076.51985.7475
Digit 56.59088.07776.20136.38146.255005.96846.23615.51476.9847
Digit 67.09159.97246.17369.63348.36575.968408.59558.032610.4945
Digit 78.14597.57225.91186.69716.17076.23618.595505.94264.4272
Digit 88.16968.60995.76256.90236.51985.51478.03265.942606.5642
Digit 99.90208.62587.67327.81665.74756.984710.49454.42726.56420
Table 5. Separability ratios of PCA + NN to PCA with a training size of 500.
Table 5. Separability ratios of PCA + NN to PCA with a training size of 500.
DigitsDigit 0Digit 1Digit 2Digit 3Digit 4Digit 5Digit 6Digit 7Digit 8Digit 9
Digit 001.15091.08661.56651.43721.38581.32081.29101.18751.6427
Digit 11.150901.12731.23891.30231.13351.51201.28181.09881.5575
Digit 21.08661.127301.31721.48181.32551.44571.19291.14721.5945
Digit 31.56651.23891.317201.57211.68941.69341.35191.29901.7273
Digit 41.43721.30231.48181.572101.44751.75271.57421.38011.9555
Digit 51.38581.13351.32551.68941.447501.53931.37401.24141.7608
Digit 61.32081.51201.44571.69341.75271.539301.52531.44161.9579
Digit 71.29101.28181.19291.35191.57421.37401.525301.19811.9078
Digit 81.18751.09881.14721.29901.38011.24141.44161.198101.7558
Digit 91.64271.55751.59451.72731.95551.76081.95791.90781.75580
Table 6. Standard deviations for each digit class in MNIST with a training size of 5000.
Table 6. Standard deviations for each digit class in MNIST with a training size of 5000.
Stage/ClusterDigit 0Digit 1Digit 2Digit 3Digit 4Digit 5Digit 6Digit 7Digit 8Digit 9
One stage0.85280.88070.76020.84530.83980.83490.95440.94310.88290.9686
Two stage (50 epoch)0.44020.39830.40660.40550.36440.31500.33900.45300.34360.4012
Two stage (30 epoch)0.43770.43020.37750.44280.31530.37050.39990.50350.36840.3494
Two stage (25 epoch)0.40740.40570.38480.43360.34750.36110.42490.45390.34500.4115
Two stage (20 epoch)0.41730.37450.39100.39600.37030.29290.38640.43460.37730.4278
Two stage (15 epoch)0.42340.38220.38670.38100.35700.36510.44070.38930.35820.4257
Two stage (10 epoch)0.35930.42700.38130.41510.32980.27290.43350.46210.35010.4343
Table 7. Standard deviations for each digit class in MNIST with a training size of 1000.
Table 7. Standard deviations for each digit class in MNIST with a training size of 1000.
Stage/ClusterDigit 0Digit 1Digit 2Digit 3Digit 4Digit 5Digit 6Digit 7Digit 8Digit 9
One stage0.82160.92080.79240.86680.80620.83460.98220.95480.88640.9897
Two stage (50 epoch)0.35760.40090.38680.43530.32020.38510.37880.50010.36510.4702
Two stage (30 epoch)0.43710.42410.43480.44270.34890.38260.41510.45050.37480.4831
Two stage (25 epoch)0.37960.37560.41820.33150.40910.27330.43640.44330.39140.4414
Two stage (20 epoch)0.41540.36890.44220.47760.46680.39320.39880.45330.42430.4175
Two stage (15 epoch)0.37350.34220.44890.40900.39300.44300.37600.48910.38230.4644
Two stage (10 epoch)0.82160.92080.79240.86680.80620.83460.98220.95480.88640.9897
Table 8. Separability values for only PCA with a training size of 5000.
Table 8. Separability values for only PCA with a training size of 5000.
DigitsDigit 0Digit 1Digit 2Digit 3Digit 4Digit 5Digit 6Digit 7Digit 8Digit 9
Digit 008.94907.18226.70947.59115.48396.61997.19836.67646.9278
Digit 18.949006.14655.97346.73155.63086.23855.91895.34695.7358
Digit 27.18226.146505.54775.81535.65254.65566.33484.66355.6461
Digit 36.70945.97345.547706.42903.83366.21775.93674.14825.3524
Digit 47.59116.73155.81536.429005.08094.93294.56745.25932.9364
Digit 55.48395.63085.65253.83365.080904.86645.06093.78424.2076
Digit 66.61996.23854.65566.21774.93294.866405.98105.14894.8942
Digit 77.19835.91896.33485.93674.56745.06095.981005.37353.0897
Digit 86.67645.34694.66354.14825.25933.78425.14895.373504.1881
Digit 96.92785.73585.64615.35242.93644.20764.89423.08974.18810
Table 9. Separability values for only PCA + NN with a training size of 5000.
Table 9. Separability values for only PCA + NN with a training size of 5000.
DigitsDigit 0Digit 1Digit 2Digit 3Digit 4Digit 5Digit 6Digit 7Digit 8Digit 9
Digit 008.45466.31106.24328.15805.48917.81377.20266.29257.3703
Digit 18.454606.29796.13888.90357.82128.16836.80966.50997.5594
Digit 26.31106.297905.47637.15286.97596.40806.78375.82466.8386
Digit 36.24326.13885.476308.05505.39698.26916.17865.25276.5728
Digit 48.15808.90357.15288.055007.75626.93116.21567.07994.2345
Digit 55.48917.82126.97595.39697.756207.70627.27025.36686.9289
Digit 67.81378.16836.40808.26916.93117.706208.38487.55917.5293
Digit 77.20266.80966.78376.17866.21567.27028.384806.86024.2942
Digit 86.29256.50995.82465.25277.07995.36687.55916.860206.1857
Digit 97.37037.55946.83866.57284.23456.92897.52934.29426.18570
Table 10. Separability ratios of PCA + NN to PCA with a training size of 5000.
Table 10. Separability ratios of PCA + NN to PCA with a training size of 5000.
DigitsDigit 0Digit 1Digit 2Digit 3Digit 4Digit 5Digit 6Digit 7Digit 8Digit 9
Digit 000.94480.87870.93051.07471.00091.18031.00060.94251.0639
Digit 10.944801.02461.02771.32271.38901.30931.15051.21751.3179
Digit 20.87871.024600.98711.23001.23411.37641.07091.24901.2112
Digit 30.93051.02770.987101.25291.40781.32991.04071.26621.2280
Digit 41.07471.32271.23001.252901.52651.40511.36091.34621.4421
Digit 51.00091.38901.23411.40781.526501.58361.43651.41821.6468
Digit 61.18031.30931.37641.32991.40511.583601.40191.46811.5384
Digit 71.00061.15051.07091.04071.36091.43651.401901.27671.3898
Digit 80.94251.21751.24901.26621.34621.41821.46811.276701.4770
Digit 91.06391.31791.21121.22801.44211.64681.53841.38981.47700
Table 11. Test recognition rates of MDC for USPS.
Table 11. Test recognition rates of MDC for USPS.
Training Size5001000200040007291
Two-Stage88.2990.1790.6891.6090.8451
One-Stage84.8986.8988.8090.1389.2631
Table 12. Accuracies with K = 10 and various sizes and epochs for USPS.
Table 12. Accuracies with K = 10 and various sizes and epochs for USPS.
Epoch/Training Size500100020004000
1087.489089.419590.650891.3091
1587.282489.491990.245991.5539
2087.725889.753690.479291.4046
2588.298189.351690.685191.5269
3087.676389.316390.501891.5633
5088.199790.171390.308791.6021
One-stage84.896486.896888.801190.1319
Table 13. Test recognition rates of MDC for MNIST.
Table 13. Test recognition rates of MDC for MNIST.
Training Size500010,00060,000
Two-Stage93.471294.114597.2372
One-Stage94.224095.331197.1316
Table 14. Test recognition rates of SVM for USPS.
Table 14. Test recognition rates of SVM for USPS.
Training Size5001000200040007291
Two-stage99.336299.7299.7999.92299.9863
One-stage98.2698.0898.4297.6997.209
Table 15. Test recognition rates of SVM for MNIST.
Table 15. Test recognition rates of SVM for MNIST.
Training Size500010,00020,00060,000
Two-stage99.907499.902499.937699.9815
One-stage97.822397.969398.147596.6545
Table 16. Accuracies with K = 8 and various sizes and epochs for MNIST.
Table 16. Accuracies with K = 8 and various sizes and epochs for MNIST.
Epoch/Training Size500010,00020,000
1099.397899.535799.6363
1599.128599.595899.7012
2099.242899.518799.7647
2599.241899.612399.7198
3099.329699.484999.6489
5099.087499.604899.6929
One-stage97.810998.069798.1475
Table 17. Performance scores with one-stage and two-stage feature extractors.
Table 17. Performance scores with one-stage and two-stage feature extractors.
MNISTUSPS
SVMMDCSVMMDC
One-stage98.147597.131698.4290.13
Two-stage99.981597.237299.986391.60
Table 18. Comparison with state-of-the-art methods on MNIST.
Table 18. Comparison with state-of-the-art methods on MNIST.
MethodsError Rates
LDANet-2 (Chan et al. [21])0.62
PCANet-1 (L1′ = 64, k1′ = k2′ = 3) (Chan et al. [21])0.62
Scatnet-2 (SVM rbf) (Bruno et al. [9])0.43
Conv. Maxout and DropoutConv. Maxout and Dropout (Goodfellow et al. [23])0.45
Stochastic pooling ConvNet (Zeiler et al. [24])0.47
ConvNet (Jarrett et al. [8])0.53
HSC (Yu et al. [26])0.77
K-NN-IDM (Keysers et al. [27])0.54
CDBN (Lee et al. [28])0.82
KNN-SVM (Prasad et al. [39])0.74
Deep Morph-CNN (Mellouli et al. [33])0.34
NRC (Xu et al. [34])1
One-stage features on different classifiers
SVM1.8525
MDC2.8684
Two-stage features on different classifiers
SVM0.0185
MDC2.7628
Table 19. Comparison with state-of-the-art methods of USPS.
Table 19. Comparison with state-of-the-art methods of USPS.
MethodsError Rates
NRC (Xu et al. [34])4.90
Scatnet-2 (SVM rbf) (Bruno et al. [9])2.30
IDM (Keysers et al. [27])1.90
Online SVM learning (Tax et al., 2003 [44])4.25
Discriminant-based supervised learning (Mairal et al. [30])2.40
SVM KNN (Zhang et al. [31])2.59
MQDF (Su et al. [32])2.19
One-stage features on different classifiers
SVM1.58
MDC9.87
Two-stage features on different classifiers
SVM0.0137
MDC8.40
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Gunler Pirim, M.A.; Tora, H.; Oztoprak, K.; Butun, İ. Two-Stage Feature Generator for Handwritten Digit Classification. Sensors 2023, 23, 8477. https://doi.org/10.3390/s23208477

AMA Style

Gunler Pirim MA, Tora H, Oztoprak K, Butun İ. Two-Stage Feature Generator for Handwritten Digit Classification. Sensors. 2023; 23(20):8477. https://doi.org/10.3390/s23208477

Chicago/Turabian Style

Gunler Pirim, M. Altinay, Hakan Tora, Kasim Oztoprak, and İsmail Butun. 2023. "Two-Stage Feature Generator for Handwritten Digit Classification" Sensors 23, no. 20: 8477. https://doi.org/10.3390/s23208477

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop