4.1. Preparation of Database
At present, no standard benchmark database of handwritten
Indic scripts is freely available in the public domain. Hence, we have created our own database of handwritten documents in the laboratory. The document pages for the database were collected from different sources on request. Participants of this data collection drive were asked to write few lines on A-4 size pages. No other restrictions were imposed regarding the content of the textual materials. The documents were written in 12 official scripts of India. The document pages are digitized at 300 dpi resolution and stored as grey tone images. The scanned images may contain noisy pixels which are removed by applying Gaussian filter [
33]. The text words are automatically extracted from the handwritten documents by using a page-to-word segmentation algorithm described in [
44]. A sample snapshot of word images written in 12 different scripts is shown in
Figure 7. Finally, a total of 7200 handwritten word images are prepared, with exactly 600 text words per script.
Our developed database has been named as
CMATERdb8.4.1, where
CMATER stands for ‘Centre for Microprocessor Applications for Training Education and Research,’ a research laboratory at Computer Science and Engineering Department of Jadavpur University, India, where the current database is prepared. Here,
db symbolizes database, the numeric value 8 represents handwritten multi-script
Indic image database and the value 4 indicates word-level. In the present work, the first version of
CMATERdb8.4 has been released as
CMATERdb8.4.1. The database is made freely available at
https://code.google.com/p/cmaterdb/.
4.2. Performance Analysis
The classifier combination approaches, described above, are applied on a dataset of 7200 words divided into 12 classes with equal number of instances in each of them. 12 classes refer to the 12 Indic scripts that have been studied before and for which the MLP classifier results can be obtained with high accuracy. The classes numbered from A to L are Devanagari, Bangla, Oriya, Gujarati, Gurumukhi, Tamil, Telugu, Kannada, Malayalam, Manipuri, Urdu and Roman in that particular order.
First, the confusion matrix that is obtained from the MLP based classifier on the dataset by using MLG feature along with the overall accuracy is presented. Then, the result generated by the same classifier on the HOG and Elliptical feature sets applied on the same dataset is also presented. Results have been cross-validated for the classifier parameter values to obtain the optimal results for the dataset and the values are provided in the result section.
The MLG feature set consisting of 60 feature values for every input image is fed into the MLP classifier with 30 hidden layer neurons and a learning rate of 0.8. Here, 500 iterations are allowed with an error tolerance of 0.1. The overall accuracy obtained is 91.42% and the confusion matrix generated in this case is given in
Table 1. The
R column in the table refers to the rejection of the input by the recognition module but the class confidences that are associated with them get accounted for during the combination process.
The HOG feature set, consisting of 80 feature values for every input data, is fed into the MLP classifier with 40 hidden layer neurons and a learning rate of 0.8. Same error tolerance and the number of iterations, as applied in case of MLG features, are allowed here. A maximum recognition accuracy of 78.04% has been noted. The confusion matrix is shown in
Table 2.
The Elliptical feature set containing 58 feature values derived from each image data forms the training set for the MLP classifier with 30 hidden neurons with a learning rate of 0.7. The error tolerance and number of iterations remain the same as the previous cases. An accuracy of 79.2% is achieved and represented in the confusion matrix given in
Table 3.
Now, the confidence values provided to the classes for every input data by the classifiers on the three sets of features form the input for the classifier combination procedures. The confusion matrix resulting from the Majority voting procedure is presented in
Table 4. An overall accuracy of 95.6% is achieved on this dataset containing 7200 samples divided equally among the 12 script classes. It is seen that
Devanagari script has got the least accuracy and gets confused with
Telugu whereas high accuracies are shown for
Manipuri and
Odia and
Bangla.
Borda count algorithm gives an accuracy of 93.5% which is an increase of 2.1% over the best performing individual classifier. It provides the highest recognition rate for
Devanagari among all the combination schemes and good accuracies for other popular scripts like
Bangla and
Odia and hence can be the preferred choice for wide usage. The trainable version of the algorithm with weights based on overall accuracy of the classifiers improves the results further. The increase is 2.9% with satisfactory results for scripts like
Telugu,
Kannada and
Urdu. The accuracy for the
Gurumukhi script remains low irrespective of the weights. The results are presented in
Table 5 and
Table 6.
The simple rules at the measurement level to combine the decisions provide good results in the present work. The sum rule attains an accuracy of 97.76% with almost close to perfect recognition for
Urdu,
Gurumukhi and
Roman. The product rule and max rule have accuracies of 95.73% and 94.60% respectively. Highest accuracy is found for
Odia script whereas product rule suffers in case of
Gurumukhi and max rule in case of
Devanagari. The results for the elementary rules of combination are tabulated in
Table 7,
Table 8 and
Table 9.
Sum rule outperforms all other rule based combination approaches in this work and testifies the results presented by Kittler et al. mentioned in [
38] by being less prone to noise and unclean data. The DS theory results combine the results, two at a time and then all three together. The class-wise performance based BPA, which outperforms the global performance based
BPA, has been implemented for the multi-classifier combination using the DS theory [
45]. The rule applied for this process is quasi-associative and hence the results of combining two sources cannot be combined with the third. The rule has to be extended to include all the three sources together. Results for the combination of the classifier results on HOG and Elliptical features, MLG and Elliptical features, and, HOG and MLG features are presented in
Table 10,
Table 11 and
Table 12respectively. The combination result including all the three sources of information is given in
Table 13.
There is no improvement shown by the combination of the results from MLG and HOG feature sets. But when the Elliptical feature set is involved in the combination process there is much improvement over the participating classifiers. Overall accuracies of 91.2% and 97.04% are achieved by combining sources having 78.1% and 79.4% accuracies and 91.4% and 79.4% accuracies respectively. So, improvements of 6% and 10% are found by applying the DS theory of evidence. Combining all three, an accuracy of 95.64%, more than 4% over the better performing classifier is seen. In both the schemes all the script classes have accuracies over 90% and with almost 100% accuracy for certain scripts like Manipuri, Gujarati and Urdu, thus proving to be the model to be used where these scripts are widely used.
In order to understand why the results from the Elliptical feature set combine so well with the two other feature sets, correlation analysis is performed on the confidence score outputs. Spearman rank correlation is done on the rank level information provided by the classifiers to arrive at mean values for the measure of the correlation. HOG and MLG show an index of 0.619 which is almost the double of the scores obtained by comparing the Elliptical features with these two. With values of 0.32 and 0.27, the low correlation index is an indication of better possibilities for the combination processes. Thus, complementary information is provided by the output of Elliptical feature set which helps in the improvement the overall combined accuracy.
Secondary classifiers are applied to learn the patterns from the primary classifier outputs and develop a way to combine them. The confidence scores from the three sources are concatenated to form a larger training set with its correct label. This set is the new feature set which undergoes classification using well-known algorithms. Classifiers like
k-NN, Logistic Regression, MLP and Random Forest are applied to report final results which are tabulated in
Table 14,
Table 15,
Table 16 and
Table 17 respectively. The results are reported after 3-fold cross validation and tuning of the parameters involved. This process is computationally costly and takes a processing step along with much higher complexity but is compensated by the high accuracy results that are obtained. 3-NN provides an accuracy of 98.30%, Random Forest classified 98.33% of the 7200 samples correctly and Logistic Regression attained 98.48% accuracy. Using MLP again as the secondary classifier, 98.36% accuracy is obtained.
Devanagari is the most confused script in all the cases but still has accuracy over 95%. The other scripts are predicted to almost certainty.
Script recognition is a difficult task given the variation in the words for a particular script. But the results are really encouraging for building a model that can identify the script with certainty. The results obtained after the combination exceeds the reported accuracies for this certain task and hence set the new benchmark.
Table 18 provides the class wise accuracy along with the overall accuracy achieved by each procedure used in the paper. It shows that the Logistic Regression classifier acting on the MLP classifier outputs provide the best result where 98.45% accuracy is obtained with an improvement of 7.05%. Results are also obtained from the feature level combination. Natural combination or concatenation of two features at a time and all three together are done. The new feature set formed in each case undergoes the same process of classification through the MLP classifier. The comprehensive results for comparison are given in the following
Table 19.