*3.3. Alphanumeric Character Segmentation*

In the next stage, the alphanumeric characters that made up the license plate were extracted from the region resulting from the previous step. To this end, we first applied a gray level histogram equalization algorithm to increase the contrast level of the license plate region. Next, we converted the resulting image into a binary image using a global threshold of 0.5 (in the range [0, 1]). Then, the image was smoothed using a 3 × 3 median filter and salt-and-paper noise was removed from it. Any object that remained in the binary image after these operations was considered to represent a character. We segmented these characters using CCA. Each segmented character was resized to 56 × 56 pixels. The character segmentation process is illustrated in Figure 5 and Algorithm 1.

**Figure 5.** Alphanumeric character segmentation: (**A**) before gray level histogram equalization; (**B**) after gray level histogram equalization; (**C**) median filtered and noise removed image; (**D**) binary image; (**E**) character region segmentation; and (**F**) extracted and resized characters.

**Algorithm 1:** Alphanumeric character segmentation algorithm.

**Result:** AlphanumericCharacter (AC) **Input**: LicensePlate (P); HEI = histogramEqualizationImage(CI); BI = binaryImage(HEI); NRI = noiseRemovedImage(BI); **if** *hasObject* **then** CO = countObject(NRI); BB = defineBoundingBoxUsingConnectedComponentAnalysis(CO); CC = cropEachCharacter(CO,BB); RC = Resize(CC,[53×53]); **end**

#### *3.4. Feature Extraction*

To be able to identify the characters of a license plate, we needed to first extract features from them that define their characteristics. For this purpose, we used HOG features [17] as it has successfully been used in many applications. To calculate the HoG features, the image was subdivided into smaller neighborhood regions (or "cells") [60]. Then, for each cell, at each pixel, the kernels [−1, 0, +1] and [−1, 0, +1] *<sup>T</sup>* were applied to get the horizontal (*Gx*) and vertical (*Gy*) edge values, respectively. The magnitude and orientation of the gradient were calculated as *M*(*x*, *y*) = - (*G*<sup>2</sup> *<sup>x</sup>* + *G*<sup>2</sup> *<sup>y</sup>*) and *θ*(*x*, *y*) =

*tan*−<sup>1</sup> *Gy Gx* , respectively. Histograms of the unsigned angle (0°to 180°) weighted on the magnitude of each cell were then generated. Cells were combined into blocks and block normalization was performed on the concatenated histograms to account for variations in illumination. The length of the resulting feature vector depends on factors such as image size, cell size, and bin width of the histograms. For the proposed method, we used a cell size of 4 × 4 and each histogram was comprised of 9 bins, in order to achieve a balance between accuracy and efficiency. The resulting feature vector was of size 1 × 6084. Figure 6 gives a visualization of HOG features for different cell sizes.

**Figure 6.** HOG feature vector visualization with different cell sizes.

#### *3.5. Artificial Neural Network (ANN) Architecture*

Once the features were extracted from the segmented characters, we trained an artificial neural network to identify them. As each character was encoded using a 1 × 6084 feature vector, the ANN had 6084 input neurons. The hidden layer comprised 40 neurons and the output layer had 36, equal to the number of different alphanumeric characters under consideration. Figure 7 shows the proposed recognition process along with the architecture of the ANN.

**Figure 7.** Proposed recognition process along with the artificial neural network architecture.

## **4. Experimental Results**

Initially, we created a synthetic character image database to train and test the classification method. In addition, we created a database of real images, using the image acquisition method discussed above, to test the proposed method. We justified the selection of HOG as the feature extraction method and ANN as the classifier by comparing its performance with other feature extraction and classification methods. We also compared the performance of our method to similar existing methods. To conduct these experiments, an Intel®Core™i5 computer with 8 Gigabytes of RAM, running Windows Professional 64-bit operating system was used. MATLAB® framework (MathWorks, Natick, MA, USA) was used for the implementation of the proposed method as well as the experiments.

## *4.1. Generation of a Synthetic Data for Training*

Using synthetic images is convenient as it enables the creation of a variety of training samples without having to manually collect them. The database was created by generating characters using random fonts and sizes (12—72 in steps of 2). In addition, one of four styles (normal, bold, italic, or bold with italic) was also used in the font generation process. To align with the type of characters available in license plates, we included the 10 numerals (0–9) as well as the characters of the English alphabet (A–Z). We also rotated the characters in the database by a randomly chosen angle between ±0° and ±15°. Each class contained 200 samples consisting of an equal number of rotated and unrotated images. In total, 36 × 200 = 7200 images were used to train the ANN. Some characters from the synthetic training database are shown in Figure 8.

**Figure 8.** Synthetic samples for training purposes.

#### *4.2. Performance on Synthetic Data*

The ANN, discussed in Section 3, was trained on 70% of the synthetic database (5040 randomly selected samples). The remaining images were split evenly for validation and testing (1080 samples each). To determine the ideal number of hidden neurons, ANNs with different numbers of neurons (10, 20, and 40) were trained on these data five times each (see Table 2). The network with the best performance (with 40 hidden neurons) was selected as our trained network. We further increased the number of neurons to 60 and recorded comparably high processing time without reducing the error (%). The overall accuracy of the classification was 99.90% and the only misclassifications were between the classes **0** and **O**.


**Table 2.** Training performance with respect to hidden neuron size. The best performance is highlighted in bold.

## *4.3. Performance on Real Data*

To test the performance on real-data, we acquired images at several locations in Kuala Lumpur, Malaysia using the process discussed in Section 3.1. In total, 100 vehicle license plates were used in this experiment, where each plate contained 5–8 alphanumeric characters. Then, 671 characters were

extracted from the license plate images using the process discussed above. Classes **I**, **O**, and **Z** were not used here as they are not present in the license plates, as discussed above. The number of characters in each class is shown in Table 3. An accuracy level of 99.70% was achieved. The misclassifications in this experiment occurred among the classes **S**, **5**, and **9**, likely due to similarities in the characters. Real-time classification results for a sample license plate image are shown in Figure 9.


**Table 3.** The number of characters extracted from license plate images in each class.

**Figure 9.** Real-time experimental results for license plate character recognition.

#### *4.4. Comparison of Different Feature Extraction and Classification Methods*

We also compared the proposed method with combinations of feature extraction and classification methods with respect to accuracy and processing time. Bag of words (BoF) [61], scale-invariant feature transform (SIFT) [62], and HOG were the feature extraction methods used. The classifiers used were: stacked auto-encoders (SAE), k-nearest neighbors (KNN), support vector machines (SVM), and ANN [63]. Processing time was calculated as the average time taken for the procedure to complete, as discussed in Section 3 (license plate extraction, characters extraction, feature extraction, and classification). The same 100 images used in the previous experiment were used here. The results are shown in Table 4.


**Table 4.** Comparison of performance for different combinations of feature extraction and classification methods. The best performance per metric is highlighted in bold.

#### *4.5. Performance Comparison with Other Similar Methods*

Table 5 compares the performance of the proposed algorithm with other similar AVLPR methods in the literature. We reimplemented these methods on our system and trained and tested them on the same datasets to ensure an unbiased comparison. The training and testing was performed on our synthetic and real image databases, respectively. Note that the processing time only reports the time taken for the feature extraction and classification stages of the process. Accuracy denotes the classification accuracy. Since the training dataset was balanced, we did not consider bias performance metrics such as sensitivity and precision. As can be seen in Table 5, the proposed method outperformed the other compared methods.

**Table 5.** The classification performance for the proposed method compared to similar existing methods. The best performance per metric is highlighted in bold.


#### *4.6. Comparison of Methods with Respect to the Medialab Database*

We compared the performance of our method to the methods discussed above, on a publicly available database (Medialab LPR (License Plate Recognition) database [64]) to investigate the transferability of results. This database contains still images and video sequences captured at various times of the day, under different weather conditions. We included all still images from this database except for ones that contained more than one vehicle. As our image capture system was specifically designed to capture only one vehicle at a time in order to simplify the subsequent image processing steps, images with multiple vehicles are beyond our scope.

The methods were compared with respect to the different stages of a typical AVLPR system (detection, character segmentation, and classification). Table 6 shows the comparison results (detection, character segmentation, and classification accuracy show the percentages of vehicle number plate regions detected, characters accurately extracted, and accurate classifications, respectively). As our method of pre-defined rectangular region selection (shown in Figure 2) was specifically designed for our image acquisition system, we did not consider this step in the comparison as different image acquisition systems were used in obtaining images for this database. Instead, the full image was used for detecting the license plate.


**Table 6.** Performance comparison on the Medialab LPR database. The stages that were not addressed in the original papers are denoted by "—". The best performance per metric is highlighted in bold.

Note that some of the methods in Table 6 do not address all three stages of the process. This is due to the fact that some papers only performed the latter parts of the process (for example, using pre-segmented characters for classification) and others did not clearly mention the methods they used. All methods were trained on our synthetic database as discussed above (no retraining was performed on the Medialab LPR database).

As can be seen in Table 6, the proposed method outperformed the other methods compared. However, the overall classification performance for all methods was slightly lower than that on our database (Table 5). We hypothesize that it could be due to factors such as differences in resolution and capture conditions (for example, weather, and time of day) of the images in the two databases.

#### *4.7. Comparison of Methods with Respect to the UFPR-ALPR Database*

We further compared the performance of our method with other existing methods (discussed above) on another publicly available database (UFPR-ALPR), which includes images that are more challenging [56]. This database contains multiple images of 150 vehicles (including motorcycles and cars) from real-world scenarios. The images were collected from different devices such as mobile cameras (iPhone® 7 Plus and Huawei® P9 Lite) and GoPro® HERO®4 Silver.

Since we used a digital camera to capture images in our method, we only considered those images in the UFPR-ALPR database that were captured by a similar device (GoPro® HERO®4 Silver). The database is split into three sets: 40%, 40%, and 20% for training, testing, and validation, respectively. We only considered the testing portion of the database to perform this comparison. In addition, we did not consider motorcycle images here, as our method was developed for cars. First, we converted the original image from 1920 × 1080 to 1498 × 946 pixels (to keep it consistent with the images of our database). Then, we performed the detection and recognition processes on the resized images.

As can be seen in Table 7, the proposed method outperformed the other methods with respect to accuracy of detection, segmentation, and classification. However, the overall classification performance for all methods were lower than on ours and Medialab LPR databases (Tables 5 and 6). We hypothesize that this could be due to factors such as the uncontrolled capture conditions (i.e, speed of vehicle, weather, and time of day) of the images in the three databases.


**Table 7.** Performance comparison on the UFPR-ALPR database. The stages that were not addressed in the original papers are denoted by "—". The best performance per metric is highlighted in bold.
