Two-Step Algorithm for License Plate Identification Using Deep Neural Networks

Kundrotas, Mantas; Janutėnaitė-Bogdanienė, Jūratė; Šešok, Dmitrij

doi:10.3390/app13084902

Open AccessArticle

Two-Step Algorithm for License Plate Identification Using Deep Neural Networks

by

Mantas Kundrotas

,

Jūratė Janutėnaitė-Bogdanienė

^* and

Dmitrij Šešok

Department of Information Technology, Vilnius Gediminas Technical University, LT-10223 Vilnius, Lithuania

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(8), 4902; https://doi.org/10.3390/app13084902

Submission received: 17 February 2023 / Revised: 7 April 2023 / Accepted: 9 April 2023 / Published: 13 April 2023

(This article belongs to the Section Computing and Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

:

License plate identification remains a crucial problem in computer vision, particularly in complex environments where license plates may be confused with road signs, billboards, and other objects. This paper proposes a solution by modifying the standard car–license plate–letter detection approach into a preliminary license plate detection–precise license plate detection of the four corners where the numbers are located–license plate correction–letter identification. This way, the first algorithm identifies all potential license plates and passes them as input parameters to the next algorithm for more precise detection. The main difference between this approach and other algorithms is that it uses a relatively small image compared to the whole vehicle. Thus, a small but robust network is used to find the four corners and perform a perspective transformation. This simplifies the letter recognition task for the next algorithm, as no additional transformations are required. This solution could be useful for research focusing on this specific task. It allows to apply another compact but robust neural network, increasing the overall speed of the system. Publicly available datasets were used for training and validation. The CenterNet object detection algorithm was used as a basis with a modified Hourglass-type network. The size of the network was decreased by 40% and the average accuracy was 96.19%. Speed significantly increased, reaching 2.71 ms and 405 FPS on average.

Keywords:

deep learning; vehicle detection; convolutional neural networks; image processing; license plate detection and recognition

1. Introduction

License plate (LP) identification is a critical process in many areas due to the significant number of vehicles caused by urbanization and related issues such as toll collection, vehicle verification, and speeding detection. Collected data can help identify mobility patterns, travel demand, congestion patterns in cities, traffic management [1], and even estimate emissions [1,2]. There are various LP recognition and identification algorithms proposed, but they can be divided into two main categories: those using deep learning and traditional algorithms. Images are now collected not only by speed recorders from passing vehicles or people but also by drones, and they are often taken at oblique angles and can be blurry, distorted, etc. Therefore, it is important to improve the quality of the images. One solution based on the phase congruency model (PCM) uses a combination of DCT-PCM (discrete cosine transform PCM) for plate text detection. The proposed method presents a new clustering approach for eight neighboring pixels that aims to avoid background pixels being classified as candidate pixels [3]. This approach helps to extract the edges of the text, and false positives are further eliminated by applying a fully connected neural network. The method achieves competitive results for the total-text dataset when compared to other methods. However, it should be noted that the proposed method was specifically designed to work with both drone images and usual images taken in an orthogonal direction. Other methods may perform better when dealing with specific types of datasets. Another solution used two generative adversarial networks (GANs) to remove noise in the image and increase image resolution [4]. Although computational time is larger compared to similar methods, the final image quality is increased significantly. Deep convolutional neural networks (CNNs) are commonly utilized for various image recognition tasks due to their ability to automatically extract features, making them a powerful tool in medicine, human pose, and face recognition [5,6,7,8]. License plate matching is typically achieved through license plate character registration (LPCR) or license plate feature (LPF) extraction. A novel LPF-based LP recognition method was developed using a deep CNN. This method employs a multitask learning approach that first recognizes parallel letters and then classifies images [9]. The proposed CNN model was designed using a modified VGG (visual geometry group) architecture, which yielded better results compared to existing state-of-the-art methods. However, it should be noted that the VGG model is a large network and may take more time to train, which could be a drawback in situations where real-time license plate identification is required. Another approach that uses a deep neural network, called the capsule network, improves the processing time by integrating the segmentation process and extracting features from the segmented data using the CN framework [10]. Capsule networks are advantageous in preserving detailed information throughout the network. However, routing methods such as EM-routing or routing by agreement have a negative effect on the capsule network, preventing it from distinguishing inputs and their negative counterparts. A two-stage DNN is used to detect license plates on the unedited raw image using YOLOv2 (You Only Look Once), and then cropped images are used to recognize license plates. While it has demonstrated excellent recognition results, the computational time could be improved [11]. A common problem that occurs during the recognition process is data loss being passed from one layer to another. To address this issue, an adversarial network is used to restore lost input data and then passes it to the following network (YOLOv3), which is responsible for finding symbols and can do so in three different feature map sizes: 31, 16, and 8 [12,13]. YOLOv3 employs scaled anchor boxes, while YOLOv2 uses boxes of the same size. Despite yielding better results, YOLOv3 still faces challenges in detecting small objects. A slower but more efficient three-step method uses a modified Mask-RCNN network without a segmentation step [14], as well as another Mask-RCNN network for LP number recognition. Every symbol is found and identified using additional small symbol filtering and clustering based on position. This method has demonstrated promising results [15], with higher precision compared to most other methods. License plates that failed to be detected typically had either very low or very high brightness levels. Currently existing ALPR systems often require significant computer resources, and achieving a balance between high accuracy and speed remains an active topic. With the emergence of the IoT, various solutions have been proposed by developing smaller systems that require less computational power. Using the YOLOv5 light version with IoT devices or YOLOv5 for license plate detection results in faster computations compared to earlier versions of YOLO [16,17]. Slight improvements are still needed for the proposed solutions, as license plates with poor illumination, captured during nighttime or under adverse weather conditions, are not accurately detected and identified. Convolutional neural network architecture is a crucial factor in achieving speed and quality, and various architecture solutions have been proposed for specific problems. The hourglass-type network is commonly used for human pose estimations, saliency, facial landmark detection, and more [5,6,7]. However, hourglass networks are quite large, which results in reduced network speed.

Selmi et al. presented a system for license plate detection and recognition divided into three subcategories: LP detection, character segmentation, and character detection. License plate detection is accomplished in the following steps: the image (RGB) is converted to an HSV image, small elements are extracted using contrast maximization, a Gaussian blur filter is used to remove details and noise from images, adaptive thresholding is applied to eliminate insignificant regions in the image, contours are found, geometric filtering is applied, and finally, a CNN is used to detect license plates. The CNN is composed of four layers, with two convolutional layers for feature extraction and two fully connected layers [18]. Another license plate detection method for complex backgrounds detects the vehicle region using Fast R-CNN and generates candidate regions. Then, license plates are classified and all non-license plates are removed [19]. Both proposed solutions demonstrated high precision and recall. However, the first step involves identifying the vehicle itself, which requires more computational time and may be too slow for real-time license plate detection.

Zhang et al. proposed the V-LPDR framework to address the license plate detection and character recognition problem in unconstrained scenarios. The framework uses a novel flow-guided spatiotemporal attention network, which is divided into three modules: detection backbone, flow-guided feature warping, and spatiotemporal attention block. Results show very high accuracy for the proposed solution [20]. The primary drawbacks that persisted included the overall processing time for a frame, which is unsuitable for some real-time applications, and reduced performance with distorted images. Tung et al. proposed a license plate detection method using RetinaFace and MobileNet to predict the license plate from the original image and determine the coordinates of the four key points on the license plate. The output of the detection module is then used as input for the character recognition module [21]. Improvements can still be made to the dataset, as a novel dataset was used for training and it lacked detailed parameters compared to established datasets. This resulted in subpar performance; however, updating the dataset would likely lead to improvement. Additionally, license plates in complex environments were not detected as effectively. For object detection and recognition, a YOLO-based network was employed. However, the approach was conventional, involving detection of the car followed by the detection of the license plate, which led to increased computational time. To address the problem of small-object detection, annotations are automatically generated for larger areas [22].

Another modern approach for license plate detection involves the use of a kernel density function. Pre-processing is carried out using a binary technique [23]. The process involves downsampling the image and then converting it to a grayscale image. Candidate images are then extracted using a kernel density estimator that complies with the binary conversion technique. Nonetheless, the proposed solution requires improvement for handling vehicles in motion or images containing multiple vehicles. In [24], a real-time detector called STELLA was used for license plate detection. The detection network is implemented on RetinaNet and utilizes the feature pyramid network (FPN), while recognition is carried out using CRNN. Yu et al. used the CenterNet detection approach and a reduced architecture of Hourglass-104 as the backbone for CenterNet [25]. The novelty of their approach was that instead of using bounding boxes, they used bounding ellipses. Similarly, the proposed approach in this paper also uses the CenterNet detection method as a basis and an hourglass network as a backbone. However, instead of the standard car–license plate–letter detection approach, a preliminary license plate detection method is employed, followed by precise detection of the four corners where the numbers are located, license plate correction, and finally, letter identification. The method was tested for vehicle detection at a single site, and data labeling is expected to be challenging.

2. Materials and Methods

License plates are easily confused with surrounding objects such as road signs, logos, commercials, road-marking symbols, etc. Therefore, it is essential that the data provided for license plate detection algorithms is as clear as possible, with minimal incorrect and potentially confusing information. Vehicle detection algorithms can be used as a solution, with the potential to provide better overall accuracy of the system. However, they may show poor results in situations where the license plate is very close to the camera, and such systems can be rather slow [26].

The solution proposed in this paper involves changing the identification process from standard car–number–letter recognition to finding the predicted license plate. It detects plate corners, corrects plate position, and then performs letter recognition. This approach differs from other algorithms in that the resulting image size is relatively small compared to the vehicle, enabling the use of a compact and fast network for plate corner detection. These coordinates are then used to perform a perspective transformation that corrects any distortion or rotation and returns the license plate to its original position, making precise license plate recognition easier for the second algorithm. The proposed solution is illustrated in Figure 1 and Figure 2.

All experiments were conducted using publicly available databases: AOLP [27], Caltech Cars [28], EnglishLP [29], OpenALPR [30], UFPR-ALPR [31], and Platesmania data that were published in article [32]. All these datasets are different and their summary is presented in Table 1.

2.1. Training and Validation

Training data was from datasets Platesmania, UFPR-ALPR, and AOLP. Validation/testing data was from AOLP, UFPR-ALPR, CaltechCars, OpenALPR, and EnglishLP databases. Data distribution between training and validation was based on the distribution given in paper [5]. The data used for training and testing had marked license plates, and all plates had their four corners marked to test rotation and the second part of detection. All data were stored in the TFRecord format, with the image stored as an integer within the range of [0, 255]. It was stored with compression that did not affect the quality of the image. Additionally, a 32 × 32 × 5 matrix with ground truth data was stored, which was generated from the marked plates. For the precise license plate detection step, images were stored in the same format, except that they were already cropped, containing only candidate license plates and an additional vector of nine elements. The elements consisted of 4 × 2 + 1 X and Y coordinates of the license plate corners and a confidence score that indicates whether it is a license plate or not. Rotation, distortion, magnification, reduction, center modification, merging, and color processing augmentation was performed on all the training data. To simulate smaller license plates in traffic scenarios, multiple image combination augmentation was applied, and validation data was left unmodified.

“Area” resizing interpolation was used both for training and validation data. To maintain the same conditions for all experiments, the same training strategy was applied: the first 50 epochs use “Adam” optimizer with learning rate 1 × 10⁻⁴,

β_{1} = 0.9

,

β_{2} = 0.999

,

ε = 1 {\times 10}^{- 5}

, and amsgrad = true. Then, SGD optimizer with cyclic learning rate and base = 1 × 10⁻³ is used. The model training process stops when the test loss does not improve for 10 consecutive epochs. After that, the learning coefficient is reduced ten times and the training process is repeated twice. Accuracy of the license plate number detection algorithm is evaluated using two metrics:

Object detection check. IoU (Intersection Over Union) is used with a threshold of 0.4. It evaluates predicted object coordinates with real coordinates.
Object coordinates error check. Euclid distance was calculated between every original point and predicted point (1).

e r r = \frac{\sqrt{{(x_{p} - x)}^{2} + {(y_{p} - y)}^{2}}}{\sqrt{{(x_{m a x} - x_{m i n})}^{2} + {(y_{m a x} - y_{m i n})}^{2}}}

(1)

The system’s speed was calculated using TensorFlow 1.15 with an NVIDIA GPU 1080Ti. To simulate real-life scenarios, the neural network’s speed was tested with a group size of 16, allowing for the simultaneous processing of 16 images. This process was repeated 100 times and the average single processing speed was calculated. Finally, the single image processing time from all 16 images was computed. The speed results are presented in frames per second or milliseconds.

2.2. Model

The proposed two-step algorithm uses an Hourglass type network structure in both steps. The standard structure was modified to increase calculation speed by replacing two standard Hourglass blocks [5] with a single block that has more convolutional layers and filters. This resulted in a 40% reduction in the network size and a doubling of the speed. General network structure is presented in Figure 3.

For images that are less than 1000 × 1000, a universal size of 256 × 256 was used to balance model speed and precision. Additionally, the inner block structure of the Hourglass network was modified to increase speed by using Resnet blocks, as shown in Figure 4a. The network structure for precise license plate identification is the same, but the input data is 128 × 128 × 3 and the output data is 32 × 32 × 9 (Figure 4b).

The structures of the merge and residual blocks are given in Figure 5a and Figure 5b, respectively. The steps for the merge block are as follows: 8 × 8 × 16 input -> Bilinear Upsample 16 × 16 × 16 -> Conv 1 × 1 with BatchNorm (no activation and filters matching previous output block with matching scale, for example, 32 filters) -> 16 × 16 × 32 concat with previous block, which is 16 × 16 × 32 -> 16 × 16 × 64 -> apply batch normalization -> output 16 × 16 × 64. Then residual blocks are used, like 3× and merge (known as Upscale) block is applied again, which results in 32 × 32 × 64 output. Residual block steps: input -> Conv 1 × 1 with output filters × 0.5 and bias, BatchNorm and LeakyRelu 0.01 -> Conv 3 × 3 output filters matching input filters with bias, BatchNorm, LeakyRelu 0.01 -> Conv 1 × 1 with output filters equal to filters (this conv expands again), no batchnorm and no activation -> Add no activation.

This paper is based on the CenterNet object detection method [33], in which each type of object is considered as a separate class N, and each class has 5 elements: the center point, the axis x difference coordinates, the axis y difference coordinates, the object height, and the object width. In this approach, a Gaussian distribution is applied to the feature map, which depends on the object’s size. The main formulas are given in Equations (2)–(5):

c_{x} = \frac{x_{m i n} + x_{m a x}}{2},

(2)

c_{y} = \frac{y_{m i n} + y_{m a x}}{2},

(3)

Y_{x y} = e x p (\frac{- {(x - c_{x})}^{2} + {(y - c_{y})}^{2}}{2 σ^{2}}),

(4)

\{\begin{matrix} {O f f x_{x y} = c_{x} - x, O f f y_{x y} = c_{y} - y, w h e n Y}_{x y} > 0, \\ W_{x y} = x_{m a x} - x_{m i n}, H_{x y} = y_{m a x} - y_{m i n}, w h e n Y_{x y} > 0, \end{matrix}

(5)

where

c_{x}

and

c_{y}

are the center of each object,

σ

is the standard deviation, and

Y_{x y}

is a Gaussian distribution. Coordinates of the upper left point (

x_{m i n}, y_{m i n}

) and lower right point (

x_{m a x}, y_{m a x}

) are used to calculate the center coordinates of each object (

c_{x}, c_{y}

), as given in (2), (3).

O f f x_{x y}

is the distance from the current pixel center point to the object’s upper left x point;

O f f y_{x y}

is the distance from the object’s current pixel point to the object’s upper left y point;

W_{x y}

and

W_{x y}

are width and height, respectively, as given in (5). Data is decoded and predicted height and width are then used to obtain the upper and lower left points of the license plate. The loop iterates through each element of the matrix of center points returned by the network. If the value of that element is greater than the specified limit, other values derived by the network are considered:

p r e d i c t i o n S i z e W

,

p r e d i c t i o n S i z e H

,

p r e d i c t i o n O f f x

, and

p r e d i c t i o n O f f y

. Since the X and Y values of

p r e d i c t i o n O f f

represent the distance and direction of each existing pixel to the original center of the object, adding these distances to the existing X and Y coordinates yields the center point. Adding half the estimated height and width to it produces two points: (x_min, y_min)—the upper left point, and (x_max, y_max)—the lower right point. The decoding algorithm is shown in (6):

\begin{matrix} F o r y a x i s i n r a n g e (0, m a p S i z e Y) : \\ F o r x a x i s i n r a n g e (0, m a p S i z e X) \\ i f p r e d i c t i o n P o i n t [y a x i s, x a x i s] > t h r e s h o l d : \\ h a l f_{w} = p r e d i c t i o n S i z e W [y a x i s, x a x i s] / 2 : \\ h a l f_{h} = p r e d i c t i o n S i z e H [y a x i s, x a x i s] / 2 \\ C_{x} = x a x i s + p r e d i c t i o n O f f x [y a x i s, x a x i s] \\ C_{y} = y a x i s + p r e d i c t i o n O f f y [y a x i s, x a x i s] \\ x_{m i n} = C x - h a l f_{w}, \\ x_{m a x} = C x + h a l f_{w} \\ y_{m i n} = C y - h a l f_{h}, \\ y_{m a x} = C y + h a l f_{h} \end{matrix}

(6)

In order to detect small objects more easily and minimize the number of false negative objects, the input size for the CenterNet method needs to be increased. However, the goal was to achieve accurate results without increasing the amount of input data. While CenterNet can efficiently determine the center point of an object even when it is poorly visible, X and Y distance outputs to the original center point, its predictions for the height and width of poorly visible objects are not precise. To prevent the network from learning distances, a modification to the data encoding was implemented as given in (7). The parameters

O f f x

,

O f f y

, H, and W were changed to

o f f_{x_{m i n}}

,

o f f_{y_{m i n}}

,

o f f_{x_{m a x}}

, and

o f f_{y_{m a x}}

, while leaving the center points of the object unchanged. Figure 6 illustrates the encoded license plate coordinates.

\begin{matrix} F o r x m i n, y m i n, x m a x, y m a x i n B o x e s : \\ F o r y a x i s i n r a n g e (m i n, m a x) : \\ F o r x a x i s i n r a n g e (m i n, m a x) : \\ o f f_{x_{m i n}} = x m i n - x a x i s \\ o f f_{y_{m i n}} = y m i n - y a x i s \\ o f f_{x_{m a x}} = x m a x - x a x i s \\ o f f_{y_{m a x}} = y m a x - y a x i s \end{matrix}

(7)

Rotation, distortion, enlargement, reduction, centering, merging, and color processing augmentation were used for the training data, and the testing data were kept original:

Rotation tolerance [−45, 45] degrees.
Distortion (shear factor) horizontally and vertically [−0.1, 0.1]. Applied both separately and together horizontally and vertically.
Shift to random direction from center by randomly selected value in range [0, 0.05 × image diagonal].
Connection. Making a 2 × 2, 3 × 3 grid by cropping and joining several pictures. This simulates very small numbers in a large image.
Trimming. After choosing the number about which cropping will be performed, the algorithm cuts selectively in sections from license plate to image area ratio [0.001, 0.95].
Linear lightening and dimming. Randomly increasing brightness or decreasing darkness between [0, 50%]. Additionally distorted Gaussian kernels were applied to simulate varying intensity shadows or overexposed image patches. RGB, BGR, GRAYSCALE color channel shifts.
All augmentations have a 30% chance of success.

3. Results and Discussion

3.1. Comparing to CenterNet

In this section, the results of the two-step license plate identification algorithm are presented. In the first step, we performed CenterNet test training with a small amount of data to check whether all the elements related to training were correct. Therefore, 500 training images and 500 testing images in TFRecord form were generated without any augmentation (see Figure 7, labeled “No_Augment1”). During training, the “Adam” optimizer with standard parameters was used. The results showed that the original model converged, although the results were not satisfactory. As a second step, training and validation data were generated from all training and validation data to determine the starting point. The trained network demonstrated better results (see Figure 7, labeled “No_Augment_2”).

Next, data was generated by following the augmentation and training protocols described in the previous section. The resulting images are labeled as “Augment_1” in Figure 7, and they show a positive improvement in the model’s performance. “Augment_2” shows even better results with a larger amount of data provided. The model was now capable of detecting both very small and very large objects, and the number of true positive objects had increased. However, the precision of the model still needed improvement, as it was detecting different numbers that are not necessarily related to license plates. Furthermore, the low recall indicates that the model was still unable to find a lot of small objects, which may be due to issues with the model’s structure, input size, or coding principles. To achieve high accuracy with this specific model, modifications to the coding and learning methods were necessary.

The proposed algorithm read and processed all generated data in TFRecord. The last four output matrices were modified to a new format. Then, the proposed network weights were reinitialized, and the network was retrained based on the CenterNet training protocol. The error function for calculating corner distances was changed from “L2” to “Wing Loss”. Training the network with all databases resulted in better performance compared to using the CenterNet method, as shown in Figure 7, labeled “2Point_wing”. The proposed model demonstrated an improvement in detecting smaller objects and errors. Therefore, the Hourglass type network with modified inner blocks, as shown previously, proved to be superior to the CenterNet method.

3.2. Detecting Small Objects

Although more smaller objects were detected, the results were still not satisfactory and the issue persisted. Therefore, additional modifications were made to the training process:

Weights were added to the error function. This way, the error for all negative pixels decreases and for all positive pixels increases, resulting in an increased number of false positives, but also more true positives.
Training data number in TFRecord increased to 50. This ensures that the model is not overfit.
Cropping augmentation was modified so that images would be normally distributed.

Before the changes, the most suitable option was “2Point_wing,” and after the changes, it became “2Point_wing_2.” As shown in Figure 7, the modifications increased both false positives and true positives. Additionally, the number of correctly identified smaller objects also increased. Since the aim was to improve the accuracy of the proposed system without changing the model structure, it was decided to add an additional detection step using the same structure as described previously:

The coordinates obtained from a predicted object are used to crop a potential license plate. Since there may be cases where the preliminary detection is biased, the coordinates are scaled twice before cropping.
Cropped images are processed by the network so that it returns nine values for each image: coordinates of four corners (x, y) and whether a license plate is present.

The cropped images are relatively small, so it was possible to further minimize the structure of the network by reducing the input to 128 × 128 × 3 and removing parts of the network that use the “Up-sampling2D” layer, as shown in Figure 8.

Once the model was trained, it was combined with the predicted number detection algorithm. The results are shown in Figure 9 for the AOLP database and labeled as “combined”. The biggest improvement is observed with the UFPR_ALPR database, which contains many small objects. The precision of object point detection was doubled.

There was also a significant decrease in the number of incorrectly detected false positive objects at the lower limit. However, there was also a decrease in the number of correctly detected objects, which occurred because the verification model returned a sufficiently low result for positive objects. Therefore, the number of positively detected objects decreased significantly at the higher limit. To solve this problem, it was decided to combine the results of the preliminary detection and verification models using Formula (8):

s c o r e = 1 s t a g e S c o r e \times 0.5 + 2 s t a g e S c o r e \times 0.5

(8)

The proposed modification suggests that using a threshold of 0.5 would achieve the highest possible result for these combinations. The results of this modification are shown in Figure 9, labeled as “combined_mergedScores”. Moreover, an additional advantage of the proposed verification model is that it provides the object coordinates of all corners, which can be utilized to perform perspective transformations, enabling the restoration of an image to its normal position, irrespective of whether it was rotated or bent.

3.3. License Plate Detection Comparison with Other Methods

In this section, a comparison is made with already existing license plate detection algorithms. The efficiency of the two-step license plate identification algorithm is presented and analyzed with different datasets. The speed for single-image processing and the speed for the neural network were calculated and compared to other methods. License plate detection methods in [31,32] require a vehicle detection step, making the overall license plate detection process slower. Table 2 shows the speed given as a sum of time required to detect the vehicle (YOLOv2) and the time required to detect the license plate (Fast-YOLOv2). Experiments were also performed with YOLOv3 and Fast-YOLOv3, but their performance is better on small objects. Zhang et al.’s research proposes a flow-guided spatiotemporal attention detection network to detect license plates in complex situations and track them [20]. The UFPR-ALPR dataset, consisting of 4500 images from 150 vehicles in real-world environments, was used during the experiments. Although precision and recall were better in all the discussed papers, they were significantly slower than the proposed license plate detection algorithm.

Results from the AOLP dataset showed that the proposed license plate detection algorithm’s precision and recall yield better results than [21] or [18], but worse than [32]. The AOLP dataset can be further divided into three subsets (AC, LE, and RP), and results vary for each, so average precision is taken into consideration. In [21], results show a precision of 97.70% with a recognition speed of more than 21 FPS. In [18], the average precision was 95.50% (AC—92.6%, LE—93.5%, RP—92.9%). It is worth noting that the network structures used can affect precision. Both [18,21], not being the most novel solutions, result in worse performance.

Results from the CaltechCars dataset showed that the proposed algorithm has better recall (99.19%) compared to [18,19], with recall and precision of 91.3% and 93.8%, and 96.83% and 98.39%, respectively. However, recall and precision were worse compared to [32], although the proposed algorithm was significantly faster.

A two-step license plate detection presented in [23] takes a similar approach, where the candidate license plate is extracted from the original image and then verified. Results show a high accuracy of 98.1% with 452 ms speed. Study [24] also used a different approach and achieved high precision. However, the current detection algorithm performs faster, although comparisons in these situations should be performed for the same datasets.

The proposed neural network performs poorly on very small objects. When the input data is minimized to 256 × 256, these objects are not visible to the human eye, which is likely why the model has difficulty detecting them. Images larger than this, such as 2500 × 2500 × 3, are downscaled to fit the input size, therefore, any objects that are approximately 50 pixels in diameter will disappear and appear as “clouds”. However, increasing the input size to 512 × 512 would improve the system’s performance in detecting these small objects at the cost of decreased overall system performance. This would likely result in a faster and more precise system than those analyzed in existing papers.

4. Conclusions

This study aimed to propose a solution for efficient license plate detection and recognition. In this work, a combination of two neural networks was developed, which was able to achieve an average accuracy of 96.19% at a speed of 405 frames per second. When using artificially augmented data to train any modification of the neural network, the results were 1.5–2 times better. Using two Hourglass type networks with a modified inner block structure allowed faster calculations. Its speed comes from preliminary license plate number detection, which is then passed to a second network for precise number identification. It is likely that an even wider variety of unique data would improve the accuracy of the neural network further. The next step would be to perform research by adding another light network for character recognition, the input of which is the license plates detected using the proposed two-step license plate identification algorithm. Because LPR or ALPR systems must work with a CPU-integrated camera or embedded device which require little energy consumption (5–10 W), the proposed algorithm could be applied since it does not require high computational resources and produces satisfactory results.

Author Contributions

Conceptualization and methodology, M.K. and D.Š.; software, M.K.; writing—original draft preparation, J.J.-B.; visualization, investigation, and editing, M.K. and J.J.-B.; writing—review, J.J.-B.; supervision, project administration, and funding acquisition, D.Š. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding authors.

Conflicts of Interest

The authors declare no conflict of interest.

References

Tang, J.; Wan, L.; Schooling, J.; Zhao, P.; Chen, J.; Wei, S. Automatic number plate recognition (ANPR) in smart cities: A systematic review on technological advancements and application cases. Cities 2022, 129, 103833. [Google Scholar] [CrossRef]
Li, R.; Yang, F.; Liu, Z.; Shang, P.; Wang, H. Effect of taxis on emissions and fuel consumption in a city based on license plate recognition data: A case study in Nanning, China. J. Clean. Prod. 2019, 215, 913–925. [Google Scholar] [CrossRef]
Mokayed, H.; Shivakumara, P.; Woon, H.H.; Kankanhalli, M.; Lu, T.; Pal, U. A new DCT-PCM method for license plate number detection in drone images. Pattern Recognit. Lett. 2021, 148, 45–53. [Google Scholar] [CrossRef]
Hamdi, A.; Chan, Y.K.; Koo, V.C. A New Image Enhancement and Super Resolution technique for license plate recognition. Heliyon 2021, 7, e08341. [Google Scholar] [CrossRef] [PubMed]
Newell, A.; Yang, K.; Deng, J. Stacked Hourglass Networks for Human Pose Estimation. In Computer Vision—ECCV 2016. ECCV 2016; Lecture Notes in Computer Science; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer: Cham, Switzerland, 2016; p. 9912. [Google Scholar]
Luo, H.; Han, G.; Wu, X.; Liu, P.; Yang, H.; Zhang, X. Cascaded hourglass feature fusing network for saliency detection. Neurocomputing 2021, 428, 206–217. [Google Scholar] [CrossRef]
Huang, Y.; Huang, H. Stacked attention hourglass network based robust facial landmark detection. Neural Netw. 2023, 157, 323–335. [Google Scholar] [CrossRef]
Arowolo, M.O.; Adebiyi, M.O.; Michael, E.P.; Aigbogun, H.E.; Abdulsalam, S.O.; Adebiyi, A.A. Detection of COVID-19 from Chest X-Ray Images using CNN and ANN Approach. Int. J. Adv. Comput. Sci. Appl. 2022, 13, 754–759. [Google Scholar] [CrossRef]
Hasnat, A.; Nakib, A. Robust license plate signatures matching based on multi-task learning approach. Neurocomputing 2021, 440, 58–71. [Google Scholar] [CrossRef]
Sathya, K.B.; Vasuhi, S.; Vaidehi, V. Perspective Vehicle License Plate Transformation using Deep Neural Network on Genesis of CPNet. Procedia Comput. Sci. 2020, 171, 1858–1867. [Google Scholar] [CrossRef]
Kessentini, Y.; Besbes, M.D.; Ammar, S.; Chabbouh, A. A two-stage deep neural network for multi-norm license plate detection and recognition. Expert Syst. Appl. 2019, 136, 159–170. [Google Scholar] [CrossRef]
Lee., Y.; Jun., J.; Hong, Y.; Jeon, M. Practical License Plate Recognition in Unconstrained Surveillance Systems with Adversarial Super-Resolution. In Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, Prague, Czech Republic, 25–27 February 2019; Volume 5, pp. 68–76. [Google Scholar]
Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
He, K.; Gkioxari, G.; Dollar, P.; Girshick, R. Mask R-CNN. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
Selmi, Z.; Halima, M.B.; Pal, U.; Alimi, M.A. DELP-DAR System for License Plate Detection and Recognition. Pattern Recognit. Lett. 2020, 129, 213–223. [Google Scholar] [CrossRef]
Batra, P.; Hussain, I.; Ahad, M.A.; Casalino, G.; Alam, M.A.; Khalique, A.; Hassan, S.I. A Novel Memory and Time-Efficient ALPR System Based on YOLOv5. Sensors 2022, 22, 5283. [Google Scholar] [CrossRef] [PubMed]
Khan, I.R.; Ali, S.T.A.; Siddiq, A.; Khan, M.M.; Ilyas, M.U.; Alshomrani, S.; Rahardja, S. Automatic License Plate Recognition in Real-World Traffic Videos Captured in Unconstrained Environment by a Mobile Camera. Electronics 2022, 11, 1408. [Google Scholar] [CrossRef]
Selmi, Z.; Halima, M.B.; Alimi, A.M. Deep Learning System for Automatic License Plate Detection and Recognition. In Proceedings of the 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Kyoto, Japan, 9–15 November 2017; pp. 1132–1138. [Google Scholar]
Kim, S.G.; Jeon, H.G.; Koo, H.I. Deep-learning-based license plate detection method using vehicle region extraction. Electron. Lett. 2017, 53, 1034–1036. [Google Scholar] [CrossRef]
Zhang, C.; Wang, Q.; Li, X. V-LPDR: Towards a unified framework for license plate detection, tracking, and recognition in real-world traffic videos. Neurocomputing 2021, 449, 189–206. [Google Scholar] [CrossRef]
Tung, C.L.; Wang, C.H.; Peng, B.S. A Deep Learning Model of Dual-Stage License Plate Recognition Applicable to the Data Processing Industry. Math. Probl. Eng. 2021, 2021, 3723715. [Google Scholar] [CrossRef]
Silva, S.M.; Jung, C.R. Real-time license plate detection and recognition using deep convolutional neural network. J. Vis. Commun. Image Represent. 2020, 71, 102773. [Google Scholar] [CrossRef]
Davix, C.A.; Christopher, C.S.; Judson, D. Detection of the vehicle license plate using a kernel density with default search radius algorithm filter. Optik 2020, 218, 164689. [Google Scholar] [CrossRef]
Gong, Y.; Deng, L.; Tao, S.; Lu, X.; Wu, P.; Xie, Z.; Ma, Z.; Xie, M. Unified Chinese License Plate detection and recognition with high efficiency. J. Vis. Commun. Image Represent. 2022, 86, 103541. [Google Scholar] [CrossRef]
Yu, B.; Shin, J.; Kim, G.; Roh, S.; Sohn, K. Non-Anchor-Based Vehicle Detection for Traffic Surveillance Using Bounding Ellipses. IEEE Access 2021, 9, 123061–123074. [Google Scholar] [CrossRef]
Bugeja, M.; Dingli, A.; Attard, M.; Seychell, D. Comparison of Vehicle Detection Techniques applied to IP Camera Video Feeds for use in Intelligent Transport Systems. Transp. Res. Procedia 2020, 45, 971–978. [Google Scholar] [CrossRef]
Hsu, G.S.; Chen, J.C.; Chung, Y.Z. Application-oriented-license-plate-recognition. IEEE Trans. Veh. Technol. 2013, 62, 552–561. [Google Scholar] [CrossRef]
Weber, M.; Peona, P. Caltech Cars 1999 (1.0) [Data set]. CaltechDATA. 2022. Available online: https://data.caltech.edu/records/fmbpr-ezq86 (accessed on 31 March 2023).
Srebrić, V. EnglishLP database. In Information Technology Application Project—Ministry of Science and Technology; University of Zagreb: Zagreb, Croatia, 2003. [Google Scholar]
OpenALPR Inc. OpenALPR-EU Dataset. 2016. Available online: https://github.com/openalpr/benchmarks/tree/master/endtoend/eu (accessed on 16 February 2023).
Laroca, R.; Severo, E.; Zanlorensi, L.A.; Oliveira, L.S.; Gonc¸alves, G.R.; Schwartz, W.R.; Menotti, D. A robust real-time automatic license plate recognition based on the YOLO detector. arXiv 2018, arXiv:1802.09567v6. [Google Scholar]
Laroca, R.; Zanlorensi, L.A.; Gonçalves, G.R.; Todt, E.; Schwartz, W.R.; Menotti, D. An efficient and layout-independent automatic license plate recognition system based on the YOLO detector. IET Intell. Transp. Syst. 2021, 15, 483–503. [Google Scholar] [CrossRef]
Zhou, X.; Wang, D.; Krahenbuhl, P. Objects as Points. arXiv 2019, arXiv:1904.07850. [Google Scholar]

Figure 1. Proposed two-step license plate recognition algorithm: the first network filters out candidate license plates, while the second network detects precise license plates. The final output is a clear image of only the license plate, which can then be used for the next step: character recognition. The goal of this work is to improve the license plate detection algorithm.

Figure 2. Potential number detection and corners identification: (a) original picture with a car; (b) the original coordinates are increased and the number is cut out; (c) the four corners of the number are found.

Figure 3. Hourglass structure used in candidate license plate detection and precise license plate detection.

Figure 4. CNN structure for license plate number detection. The structure is identical, but the input and output differ: (a) structure of the first network for preliminary number detection; (b) structure of the second network for precise number detection.

Figure 5. Used structures of inner blocks: (a) merge block; (b) residual block.

Figure 6. License plate encoding: (a) original image is converted to 256 × 256 image; (b) object center point is encoded using Gaussian distribution; (c) distances from the current pixel center point to the object’s upper left x point; (d) distances matrix from the object’s current pixel point to the object’s upper left y point; (e) distances from the current point of the center point to the lower left x point of the object; (f) distances matrix from the current pixel center point to the lower left y point of the object.

Figure 7. Results of two-step identification algorithm for the EnglishLP database: (a) precision scores; (b) recall scores; (c) F1 scores; (d) Pt1 point error scores; (e) Pt2 point error scores.

Figure 8. Improved CNN network structure for precise license plate identification.

Figure 9. Results of two-step identification algorithm for the AOLP database: (a) precision scores; (b) recall scores; (c) F1 scores; (d) Pt1 point error scores; (e) Pt2 point error scores.

Table 1. Used datasets.

Title	Size	Number of Records	Number of Records for Testing	Country
AOLP	352 × 240	2049	1268	Taiwan
CaltechCars	896 × 592	124	124	USA
EnglishLP	640 × 480	509	509	Europe
OpenALPR	various	445	445	Europe
UFPR-ALPR	352 × 240	2049	1268	Taiwan
Platesmania	various	722	446	USA, China, Europe

Table 2. Detection performance comparisons for various datasets. Precision for all datasets is average precision, recall is precise for each dataset, and speed is average for all datasets.

Method	Precision (%)	Recall (%)	Speed (ms\|fps)
UFPR-ALPR dataset
Laroca et al. [31]	98.33	97.33	11.16\|90 + 4.07\|246
Laroca et al. [32]	98.67	98.67	8.54\|117 + 3.09\|324
Zhang et al. [20]	99.11	99.39	122\|-
Proposed	96.19	85.61	2.2468\|444 + 0.4686\|2133
AOLP dataset
Laroca et al. [32]	99.85	99.85	8.54\|117 + 3.09\|324
Tung et al. [21]	95.92	97.60	-\|62.93
Selmi et al. [18]	95.50	95.43	-\|-
Proposed	96.19	100	2.2468\|444 + 0.4686\|2133
OpenALPR dataset
Laroca et al. [32]	98.52	98.52	8.54\|117 + 3.09\|324
Silva S. M., Jung, C. R. [28]	90.94	-	-\|-
Proposed	96.19	96.17
CaltechCars
Laroca et al. [32]	99.13	99.13	8.54\|117 + 3.09\|324
Selmi et al. [32]	93.80	91.30	-\|-
Kim et al. [19]	98.39	96.83	-\|-
Proposed	96.19	99.19	2.2468\|444 + 0.4686\|2133
EnglishLP
Laroca et al. [32]	100	100	8.54\|117 + 3.09\|324
Proposed	96.19	100	2.2468\|444 + 0.4686\|2133
Other
Kernel density [23]	98.1	-	2.71\|452
STELLA [24]	97.9	96.3	-\|30
Bounding ellipses [25]	-	-	-\|38

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kundrotas, M.; Janutėnaitė-Bogdanienė, J.; Šešok, D. Two-Step Algorithm for License Plate Identification Using Deep Neural Networks. Appl. Sci. 2023, 13, 4902. https://doi.org/10.3390/app13084902

AMA Style

Kundrotas M, Janutėnaitė-Bogdanienė J, Šešok D. Two-Step Algorithm for License Plate Identification Using Deep Neural Networks. Applied Sciences. 2023; 13(8):4902. https://doi.org/10.3390/app13084902

Chicago/Turabian Style

Kundrotas, Mantas, Jūratė Janutėnaitė-Bogdanienė, and Dmitrij Šešok. 2023. "Two-Step Algorithm for License Plate Identification Using Deep Neural Networks" Applied Sciences 13, no. 8: 4902. https://doi.org/10.3390/app13084902

APA Style

Kundrotas, M., Janutėnaitė-Bogdanienė, J., & Šešok, D. (2023). Two-Step Algorithm for License Plate Identification Using Deep Neural Networks. Applied Sciences, 13(8), 4902. https://doi.org/10.3390/app13084902

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Two-Step Algorithm for License Plate Identification Using Deep Neural Networks

Abstract

1. Introduction

2. Materials and Methods

2.1. Training and Validation

2.2. Model

3. Results and Discussion

3.1. Comparing to CenterNet

3.2. Detecting Small Objects

3.3. License Plate Detection Comparison with Other Methods

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI