Fast Localization and High Accuracy Recognition of Tire Surface Embossed Characters Based on CNN

Guo, Zhongfeng; Yang, Junlin; Qu, Xinghua; Li, Yuanxin

doi:10.3390/app13116560

Open AccessArticle

Fast Localization and High Accuracy Recognition of Tire Surface Embossed Characters Based on CNN

by

Zhongfeng Guo

,

Junlin Yang

^*,

Xinghua Qu

and

Yuanxin Li

Liaoning Provincial Key Laboratory of Intelligent Manufacturing and Industrial Robots, Shenyang University of Technology, Shenyang 110870, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(11), 6560; https://doi.org/10.3390/app13116560

Submission received: 20 April 2023 / Revised: 25 May 2023 / Accepted: 25 May 2023 / Published: 28 May 2023

Download

Browse Figures

Versions Notes

Abstract

:

To solve the problem of recognizing artificial tire-side pressure printing characters with low efficiency and high labor intensity, we propose a CNN-based method for tire surface character recognition. In the image pre-processing, the SSR algorithm is improved to enhance the contrast of characters, and the Normalized Cross Correlation template matching algorithm based on pyramid acceleration is proposed to quickly locate the “DOT” characters and segment them. The improved LeNet-5 network structure is used to recognize characters, and a self-built digital sample library is randomly divided according to the ratio of 8:2 to conduct digital recognition experiments. The experimental results show that the recognition accuracy of the training set can reach 95.9%, and the accuracy of the validation set is 99.5%. The accuracy of the testing set is 95.6%, which meets the practical application requirements. Moreover, the whole algorithm only needs to be implemented on a commonly configured CPU, reducing equipment costs.

Keywords:

character recognition; image processing; LeNet-5 network; SSR algorithm; template matching

1. Introduction

The side pressure printing of the car tire provides information such as manufacturers, tire specifications, trademarks, production dates, etc. During tire storage, installation, testing, and other tasks, the tire character information needs to be read. At present, the identification and recording of tire side embossed characters are mostly performed manually, which is inefficient, labor intensive, and prone to problems such as omission and misremembering. With the rise of machine vision, visual inspection methods are widely used in industry. However, there are still difficulties in machine vision recognition of DOT embossed characters for automotive tires. Due to the low contrast between the characters and the tire background, as well as the tendency of the tire surface to have stains, wear, and other interference, manual detection is difficult, and it is difficult to accurately identify the characters with traditional OCR methods. With the rapid development of deep learning technology, the solution to this problem has seen a turnaround [1,2,3,4].

Reference [5] uses the image guidance filtering algorithm to process the images of the pressure printing characters and uses the drip segmentation algorithm to divide the characters. The classification method of Hog+SVM is classified using character classification and identification category comparison experiments. Reference [6] uses deep learning to identify the tire pressure printing character, uses the combination of Hoff transformation and partitional formula to detect the concentric circular area, and uses the Faster R-CNN algorithm for DOT identification positioning. The network structure recognizes characters. Reference [7] proposed a method of multi-step deep learning to achieve character information recognition. To improve the speed of operation, it also proposed that the end-to-end deep learning recognition framework be combined with the regional positioning algorithm performance. Reference [8] developed online recognition and classification systems based on machine vision, obtained tire character segmentation images through template matching and edge detection algorithms, and finally implemented the classification recognition of characters through the minimum two-multiplication SVM. Reference [9] proposed a scale based on convolutional neural networks unchanged by convolutional neural networks to solve the problem of image scale changes and realize the positioning and recognition of characters. Reference [10] presented a deep convolutional network structure of its own design. This structure has a good improvement effect on character recognition through the non-linear operation of convolution kernel, lower sampling, and deep convolutional networks. Reference [11] proposed two lightweight neural networks with a hybrid residual and dense connection structure to improve the super-resolution performance. Reference [12] developed a lightweight convolutional neural network to realize automatic scratch detection for components in contact sliding, such as those in metal forming.

This article aims to meet actual testing needs and identify the side-printed characters on the side of the tire. Based on the combination of traditional machine vision and a neural network, we first pre-process the collected images, divide them into single-character images, and use improved LENET-5 recognition of the digital characters, which improves the accuracy of character recognition. The CPU is completed without configured high-performance graphics cards, which greatly reduces the hardware cost, and significantly improves the training speed and recognition speed of the image.

2. Tire Character Recognition System Design Scheme

2.1. Overall Plan

Based on the inspection requirements, the overall system solution is determined. The tire to be inspected is placed on a pallet, which drives the tire to the visual inspection area, where the embossed characters on the side of the tire are detected, and the character information is extracted. Based on the detection information, the tire status is determined, and the tires are classified and stored. The workflow is shown in the Figure 1:

The overall structural design scheme is as follows (Figure 2):

2.2. Hardware Selection

The diameter size of the tire is d = 600 mm. To fully display the entire image of the tire, the field of vision is 800 mm. The height of a single character is about 6 mm. To ensure accuracy, assuming that the actual size of each millimeter needs to contain 5 pixels, then the resolution of the final camera cannot be less than 4000 * 4000. A 25-megapixel camera is chosen. After determining the working distance, a lens is selected to match it. Because the character color on the tire is the same as the non-character area and is raised upward, a side light is used to highlight the character area as much as possible. As shown in Figure 3, the raised character causes the reflection of the light. The light that enters the lens is different from the non-character light; that is, the gray of the character area and the non-character area are different. The lighting system is designed, and the LED light source is used to provide light to the entire tire. It is placed around the tires, and the construction system platform is set up as shown in Figure 4.

3. Machine Visual Processing

3.1. Image De-Backgrounding

The image collected with the industrial camera is shown in Figure 5. The color of the tire is obviously different from the background color. Therefore, the grey histogram of the image is analyzed, and the tire image is separated from the background through the difference in grayscale value. The grayscale distribution of the overall image is shown in Figure 6. The adapting threshold method [13] obtains tire images (see Figure 7) and then subtracts them from the original image. Tire images that are removed from the background are shown in Figure 8.

3.2. Image Enhancement

3.2.1. Grayscale Transformation

A common method of image enhancement is grayscale transformation. The principle can be understood as a template to scan each pixel in the original image and determine the maximum and minimum of the grayscale values of the pixels in the template, making this difference the value of the pixel point at the center of the template [14,15]. Its main function is to overcome the contrast of the image and to overcome the exposure degree of insufficient or excessive exposure due to the imaging. Assuming that the original image

f (x, y)

has a grayscale range of

[a, b]

, it is desired that the transformed image be

g (x, y) .

The grey area of the transformed image is extended to

[c, d]

, then the calculation is made using Equation (1) [16]:

g (x, y) = \{\begin{array}{l} c & 0 \leq f (x, y) < a \\ \frac{d - c}{b - a} \cdot f (x, y) + c & a \leq f (x, y) < b \\ d & b \leq f (x, y) \leq M f \end{array}

(1)

where Mf represents the maximum value of f (x, y).

Tire images treated using grayscale transformation are shown in Figure 9:

As the characters do not differ much from the tire color, and the tire is large, the character area may be under-illuminated or unevenly illuminated and may not segment the characters well, so Retinex theory [17] is invoked to solve the illumination problem and achieve image enhancement.

3.2.2. Retinex Theory

Retinex was proposed by Edwin H. Land in 1963, its name a portmanteau of Retina and Cortex. The Retinex theory pointed out that the incident light determined the size of the dynamic range of all pixels in the image, and the unchanging reflex coefficient of the object itself determined the inherent attributes of the image. In other words, the image we see is formed by the light reflected by the reflection coefficient of the incident light. As shown in Figure 10, the final formula can be expressed as:

S (x, y) = L (x, y) \cdot R (x, y)

(2)

where

L (x, y)

represents the light image,

R (x, y)

indicates the reflex image of the object, and the

S (x, y)

indicates the reflected light image that the human eye can receive.

Obviously, if an image is considered to be made up of irradiated and reflected light, the basic idea of Retinex image enhancement is to remove the influence of the incident light and retain the reflective properties of the object itself.

3.2.3. Single Scale Retinex (SSR)

The SSR algorithm calculates the grey scale value of the target pixel point in the image, obtained by weighting the pixel values in the region where the target point is the center, and the proportional size of the weights is determined by the surround function (Gaussian function). If the original image is

S (x, y)

, the reflectance image is

R (x, y),

and the luminance image is

L (x, y)

, then the SSR is calculated as follows [18]:

r (x, y) = \log R (x, y) = \log S (x, y) - \log L (x, y) = \log S (x, y) - \log [G (x, y) * S (x, y)]

(3)

where * represents the convolution of the Gaussian function and the original image. The Gaussian function is:

G (x, y) = λ e x p [- \frac{x^{2} + y^{2}}{{2 σ}^{2}}]

(4)

Furthermore, λ is subject to the following conditions:

\iint G (x, y) d x d y = 1

(5)

The algorithm flowchart is shown in Figure 11:

L (x, y) represents the low-frequency part of the image, and R (x, y) represents the high-frequency part of the image. The SSR algorithm aims to retain high-frequency components and filter off the low-frequency component.

The SSR algorithm steps are as follows:

(1): Read the original image data S (x, y) to convert the integer data to Double.
(2): Determine the size of the scale parameter σ and determine the λ value of the condition.
(3): Calculate r (x, y) according to Formula (3).
(4): convert r (x, y) from the paired domain to the real domain R (x, y).
(5): Linear stretching treatment of R (x, y) and output images.

3.2.4. Improved SSR Algorithm

Although the SSR algorithm makes the characters look clearer to a certain extent, there are still problems. For example, the character areas are blurred and have little difference from the tire background. This is due to the uneven lighting of the characters and the tire background, which leads to the degradation of detail after the same filtering process is applied. To ensure that the characters are clearer, an improved SSR algorithm is proposed.

On the basis of the SSR algorithm, the Gamma transformation is increased, which enhances the details of the image, and increases the grayness difference between characters and tires. After filtering, it can still ensure large differences and the easy follow-up processing of characters. Gamma transformation is also called power-law transformation. It can correct images with high or low gray levels and enhance the contrast of the image [19]; it is defined as follows:

s = c r^{γ}

(6)

The value range of r is [0,1], and the value of c and γ is more than 0.

The improved SSR algorithm process is shown in Figure 12:

The comparison results of images processed with the SSR algorithm and improved SSR algorithm are as follows (Figure 13 and Figure 14):

3.3. Polar Coordinate Conversion

The characters on the tires are distributed along the radial direction of the tire, in the circular area, which is difficult to extract and identify directly. Therefore, the tire ring diagram is expanded as a rectangular diagram, and the dual-linear interpolation is used to change the character ring distributed structure through polar coordinate changes [20], as shown in Figure 15. As the distance between the character region and the inner and outer edges of the tire is fixed, the original point of the tire ring is used as the coordinate transformation. According to the radius of the inner edge of the tire, the radius value D of the circle in which the character region is located can be determined. The width of the rectangle is the length of the rectangular length of the diameter value of the character area and the circuit with the radius of activity and the radius. Tire expansion map is shown in Figure 16.

3.4. Template Matching

Tire identification positioning is an important prerequisite for successfully identifying character information. By positioning the identifier character, you can obtain accurate location information for identifying characters, which provides a guarantee for subsequent obtaining of character information. According to the rules of tire characters, the relative distance of the production date and the “DOT” logo to be identified are fixed. Because the “DOT” characters exist on different batches of tires, the DOT logo information can be used to locate and obtain the location of the date of manufacture, which can then be identified.

3.4.1. Template Acquisition

Before the template matching starts, a template is required. One must draw the ROI (template image area) in the collected image through a human–computer interaction interface, identify the character contour in the ROI area, and store it as a template. This is shown in Figure 17.

3.4.2. Template-Based Template Matching

Normalized Cross Correlation (NCC) is a typical grayscale correlation-based matching method with the advantages of being less affected by illumination variations and unaffected by scale factor errors [21]. The matching steps are as follows:

Step 1. As shown in Figure 18, suppose the size of the search map S is N × N, the size of the model image T is M × M, and N > M; the template T is overlaid on the search map S, the search map under the template overlay becomes the submap

S^{i, j}

; and i, j are the coordinates of the upper left pixel point of this subgraph in the S-graph, called the reference point.

Step 2. The search graph was searched from left to right, from top to bottom, and the similarity measure R (i, j) of each subgraph, S^{i, j}, and template image T were calculated separately [22].

R (i, j) = \frac{\sum_{m = 1}^{M} \sum_{n = 1}^{M} [S^{i, j} (m, n) \times T (m, n)]}{\sqrt{\sum_{m = 1}^{M} \sum_{n = 1}^{M} [S^{i, j} (m, n)]^{2}} \sqrt{\sum_{m = 1}^{M} \sum_{n = 1}^{M} [T (m, n)]^{2}}}

(7)

Step 3. The larger the value of the NCC coefficient, the greater the correlation between the the template and the subgraph. The value range of R (i, j) is 0 to 1, the maximum value of R (i, j) is obtained, and the (i, j) position is recorded; that is, the position information with the greatest similarity is obtained, and the matching region is extracted.

3.4.3. Gaussian Pyramid-Based Template Matching Acceleration

The normalized correlation function algorithm (NCC) is used to match with high accuracy but is computationally intensive. To improve the processing speed of the algorithm, NCC is combined with the idea of image Gaussian pyramid hierarchical matching to increase the matching rate.

Gaussian pyramids are sampled images using Gaussian smoothing and subpixel sampling [23,24]; that is, i + 1 layer Gaussian images can be obtained by smoothing and subsampling the i-th layer Gaussian pyramid. The Gaussian pyramid is constructed by performing the following steps:

Step 1. The bottom layer of the image pyramid is the original image and is represented by G0.

Step 2. Convolution of G0 with a Gaussian kernel, using a 5 × 5 Gaussian kernel, as follows:

w (m, n) = \frac{1}{256} [\begin{matrix} 1 & 4 & 6 & 4 & 14 \\ 4 & 16 & 24 & 16 & 4 \\ 6 & 24 & 36 & 24 & 6 \\ 4 & 16 & 24 & 16 & 4 \\ 1 & 4 & 6 & 4 & 1 \end{matrix}]

(8)

Step 3. The second-layer image, G1, is obtained by removing all even rows and even columns of the Gaussian low-pass filtered image, and the layer

G_{i + 1}

The formula for the Gaussian pyramid image of the layer located at the point (x,y) is given in Equation (9):

G_{i + 1} (x, y) = \sum_{m = - 2}^{2} \sum_{n = - 2}^{2} w (m, n) G_{i} (2 x + m, 2 y + n)

(9)

Step 4. Repeat Step 3, and obtain the whole image pyramid by iterating the input image Gi as shown in the figure, the size of the iterated image

G_{i + 1}

. The size of the iterated image is 1/4 of Gi, and the resolution is 1/2 of Gi, which significantly reduces the computational effort when analyzing it. The Gauss Pyramid Significance is in Figure 19.

Step 5. The pyramid was taken to 5 layers, and the NCC algorithm was used on the top G4 image to find the position with the greatest similarity, then the inverse transformation was passed to the bottom original image to find the best matching position, and the matching result was obtained as shown in the Figure 20.

3.4.4. Template Matching Verification

To verify the accuracy of NCC template matching based on pyramid acceleration, the expanded images are spliced, and random pixels and noise interference are added. The self-built data set contains 300 sample data sets for matching experimental verification. Some data samples and template-matching results are as follows Figure 21. The template matching Indicates the matching result is in Table 1.

4. Character Recognition

After conventional machine vision processing, the chronological information image to be detected is segmented, and the characters are decomposed to obtain the following results, as shown in Figure 22. The segmented characters are recognized with a neural network reading the digital information.

4.1. Data Set Preparation

Due to the limited tire samples, the tires were sampled with different rotation angles to increase the number of tire samples, then the positioning of segmented characters was carried out, the character images were saved, and the character images were enhanced by adding random pixels and equal scaling. The training set consisted of ten sets of image samples from zero to nine. A total of 80% of the images were selected as the training set and 20% as the validation set.

4.2. Classic LeNet Network Structure

The classical LeNet network structure was proposed in 1994 by Yann LeCun et al. The network structure served as one of the first convolutional networks [25] which gave a great impetus to the advancement and development of the field of deep learning, and went through many more years of research and many successful iterations until the final variant was completed in 1998. This variant was named LeNet-5 and has very good results for handwritten character recognition and is one of the most representative models of early convolutional neural networks. The model structure diagram is shown in Figure 23.

As can be seen in the figure, the traditional LeNet-5 model network structure has a total of seven layers of network structures, including two convolutional layers, two pooling layers, two full connection layers, and one output layer. Each layer contains different quantities of training parameters; the image data is first involved in two convolutions and pooling and then enters the full connection layer; the input layer input image is

32 \times 32

. The input layer is a single-channel grayscale map of pixel size, and the output layer is an output of 10 neurons that perform the final classification recognition.

Convolution and pooling are the most important parts of a convolutional neural network. The convolutional layer is also called a feature extract layer. Through the convolutional computing of the convolution layer, the picture contains the feature value of various pieces of information. In addition, the convolutional layer can also be regarded as enhancing the input image feature value using the filter. One must slide on the top to get the final feature map; the computing formula of the convolution is as follows.

x_{j}^{l} = f (\sum_{i \in k_{j}} X_{i}^{l - 1} * k_{i j}^{l} + b_{j}^{l})

(10)

The pooling layer is also called a downsampling layer. The data is reduced to the data through the pooling operation. The size of the pixel value of the pooling window and the step-long pixel value of the characteristic diagram is cut. The size is compressed to reduce the calculation amount and extract the main features of the image. The LeNet-5 Parameters of each layer is in Table 2.

4.3. Improved LeNet-5 Network

The LeNet-5 network has a simple structure, calculates easily, and has low requirements on hardware configuration, which can meet the premise of a lightweight network to achieve fast recognition of targets. However, the traditional LeNet-5 network still has shortcomings in recognition accuracy. Therefore, an improved LeNet-5 network is proposed to improve recognition accuracy and reduce computational effort as much as possible.

4.3.1. Change the Input Image Size

The input image size is normalized to 32 × 16 pixels. According to actual measurements, the height-to-width ratio of the characters imprinted on the side of the tire is 2:1, compared with the 32 × 32 input image size of the LeNet-5 network structure, changing the input image to 32 × 16 is more in line with the height and width ratio of the actual characters, avoiding information loss when the size is normalized to the original input size, and also reducing the size of the input size and improving the model’s computational speed.

4.3.2. Relu Activation Function

We propose using Relu as the activation function. The mathematical formula for the Relu function is as follows:

R e l u = \{\begin{matrix} x (x > 0) \\ 0 (x \leq 0) \end{matrix}

(11)

The classical LeNet-5 network uses Sigmoid as the activation function, which has soft saturation and is prone to gradient disappearance, so the activation function of the convolutional layer is improved, and the activation function is changed to the Relu function. It can be seen from its characteristics that there is hard saturation at x < 0, and the gradient does not decrease at x ≥ 0, which can alleviate the gradient reduction problem and better convergence. The result is that the neurons in the network have a certain sparse activation, effectively inhibiting the drawbacks of the Sigmoid function and improving recognition speed. Moreover, the Relu function only needs a threshold to obtain the activation value, which has the characteristics of less computation and simple calculation.

4.3.3. Asymmetric Convolution

Classic LeNet-5 networks use a single layer of large convolution kernels, which increase the amount of computation and take a long time. The two-layer asymmetric convolution has fewer parameters than a single larger convolution kernel, which speeds up the operation and increases the depth of the network. At the same time, the double-layer asymmetric convolution has more non-linear transformations than the single-layer large convolution, which gives the network stronger feature extraction capabilities.

Taking the input image with the dimensions of 5 × 5 as an example, the traditional 3 × 3 convolution kernel requires 81 operations to obtain the feature of Figure 2, while the asymmetric convolution kernel using 3 × 1 and 1 × 3 requires a total of 72 operations. In this way, the number of parameters can be reduced, overfitting can be reduced, and the expression ability of the model can be enhanced.

4.3.4. Batch Normalization

The classic LeNet-5 network does not process the data of the middle layer of the network, resulting in discrete intermediate data, which is not conducive to network training, so the batch normalization layer is added after each convolutional layer. During the training process, the data in each Batch_size is normalized and converted to a state where the mean is zero, and the variance is one. In the process of neural training, the convergence speed of the model network can be accelerated, the training speed can be improved, and the accuracy of the model can be improved.

4.3.5. GAP Replaces the Flatten Layer

The traditional LeNet-5 network has been classified into SoftMax with a Flatten full connection, resulting in problems such as an excessive number of full connection parameters, low training speed, and easy overfitting. To achieve the purpose of reducing parameters and reducing training duration, global average pooling is used instead of the Flatten layer. The average pooling of the full picture of each feature diagram means that each picture only gets one output, which can have much fewer network parameters.

4.3.6. SVM Classifier Instead of SoftMax

To further improve the comprehensive recognition ability of features, the SVM classifier is used to replace the SoftMax classifier in LeNet-5 at the output layer. SVM uses structural risk minimization theory to construct optimal segmentation hyperfields in feature space to achieve global optimum, while the SoftMax regression model uses gradient descent to update parameters and seek the optimal probability combination by finding the least cost function. Apparently, the SVM does not calculate the cost function, omitting the process of updating parameters; convergence is faster. The improved LeNet-5 Parameters of each layer is in Table 3.

5. Experimental Verification

5.1. Related Environment Configuration

The configuration of our experimental hardware was an AMD^®ryzen 7 3700X 8-core [email protected], 16G RAM. The software environment for the experiment used Windows 11/64-bit for the host operating system and MATLAB2022b as the software platform.

5.2. Network Training

For the training of the classic LeNet-5 network and the improved LeNet-5 network, the digital samples were randomly divided into a training set and verification set according to the ratio of 8:2. The total number of images in the training set was 13,360, and the total number of images in the verification set was 3340. We randomly selected 1670 images in the image library as a testing set. Since different Batch_size parameters have an impact on the training speed and results, we tested different Batch_size paramaters (32, 64, 128) and learning rates to find the most suitable match. Finally, the Batch_size parameter was 128 for training. The Adam function was selected as the optimizer, and the learning rate was set to 0.001. One hundred epochs were set to end the training.

5.3. Simulation Experiments

The network parameters are updated by the cross-validation method, and the accuracy and cross-entropy loss functions of the training and validation sets are shown in Figure 24.

It can be seen that the accuracy of the training set of the improved LeNet-5 model gradually increases with the number of iterations and tends to stabilize after 40 iterations. The training accuracy reaches 95.9% after the end of the iteration, and the accuracy of the verification set reaches 99.5%. It can be seen from the loss function curve that the verification set loss function remains stable at about 0.3 after 30 iterations.

The recognition accuracies of the improved LeNet-5 network structure and the classical LeNet-5 network structure training set and verification set are shown in Figure 25, and the improved model can achieve higher accuracy in fewer iterations than the classical LeNet-5 network. At 20 iterations, the improved LeNet-5 network structure model achieves 90% accuracy, while the LeNet-5 network only has about 75% accuracy. After about 70 iterations, the structural accuracy of the two models tends to flatten, and finally, after 100 iterations, the training set improves the LeNet-5 network structure recognition accuracy to 95.9%, while the classical LeNet-5 network structure can only reach 89.6%, increasing the accuracy rate by 6.9%. The accuracy of the improved LeNet-5 network structure and the classical LeNet-5 network structure of the verification set are 99.5% and 95.1%, respectively. Obviously, the improved LeNet-5 network structure recognition accuracy is higher. In terms of time, the training time of the classical LeNet-5 network structure is 40 min 46 s, and the training time is 39 min 12 s due to the improvement in the LeNet-5 network structure by reducing the parameters. The average time to recognize an image is 0.0657 s for the classical LeNet-5 network structure and 0.0594 s for the improved LeNet-5 network structure.

6. Conclusions

(1): A combination of traditional machine vision and a convolutional neural network is used for tire character detection, and an improved SSR algorithm is proposed to improve the contrast between the character and the tire, which facilitates character extraction.
(2): The NCC template matching method based on Gaussian pyramid acceleration is proposed to locate the location of the tire “DOT” character, and self-built image samples are used to verify the feasibility of template matching.
(3): The classical LeNet-5 network is improved by changing the traditional convolutional layer to asymmetric convolution to improve the network’s ability to change non-linearly, adding a BN layer to reduce parameters, replacing SoftMax with an SVM classifier to accelerate convergence, and replacing the Flatten layer with a global pooling layer to reduce overfitting. The experimental results show that the improved LeNet-5 network converges faster and has 6.9% higher accuracy, and the training time is reduced by 1 min 34 s.

In summary, the algorithm proposed in this paper has improved performance and only needs to run on a CPU, which greatly reduces the equipment cost. We still need to further optimize the network to make it lighter, reduce the configuration requirements of the network, and increase the number of images for further testing. The final implementation does not require pre-processing of the image, and the characters are recognized directly by the neural network.

Author Contributions

Z.G.: Conception of the study, proposition of the theory and method, supervision; J.Y.: literature search, figures, manuscript preparation, and writing; X.Q.: programming, testing of exiting code components; Y.L.: data collection. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Liaoning Provincial Education Department Project (Grant No. LJKZ0114).

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Kazmi, W.; Nabney, I.; Vogiatzis, G.; Rose, P.; Codd, A. An Efficient Industrial System for Vehicle Tyre (Tire) Detection and Text Recognition Using Deep Learning. IEEE Trans. Intell. Transp. Syst. 2021, 22, 1264–1275. [Google Scholar] [CrossRef]
Zhou, S.; Chen, Q.; Wang, X. HIT-OR3C: An opening recognition corpus for Chinese characters. In Proceedings of the International Workshop on Document Analysis Systems, Boston, MA, USA, 9–11 June 2010. [Google Scholar]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. Yolov4: Optimal speed and accuracy of object detection. Comput. Vis. Pattern Recognit. 2020, arXiv:2004.10934. [Google Scholar]
Wang, Q. Study on Segmentation and Recognition Technology of Low Quality Pressed Characters. Master’s Thesis, Shandong University, Jinan, China, 2015. [Google Scholar]
Zhang, C. Tyre Imprint Character Recognition Based on Deep Learning. Master’s Thesis, Nanjing University of Posts, Nanjing, China, 2021. [Google Scholar]
Li, J. Tire DOT Information Recognition Based on Deep Learning. Master’s Thesis, Guangdong University of Technology, Guangzhou, China, 2022. [Google Scholar]
Li, Y. Study on the Embossed Character Recognition System Based on Machine Vision. Master’s Thesis, Guangdong University of Technology, Guangzhou, China, 2017. [Google Scholar]
Han, J.; Yao, J.; Zhao, J.; Tu, J.; Liu, Y. Multi-Oriented and Scale-Invariant License Plate Detection Based on Convolutional Neural Networks. Sensors 2019, 19, 1175. [Google Scholar] [CrossRef] [PubMed]
Liu, Z.; Cai, Y.; Chen, L.; Wang, H.; He, Y. Vehicle license plate recognition method based on deep convolution network in complex road scene. Proc. Inst. Mech. Eng. Part D J. Automob. Eng. 2019, 233, 2284–2292. [Google Scholar] [CrossRef]
Kim, S.; Jun, D.; Kim, B.G.; Lee, H.; Rhee, E. Single Image Super-Resolution Method Using CNN-Based Lightweight Neural Networks. Appl. Sci. 2021, 11, 1092. [Google Scholar] [CrossRef]
Li, W.; Zhang, L.; Wu, C.; Cui, Z.; Niu, C. A new lightweight deep neural network for surface scratch detection. Int. J. Adv. Manuf. Technol. 2022, 123, 1999–2015. [Google Scholar] [CrossRef] [PubMed]
Yang, Y. Improvement of Differential Evolution Algorithm and Its Application on Multi-threshold Image Segmentation. Master’s Thesis, Guangxi University, Nanning, China, 2022. [Google Scholar]
Bi, X.; Li, M.; Zha, F.; Guo, W.; Wang, P. A non-uniform illumination image enhancement method based on fusion of events and frames. Optik 2023, 272, 170329. [Google Scholar] [CrossRef]
Cheng, J.; Xie, Y.; Zhou, S.; Lu, A.; Peng, X.; Liu, W. Improved Weighted Non-Local Mean Filtering Algorithm for Laser Image Speckle Suppression. Micromachines 2022, 14, 98. [Google Scholar] [CrossRef] [PubMed]
Yu, Z.; Zhang, Y.; Lian, F.; Chen, D.; Wang, H. A Research of Gray-scale Transformation Based on Digital Image Enhancement. Electron. Qual. 2009, 267, 18–20. [Google Scholar]
Land, E.H. The Retinex Theory of Color Vision. Sci. Am. 1978, 237, 108–128. [Google Scholar] [CrossRef] [PubMed]
Li, X.; Zhu, J.; Liu, J.; Bi, L. An Adaptive SSR Method for Foggy Low Illumination Image Enhancement. Comput. Appl. Softw. 2022, 39, 233–239. [Google Scholar]
Bhandari, A.K.; Kumar, A.; Singh, G.K.; Soni, V. Dark satellite image enhancement using knee transfer function and gamma correction based on DWT–SVD. Multidimens. Syst. Signal Process. 2016, 27, 453–476. [Google Scholar] [CrossRef]
Dufour, V.; Wiest, L.; Slaby, S.; Auger, L.; Cardoso, O.; Curtet, L.; Pasquini, L.; Dauchy, X.; Vulliet, E.; Banas, D. Miniaturization of an extraction protocol for the monitoring of pesticides and polar transformation products in biotic matrices. Chemosphere 2021, 284, 131292. [Google Scholar] [CrossRef] [PubMed]
Fang, X. Research on Template Matching Algorithm for Deformation Image. Master’s Thesis, Huazhong University of Science and Technology, Wuhan, China, 2019. [Google Scholar]
Wu, P.; Xu, H.; Son, W. A fast NCC image matching algorithm based on wavelet pyramid search strategy. J. Harbin Eng. Univ. 2017, 38, 791–796. [Google Scholar]
Brunelli, R.; Poggiot, T. Template matching: Matched spatial filters and beyond. Pattern Recognit. 1997, 30, 751–768. [Google Scholar] [CrossRef]
Ryan, M.; Hanafiah, N. An examination of character recognition on ID card using template matching approach. Procedia Comput. Sci. 2015, 59, 520–529. [Google Scholar] [CrossRef]
Zhou, Q. Research on Methods of IIoT Intelligent Intrusion Detection Based on Deep Learning. Master’s Thesis, Chongqing University of Posts and Telecommunications, Chongqing, China, 2020. [Google Scholar]

Figure 1. Working flow chart.

Figure 2. The overall structural design scheme: (a) Overall system modeling design; (b) Visual system model.

Figure 3. Side lighting diagram.

Figure 4. Experimental system platform.

Figure 5. Collected image.

Figure 6. Gray straight recharge.

Figure 7. Adaptive threshold dual value.

Figure 8. Tire image after the background.

Figure 9. Gray transformation treatment.

Figure 10. Retinex theory.

Figure 11. SSR algorithm flow chart.

Figure 12. Improved SSR algorithm flow chart.

Figure 13. Image of SSR algorithm processing.

Figure 14. Improved SSR algorithm processing image.

Figure 15. Tire deployment schematic.

Figure 16. Tire expansion map.

Figure 17. Making a matching template (a) is character image. (b) is the created template edge image.

Figure 18. NCC algorithm matching indication.

Figure 19. Gauss Pyramid Significance.

Figure 20. Matching result.

Figure 21. Template matching verification, where (a–d) are the original map, (e–h) are the corresponding match result.

Figure 22. The result of character decomposition.

Figure 23. Classic LeNet-5 model structure diagram.

Figure 24. Improved accuracy and loss of LeNet-5.

Figure 25. Comparison of accuracy (a) Training Collection (b) Verification Collection.

Table 1. Template matching Indicates the matching result.

Total Number of Samples (Sheets)	Number of Matching Successes (Sheets)	Matching Success Rate (%)	Average Matching Time(s)
300	287	95.667%	1.2058

Table 2. LeNet-5 Parameters of each layer.

Layer (Type)	Filters	Kernel Size	Output Shape
Conv1_1	32	(3,3)	(30,14) 32
Max_pooling_1			(15,7) 32
Conv2_1	32	(3,3)	(13,5) 32
Max_pooling_2			(13,5) 32
Conv3_1	32	(3,3)	(7,3) 32
Max_pooling_3			(5,1) 32
Flatten			9248
Dense_1			500
Dense_2			10

Table 3. Improved LeNet-5 Parameters of each layer.

Layer (Type)	Filters	Kernel Size	Output Shape
Conv1_1	32	(3,3)	(30,14) 32
BN			(30,14) 32
Conv1_2	32	{(3,1), (1,3)}	(30,14) 32
BN			(30,14) 32
Max_pooling_1			(15,7) 32
Conv2_1	32	(3,3)	(13,5) 32
BN			(13,5) 32
Conv3_2	32	{(3,1), (1,3)}	(11,3) 32
BN			(11,1) 32
Max_pooling_3			(5,1) 32
GlobalAveragePooling			32
Dense_1			500
Dense_2			10

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Guo, Z.; Yang, J.; Qu, X.; Li, Y. Fast Localization and High Accuracy Recognition of Tire Surface Embossed Characters Based on CNN. Appl. Sci. 2023, 13, 6560. https://doi.org/10.3390/app13116560

AMA Style

Guo Z, Yang J, Qu X, Li Y. Fast Localization and High Accuracy Recognition of Tire Surface Embossed Characters Based on CNN. Applied Sciences. 2023; 13(11):6560. https://doi.org/10.3390/app13116560

Chicago/Turabian Style

Guo, Zhongfeng, Junlin Yang, Xinghua Qu, and Yuanxin Li. 2023. "Fast Localization and High Accuracy Recognition of Tire Surface Embossed Characters Based on CNN" Applied Sciences 13, no. 11: 6560. https://doi.org/10.3390/app13116560

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Fast Localization and High Accuracy Recognition of Tire Surface Embossed Characters Based on CNN

Abstract

1. Introduction

2. Tire Character Recognition System Design Scheme

2.1. Overall Plan

2.2. Hardware Selection

3. Machine Visual Processing

3.1. Image De-Backgrounding

3.2. Image Enhancement

3.2.1. Grayscale Transformation

3.2.2. Retinex Theory

3.2.3. Single Scale Retinex (SSR)

3.2.4. Improved SSR Algorithm

3.3. Polar Coordinate Conversion

3.4. Template Matching

3.4.1. Template Acquisition

3.4.2. Template-Based Template Matching

3.4.3. Gaussian Pyramid-Based Template Matching Acceleration

3.4.4. Template Matching Verification

4. Character Recognition

4.1. Data Set Preparation

4.2. Classic LeNet Network Structure

4.3. Improved LeNet-5 Network

4.3.1. Change the Input Image Size

4.3.2. Relu Activation Function

4.3.3. Asymmetric Convolution

4.3.4. Batch Normalization

4.3.5. GAP Replaces the Flatten Layer

4.3.6. SVM Classifier Instead of SoftMax

5. Experimental Verification

5.1. Related Environment Configuration

5.2. Network Training

5.3. Simulation Experiments

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI