1. Introduction
Tires are the only medium for cars to contact the ground. According to the World Health Organization, tires contribute to 40% of all traffic accidents [
1]. Therefore, the quality of tires is important for the driving safety of cars and tire quality inspection is critical in the tire production. In the tire industry, the quality inspection of tires in many factories still adopts the quality inspection method of manual visual observation. This method is inefficient, subjective, labor-intensive, and has a high missed inspection rate, which can not meet tire manufacturing’s automation requirements. However, some existing automated defect detection methods are based on manual design, which needs to design algorithms for each defect, which is complex and not robust [
2,
3]. In recent years, automatic inspection based on deep learning has been developed for industrial inspection applications such as steel [
4], fabrics [
5], solar batteries [
6], etc. Many scholars have applied neural networks to traffic cars [
7,
8,
9]. Specific to tire production scenarios, there are more and more tire defect detection methods based on deep learning [
10,
11,
12]. This detection technique is more effective than manual detection. Compared with traditional visual inspection techniques, it needs less prior knowledge of designers and does not need to design inspection algorithms for each defect.
Since 2012, after the team led by Hilton proposed the convolutional neural network (CNN)—Alexnet—and won the imageNet (large scale visual recognition challenge championship), the object detection based on neural network has developed rapidly. In [
13], the authors propose the faster regions with CNN features (Faster RCNN), which is a high-precision two-stage detection network. The authors propose the region proposal network (RPN), which can use a convolutional neural network to make candidate regions to distinguish foreground and background, replacing the traditional Selective Search [
14]. It speeds up the detection speed and precision. In [
15], the authors propose the feature pyramid network (FPN). FPN can fuse the high resolution of low-level features and high semantic information of high-level features output from the backbone network to improve object detection. Since then, more scholars have continued to explore FPN. In [
16], the authors propose PAFPN, which add a bottom-up secondary fusion network to the FPN. Based on FPN, PAFPN adds a bottom-up fusion path to improve the entire feature hierarchy, making the underlying positioning signal accurate and shortening the information path between the lower and uppermost features. It improves the effect of object detection and region segmentation. In COCO 2017, PAFPN won the champion of instance segmentation and the second place of object detection. In [
17], the authors propose a balanced feature pyramid (BFP), it scales the feature map to a uniform size and accumulates the average, and then refines the averaged features through a non-local neural network. Then, the fused feature maps are for four feature maps with the same size as the original feature layer, and then added to the feature maps of the original feature layer to achieve the effect of feature improvement. In [
18], in the FPN of the Yolov3 network, the authors combine the feature maps of each stage again with the feature maps of the other three stages, and use attention to control the weight of feature map fusion for fusion features at different stages in FPN. This method achieves a balance between speed and precision. In [
19], Google adopts the reinforcement learning method to search the NAS-FPN on the RetinaNet network. It shows strong object detection performance on Imgnet, but needs to spend time on many of TPUs to find the best architecture. In [
20], the authors propose a feature pyramid grid (FPG), which fuses the feature maps horizontally and vertically multiple times to form a unified feature pyramid grid. It has higher precision than the FPN on the detection and segmentation. The above network has improved the effect of object detection, but it is only on the COCO dataset or the Image dataset, and it is not specific to a certain industrial scene. When the neural network is applied to a specific industrial scene, it should be optimized for this scene.
2. Background and Related Work
In the tire industry, as a rubber product, tires are composed of composite materials. There are many types of tire defects, and bubbles are one of the most common quality defects in tires. The location where the tire crown is prone to bubbles is mainly between the tread and the belt layer, between different belt layers, between the belt layer and the carcass, the end of the belt layer, and the joint between the belt layer and the sidewall. It has a great potential harm to the safety performance of tires, and it is easy to cause tire shoulders, delamination, and even puncture. Therefore, tire bubble defect detection is critical in the tire production.
Because the COCO or Image dataset is different from the tire defect dataset detection object, the COCO or Image dataset includes things common in nature, such as people, cars, cats, dogs, etc. However, a tire defect dataset includes grayscale pictures, which have the characteristics of similar background to the object, low contrast, and small object size. Therefore, tire bubble defect detection cannot directly use classical neural networks that perform well on COCO or Image datasets. It is necessary to adapt and improve the neural network according to the characteristics of the object to make it better in detecting tire crown bubble defects. For the bubble defect detection of tire crown, there are problems when object and background are aliased and difficult to distinguish, and the small size of the detected object affects the detection effect. Therefore, the neural network needs to adjust to solve these difficulties and make it better for tire production. It is important to improve the safety of tires.
Over the past few years, many researchers have applied neural networks to nondestructive testing of tires. In [
10], the authors propose a tire defect detection method based on a concise semantic segmentation network. They propose segmentation networks and compact convolutional neural networks for tire defect detection, resulting in smaller model size and faster detection. In [
11], the authors propose a tire image defect detection method based on a fully convolutional neural network. They replaced the fully connected layer with a convolutional layer in Vgg16 and upsampled and summed each feature map. Finally, it produces an output of the same size as the original image, which is used to locate and segment defects in the image. In [
21], the authors introduce a variable convolutional neural network into the Faster RCNN, adopt a multi-scale RPN, and use background features to reorder the candidate boxes to improve tire detection’s precision. In [
22], the authors propose an algorithm for tire defect detection and classification based on the RPN. For the problem of large span of tire defect scales, they use the different layers of the convolutional neural network to hierarchically design defects of different scales. This method improves the effect of tire object defect detection.
The above neural networks have improved the effect of tire defect detection, but there are still some problems. First, most researchers study tire X-ray defect images, and there are fewer studies on speckle interference tire defect images. Speckle interference tire defect images have lower contrast, and the object is more similar to the background. Second, some networks include image segmentation. For industrial production, pixel-level segmentation of defects is not needed but only precision detection of defects. This module increases the complexity and computation of the network, which is not conducive to the actual network model deployment.
In this paper, based on the difficulty of detecting tire crown bubble defects, we design a multi-directional integration feature pyramid network called tyre-FPN (TY-FPN). Cross-validation on the tire crown bubble defect dataset with mAP (mean average precision) [0.5:0.95] and AP (average precision) 0.5 as evaluation metrics. AP0.5 means that, when the intersection over union (IOU) of the prediction box and the ground-truth box is greater than 0.5, the prediction box recorded as correct detection; under this condition, the average precision was obtained for all detected images. In addition, mAP [0.5:0.95] takes 10 values of AP’s IOU from 0.5 to 0.95 every 0.05 to calculate the mean AP. The higher the mAP [0.5:0.95] and AP0.5, the higher the detection precision and the lower the missed detection rate. At the same time, the higher the mAP [0.5:0.95], the bigger the precision defect’s location and size. The experimental results show that: mAP [0.5:0.95] and AP0.5 increase by 2.08% and 2.4% respectively. The detection effect of TYFPN is significantly better than that of FPN.
3. Improved Network Algorithm Based on Faster RCNN-FPN
Our work built on Faster RCNN-FPN, and we improved FPN to make it better for tire bubble defect detection scenarios.
3.1. Detection Process and Image Characteristics of Tire Crown Speckle Interference Bubble Defect Image
Speckle interferometry is to use coherent light to brighten the surface of a rough object, and the speckle formed in space can detect the displacement and deformation of the object surface. This technology has been widely used in tire crown bubble defect detection [
23]. The speckle interference tire bubble defect detection is as follows: First, all the detections are installed in a room to apply different pressures to the tire to perform segmental detection. Then, the surface irradiated by coherent light is recorded with a CCD camera and the image to the computer. Finally, a computer preprocesses the image and detects a bubble defect.
Figure 1 is a tire crown speckle interference bubble defect image. The size of each image is only 67 × 67, and the proportion of bubbles in the image is less than 50%, and the size of the bubbles is less than 32 × 32, which belongs to the small object detection range. From
Figure 1, we can intuitively see the characteristics of the tire speckle interferometry bubble:
Tire speckle interference images have low contrast and low brightness;
Tire crown bubble defects vary widely: There are many styles of bubble defects;
As shown in
Figure 1a, the bubble defects are very similar to the background and are difficult to distinguish with the naked eye compared with
Figure 1b,c;
As shown in
Figure 1f, the bubble defects are fully manifested and the uncorrelated effect obviously destroys the conditions of speckle interference, making the phase values appear chaotic fringes;
On the whole, tire bubble defects are often small objects, and the small scale of small objects makes the feature pixels easy to weaken or even disappear after multiple pooling in the neural network.
3.2. Improved Network Algorithm
FPN fuses high-level feature map information from top to bottom to low-level feature maps and builds a feature pyramid network. It obtains more feature information and outputs in different feature layers, improving object detection performance. However, when it is directly applied to tire crown bubble defects’ detection, it needs to adjust according to the characteristics of the bubble defects and the difficulty of bubble defect detection. It can not be directly applied to tire bubble defect detection, and the network still has room for improvement:
The high-level feature map of FPN only adds and fuses the adjacent feature maps downward, without upward or cross-scale fusion. Therefore, the high-level feature map does not make full use of the location information of small objects in the low-level feature map. There is still room for improvement in small objects’ detection;
For the tire crown bubble defect dataset, the defects belong to the detection range of small objects, and we do not need to consider large objects’ detection. Therefore, more low-level feature map information can be fused into high-level feature maps.
Based on the above ideas, we designed a multi-directional fusion feature pyramid network named TYFPN. It not only fuses the semantic information of the high-level feature map into the low-level feature map, but also fuses the location information of the low-level feature map into the high-level features. This can make a high-level feature map improve the effect of detecting small objects to improve the overall detection effect. There is an improved bubble defect detection algorithm shown in
Figure 2. The backbone network is ResNet50 [
24]. The backbone network outputs four feature maps of different sizes, and then the feature maps are into TYFPN for feature fusion and feature improvement. TYFPN will output five feature maps, which will enter the RPN network for regional proposal operations, produce many proposal boxes, and then obtain the prediction boxes in RoiHead, filter the detection results through non-maximum suppression (NMS), and finally pass the loss function to calculate the loss.
3.3. TYFPN: Multi-Directional Fusion Feature Pyramid Network
The FPN shown in
Figure 3. It performs 1 × 1 convolution on the four feature maps output by ResNet50, and then performs feature fusion from top to bottom. When high-level feature maps are fused with low-level feature maps, the rich semantic information of high-level feature maps is fused into low-level feature maps, so low-level feature maps obtain more useful information about objects.
The high-level feature map of FPN is effective for detecting large objects, but the bubble defects in the tire crown are often small objects. In addition, FPN only has top-down feature fusion of adjacent feature maps, low-level feature maps obtain semantic information of high-level feature maps, but high-level feature maps do not obtain location information of low-level feature maps. Therefore, there is still some room for improvement in high-level feature map detection of objects. In addition, FPN pays more attention to the feature fusion of adjacent layers. When there is a certain span between the low-level feature map and the high-level feature map fusion, the location information is not necessarily accurate, and its features will be weakened during the fusion [
25].
Based on the above ideas, we fuse the rich location information of the low-level feature map into the high-level feature map to make the positioning more accurate and improve the detection effect of small objects. The improved feature pyramid network shown in
Figure 4. Firstly, ResNet50 outputs [C1 C2 C3 C4] four feature maps for 1 × 1 convolution, and the number of channels is uniformly 256. Then, the [C2 C3] feature layer is upsampled with nearest neighbor interpolation to make [C2 C3] have the same resolution as a [C1] feature layer, and the output feature layers are [S1 S2 S3 S4]. The nearest neighbor interpolation method is as Equation (1):
Among them, , w, , and h are, respectively, the width after enlargement, the width before enlargement, the length after enlargement, and the length before enlargement. The pixel value of the enlarged image pixel point (x y) corresponds to the pixel value of the original image pixel point (/w ∗ x /h ∗ y) (rounded off for decimals). Secondly, the [S1] feature layer adds an [S2] feature layer to become the [M1] feature layer. The [S2] feature layer adds an [S3] feature layer to become an [M2] feature layer. The [M1] feature layer adds an [M2] feature layer results in the [M3] feature layer. The [S4] feature layer is added to the [S1] feature layer to become the [M4] feature layer, and the [S4] feature layer becomes the [M5] feature layer. [M1 M2 M3 M4 M5] feature layer for 3 × 3 convolution, [M2 M3 M4] feature layer downsample, [M2 M3 M4] feature layer’s size becomes 1/2, 1/4, 1/8 of themselves. After processing, [M1 M2 M3 M4 M5] feature layer becomes the [L1 L2 L3 L4 L5] feature layer. Finally, the [L1 L2 L3 L4] feature layer up-sample from bottom to top and add to the adjacent layers, max pooling on the [L5] feature layer. This obtains the [P1 P2 P3 P4 P5] feature layer and processes it in the next module.
TYFPN not only has feature fusion between adjacent layers, but also fusion between spanning layers. There is not only the fusion of high feature maps to low level feature maps, but also the fusion of low feature maps to high level feature maps. It makes full use of the small object information in the lowest feature layer. In the tire crown bubble defect dataset, TYFPN performs better than FPN, and tire crown bubble defects’ detection effect has been significantly improved. The experimental and results will be described in detail in
Section 3.
3.4. Anchor Box Setting and Sample Balance
In Faster RCNN, the anchor box provides region proposals for the ROI (region of interest), and the size and aspect ratio of the anchor box have a great influence on the detection. The setting of the anchor box needs to be based on the distribution of the size and aspect ratio of the objects in the dataset. We performed an analysis on the tire crown bubble defect dataset and the results shown in
Figure 5. From
Figure 5, the size of the object is very small, and the aspect ratio of the ground truth frame of the object is basically distributed around 1, but there are also a small number of objects of other ratios. Therefore, we set the base size of the anchor box to 8 and its aspect ratio to 0.5, 1, 2. For many anchor boxes made by RPN, we process them as follows:
- Step 1
Set each anchor’s mask to −1, indicating that the anchor is neither a positive sample (objects) nor a negative sample (background);
- Step 2
Set anchors’ mask to 0, if maximum IoU with all ground-truths is less than 0.3, indicating negative samples;
- Step 3
Set anchors’ mask to 1, if max IoU with all ground-truths more than 0.7, indicating a positive sample;
- Step 4
Some ground-truths are not assigned to find the anchor with the largest IoU. If the IoU is greater than 0.3, set this anchor as a positive sample;
- Step 5
Limit the number of training samples, balance positive and negative samples, and set the ratio of positive and negative samples to 1:1, for a total of 256 samples;
- Step 6
If the number of positive samples is less than 128, they are filled with negative samples.
3.5. Loss Function
As shown in Equation (2), the loss includes the RPN loss and the ROI loss, and the RPN loss and the ROI loss each include the classification loss and the bounding box loss. The same loss function is used for the classification loss and the bounding box loss for the RPN and ROI. In addition, the classification loss is the cross-entropy loss function as Equation (3) and the bounding box loss is the l1 loss function as Equation (4). In Equation (3),
represents the probability that the
i-th anchor box is predicted to be the true label; when the positive sample
y is 1, the negative sample
y is 0. In Equation (4),
is the predicted value,
is the true value, and
is the absolute value of the difference between the predicted value and the true value:
5. Discussion
In this study, we present a neural network with a multi-directional fusion pyramid for tire defect detection. The method achieves 51.86% mAP [0.5:0.95] and 92% AP 0.5, and has better results with other neural networks with feature pyramids.
Our purpose of using a neural network is to replace manual labor and improve the precision of tire defect detection. In the previous chapter, we demonstrated that using our neural network is effective. However, neural networks are complex structures that mimic the cognitive abilities of the human brain [
31], and it is not entirely clear how neural networks use the data they have been trained on to reach specific conclusions, and it is hard to determine how or why the system behaves in this way [
32]. For tire defect detection, it is not only related to the interests of the industry, but also related to the safety of driving. We were unable to determine whether the neural network was biased against some types of bubble defects and chose to ignore them in detecting tire bubble defects. However, we can find and solve these problems by analyzing the results.
In the above part of the article, when we use Faster-RCNN for bubble defect detection, some bubble defects are missed, and, in the fifth set of experiments in
Table 2 and
Table 3, all neural networks are slightly less precise. Because some types of bubbles appear less frequently in real production and make up a small proportion of the dataset, neural networks with lower generalization ability are not sensitive to them. In the fifth set of experiments, most of these low-frequency bubble defects were divided into the test dataset, so that the neural network did not detect them well. Therefore, in order to ensure the practical application effect of neural networks in industrial defect detection, we can reduce these risks in the following ways:
Increase the proportion of defect types with low frequency in the dataset, increase the sensitivity of the neural network to it, and reduce the missed detection rate;
The equipment and environment need to be consistent, and the image sources for training and detection should be the same, such as the same camera and the same tire types, improving the detection stability of neural network;
Continuously tune and improve neural networks to improve precision and generalization ability.