CPAM: Cross Patch Attention Module for Complex Texture Tile Block Defect Detection

Zhu, Wenbo; Wang, Quan; Luo, Lufeng; Zhang, Yunzhi; Lu, Qinghua; Yeh, Wei-Chang; Liang, Jiancheng

doi:10.3390/app122311959

Open AccessArticle

CPAM: Cross Patch Attention Module for Complex Texture Tile Block Defect Detection

by

Wenbo Zhu

^1,*

,

Quan Wang

¹,

Lufeng Luo

¹

,

Yunzhi Zhang

¹

,

Qinghua Lu

¹,

Wei-Chang Yeh

²

and

Jiancheng Liang

¹

School of Mechatronic Engineering and Automation, Foshan University, Foshan 528225, China

²

Department of Industrial Engineering and Engineering Management, National Tsing Hua University, Hsinchu 30013, Taiwan

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2022, 12(23), 11959; https://doi.org/10.3390/app122311959

Submission received: 23 October 2022 / Revised: 16 November 2022 / Accepted: 19 November 2022 / Published: 23 November 2022

(This article belongs to the Section Applied Industrial Technologies)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Due to the little variation in defect points, tile block defect detection typically detects subtle defects in large-format images, allowing defective characteristics to be displayed regionally. Traditional convolutional neural network architectures that extract regional features take into account the connection between regional features simply, resulting in the presence of region-specific bias, which makes tile block defect detection still a challenging task. To address this challenge, this paper divides feature information into patches that can represent different regional features. Additionally, the relationship between different patches and tile block defects is studied; as a result, this paper proposes a new attention mechanism called the Cross Patch Attention Module (CPAM). Since the regional performance of patches is consistent with the tile block defect characteristics, CPAM can distinguish various regional features by patches. Then, in order to create reliable one-dimensional patch information, CPAM provides a method to connect patches linearly in two spatial directions. This takes into account the correlation of adjacent patches in various spatial directions. Finally, by extracting the regional characteristics of patches, CPAM can successfully assist the model in distinguishing the importance of different patches. The experimental results demonstrate that CPAM has excellent performance for tile block defect detection, and plugging CPAM into different end-to-end models can have a good gain effect, which can effectively and stably help the model to complete the task of tile block defect detection.

Keywords:

attention mechanism; deep learning; tile; defect detection

1. Introduction

Production and inspection are two important parts of the tile industry. Although there are numerous facilities currently that can automate the production of tiles, the assessment of the tile’s surface quality is still conducted manually [1]. Defects can occur everywhere on the tile surface, and detecting subtle defects in large-format tile images can be a tough task, especially on complex textured tiles. Complex textured tiles have intrusive background information, and some of the background information is relatively similar to and between defects, so they are easily confused when performing defect detection tasks. Therefore, providing a stable, accurate, and efficient method for surface defect detection is crucial to the development of the tile industry.

Vision-based inspection methods have developed rapidly in recent years, and many scholars have applied methods based on deep learning models to the field of industrial defect detection, such as Fabric Defect Detection [2], Electrical Insulator Defects Detection [3], and Bamboo Surface Sliver Detection [4]. In the field of tile defect detection, tile defect detection is still a difficult task due to the special characteristics of tile datasets, the great resolution of tile images, and the many types and tiny sizes of defects. There are many scholars who have conducted related investigations using machine vision inspection methods, Rahaman et al. [5] use a first-order derivative edge detector (Sobel) for the tile defect detection task, but this method is only applied to solid-colored tiles and no tests are performed regarding complex textured tiles. Karimi et al. [6] conduct a variety of defect detection methods, such as histograms, neural networks, morphological techniques, Gabor filters and wavelet transforms. Both of these methods have their own pros and cons, and there is no more comprehensive approach. Hanzaei et al. [7] integrate defect classification using a multi-class support vector machine with defect edge detection using a rotation invariant local variance metric (RIMLV) operator. This approach can detect tile defects more accurately, but only for solid-colored tiles, and has not been extended to complex textured tiles for related experiments. Deep learning models have stronger performance in defect detection tasks in complex scenes [8], but traditional deep learning models still have unresolved challenges in complex texture tile defect detection tasks [9]. This problem is reflected in the fact that traditional deep learning models do not easily take into account the connection between regional features, which makes it difficult for the model to extract the differences between different regions and exclude interference information, and therefore it is tough to identify some of the tile block defects.

In the field of tile block defect detection, this paper summarizes the characteristics of different methods, as shown in Table 1. As observe from the table, only traditional deep learning models are applied to complex textured tiles, but there are limitations in the connection between regional features. To address the challenges of traditional deep learning models for tile block defect detection, this paper is inspired by Gessert et al. [10] to relate tile block defect characteristics to regional features. Then, considering the specificity of the tile dataset, this study applies this method to the attention mechanism, which can effectively extract the regional features of tile block defects. As a result, this paper proposes a new attention module—CPAM, which focuses on the different patches. This attention module can be easily added to the end-to-end model and effectively helps the model to accomplish the tile block defect detection task. In this paper, tiles from three different backgrounds containing four defects are studied and a series of experiments are conducted on a dataset based on this. The contributions made in this paper can be summarized as follows:

This paper combines the characteristics of tile block defects with the regional performance of the patch and coalesces the feature information into simple one-dimensional information, where one information represents a patch, and each information represents a patch with different information, thus realizing the association between patch and tile block defects;
In this paper, two linear information alignment methods in different spatial directions are used to fully correlate adjacent patches and combine the two methods to establish stable linear 1D information, enhance 1D information expansion and reduce specificity bias in spatial location;
In this paper, the above two methods are combined and applied to the attention mechanism, and a new attention mechanism–CPAM, is proposed to help the model effectively complete tile block defect detection by highlighting the importance of different patches;
In this paper, CPAM is plugged into several end-to-end models and a series of experiments are conducted on the constructed dataset, and the results illustrate the effectiveness of CPAM for tile block defect detection gain effect. In addition, CPAM is compared with the attention mechanism commonly used in tile block defect detection, and the results demonstrate that CPAM extracts patches with a better gain effect for detecting tile block defects.

The rest of this paper is as follows. In Section 2, this paper details the deep learning and attention mechanisms in the field of tile block defect detection. In Section 3, this paper describes the methods of tile block defect detection. Section 4 shows the experimental results. Finally, the content of this paper is discussed and summarized.

2. Related Work

2.1. Deep Learning for Tile Block Defect Detection

In recent years, deep learning models have demonstrated great applicability in the field of industrial defect detection. Compared with traditional deep learning models, many scholars now make appropriate improvements for defect characteristics to help the models better perform their tasks. Xie et al. [11] propose an end-to-end CNN architecture (FFCNN) by combining a feature extraction module, a feature fusion module, and a decision module, which overcomes the drawbacks of traditional deep learning models for surface defect detection of magnetic tiles. In addition, they add the attention mechanism to the new architecture, which makes the new architecture effective and efficient for magnetic tile surface defect detection, and this new deep learning model has strong applicability and feature extraction capability.

However, there are a few scholars who have explored the potential of deep learning models in the field of tile block defect detection. Chaiyasarn et al. [12] use a simple CNN architecture for tile surface defect detection in temples, which has greater stability and higher accuracy compared to artificial neural networks (ANN). However, this method does not give a detailed analysis of specific defect types, which makes the scalability of the method necessary to be further enhanced. Stephen et al. [13] complete the task of crack defect detection for tiles using seven-layer CNNs, but this study does not continue in-depth and is only conducted extensively for crack defects. Both Chaiyasarn et al. [12] and Stephen [13] have conducted extensive studies on tile surface defects, but the methods are traditional deep learning models and have not been related to tile datasets with more complex textures and more types of defects. For the tiled dataset with complex texture multi-category defects, Wan et al. [9] make a corresponding improvement to YOLOv5s by adding the attention mechanism CBAM [14] and adding an output prediction layer, which makes the improved model better for multi-category defect detection tasks. This new deep learning model has strong applicability and can achieve good results on a multi-category tile surface defect dataset. These scholars have successfully applied deep learning models to tile defect detection, and some scholars have used novel deep learning models, but none of them have combined defect characteristics and attention mechanisms to analyze and thus solve the corresponding challenges more efficiently.

2.2. Attention Mechanism for Tile Block Defect Detection

To help models better focus on defective regions, many scholars have enhanced the visual perception capabilities of models by adding attention mechanisms. For instance, Li et al. [15] create a Multiscale Residual Attention Unit (MRAU) and apply it to micro-motor armature surface defect detection, Cheng et al. [16] present DE-block for metal surface defect detection, and Guo et al. [4] utilize CBAM for bamboo surface sliver defects detection. In the field of tile block defect detection, Wan et al. [9] use CBAM to help the model to further improve the focus on the defect region. CBAM helps the model to perform better by changing the importance of different locations in each space and thus highlighting the defective parts. However, since the importance of each location is not the same, the importance of each location in the defective area is also different, which may have an imbalance in importance. Therefore, CBAM does not help the model to focus on larger size defects, and CBAM cannot establish the connection between different regions to better distinguish defective regions from non-defective regions.

To address the balance problem of highlighting the tile block defect region importance, inspired by [10,14,17,18,19,20], this study attempts to solve the information imbalance within regions from the regionalization idea. Gessert et al. [10] segment the input image into nine patches and randomize two of them to construct a CNN architecture suitable for skin damage classification to combine local and global information. In addition, from the idea of processing the patch information [20], Dosovitskiy et al. [19] segment into multiple patches of fixed size and linearly embed them in the corresponding positions, and then fed the resulting vector sequence into a standard Transformer encoder to efficiently implement the image classification task. Tolstikhin et al. [21] continue the idea of patch processing by randomly arranging and linking patches and then using a simple MLP for information extraction to achieve good image classification results. Inspired by the above approach and the existing excellent attention mechanisms [10,14,17], this study applies the processing idea of the patch to the attention mechanism, not for image input, but based on feature information. In this study, the feature information is viewed as a combination of multiple patches, and the defective patches are highlighted by distinguishing the importance of different patches, and thus, CPAM is proposed. Considering the correlation and irrelevance of information in the spatial direction, in order to enhance the relevance of information, CPAM only adopts a linear arrangement of spatial information adjacent to each other. In addition, to make CPAM more efficient, CPAM uses 1D convolution to efficiently establish the connection between different patches [17], so as to obtain weights that can effectively distinguish the importance of different patches. Finally, CPAM can effectively help the model to achieve the distinction between defective and non-defective regions, make relative improvements for defects of different sizes, adjust patch size and embedding position, etc. Compared with CBAM, CPAM can better distinguish between defective and non-defective regions, and the efficient information interaction also makes CPAM outperform CBAM in terms of performance, so CPAM can help the model to better perform the tile defect detection task.

3. The Proposed Method

In this paper, a new attention module—CPAM is proposed, which can be easily plugged into the model. The structure of CPAM is shown in Figure 1, and it is mainly divided into a pooling part, a convolution part, and a weighting part. First, the input feature mapping F is globally pooled to obtain the compressed feature mapping

f \in R^{H \times W \times 1}

, which can be expressed as

f_{(i, j)} = Pool (X_{(i, j)})

(1)

f represents the whole feature map of the input is compressed into a one-dimensional feature map, X is the tensor of the input feature map, and the height and width of the feature mapping are represented by H and W, respectively. Then, the feature mapping f is pooled into patches, and each weight in the pooled feature mapping

f^{'} \in R^{P \times P \times 1}

represents the pooled patch, which can be expressed as

f_{(i, j)}^{'} = Pool_{p a t c h} (f_{(p \times i, p \times j)})

(2)

p represents the open square of the patch value, that is

p^{2} = patch

.

The information in the feature mapping F has different spatial location information, and the arrangement of spatial location information affects the patch correlation. Therefore, three sets of strategies with no identical directionality are set up in this study, as shown in Figure 2. In this study, considering two spatial orientations simultaneously can increase stability, and its discussed specifically in the experimental section. In the experiment (c) strategy is the best, so (c) strategy is chosen,

f^{'}

transposed to obtain

f^{''} \in R^{P \times P \times 1}

, which can be expressed as

f_{(i, j)}^{''} = Transpose (f_{(i, j)}^{'})

(3)

This gives the arrangement of the feature mappings

f^{'}

and

f^{''}

in the X and Y directions, by joining the feature mappings

f^{'}

and

f^{''}

into a one-dimensional vector, respectively, which are

f_{1} \in R^{P^{2} \times 1}

and

f_{2} \in R^{P^{2} \times 1}

, then it is easy to cancat the two vectors to obtain

f_{concat} \in R^{P^{2} \times 2}

.

f_{conv} = Conv (f_{concat})

(4)

The obtained

f_{conv}

represents the feature mapping that distinguishes the importance of different patches. After completing the convolution part,

f_{conv}

needs to be weighted into the input feature map, so

f_{conv}

needs to be converted into a weightable size. In the weighting part,

f_{conv}

is separated into two-dimensional feature mapping

f_{s p l i t} \in R^{P \times P \times 1}

, which is converted into

f_{pool} \in R^{H \times W \times 1}

by the pooling operation, and then

g \in R^{H \times W \times 1}

will be obtained by the sigmoid function, and the weighting part can be expressed as

g = σ (Pool (Split (f_{conv})))

(5)

In the equation,

σ

is the sigmoid function. Finally, the weighted feature mapping

F_{C P A M}

can be obtained, and the output is expressed as

y_{(i, j)} = x_{(i, j)} \otimes g_{(i, j)}

(6)

Functionally, the most important function of CPAM is to coalesce feature information as a combination of multiple patches, which can obtain weights that contain a large amount of information. Then, CPAM employs two different spatial orientation alignment methods to construct stable one-dimensional information, using one-dimensional convolution to quickly help the model distinguish the importance of different patches. Structurally, CPAM is very simple, divided into a pooling part, convolution part and weighting part, and there is only one convolutional layer in the whole structure, so the operation is efficient. Additionally, unlike other attention, CPAM does not distinguish between the importance of different channels and does not focus on the importance of each piece of information in space. Eventually, CPAM combines the tile block defect characteristics and the regional representation of the patch to apply the method of processing the patch to the attention mechanism, which helps the model to complete tile block defect detection efficiently.

4. Experiments and Results

4.1. Dataset Construction

4.1.1. Image Acquisition

In this paper, the dataset is collected in a real factory environment with a Dalsa line scan camera LA-CM-16K05A-00-R, with a resolution of 16,384 × 16,384. The format of the dataset is JPG, containing 3 types of tiles and 4 types of tile surface defects, i.e., soiling, raw material impurities, poor glaze, and white spot defects. The images are shown in Figure 3.

4.1.2. Preprocessing of Images

In order to effectively improve the model training efficiency and reduce unnecessary computational overhead, this paper performs offline sliding slicing and data augmentation [22] on the original image dataset, i.e., the original input images are sliding sliced sequentially from left to right and from top to bottom. In the slicing process, the overlap rate is 0.5 and the image size is 416 × 416 pixels. After pre-processing, the dataset contains 1320 images. In the experiment, the dataset is divided into training set, test set and validation set in the ratio of 7:1:2.

4.2. Implementation Details

All experiments in this paper are conducted on NVIDIA RTX3000 GPU with 6G video memory size and Intel Core i9-9980HK CPU processor. The experimental running environment is Python3.8, cuda11.0, the optimizer is SGD, momentum is 0.937, the initial learning rate is 0.01, and the epoch is 300. The test images are randomly selected in the test set. CPAM parameters are chosen: 1D convolutional kernel size is 3, global pooling strategy is average pooling, and patch size is 9.

4.3. Model Evaluation Indicator

In the field of industrial defect detection, according to [2,3,4,9,11,15,16], this study uses Precision (P), Recall (R), F1-score (F1), Average Precision of A category (AP), Average Precision of Multiple categories (mAP) are used to evaluate the proposed method. The calculation methods of Precision, Recall, F1-score, AP, and mAP are shown in Equations (7)–(11).

Precision = \frac{TP}{TP + FP}

(7)

Recall = \frac{TP}{TP + FN}

(8)

F 1 - score = 2 \cdot \frac{Precision \cdot Recall}{Precision + Recall}

(9)

In Equation (7)–(9), TP, FP, TN, and FN represent true cases, false positive cases, true false cases, and false false cases.

AP = \int_{0}^{1} p (r) d (r)

(10)

mAP = \frac{\sum_{n = 1}^{N} (AP (n))}{N}

(11)

The PR curve is a graph plotted with Recall as the horizontal axis and Precision as the vertical axis. AP, as a single category metric, is the integral of the PR curve.

In Equation (11), n is the category and n is the total number of categories, mAP value is the average of the sum of all categories AP, which is one of the important indicators for the whole model evaluation.

4.4. Analysis of the Proposed Method

In this section, experiments related to the proposed method will be conducted and evaluated using the evaluation metrics described above. This section will be divided into four subsections for detailed description.

4.4.1. Analysis of Object Detection Module

In the field of industrial defect detection, the real-time performance and accuracy of the models are particularly important to accomplish the detection task; thus, a series of comparative experiments of object detection algorithms are conducted in this study. The models selected for the experiments are SSD-VGG16 [23], SSD-MobileNetv2 [24], YOLOv4 [25], YOLOv4-MobileNetv2, YOLOv5-s [26], YOLOv5-m [26], YOLOX-tiny [27], YOLOX-s [27] YOLOX-m [27], and YOLOv7 [28].

From the comparison results in Table 2, it can be observed that SSD is surprisingly effective on this dataset, but the overall performance is extremely unstable in terms of Recall and F1-score. YOLOv4 is more stable in terms of overall performance, and it can be observed that the fluctuations in Precision, Recall, F1-score and mAP are not very large and relatively stable. Compared to other models, YOLOv7 has the highest mAP and balanced performance, while there is still room for improvement in Recall and F1-scores. From Table 3, it can be found that there are differences in the performance of YOLOv7 in terms of the AP values on different categories, and the detection of defects in two categories with high similarity, dirty and impurity, is very effective. However, for pour glaze, bad defect detection is poor, while other models have better results in such defects. YOLOv7 combines the advantageous features of many models, and the use of Extended-ELAN in its network structure is its most special feature, which brings significant performance improvement to YOLOv7. On the other side, YOLOv7 uses different improvement strategies, such as model scaling for concatenation-based models, planned re-parameterized convolution and coarse for auxiliary and fine for lead loss, etc. Therefore, YOLOv7 is still the best model for tile block defect detection, although it has a little disadvantageous place.

4.4.2. Analysis of CPAM

CPAM can be embedded into the model, experiments are conducted to embed CPAM into YOLOv7 in this paper, the structure of YOLOv7 is shown in Figure 4. Preserving the model backbone network integrity, this study conducts embedding location experiments, adding CPAM in front of the Yolo head, and conducting comparison experiments in three locations. The experimental results are shown in Table 4, where CPAM is embedded in front of P3 (Yolo head1), P4 (Yolo head2) and P5 (Yolo head3). From the results, it can be observed that CPAM can effectively improve the performance of the model regardless of the embedding position, with the highest map effect achieved when CPAM is embedded in front of P5. Moreover, it is also known from the results that adding multiple locations does not further improve the performance of the model. Therefore, the subsequent experiments in this study only added CPAM in front of P5.

In this study, the arrangement of patches in the spatial direction is considered in the design of CPAM, so the comparison experiments regarding different arrangements in the spatial direction are conducted. The experimental results are shown in Table 5, and it can be learned that the performance of the model in terms of overall performance decreases when only one spatial orientation arrangement is considered. The overall performance of the model is more stable when both horizontal and vertical spatial directions are considered. As a result, CPAM considers the alignment of patches in both spatial directions at the same time.

In addition, the comparison experiments of convolutional kernel size and patch size in CPAM are also given in Table 6 and Table 7. Changing the size of the 1D convolutional kernel causes little influence on the outcome, and the overall performance is stable at certain values, i.e., 93.8–94.8% for mAP, 0.90–0.92 for F1, 90.9–94.6% for P, and 88.0–90.0 for R. When k is 3, the model performs better than the other cases. Changing the patch size is less effective for tile defect detection, and as observed in Table 6, the overall performance fluctuation is small. The overall performance fluctuates because the defect size varies in the selected tile defect dataset, and some patch sizes will work better for defects of matching size, so the performance will fluctuate. However, the change in defect size is not too large, so changing the patch size has a small impact on the model in this defect detection task.

It is known from the ablation experiments of CPAM that CPAM has a significant gain for tile block defect detection; moreover, this study conducted comparison experiments with other commonly used attention mechanisms in industry, such as SE [29], CBAM [14] and ECA [17], and the results are shown in Table 8. The improvement effect of SE and ECA on the overall performance of the model is clearly visible, but CBAM is more effective in the tile block defect detection task. Since CBAM fits better with tile block defect characteristics, CBAM can extract regional characteristics of tile block defects, which allows CBAM to perform better. Compared with these attention mechanisms, CPAM has a more outstanding and stable overall performance. As CPAM can not only extract regional characteristics of tile block defects but also establish connections between different regions, it thus extracts tile block defect information more efficiently. In addition, this paper provides the heat map of different attention mechanisms by using grad-cam [30]. From Figure 5, it can be found that the model incorporating CPAM can better focus on the defective region, thus effectively distinguishing the defects and the non-defects.

4.4.3. Analysis of Object Detection Module with CPAM

To verify that CPAM can also improve the performance of tile block defect detection when used in different models, relevant experiments are conducted on different end-to-end models in this study, and the experimental results are shown in Table 9.

As can be observed from the Table 9, CPAM has a gain effect on most of the models, with the largest gain for YOLOv7, which can improve 2.9% mAP. CPAM has good performance in helping the model to distinguish between the high similarity defects of dirty and impurity, and is added to different models can improve the overall detection of these two types of defects. Additionally, CPAM also has a good gain effect for pour glaze bad and white dot, especially when plugged in YOLOX-m and YOLOv7. Overall, CPAM has a certain gain effect for all kinds of defects in general, which can demonstrate that CPAM is suitable for application in the tile block defect detection. Among them, CPAM has the highest adaptability to YOLOv7, and CPAM has the largest performance improvement for YOLOv7. As a result, the YOLOv7-CPAM is most suitable for tile block defect detection tasks.

5. Discussion

Tile defect detection has developed rapidly in recent years, and vision-based detection methods have become a research hotspot, whether they are image processing methods, machine vision methods, or deep learning methods, all of which have achieved good results. In the early stage of tile defect detection, the research object is mainly solid-colored tiles, and with the advancement of technology, the research object gradually shifts to complex texture tiles with high detection difficulty. Based on previous research, this paper provides a relevant exploration of complex textured tile block defect detection and provides an improved method regarding the traditional deep learning model.

5.1. Discussions of Complex Texture Tile Block Defect Detection

Complex texture tiles have a more complex background, texture, etc., compared to solid-colored tiles, which makes it much more difficult to detect defects in complex texture tiles. There are many scholars have conducted studies related to solid-colored tiles [5,6,7], but they have not explored complex textured tiles in relation to them. In complex texture tile defect detection, Wan et al. [9] provided a method for deep learning methods; this method can be applied to complex texture tile defect detection with wide applicability. However, they did not notice the correlation of tile block defects with regional characteristics and therefore did not conduct further studies on tile block defects.

5.2. Discussions of CPAM with Tile Block Defect Detection

There are some defects that are difficult to detect in the complex texture tile block defect detection, which are block defects with high similarity to the tile background, such as pour glaze bad defect, or two defects with high similarities, such as dirty and impurity defects. Based on previous studies and the applicability of the object detection model for complex textured tile defect detection, YOLOv7 is chosen as the base model for this study. To address the challenges that some block defects are difficult to detect, this paper combines tile block defects with regional features and proposes a new attention mechanism—CPAM. CPAM differs from other attention mechanisms in that it does not only extract channel information or spatial information, but extracts spatial information from different regions to obtain a complete regional feature. The patch information extracted by CPAM contains the regional characteristics of tile block defects for which the tile block defect features are regionally expressed. In addition, CPAM employs two linear information alignment methods with different spatial directions and an efficient connect method. Finally, CPAM establishes the connection between different patches and effectively helps the model to accomplish the task of complex texture tile block defect detection by identifying the differences between defective and non-defective patches and defective patches and defective patches. After CPAM is added to different end-to-end deep learning models, the experimental results demonstrate the excellent gain effect of CPAM for complex texture tile block defect detection, which can indicate the strong applicability of CPAM in this task.

5.3. Discussions of Limitations with CPAM

CPAM has high applicability for complex textured tile block defect detection, but there are some limitations: (1) The CPAM method of establishing different patch associations is a very simple and efficient method, but there is still room for improvement; (2) CPAM is currently excellent only for block defect detection and no experiments have been conducted for line defect detection or more complex shape defect detection; (3) there may be limitations in the migratory nature of CPAM and no tests have been conducted for other defect datasets.

6. Conclusions

In this paper, a new attention module is proposed, which associates the regional representation of tile block defects with patches, and coalesces the feature information into simple one-dimensional information as a way to represent the information contained in patches. CPAM uses two spatially oriented linear alignment methods to reduce region-specific bias. CPAM helps the model to effectively distinguish defective patches from non-defective patches by highlighting the importance of different patches. CPAM can be easily plugged into the end-to-end model, and the extensive experiments conducted in this paper can demonstrate the effectiveness of CPAM with good gain for tile block defect detection.

In addition, future work will revolve around the following goals: (1) The methods for CPAM to establish associations of different patches will be further explored, thus making the methods more efficient; (2) there is not enough correlation between the characteristics of CPAM and the characteristics of tile linear defects, and the potential of CPAM on tile linear defects will be further explored; (3) other defects are more different from tile block defects, and the applicability of CPAM on other defects will be further verified.

Author Contributions

Conceptualization, W.Z. and Q.W.; methodology, W.Z. and Q.W.; software, Q.W. and J.L.; validation, L.L., Y.Z. and Q.L.; formal analysis, L.L., Y.Z. and Q.L.; investigation, W.Z. and Q.L.; resources, L.L.; data curation, J.L.; writing—original draft preparation, Q.W.; writing—review and editing, W.Z. and Q.W.; visualization, Q.W.; project administration, Q.L. and W.-C.Y.; funding acquisition, W.Z. and Q.L. All authors have read and agreed to the published version of the manuscript.

Funding

Research supported by the Guangdong Province Key Field R&D Program Project: Grant Nos. 2021B0101410002, 2020B0404030001; National Natural Science Foundation of China: Grant No. 62106048; Foshan City Key Field Science and Technology Research Project: Grant No. 2020001006297; Shunde District Core Technology Research Project: Grant No. 2030218000174.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Zorić, B.; Matić, T.; Hocenski, Ž. Classification of biscuit tiles for defect detection using Fourier transform features. ISA Trans. 2022, 125, 400–414. [Google Scholar] [CrossRef] [PubMed]
Zhou, H.; Jang, B.; Chen, Y.; Troendle, D. Exploring faster RCNN for fabric defect detection. In Proceedings of the 2020 Third International Conference on Artificial Intelligence for Industries (AI4I), Irvine, CA, USA, 21–23 September 2020; pp. 52–55. [Google Scholar]
Feng, Z.; Guo, L.; Huang, D.; Li, R. Electrical insulator defects detection method based on yolov5. In Proceedings of the 2021 IEEE 10th Data Driven Control and Learning Systems Conference (DDCLS), Suzhou, China, 14–16 May 2021; pp. 979–984. [Google Scholar]
Guo, Y.; Zeng, Y.; Gao, F.; Qiu, Y.; Zhou, X.; Zhong, L.; Zhan, C. Improved YOLOV4-CSP Algorithm for Detection of Bamboo Surface Sliver Defects With Extreme Aspect Ratio. IEEE Access 2022, 10, 29810–29820. [Google Scholar] [CrossRef]
Rahaman, G.; Hossain, M. Automatic defect detection and classification technique from image: A special case using ceramic tiles. arXiv 2009, arXiv:0906.3770. [Google Scholar]
Karimi, M.H.; Asemani, D. Surface defect detection in tiling Industries using digital image processing methods: Analysis and evaluation. ISA Trans. 2014, 53, 834–844. [Google Scholar] [CrossRef] [PubMed]
Hanzaei, S.H.; Afshar, A.; Barazandeh, F. Automatic detection and classification of the ceramic tiles’ surface defects. Pattern Recognit. 2017, 66, 174–189. [Google Scholar] [CrossRef]
Luo, Q.; Fang, X.; Liu, L.; Yang, C.; Sun, Y. Automated visual defect detection for flat steel surface: A survey. IEEE Trans. Instrum. Meas. 2020, 69, 626–644. [Google Scholar] [CrossRef] [Green Version]
Wan, G.; Fang, H.; Wang, D.; Yan, J.; Xie, B. Ceramic tile surface defect detection based on deep learning. Ceram. Int. 2022, 48, 11085–11093. [Google Scholar] [CrossRef]
Gessert, N.; Sentker, T.; Madesta, F.; Schmitz, R.; Kniep, H.; Baltruschat, I.; Werner, R.; Schlaefer, A. Skin lesion classification using CNNs with patch-based attention and diagnosis-guided loss weighting. IEEE Trans. Biomed. Eng. 2019, 67, 495–503. [Google Scholar] [CrossRef] [Green Version]
Xie, L.; Xiang, X.; Xu, H.; Wang, L.; Lin, L.; Yin, G. FFCNN: A deep neural network for surface defect detection of magnetic tile. IEEE Trans. Ind. Electron. 2020, 68, 3506–3516. [Google Scholar] [CrossRef]
Chaiyasarn, K.; Buatik, A. Tile damage detection in temple facade via convolutional neural networks. J. Eng. Sci. Technol. 2021, 16, 3057–3071. [Google Scholar]
Stephen, O.; Maduh, U.J.; Sain, M. A machine learning method for detection of surface defects on ceramic tiles using convolutional neural networks. Electronics 2021, 11, 55. [Google Scholar] [CrossRef]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Li, Z.; Li, J.; Dai, W. A two-stage multiscale residual attention network for light guide plate defect detection. IEEE Access 2020, 9, 2780–2792. [Google Scholar] [CrossRef]
Cheng, X.; Yu, J. RetinaNet with difference channel attention and adaptively spatial feature fusion for steel surface defect detection. IEEE Trans. Instrum. Meas. 2020, 70, 1–11. [Google Scholar] [CrossRef]
Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11531–11539. [Google Scholar] [CrossRef]
Han, K.; Xiao, A.; Wu, E.; Guo, J.; Xu, C.; Wang, Y. Transformer in transformer. Adv. Neural Inf. Process. Syst. 2021, 34, 15908–15919. [Google Scholar]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
Cordonnier, J.B.; Loukas, A.; Jaggi, M. On the relationship between self-attention and convolutional layers. arXiv 2019, arXiv:1911.03584. [Google Scholar]
Tolstikhin, I.; Houlsby, N.; Kolesnikov, A.; Beyer, L.; Zhai, X.; Unterthiner, T.; Yung, J.; Steiner, A.; Keysers, D.; Uszkoreit, J.; et al. Mlp-mixer: An all-mlp architecture for vision. Adv. Neural Inf. Process. Syst. 2021, 34, 24261–24272. [Google Scholar]
Van Etten, A. You only look twice: Rapid multi-scale object detection in satellite imagery. arXiv 2018, arXiv:1805.09512. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
Jocher, G.; Chaurasia, A.; Stoken, A.; Borovec, J.; NanoCode012; Kwon, Y.; Xie, T.; Michael, K.; Fang, J.; imyhxy; et al. ultralytics/yolov5: V6.2—YOLOv5 Classification Models, Apple M1, Reproducibility, ClearML and Deci.ai integrations; Zenodo: Geneve, Switzerland, 2022. [Google Scholar] [CrossRef]
Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. Yolox: Exceeding yolo series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv 2022, arXiv:2207.02696. [Google Scholar]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]

Figure 1. The structure of CPAM.

Figure 2. The connection strategies with different directionality.

Figure 3. Different tile defect pictures.

Figure 4. The structure of YOLOv7.

Figure 5. Heatmaps of different attention modules.

Table 1. The characteristics of different methods in tile block defect detection.

Method	Application Object	Calculation Time	Performance	The Connection between Regional Features
First-order derivative edge detector	Solid-colored tiles	Low	Good	Bad
Histograms	Solid-colored tiles	Low	Good	Bad
Neural networks	Solid-colored tiles	High	Good	Not good
Morphological techniques	Solid-colored tiles	Low	Good	Bad
Gabor filters	Solid-colored tiles	Low	Bad	Bad
Wavelet transforms	Solid-colored tiles	High	Good	Bad
Multi-class support vector machine and rotation invariant local variance metric (RIMLV) operator	Solid-colored tiles	High	Good	Bad
Traditional deep learning models	Solid-colored tiles	High	Good	Not good

Table 2. The results of different object detection algorithms.

Model	Precision	Recall	F1-Score	mAP
SSD-VGG16	96.4	39.8	0.45	86.0
SSD-MobileNetv2	100	44.0	0.51	84.0
YOLOv4	93.6	83.9	0.87	88.7
YOLOv4-MobileNetv2	93.7	82.7	0.87	82.6
YOLOv5-s	93.9	63.7	0.69	90.0
YOLOv5-m	89.5	70.4	0.77	90.5
YOLOX-tiny	88.8	82.6	0.84	83.0
YOLOX-s	89.1	80.5	0.85	83.0
YOLOX-m	90.4	84.1	0.86	83.5
YOLOv7	93.3	76.4	0.82	91.9

Table 3. The specific AP values for different categories of defects.

Model	AP $_{Dirty}$	AP $_{Impurity}$	AP $_{Pour glaze bad}$	AP $_{White dot}$	mAP
SSD-VGG16	65.4	81.4	100	97.1	86.0
YOLOv4	71.0	84.7	100	98.9	88.7
YOLOv5-m	76.7	89.0	100	96.1	90.5
YOLOX-m	67.8	79.0	94.6	92.7	83.5
YOLOv7	87.0	97.3	83.5	99.9	91.9

Table 4. The results of CPAM with different embedding positions.

Model	Precision	Recall	F1-Score	mAP
YOLOv7-CPAM (P3)	94.0	87.4	0.91	94.8
YOLOv7-CPAM (P4)	94.2	84.7	0.89	94.5
YOLOv7-CPAM (P5)	94.6	90.0	0.92	94.8
YOLOv7-CPAM (P3 + P4 + P5)	92.7	88.6	0.91	94.3

Table 5. The results of CPAM with arranging patches in different spatial orientations.

Model	Precision	Recall	F1-Score	mAP
YOLOv7-CPAM-(a)	93.8	88.5	0.91	94.4
YOLOv7-CPAM-(b)	93.2	83.9	0.88	93.4
YOLOv7-CPAM-(c)	94.6	90.0	0.92	94.8

Table 6. The results of CPAM with different convolution kernel sizes.

Model	Precision	Recall	F1-Score	mAP
YOLOv7-CPAM (k = 3)	94.6	90.0	0.92	94.8
YOLOv7-CPAM (k = 5)	93.0	89.4	0.91	94.1
YOLOv7-CPAM (k = 7)	90.9	89.5	0.90	93.8
YOLOv7-CPAM (k = 9)	94.4	88.0	0.91	93.9

Table 7. The results of CPAM with different patch sizes.

Model	Precision	Recall	F1-Score	mAP
YOLOv7-CPAM (patch = 9)	94.6	90.0	0.92	94.8
YOLOv7-CPAM (patch = 16)	94.4	90.0	0.92	94.9
YOLOv7-CPAM (patch = 25)	94.5	88.3	0.91	94.5
YOLOv7-CPAM (patch = 36)	94.8	88.3	0.91	94.7
YOLOv7-CPAM (patch = 49)	94.0	88.2	0.91	94.6

Table 8. The results of YOLOv7 with different attention modules.

Model	Precision	Recall	F1-Score	mAP
YOLOv7	93.3	76.4	0.82	91.9
YOLOv7-SE	94.5	85.9	0.90	94.2 (+2.3)
YOLOv7-CBAM	94.9	85.6	0.90	94.3 (+2.4)
YOLOv7-ECA	93.2	87.4	0.90	94.0 (+2.1)
YOLOv7-CPAM	94.6	90.0	0.92	94.8 (+2.9)

Table 9. The results of different models with CPAM.

Model	AP $_{Dirty}$	AP $_{Impurity}$	AP $_{Pour glaze bad}$	AP $_{White dot}$	mAP	Param
SSD-VGG16-CPAM	65.7 (+0.3)	82.5 (+1.1)	100 (+0)	99.4 (+2.3)	86.9 (+0.9)	24.0M
YOLOv4-CPAM	70.9 (−0.1)	85.8 (+1.1)	100 (+0)	99.6 (+0.7)	89.1 (+0.4)	64.0M
YOLOv5-m-CPAM	76.6 (+0.1)	98.1 (+9.1)	96.2 (−3.8)	98.6 (+2.5)	92.4 (+1.9)	21.1M
YOLOX-m-CPAM	71.4 (+3.6)	75.8 (+3.2)	99.4 (+4.8)	93.6 (+0.9)	85.0 (+1.5)	25.3M
YOLOv7-CPAM	86.9 (−0.1)	98.6 (+1.3)	93.8 (+10.3)	99.9 (+0)	94.8 (+2.9)	37.2M

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhu, W.; Wang, Q.; Luo, L.; Zhang, Y.; Lu, Q.; Yeh, W.-C.; Liang, J. CPAM: Cross Patch Attention Module for Complex Texture Tile Block Defect Detection. Appl. Sci. 2022, 12, 11959. https://doi.org/10.3390/app122311959

AMA Style

Zhu W, Wang Q, Luo L, Zhang Y, Lu Q, Yeh W-C, Liang J. CPAM: Cross Patch Attention Module for Complex Texture Tile Block Defect Detection. Applied Sciences. 2022; 12(23):11959. https://doi.org/10.3390/app122311959

Chicago/Turabian Style

Zhu, Wenbo, Quan Wang, Lufeng Luo, Yunzhi Zhang, Qinghua Lu, Wei-Chang Yeh, and Jiancheng Liang. 2022. "CPAM: Cross Patch Attention Module for Complex Texture Tile Block Defect Detection" Applied Sciences 12, no. 23: 11959. https://doi.org/10.3390/app122311959

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

CPAM: Cross Patch Attention Module for Complex Texture Tile Block Defect Detection

Abstract

1. Introduction

2. Related Work

2.1. Deep Learning for Tile Block Defect Detection

2.2. Attention Mechanism for Tile Block Defect Detection

3. The Proposed Method

4. Experiments and Results

4.1. Dataset Construction

4.1.1. Image Acquisition

4.1.2. Preprocessing of Images

4.2. Implementation Details

4.3. Model Evaluation Indicator

4.4. Analysis of the Proposed Method

4.4.1. Analysis of Object Detection Module

4.4.2. Analysis of CPAM

4.4.3. Analysis of Object Detection Module with CPAM

5. Discussion

5.1. Discussions of Complex Texture Tile Block Defect Detection

5.2. Discussions of CPAM with Tile Block Defect Detection

5.3. Discussions of Limitations with CPAM

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI