Surface Defect Detection for Aerospace Aluminum Profiles with Attention Mechanism and Multi-Scale Features

Feng, Yin-An; Song, Wei-Wei

doi:10.3390/electronics13142861

Open AccessArticle

Surface Defect Detection for Aerospace Aluminum Profiles with Attention Mechanism and Multi-Scale Features

by

Yin-An Feng

^† and

Wei-Wei Song

^*,†

School of Electrical and Control Engineering, Shaanxi University of Science and Technology, Xi’an 710016, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this study.

Electronics 2024, 13(14), 2861; https://doi.org/10.3390/electronics13142861 (registering DOI)

Submission received: 14 June 2024 / Revised: 7 July 2024 / Accepted: 10 July 2024 / Published: 20 July 2024

Download

Browse Figures

Versions Notes

Abstract

:

A YOLOv5 aluminum profile defect detection algorithm that integrates attention and multi-scale features is proposed in this paper to address the issues of the low detection accuracy, high false detection rates, and high missed detection rates that are caused by the large-scale variation of surface defects, inconspicuous small defect characteristics, and a lack of concentrated feature information in defect areas. Firstly, an improved CBAM (Channel-Wise Attention Module) convolutional attention module is employed, which effectively focuses on the feature information of defect areas in the aluminum defect dataset with only a small amount of spatial dimension. Secondly, a bidirectional weighted feature fusion network is utilized, incorporating a multi-scale feature fusion network with skip connections to aggregate various high-resolution features, thus enriching the semantic expression of features. Then, new size feature maps that have not been fused are introduced into the detection layer network to improve the detection effect of small target defects. Experimental results indicate that an average detection accuracy (mAP) of 82.6% was achieved by the improved YOLOv5 algorithm on the aluminum surface defect dataset. An improvement of 6.2% over the previous version was observed. The current defect detection requirements of aluminum profile production sites are met by this enhanced algorithm.

Keywords:

defect detection; YOLOv5; attention mechanism; BiFPN

1. Introduction

Aluminum profiles are indispensable structural materials in the aerospace field. They are characterized by low density, high strength, cost-effectiveness, easy processing, and corrosion resistance [1]. After nearly a century of development, aerospace aluminum alloys have become a distinct and complete set of core materials essential to the aerospace industry. They are a crucial part of the primary materials in aviation. Therefore, efficient surface defect detection technology for aluminum profiles is a key issue that needs urgent resolution to drive the development of the aerospace industry [2]. During the actual production of aluminum profiles, surface defects such as cracks, scratches, and peeling may occur due to various factors. These defects significantly affect the quality of aluminum profiles [3]. Thus, designing a real-time and effective surface defect detection method for aluminum profiles is of great importance for enhancing national industrial standards.

Early methods for metal surface defect detection primarily included manual visual inspection, infrared identification, and traditional machine learning-based techniques. These methods exhibited low detection accuracy and were easily affected by the surrounding environment, presenting many drawbacks. The line scan method was used by Xue et al. for metal surface defect detection. The Otsu algorithm was employed for threshold-based binarization to identify clearly defective objects. Finally, boundary tracking algorithms were used for automatic defect recognition [4]. Sharma Mansi et al. introduced a layered approach for classifying and detecting steel surface defects. The first stage employed a layered structure of binary classifiers. The second stage used object detection and semantic segmentation algorithms to detect steel surface defects [5]. Wu et al. proposed a gray-level stretching maximum entropy segmentation method. This method focuses on gray-level stretching and the denoising of drone-based track images. The optimal segmentation threshold was selected to detect rail surface defects based on drone images [6]. A composite vision system was developed by Zhihong Yan et al. in response to the trends and demands of automated welding inspection in industry. It enables the simultaneous acquisition of 3D depth images of weld surfaces and 2D grayscale images under multi-angle illumination, facilitating the effective detection of various surface defects in aluminum alloy welds [7]. The aforementioned methods exhibited a low detection accuracy and slow detection speed. They could not simultaneously detect multiple types of defects on the aluminum profiles or locate the defective regions accurately.

With the advancement of artificial intelligence technology, deep learning has become a powerful method for addressing the problem of aluminum profile surface defect detection. It has garnered extensive attention and research. A conceptually simple yet effective method for detecting forged faces and localizing vertical regions in images was proposed by Chenqi Kong et al. The approach relies on a segmentation map to provide meaningful semantic information, and it supplements this with noise estimation for capturing low-level cues and enhancing decision-making capabilities. Finally, the integration of features from these modules enables the distinction of fake faces, yielding state-of-the-art detection accuracy and significant localization performance [8]. Compared to traditional methods, deep learning offers higher accuracy and faster detection speeds. It is widely applied in the field of industrial identification [9,10,11]. Deep learning-based object detection algorithms can be divided into two main categories. One is the two-stage object detection algorithms, represented by Faster R-CNN [12]. The other is the single-stage object detection algorithms, represented by YOLO [13]. A multitask deep learning-based aluminum defect detection method was proposed by Shen et al. A multitask deep network model was designed based on Faster R-CNN. A multitask loss layer was then designed. Adaptive weights were used to balance each task, addressing the issue of the low detection accuracy caused by the limited number of aluminum defect samples [14]. An improved faster RCNN deep learning network for detecting surface defects on aluminum material was proposed by Xiang et al. The feature pyramid network (FPN) structure was integrated into the backbone network to enhance the feature extraction capability for small defects. Subsequently, the Region of Interest (ROI) Align algorithm was employed to replace the coarse ROI Pooling algorithm, resulting in more accurate defect localization information. This led to a significant enhancement in the detection capability for small defects on aluminum [15]. The experimental research on surface defect detections of aluminum alloy profiles using the YOLOv5 algorithm was conducted by Deng Gang et al. The application of the K-Means++ algorithm to the adaptive anchor box algorithm was proposed, addressing the issue of low recall rates for small and elongated targets. YOLOv5 achieved good results in both detection accuracy and speed in that study, demonstrating superior performance in specific tasks [16]. The attention-based prior-constrained neural network (PRANN) model for evaluating wind turbine responses was proposed by Dongping Zhu et al. The aforementioned model integrates the uncertainties and interactions of environmental factors, which are constrained by a conditional autoregressive model, aiming to enhance the safety and reliability of wind turbine generators. Experimental validation shows that the PRANN model effectively predicts the structural responses and fatigue estimation capabilities of wind turbines, and the results are represented using confidence intervals, thus providing engineers with the ability to design or assess wind turbines at the desired confidence levels [17]. Remaining useful life (RUL) prediction is critical for bearing health management, ensuring the safety and reliability of rotating mechanical systems. A new data-driven prognostic approach was proposed by Qing Ni et al., who introduces a health indicator (HI) to integrate multi-scale features and reveal the potential degradation in bearings. WDgram and MOGOA algorithms were employed to mitigate the random fluctuations in RUL estimation, leading to the development of a high-precision and widely applicable gated recurrent unit network. Experimental validation has demonstrated the significant advantages of this method in the management of rotating mechanical system’s health [18].

However, in practical applications, the aforementioned methods have certain limitations in their ability to extract defective features, and they still exhibit shortcomings in detecting defects and dense targets. To address these challenges, an improved algorithm based on YOLOv5 is proposed in this paper to tackle the common surface defects in aluminum profiles. The algorithm’s effectiveness is validated on the Tianchi aluminum profile defect dataset. The improved algorithm primarily incorporates an enhanced Convolutional Block Attention Mechanism (Para_CBAM) in the backbone network to enhance feature extraction capabilities. Additionally, a BiFPN network was introduced in the neck layer to replace the original feature pyramid network, and a detection head of 160 × 160 was added to the output layer to achieve better detection results. The improved algorithm significantly outperforms the original YOLOv5 in terms of detection accuracy, albeit with a slight decrease in detection speed. Overall, it meets the requirements for defect detection in aluminum-profile manufacturing plants.

2. Improvements to the YOLOv5 Algorithm

The improved model proposed in this study is based on the YOLOv5 algorithm, as depicted in Figure 1. YOLOv5 is divided into four versions based on network depth and width: YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x. These versions differ in network depth, width, and computational complexity, with YOLOv5s being smaller and more lightweight, thus making it suitable for resource-constrained scenarios, while YOLOv5x is deeper and more accurate, which renders it suitable for applications requiring high precision.

In response to the lightweight and real-time requirements of defect detection in the industry, YOLOv5s was improved in this study while maintaining detection accuracy. The focus of the improvement was on optimizing the backbone network, neck, and head layers to enhance detection speed and accuracy. Particularly for defect detection in small-scale and complex environments, improvements were made in model structure and parameter tuning, resulting in an enhanced performance across different versions of YOLOv5.

2.1. Para_CBAM Attention Mechanism Design

CBAM is a structurally compact and efficient attention module that continuously attends to both the channel and spatial information in images. It achieves this by sequentially integrating two sub-modules: spatial attention and channel attention. This allows the network to first identify important feature channels and then further focus on important spatial regions within these channels. This fine-grained attention adjustment strategy enables CBAM to achieve significant performance improvements across various visual tasks.

The overall structure of CBAM (Channel-Wise Attention Module) is illustrated in Figure 2. Initially, global average pooling and global max pooling are utilized to extract global spatial information. Subsequently, these pieces of information are utilized to learn the importance weights for each channel. This step ensures that the proposed model can identify and emphasize the feature channels that are most critical for the current task. Following this, the spatial attention module is introduced in CBAM, which further highlights the important spatial regions by considering information at each spatial position of the feature map. Channel-wise global max and global average pooling are performed on the feature map in this module, followed by a convolutional layer, which then ultimately generates a spatial attention map. This map guides the model’s focus toward crucial parts of the image.

The overall structure of CBAM (Channel-Wise Attention Module) is depicted in Figure 2. Initially, global average pooling and global max pooling are applied to extract global spatial information. Subsequently, these pieces of information are utilized to learn the importance weights for each channel. This step ensures that the model can identify and emphasize the feature channels that are most critical for the current task. Following this, the spatial attention module is introduced in CBAM, which further highlights important spatial regions by considering information at each spatial position of the feature map. The feature map undergoes channel-wise global max and global average pooling in this module, followed by a convolutional layer, and then ultimately generates a spatial attention map. This map guides the model’s attention toward key areas of the image.

The features

F_{a v g}^{c}

obtained from global average pooling and the features

F_{m a x}^{c}

obtained from global max pooling are each processed through a fully connected layer with shared weights. Afterward, they are summed and processed through a Sigmoid activation function to generate the final channel attention feature, which is denoted as

M_{C} (F)

, as shown in Equation (1):

\begin{array}{l} M_{C} (F) & = σ (M L P (A v g P o o l (F)) + M L P (M a x P o o l (F))) \\ = σ (W_{1} (W_{0} (F_{a v g}^{c})) + W_{1} (W_{0} (F_{m a x}^{c}))) \end{array} .

(1)

In the above equation

σ

—represents the Sigmoid activation function;

M L P

—denotes a Multilayer Perceptron (MLP) that includes a ReLU activation function;

A v g P o o l

—signifies the max pooling operation;

M a x P o o l

—indicates the average pooling operation.

After the channel attention module is executed, the spatial attention mechanism continues to provide additional focus on key spatial locations. Initially, local max pooling and global average pooling operations are applied to the input feature map

F

, albeit this time along the channel dimension, resulting in two two-dimensional feature maps

F_{a v g}^{s}

and

F_{m a x}^{s}

. Subsequently, these two feature maps are stacked along the channel dimension and passed through a 7 × 7 convolutional layer, which is followed by the application of the Sigmoid function to obtain the spatial attention map

M_{s} (F)

, as shown in Equation (2).

M_{s} (F) = σ (f^{7 \times 7} ([F_{a v g}^{s}; F_{m a x}^{s}])) .

(2)

In the equation

σ

—represents the Sigmoid activation function;

f^{7 \times 7}

—signifies a convolution layer with a 7 × 7 kernel size;

[;]

—denotes the stacking of feature maps.

The final output of the CBAM module is obtained by first adjusting the input feature map

F

with the channel attention weights

M_{C} (F)

, and then it is further refined by applying the spatial attention map

M_{s} (F)

.

F^{'} = M_{C} (F) \circ F,

(3)

F^{″} = M_{S} (F^{'}) \circ F^{'} .

(4)

In the equation

\circ

—element-wise multiplication.

Due to the serial fusion of the spatial and channel attention in the CBAM attention mechanism, this sequential fusion method implies that the weights of the subsequent attention modules are influenced by the results of the previous module, regardless of whether the channel attention processing or spatial attention processing is performed first. Specifically, the preceding attention module adjusts the input feature map to a certain extent, thereby affecting the weight allocation of subsequent attention modules. While this design performs well in many applications, its inherent sequential dependency limits the model’s ability to comprehensively capture features as each application of attention is based on a feature map that has already been “adjusted” by another type of attention.

In order to overcome this limitation and further enhance the proposed model’s performance, an improved CBAM structure is proposed in this paper, where the original serial fusion of attention is replaced with parallel fusion. In this improved structure, the channel attention and spatial attention modules are designed to independently process the original input feature map simultaneously rather than sequentially. This means that each attention module directly adjusts the original feature map rather than being based on another modified feature map. This parallel processing approach eliminates the sequential dependency between attention modules, allowing the model to capture information in the feature map more flexibly and comprehensively. The channel attention weights

M_{C} (F)

and spatial attention weights

M_{s} (F)

are simultaneously applied to the input feature map

F

. The improved formula is shown in Equation (5).

F^{'} = M_{S} (F) \circ F + M_{C} (F) \circ F .

(5)

Through this parallel fusion design, the proposed model no longer needs to concern itself with the processing order of channel attention and spatial attention, thereby enabling more efficient and effective utilization of both types of attention to enhance feature representation. This improvement not only holds promise for enhancing the model’s performance across various visual tasks, but also provides a new perspective for the design of attention mechanisms. In practical applications, this parallel-fusion CBAM can be easily modified and integrated into existing convolutional neural networks, thereby opening up new possibilities for enhancing the network’s expressiveness and adaptability. The improved CBAM attention module in this paper is named Para_CBAM, and the overall improved structure is depicted in Figure 3.

2.2. Optimizing Feature Convergence Networks

During the object detection process, due to the abundance of pixels in large objects, feature points are less likely to be lost in convolution operations, while feature points are more likely to be lost as they progress for small objects with fewer pixels. To further improve the proposed model’s ability to recognize complex scenes and subtle defects, optimizing the feature fusion network is crucial. Although the original YOLOv5 utilizes FPN + PANet to enhance feature fusion efficiency in multiple aspects, there is still room for improvement when dealing with finer targets and more complex backgrounds. BiFPN, with its efficient bidirectional fusion paths and cross-scale connections, not only enhances the detection capability for objects of different scales, but also improves the model’s ability to capture details while maintaining computational efficiency. This significantly enhances the performance of aluminum profile defect detection tasks. Therefore, BiFPN was adopted as the core network for feature fusion in this study.

The network structure of BiFPN is depicted in Figure 4. Through carefully designed bidirectional fusion paths and cross-scale connections, the BiFPN network architecture achieves efficient and flexible feature fusion. Initially, it receives multi-scale feature maps from the base convolutional network, which exhibit varying resolutions and represent different semantic levels from shallow to deep layers. In the top-down path, BiFPN gradually conveys rich semantic information from higher layers to lower-level feature maps through upsampling and weighted fusion operations, thus enhancing their semantic representation capability. Simultaneously, in the bottom-up path, it transfers detailed information from lower layers to higher-level feature maps through downsampling and fusion operations, thus improving their spatial detail representation. Additionally, BiFPN establishes cross-scale connections to directly transmit information between the feature maps of different scales, significantly enhancing the efficiency of information flow. At feature fusion points, BiFPN employs learned weights for weighted fusion, adaptively adjusting the contribution of features from different sources to generate multi-scale feature maps fused with rich contextual information. These feature maps, processed by BiFPN, not only contain abundant semantic information, but also retain fine spatial details, thus providing robust feature support for downstream tasks such as object detection or semantic segmentation.

During feature fusion, BiFPN employs learnable weights to perform a weighted summation of features from different sources. If there are multiple feature layers

P_{1}

,

P_{2}

,…,

P_{n}

to be fused, each feature layer

P_{i}

is associated with a corresponding weight

ω_{i}

. The fused feature

P

can be computed using Formula (6):

P = \frac{Σ_{i = 1}^{n} ω_{i} \cdot P_{i}}{Σ_{i = 1}^{n} ω_{i}} .

(6)

2.3. Improved Target Detection Layer

In YOLOv5, three detection layers are utilized to detect objects of different sizes. These detection layers are positioned at different levels of the network to predict targets within various scale ranges. In the original YOLOv5 model, the core of the network output layer consists of three detectors. These detectors perform object detection tasks using grid-based anchors on feature maps of varying sizes. After completing feature fusion, the detection layers output feature maps of three different scales: 80 × 80, 40 × 40, and 20 × 20. In tasks such as object detection, particularly in applications sensitive to details like aluminum profile defect detection, the model needs to be capable of identifying and locating defects of various sizes, including very small defects.

The 160 × 160-sized feature map provides the proposed model with finer spatial information, which is crucial for detecting small-sized or subtle defects. In applications like aluminum profile defect detection, defects often occupy only a small portion of the image, and their characteristics are not prominent. High-resolution feature maps carry rich detailed information, enabling the model to identify and locate small defects with higher precision. This size of feature map offers an appropriate balance, preserving detailed information with sufficient resolution without becoming excessively large, which would make the model difficult to handle. This balance allows the model to enhance detection accuracy and robustness while maintaining real-time performance. Therefore, the addition of a non-fused 160 × 160 high-resolution feature map enriches spatial information, aiding the model in better understanding image details and thus improving the accuracy of detecting small-sized targets.

3. Experiments and Analysis of Results

3.1. Experimental Data and Conditions

3.1.1. Experimental Environment

The experiments in this study were conducted on a Windows 10 operating system, utilizing an NVIDIA GeForce RTX 3060 GPU with CUDA 11.1 drivers. The CPU used was an Intel Core i7-9700K processor, and the code was implemented in Python 3.8, with the OpenCV extension library integrated. The images were preprocessed to a uniform size of 640 × 640 × 3. The initial learning rate was set to 0.01, with a decay rate of 0.0005. A batch size of 16 was employed, and the maximum number of iterations was set to 300.

3.1.2. Experimental Dataset

The data utilized in this paper were sourced from the Tianchi aluminum profile surface defect dataset, comprising a total of 2090 sample images along with their corresponding defect labels. The defect categories encompassed in the dataset include scratches, orange peel, stains, non-conductive areas, corner leakage, jet flow, paint bubbles, leakage, and miscellaneous colors, thus constituting nine distinct defect types. Each image contains at least one type of defect, as depicted in Figure 5. During defect annotation when using the Labelme tool, defect targets are identified using rectangular bounding boxes, and the annotation information is saved in a JSON file format.

Various types of surface defects in aluminum profiles exhibit significant differences at various scales, with a majority characterized by a high proportion of minor defects. Table 1 reveals that 68.3% of defects are annotated with areas that are less than 10% of the image size, while 31.7% have areas exceeding 10% of the image size. Among samples where defect areas are less than 10%, they are nearly evenly distributed.

The quantity statistics of each defect category are shown in Figure 6. It can be observed from the figures that there are relatively fewer samples of corner voids and spray defects. The scarcity and uneven distribution of some defect categories in the dataset can adversely affect experimental results. Firstly, during the training process, the proposed model may encounter overfitting issues as it tends to favor features from categories with larger sample sizes. Secondly, this bias limits the model’s generalizability, thereby impacting its ability to detect rare defects effectively in practical applications. Finally, the imbalance in sample distribution also results in inaccurate model evaluation metrics, where the model performs well in identifying frequent defects but exhibits significantly reduced capability in detecting rare defects. Therefore, balancing the distribution of samples in the dataset is crucial for improving the overall performance and practicality of the model. In light of this, this study augmented these data samples.

In order to increase the quantity of the less-represented classes in the dataset, traditional image processing-based data augmentation methods were employed in this study. Specifically, these methods included flipping, rotating, scaling, adding noise, and adjusting brightness. Through these operations, it was possible to simulate the defect samples captured from different angles, thereby enriching the diversity of the dataset. Flipping and rotating alter the camera angles at which the defects are captured, while scaling, adding noise, and adjusting brightness further simulate various environmental conditions that may occur in practice. These simple geometric transformations of images are highly effective for effectively augmenting the dataset, significantly enhancing the model’s performance and generalization ability in defect detection tasks. Following augmentation, the aluminum profile surface defect dataset contains a total of 2396 images, which are randomly divided into training, testing, and validation sets in an 8:1:1 ratio.

3.2. Experimental Evaluation Indicators

In this study, the network models were compared based on precision, average precision (AP), recall, mean average precision (mAP), and frames per second (FPS). Precision measures the accuracy of the model’s positive class predictions, while recall assesses the model’s ability to identify positive class samples. AP computes the average precision of a category at different recall thresholds, whereas mAP is a widely used evaluation metric in the field of object detection, representing the average precision across all categories. Their calculation formulas are shown in Equations (7)–(10).

P = \frac{T P}{T P + F P},

(7)

R = \frac{T P}{T P + F N},

(8)

A P = Σ_{i = 1}^{n} P (i) Δ R (i) = \int_{0}^{1} P (R) d R,

(9)

P_{m A P} = \frac{Σ_{i = 1}^{N} P_{A i}}{N} .

(10)

In the above equations, the following are defined as such:

T P

—number of correctly detected targets;

F P

—number of incorrectly detected targets;

F N

—number of undetected targets;

P

—the percentage of correct predictions among all the positive samples predicted;

R

—the percentage of correct predictions among all the positive samples;

N

—number of categories to be classified;

P_{A i}

—average precision for the target class;

P_{m A P}

—the average precision obtained at different recall rates.

3.3. Experimental Results and Analysis

Four sets of ablation experiments were conducted in order to more intuitively showcase and evaluate the performance of the various improvement strategies proposed for the YOLOv5s model in aluminum profile defect detection, as shown in Table 2.

The data presented in Table 1 indicate a 2.9% improvement in the mAP value for the YOLOv5-P2 algorithm compared to the original algorithm, thus reflecting an enhancement in the capability to identify small defects due to the addition of detection heads. The YOLOv5-P2Bi algorithm showed a 4.5% increase in mAP value compared to the original algorithm, validating the significant enhancement of the model’s detection capability across different scales and its ability to capture details through the efficient BiFPN feature fusion network, which utilizes bidirectional fusion paths and cross-scale connections. Furthermore, the proposed model achieved a 6.2% increase in mAP. This was attributed to the integrated Para_CBAM attention mechanism, which ensures additional attention to target areas. These results confirm the significant effectiveness of the proposed weighted BiFPN structure based on the Para_CBAM attention mechanism in improving detection accuracy.

As shown in the loss curves depicted in Figure 7, different network models exhibit varied trends in loss reduction during the training process. Initially, all models experience a sharp decline in loss within the first 50 iterations, and this is followed by a phase of gradual and steady reduction. In particular, after 300 iterations, the loss value of the improved network models based on YOLOv5 decreases to approximately 0.02, with no signs of overfitting observed. It is evident from the graph that the proposed YOLOv5-P2BiP algorithm (represented by the curve at the bottom) achieves faster convergence and lower loss values compared to other models, indicating the effectiveness of the proposed improvements in enhancing the convergence performance of the YOLOv5 model.

To compare the performance of the improved algorithm with other algorithms, the experiments were conducted using the same dataset. SSD, Cascade R-CNN, RetinaNet, Faster R-CNN, YOLOv3, and YOLOv5 were used for comparison. The results are shown in Table 3.

The SSD algorithm achieved a mAP of only 54.4%, indicating its insufficiency in effectively identifying the multiple defect categories of small targets. The Cascade R-CNN and RetinaNet algorithms increased mAP by 7.2% and 8.7%, respectively, showing significant improvement in detection accuracy over SSD. The YOLOv3 and YOLOv5 algorithms showed improvements of 15.1% and 22.0% over SSD. The improved YOLOv5 algorithm achieved a 6.2% increase over the original YOLOv5, effectively recognizing various defect categories on steel surfaces, thus outperforming the other detection algorithms. In terms of parameter count, the improved YOLOv5 has 11.2 M more parameters than YOLOv5, but 5.7 M, 88.3 M, 9.3 M, 41.1 M, and 43.3 M fewer parameters than SSD, Cascade R-CNN, RetinaNet, Faster R-CNN, and YOLOv3, respectively. In terms of detection speed, the improved YOLOv5 is 16.5 FPS slower than the original YOLOv5, but 4 FPS and 32.8 FPS faster than Cascade R-CNN and Faster R-CNN, respectively.

The effectiveness of the improved YOLOv5-P2BiP algorithm in aluminum profile surface defect detection task is demonstrated in Figure 8. The observation results indicate that the enhanced network model not only achieves a more accurate classification and localization of defects, but it also effectively avoids issues of missed detections and false alarms, particularly demonstrating outstanding performance in detecting small-sized defects. Confidence scores for all defect classes except “jet flow” were above 80%, highlighting its excellent detection performance.

4. Conclusions

In response to the issues of missed detections and false alarms in aluminum profile defect recognition algorithms, a novel aluminum profile defect detection algorithm based on the improved YOLOv5s is proposed in this paper. Firstly, an improved Para_CBAM attention mechanism module was introduced into the backbone network to enhance the model’s capability of extracting aluminum profile defect features. Secondly, the PANet in the neck layer was improved by incorporating the weighted bidirectional feature pyramid network (BiFPN), thus enhancing the feature fusion ability. Finally, a detection layer with a scale of 160 × 160 was added to the model, effectively improving the detection capability for small and inconspicuous features.

The experimental results validate the positive effects of these improvements on enhancing model performance, particularly in terms of accuracy in aluminum profile defect detection (where the model’s mAP value significantly increased to 82.6%), thus fully demonstrating the effectiveness of the proposed improvement strategies in improving the accuracy of aluminum profile defect detection. However, the pursuit of high accuracy also entails some sacrifices in terms of the model being lightweight. In future research efforts, lightweight network architectures will be utilized as the foundation to establish efficient lightweight detection models for deployment on small-scale mobile devices.

Author Contributions

Conceptualization, Y.-A.F. and W.-W.S.; methodology, Y.-A.F. and W.-W.S.; investigation, Y.-A.F.; writing—original draft preparation, Y.-A.F. and W.-W.S.; writing—review and editing, Y.-A.F. and W.-W.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The complete research data produced within the study are contained within this article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Li, S.S.; Yue, X.; Li, Q.Y.; Peng, H.L.; Dong, B.X.; Liu, T.S.; Yang, H.Y.; Fan, J.; Shu, S.L.; Qiu, F.; et al. Development and applications of aluminum alloys for aerospace industry. J. Mater. Res. Technol. 2023, 27, 944–983. [Google Scholar] [CrossRef]
Peltier, F.; Thierry, D. Development of Reliable Accelerated Corrosion Tests for Aluminum Alloys Used in the Aerospace Industry. Corrosion 2023, 79, 1006–1016. [Google Scholar] [CrossRef] [PubMed]
Neuhauser, F.M.; Bachmann, G.; Hora, P. Surface defect classification and detection on extruded aluminum profiles using convolutional neural networks. Int. J. Mater. Form. 2020, 13, 591–603. [Google Scholar] [CrossRef]
Xue, B.; Wu, Z. Defect Detection and Classification Algorithm of Metal Nanomaterials Based on Deep Learning. Integr. Ferroelectr. 2022, 226, 277–292. [Google Scholar] [CrossRef]
Sharma, M.; Lim, J.; Lee, H. The Amalgamation of the Object Detection and Semantic Segmentation for Steel Surface Defect Detection. Appl. Sci. 2022, 12, 6004. [Google Scholar] [CrossRef]
Wu, Y.; Qin, Y.; Wang, Z.; Jia, L. AUAV-based visual inspection method for rail surface defects. Appl. Sci. 2018, 8, 1028. [Google Scholar] [CrossRef]
Yan, Z.; Shi, B.; Sun, L.; Xiao, J. Surface defect detection of aluminum alloy welds with 3D depth image and 2D gray image. Int. J. Adv. Manuf. Technol. 2020, 110, 741–752. [Google Scholar] [CrossRef]
Kong, C.; Chen, B.; Li, H.; Wang, S.; Rocha, A.; Kwong, S. Detect and locate: Exposing face manipulation by semantic-and noise-level telltales. IEEE Trans. Inf. Forensics Secur. 2022, 17, 1741–1756. [Google Scholar] [CrossRef]
Zhang, M.; Yin, L. Solar cell surface defect detection based on improved YOLO v5. IEEE Access 2022, 10, 80804–80815. [Google Scholar] [CrossRef]
Li, X.; Wang, C.; Ju, H.; Li, Z. Surface defect detection model for aero-engine components based on improved YOLOv5. Appl. Sci. 2022, 12, 7235. [Google Scholar] [CrossRef]
Shi, J.; Yang, J.; Zhang, Y. Research on steel surface defect detection based on YOLOv5 with attention mechanism. Electronics 2022, 11, 3735. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the IEEE Conference on Computer Vision & Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6517–6525. [Google Scholar]
Shen, X.; Li, Z.; Li, M.; Xu, X.; Zhang, X. Aluminum surface defect detection based on multi-task deep learning. Laser Optoelectron. Prog. 2020, 57, 101501. [Google Scholar] [CrossRef]
Li, L.; Jiang, Z.; Li, Y. Surface defect detection algorithm of aluminum based on improved faster RCNN. In Proceedings of the 2021 IEEE 9th International Conference on Information, Communication and Networks (ICICN), Xi’an, China, 25–28 November 2021; pp. 527–531. [Google Scholar]
Li, S.; Guo, S.; Han, Z.; Kou, C.; Huang, B.; Luan, M. Aluminum surface defect detection method based on a lightweight YOLOv4 network. Sci. Rep. 2023, 13, 11077. [Google Scholar] [CrossRef] [PubMed]
Zhu, D.; Huang, X.; Ding, Z.; Zhang, W. Estimation of wind turbine responses with attention-based neural network incorporating environmental uncertainties. Reliab. Eng. Syst. Saf. 2024, 241, 109616. [Google Scholar] [CrossRef]
Ni, Q.; Ji, J.C.; Feng, K.; Zhang, Y.; Lin, D.; Zheng, J. Data-driven bearing health management using a novel multi-scale fused feature and gated recurrent unit. Reliab. Eng. Syst. Saf. 2024, 242, 109753. [Google Scholar] [CrossRef]

Figure 1. Diagram of the improved YOLOv5 network structure.

Figure 2. CBAM module overall structure.

Figure 3. Para_CBAM attention module.

Figure 4. Comparison of three feature fusion structures.

Figure 5. Type of images in the dataset.

Figure 6. Distribution of the aluminum profile surface defects dataset.

Figure 7. Yolov5 training loss curves for each model.

Figure 8. YOLOv5-P2BiP assay results.

Table 1. Percentage of defects of different scales in the aluminum profile dataset.

Percentage of Defective Area	0–0.01	0.01–0.05	0.05–0.10	0.10–1
Percentage (%)	21.3	22.2	24.8	31.7

Table 2. Comparisons of the model performances.

Model	Para_CBAM	BiFPN	Adding Detection Heads	mAP (%)	FPS
YOLOv5	×	×	×	76.4	63.4
YOLOv5-P2	×	×	√	79.3	59.7
YOLOv5-P2Bi	×	√	√	80.9	55.3
YOLOv5-P2BiP	√	√	√	82.6	46.9

Table 3. Comparison of the experimental results.

Model	mAP (%)	Number of Parameters (M)	FPS
SSD	54.4	24.4	63.2
Cascade R-CNN	61.6	107.0	42.9
RetinaNet	63.1	28	56.6
Faster R-CNN	79.2	59.8	14.1
YOLOv3	69.5	62.0	49.7
YOLOv5	76.4	7.5	63.4
YOLOv5-P2BiP	82.6	18.7	46.9

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Feng, Y.-A.; Song, W.-W. Surface Defect Detection for Aerospace Aluminum Profiles with Attention Mechanism and Multi-Scale Features. Electronics 2024, 13, 2861. https://doi.org/10.3390/electronics13142861

AMA Style

Feng Y-A, Song W-W. Surface Defect Detection for Aerospace Aluminum Profiles with Attention Mechanism and Multi-Scale Features. Electronics. 2024; 13(14):2861. https://doi.org/10.3390/electronics13142861

Chicago/Turabian Style

Feng, Yin-An, and Wei-Wei Song. 2024. "Surface Defect Detection for Aerospace Aluminum Profiles with Attention Mechanism and Multi-Scale Features" Electronics 13, no. 14: 2861. https://doi.org/10.3390/electronics13142861

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Surface Defect Detection for Aerospace Aluminum Profiles with Attention Mechanism and Multi-Scale Features

Abstract

1. Introduction

2. Improvements to the YOLOv5 Algorithm

2.1. Para_CBAM Attention Mechanism Design

2.2. Optimizing Feature Convergence Networks

2.3. Improved Target Detection Layer

3. Experiments and Analysis of Results

3.1. Experimental Data and Conditions

3.1.1. Experimental Environment

3.1.2. Experimental Dataset

3.2. Experimental Evaluation Indicators

3.3. Experimental Results and Analysis

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI