Detection of Surface Defects of Barrel Media Based on PaE-VGG Model

Peng, Hongli; Cheng, Long; Tian, Jianyan

doi:10.3390/math13071104

Open AccessArticle

Detection of Surface Defects of Barrel Media Based on PaE-VGG Model

by

Hongli Peng

,

Long Cheng

and

Jianyan Tian

^*

College of Electrical and Power Engineering, Taiyuan University of Technology, Taiyuan 030024, China

^*

Author to whom correspondence should be addressed.

Mathematics 2025, 13(7), 1104; https://doi.org/10.3390/math13071104

Submission received: 22 November 2024 / Revised: 5 March 2025 / Accepted: 19 March 2025 / Published: 27 March 2025

Download

Browse Figures

Versions Notes

Abstract

:

To address the issues of insufficient defect samples and low detection accuracy of barrel media, we propose a detection of the surface defects of barrel media based on a PaE-VGG model. The proposed PaE-VGG model is based on a modification of a state-of-the-art VGG convolutional neural network, incorporating position-aware circular convolution for facilitating location-sensitive global feature extraction. For each feature extraction channel, the Efficient Channel Attention mechanism is calculated, which adaptively weights the feature vector. The experimental findings demonstrate that our proposed PaE-VGG model achieves an accuracy rate of 94.37%, showcasing a significant improvement of 4.76% compared to the previous version. Furthermore, when compared to highly successful convolutional neural networks for defect detection, such as AlexNet, Googlenet, and ResNet18, our optimization model outperforms them by 4.20%, 1.51%, and 0.72%, respectively. Therefore, the proposed PaE-VGG has achieved good precision and performance in the detection of barrel media defects after improvement.

Keywords:

barrel media; defect detection; VGG network; attention mechanism

MSC:

68T07; 68T45

1. Introduction

Barrel polishing is a widely used machining technology for improving the surface quality and integrity of parts. By introducing components, abrasive media, liquid reagents, and water into the machine, the aim is to remove burrs and scratches from the surfaces of the components [1].

The barrel media, as the processing medium in barrel polishing, play a crucial role in determining the quality and performance of this process. However, barrel media have defects, such as concave, surface black spots, sintering marks, damage, abnormal shape, adhesion, and so on, because of uneven mixing of raw materials and unstable sintering-temperature control. An efficient, economical, and reliable solution is to take an image of the surface of the barrel media and analyze the image to detect defects. Therefore, the surface defect detection of the barrel media has become an urgent problem.

Conventional visual methods employ image enhancement, erosion, dilation, and opening and closing operations to extract texture features, geometric shape features, and color features. These extracted feature vectors are then utilized for distinguishing different types of defects [2]. These methods are widely used in industrial defect detection. Konovalenko et al. [3] established an effective steel surface defect recognition system, using Support Vector Machine (SVM) multi-class classifiers to make final judgments. Jia Po et al. [4] proposed a gray-scale algorithm based on component ratio, extracted the barrel media region and the black-heart defect region by morphology, and realized the rolling and polishing grinding-block defect detection by machine vision. However, the above method is difficult to debug, unstable to detect, repeated to adjust parameters, and is prone to defects misdetection and poor compatibility.

Due to the strong feature extraction capability of deep learning algorithms, methods based on deep learning have gained widespread adoption for defect detection. Park et al. [5] used Convolutional Neural Networks (CNNs) to visually detect the surface images of parts and identify defects such as dirt, abrasions, burrs, and wear. Zhang et al. [6] used a CNN to classify pavement image degradation and detect potholes, patches, cracks, and other categories. Rana Ehtisham et al. [7] employed a CNN to classify the defects of the wood structure into three categories: crack, joint, and undamaged. The ResNet-V2 model was used for fine tuning and validation on 9000 wood defect images to achieve automated defect assessment of wood structures.

Recently, transformers started to show impressive results that significantly outperform large convolution-based models [8]. The Vision Transformer (ViT), leveraging self-attention mechanisms, can capture global information in images, which is crucial for detecting defects on the surfaces of industrial products [9]. Zhou et al. [10] employed a lightweight ViT to extract global features, thereby significantly enhancing the accuracy of defect detection under similar backgrounds. ViTs generally demonstrate excellent performance, yet they are known for their high computational demands and the complexity of their training procedures [11]. When the dataset for identifying defects in barrel medias is small, the accuracy of ViT may decrease.

Therefore, when studying subjects with small datasets, it is crucial to integrate the learning capabilities of ViTs with traditional CNNs to improve algorithms [12]. This study aims to explore the application of attention mechanisms in modifying the structures of deep artificial intelligence models. In this paper, we will explore the key components and features of the PaE-VGG network. We propose a surface defect detection model, utilizing the VGG convolutional neural network as the underlying architecture [13,14,15], and propose to combine position-aware cyclic convolution (PaCC) [16]. Specifically, we propose the PaE-VGG network, a lightweight convolution network that exhibits global location sensitivity.

Our work has made significant contributions in three aspects:

(i): CNNs are favored for their focus on local relationships, better hardware support, and ease of training. Therefore, the backbone network for identification of surface defects of barrel media is proposed as a VGG-based model.
(ii): To overcome the limitation that traditional convolutional neural networks have in terms of limited receptive fields, we propose the PaCC technique, which enhances the model’s ability to comprehend defect positions. This is achieved by leveraging circular convolution to capture features that are dependent on the position. Here, base instance kernels and position-embedding strategies are employed to manage variations in input size and to incorporate location information into the output feature maps, respectively.
(iii): To ascertain the significance of each feature map channel and enhance those channels that are beneficial to the current task, we have employed an Efficient Channel Attention (ECA) [17]. This mechanism evaluates the contribution of each channel and optimizes the feature representation by adjusting their weights accordingly, thereby enhancing the performance of the PaE-VGG network.

2. Related Work

Compared with the traditional image classifier method, deep learning methods can extract deep semantic features through self-learning and strengthen the connection between features and classifiers. At present, the most successful convolutional neural networks for defect detection are AlexNet [18], GoogleNet [19], VGG [20], and ResNet [21] series. This study shows that a too-deep CNN structure leads to model overfitting and training degradation, and a too-shallow structure leads to insufficient feature extraction to express the deep information of the image. Compared with the above classical structure model, the VGG network was selected as the baseline structure of this study.

The main characteristic of the VGG network is that its convolutional layers consist of multiple stacked 3 × 3 convolutional kernels, while its pooling layers use 2 × 2 small convolutional kernels [22]. With the same receptive field, smaller convolutional kernels result in better non-linear outcomes at a lower parameter cost. After comparing the experimental data, the VGG network was selected as the baseline structure of this study.

The VGG-16 network architecture is illustrated in Figure 1. It consists of thirteen convolutional layers, three fully connected layers, five max-pooling layers, and a Softmax output layer. In the conventional VGG-16 architecture, the Softmax layer for classification consists of 1000 nodes, while the fully connected layers are structured as (4096, 4096, 1000).

3. Proposed PaE-VGG Model for Surface Defect Recognition

The defect detection is a classification problem of the surface image of the barrel media. In this section, we will develop and analyze our proposed PaE-VGG model. As shown in Figure 2, in which the PaCC block helps build global characteristics dependencies, and ECA helps efficiently capture channels interaction.

3.1. VGG-Based Model

The VGG is a typical CNN architecture that has gained significant popularity and has been widely adopted in various networks [23,24].

The VGG network necessitates a substantial amount of data samples and convolutional layers, resulting in extensive computational requirements and slow convergence of the loss function. The presence of three fully connected layers further contributes to a significant proportion of parameter weight within the entire network, demanding additional memory for training purposes. Given the limited dataset available for detecting defects in barrel media, it is essential to improve the model’s applicability by reducing computation and storage space while meeting the training requirements for small-scale samples.

After conducting an in-depth investigation into the recent advancements of VGG networks, we propose a novel network model, tailored specifically for our research task, based on the VGG architecture.

As shown in Figure 2, the entire network comprises four components: four blocks feature extraction module, an ECA channel attention mechanism, a fully connected layer, and a softmax function. Specifically, for a given image of roller polishing blocks, the input picture size is H × W × C, where C represents the number of input channels.

Initially, we construct the VGG network structure and establish the block feature extraction module. Subsequently, we replace the 2D convolutions in block3 and block4 with ParC to incorporate residual connections that fuse global and local features. The ECA mechanism is then employed to capture cross-channel interactions within the model. Finally, we modify the fully connected layer from its original three layers to one layer. Since this study focuses on classifying seven types of defects in barrel media, we adjust the output nodes from 1000 to 7.

3.2. ParC Residual Block

ParC Residual Block(PRB) structure is shown in Figure 2; we do this mainly for two purposes: (1) PaCC can extract the interaction information between global features and local areas from the global space; (2) incorporating residual blocks into the learning process significantly enhances the propagation of feature information and streamlines the network architecture.

PaCC is a mechanism designed to enhance the global feature capture capabilities of CNNs. PaCC simulates certain characteristics of the self-attention mechanism by incorporating awareness of pixel position information into the convolution operation while retaining the local characteristics of the convolution operation. As shown in Figure 3, there are two types of PaCC [16]:

(1): Horizontal PaCC (PaCC-H): PaCC-H focuses on the horizontal rows of the image, that is, the pixels in the same row.
(2): Vertical PaCC (PaCC-V): This type of convolution processes the vertical columns of the image; that is, it focuses on the pixels in the same column.
(3): When the input feature map information is processed through the PaCC-H and PaCC-V directions, respectively, we obtain $y^{H}$ and $y^{v}$ . Subsequently, by concatenating $y^{H}$ and $y^{v}$ along the channel direction, the dimensionality of the output image is expanded.

This allows the network to cover all pixels of the image, thereby enabling the network to extract global features from all input pixels. This represents a significant difference with traditional convolution, which can only gather information from a local receptive field and capture local features.

Figure 4 provides a visual representation of how PaCC-H operates, highlighting its unique approach to convolution in handling one-dimensional input data.

To simplify notation, let us assume the input x has a height of 1, giving it a shape of

c \times 1 \times w

. For the PaCC in PaCC-H, the output at position (

i; j

) can be computed as follows:

\begin{matrix} p e^{H} & = {[p e_{0}^{H}, p e_{1}^{H}, \dots, p e_{h - 1}^{H}]}^{T} \\ p e_{e}^{H} & = E V (p e^{H}, h) \\ k^{H} & = {[k_{0}^{H}, k_{1}^{H}, \dots, k_{h - 1}^{H}]}^{T} \\ x^{p} & = x + p e_{e}^{H} \\ y_{i, j} & = \sum_{t = 0}^{w - 1} k_{t}^{H} x^{p} (i, t_{\mod H}) \end{matrix}

(1)

where the instance position embedding (PE), denoted as

p e^{H}

, is generated from a base embedding

p e_{e}^{H}

. The instance kernel

k^{H}

is utilized in this process.

E V ()

represents an expand function of the a horizontal direction. To generate a h x w-sized PE matrix, the input vector is copied multiple times, and these copies are concatenated along the horizontal direction using

E V ()

.

Figure 4 illustrates the computational process when the input is a one-dimensional vector [16]. Specifically, it demonstrates how PaCC-H performs convolutions along a circular path created by connecting the beginning and the end of the input vector. The final result

y_{i, j}

is obtained by performing a two-dimensional convolution operation on

k_{t}^{H}

and

x^{p}

.

3.3. Efficient Channel Attention

ECA is a form of channel attention mechanism that determines the importance of each channel by calculating its significance, establishing weight relationships among different channels [13]. ECA can effectively enhance network performance, thus finding extensive applications in computer vision. By boosting the weights of key channels and suppressing less impactful ones, ECA determines the weight relationships between different pixels in spatial neighborhoods, elevating the weights of crucial region pixels while reducing those of non-essential areas. As shown in Figure 5, ECA avoids the adverse effects of feature dimension reduction, effectively capturing inter-channel interaction information.

ECA transforms the entire connection layer into a 1 × 1 convolutional layer. ECA utilizes global average pooling (GAP) for spatial compression. Following the global average pooling layer, adaptive 1 × 1 convolution learning is employed to learn the feature maps, which is also a key aspect of the ECA module. The ECA module has the capability to dynamically adjust the convolution kernel size t, based on different input quantities. The expression is shown in Equation (2) [17]:

t = {|\frac{{log}_{2} C}{γ} + \frac{b}{γ}|}_{odd}

(2)

where

| x |

equals Round

(| x |)

when Round

(| x |)

is an odd number, and it equals zero when Round

(| x |)

is an even number; t denotes the size of the convolution kernel; and C signifies the channel dimension. b and

γ

are used to change the ratio of the channel to the convolution kernel; usually

b = 1

, and

γ = 2

.

w = σ (C 1 D_{t} (y))

(3)

where

σ

is a Sigmoid function;

C 1 D_{t}

indicates

1 D

convolution with kernel size of t; W is a C × C parameter matrix.

4. Dataset and Model Parameters for PaE-VGG Model

The analyses were conducted using Python 3.7.6, Pytorch1.12 on a standard PC. Use a Heng Yuan Cloud server with a GeForce RTX 3090 graphics card for training and testing.

The dataset undergoes a meticulous partition, with 80% of its contents allocated for training purposes, while the remaining 20% stands reserved for validation. This section discusses the properties used to analyze the performance of the proposed model and compares them with those of other defect detection models. Moreover, ablation experiments have been carried out to showcase the efficacy of the method we have put forward.

4.1. Dataset and Data Augmentation

Based on industry standards for consolidated barrel media and extensive corporate research, we have classified the defect detection results of barrel media into seven distinct categories. The dataset includes images of barrel media without defects, as well as images of defects with concave, surface black spots, sintering marks, damage, abnormal shape, and adhesion, as shown in Figure 6. The 6615 images were categorized into seven groups for training the image-processing model, as illustrated in Table 1.

4.2. Matrices for Model Analyses

To evaluate the model’s performance, four separate categories are computed for labeling: true positive (TP), true negative (TN), false positive (FP), and false negative (FN). TP represents correct classification into the positive class, while TN indicates accurate classification into the negative class. FP denotes instances when a negative observation is mistakenly identified as positive. FN indicates cases where the model erroneously labels an observation as negative instead of positive. Utilizing these classification categories allows for computation of the following scores.

Accuracy is a commonly utilized measure for evaluating the efficacy of models, measuring the proportion of correct detections out of the total detections made [25].

$A c c u r a c y = \frac{T P + T N}{(T P + F N + F P + T N)}$

(4)
Recall is a measurement used to evaluate the effectiveness of detection when faced with class imbalance. Mathematically, it is calculated as the number of correctly identified positive cases divided by the sum of true positives (TP) and false negatives (FN) [26]:

$R e c a l l = \frac{T P}{T P + F N}$

(5)
F1-Score is a harmonic mean of precision and recall scores [27]:

$F 1 S c o r e = \frac{2 A c c u r a c y}{A c c u r a c y + R e c a l l}$

(6)

4.3. Hyperparameters for Considered PaE-VGG Models

In the experiments designed for this study, we adhered to a dataset-partitioning strategy where 80% of the 6615 images were allocated for training, resulting in approximately 5292 photos (6615 multiplied by 0.8) being selected for the training set. The remaining 20%, which amounts to 1323 photos (6615 multiplied by 0.2), were designated as the test set. For optimization purposes, Adam [28] is employed. A total of 70 epochs were completed with a batch size of 32 during the experimentation. The initial learning rate was set to 0.001 and halved every 10 epochs using a step-decay scheduler. A dropout rate of 0.25 was employed to address overfitting. Details of all the hyperparameters used can be found in Table 2.

5. Results and Discussion for PaE-VGG Model

5.1. Ablation Experiment

Here, we present ablation experiments to analyze the contribution of each component of our model. In the detection of barrel media defects, four different experimental scenarios were designed. Different model structures were used to test the barrel media defect detection: (1) VGG16-BN: VGG16-BN was used as the feature extraction network, and the standard four blocks and three fully connected layers were used; (2) ParC-VGG: A two-dimensional convolution structure in Block3 and block4 of VGG network is replaced by ParC to extract network features, and a residual structure is adopted to set the full connection layer to 3; (3) ECA-VGG: VGG-BN is used as the main trunk feature extraction network, ECA is added, and the full connection layer is set to 3; (4) PaE-VGG: In the model proposed in this paper, ParC replaces a two-dimensional convolution structure in VGG network Block3 and block4 to extract network features, and ECA sets the full-connection layer to 1. Figure 7 shows a comparison diagram of the accuracy of each trained model on the test set. To offer a more lucid depiction of the ablation study process and its outcomes, the comparative data is organized in tabular format, as presented in Table 3.

As indicated in Figure 7 and Table 3, the accuracy of the four models appears to stabilize around 70 iterations. Compared to the basic architecture of VGG-BN, (1) ECA-VGG is improved, indicating that after adding ECA, the network can pay more attention to useful channel feature extraction, thus improving the performance of the model. (2) In contrast, the ParC-VGG model has faster accuracy convergence and higher accuracy, which indicates that the position-sensitive feature extraction is more conducive to the defect feature extraction of the rolling block. (3) The PaE-VGG network has the highest accuracy, indicating that with the combination of PaCC, ECA, and VGG, the extracted features avoid more redundancy, and PaE-VGG has better performance. The experimental results clearly show that the proposed model has a strong ability to extract features.

In order to assess the dataset using the trained model, a confusion matrix is generated to measure TP, FP, TN, and FN as illustrated in Figure 8 [29]. Analysis of the confusion matrix reveals that both FP and FN occurrences are minimal compared to TN and TP, indicating accurate classification of the images.

Furthermore, in order to assess the effectiveness of defect identification and classification, a barplot (Figure 9) [25] is presented for all classes in the dataset of barrel media, including normal, concave, surface black spots, sintering marks, damage, abnormal shape, and adhesion. These classes exhibit higher precision and recall values.

As illustrated in Figure 9, this model basically accomplishes the detection of surface defects of barrel media. The defect detection model demonstrates exceptional comprehensive performance across five primary categories (damage, dark spots, normal, sintering, and adhesion), with accuracy, recall, and F1-scores consistently maintained at approximately 95%. High accuracy validates decision reliability near-ceiling accuracy values in core categories (e.g., 98.2% for damage), confirming strong alignment between model predictions and ground-truth labels. High recall ensures defect coverage. With recall rates exceeding 96.5% in critical defect categories like dark spots, the model adheres rigorously to industrial quality inspection principles of “zero tolerance for missed defects”. A balanced F1-Score highlights optimization efficacy. The uniformly high F1-scores (94.7 ± 0.8%), as the harmonic mean of precision and recall, indicate Pareto-optimal trade-offs between maintaining a low false positive rate (FP < 3.2%) and maximizing defect detection rates.

However, performance analysis reveals two critical limitations: Systematic Detection Bottleneck in Concave Category: The F1-score plunges to 78.3% for concave defects, attributed to their distinct geometric characteristics; Precision–Recall Imbalance in Abnormal Shape Category: Despite retaining high accuracy (84.1%), the suboptimal recall rate (82.6%) suggests overly conservative criteria in identifying morphologically ambiguous anomalies, leading to missed detections.

5.2. Comparison with Other Models

The enhanced model is being evaluated against various general classification algorithms, such as AlexNet, GoogleNet, and ResNet18, using the same dataset for comparison. The initial learning rate is configured to 0.001, the batch size is set to 32, and the Adam optimizer is used. The number of previous boxes for the model is set to the default value, and iterations continue until the model converges before stopping. The test results are shown in the Figure 10, with significant performance improvement. Accuracy comparison of defect detection results using different methodsis illustrated in Table 4.

The experimental results have highlighted the exceptional performance of our proposed PaE-VGG model, achieving an accuracy rate of 94.37%, which represents a significant improvement of 4.76% over the previous VGG model. When compared with several widely-used classification algorithms—AlexNet (90.17%), GoogleNet (92.89%), and ResNet18 (93.65%)—the optimized model we developed outperformed them with accuracy advantages of 4.20%, 1.51%, and 0.72%, respectively. Our methodological selection of VGG16 as the backbone architecture has been rigorously validated through comprehensive ablation studies (Section 5.1). The decision stems from its architectural transparency and computational robustness—characteristics that critically align with the hierarchical feature extraction demands of industrial defect detection. Specifically, under low-data-regime conditions (N = 6615 training samples), VGG16 achieves superior performance equilibrium. As a result, the PaE-VGG model effectively extracts features from barrel media defect images, eliminating channel redundancy and enhancing its feature extraction capabilities.

Furthermore, Figure 11 provides a visual representation of the confusion matrices for the models under consideration, allowing us to assess their performance in defect classification tasks. Notably, for the “damage” and “dark spots” categories of defects, there is a significant difference in classification, likely due to their high sensitivity to location; the position of the defects within the image may affect the classification outcome. Our model, by integrating PaCC, effectively enhances the ability to extract image features. Consequently, the improved PaE-VGG model has not only demonstrated superior accuracy but also proven its excellent performance in the detection of barrel media defects.

6. Conclusions

To address the characteristics of barrel media defect detection, we have incorporated a position-sensitive attention mechanism into the VGG model, designing a PaE-VGG network to learn features from the surface images of barrel media defects. The results indicate that the addition of PaCC not only enables the model to perform local feature extraction through convolution but also facilitates position-sensitive global feature extraction. Furthermore, the integration of ECA enhances the interaction between image channels, thereby significantly improving classification accuracy.

In this paper, we propose the PaE-VGG network, a convolutional neural network that inherits the advantages of ConvNet and integrates ViT, which can improve accuracy in tasks with limited computation. However, the current PaE-VGG network still fails to achieve satisfactory recognition accuracy for key defect categories (such as concave defects). This performance gap indicates inherent limitations in both hierarchical feature abstraction and parameter efficiency. To address these two common challenges, systematic architecture optimization will be carried out through a differentiable pruning algorithm.

Author Contributions

Conceptualization, L.C.; Methodology, H.P.; Software, L.C.; Formal analysis, H.P. and J.T.; Data curation, H.P.; Writing—original draft, H.P.; Writing—review & editing, H.P. and J.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in the study are included in the article; further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

PaCC	Position-Aware Cyclic Convolution
ECA	Efficient Channel Attention
CNN	Convolutional Neural Networks
ViT	Vision Transformer
PRB	PaCC Residual Block
PaCC-H	Horizontal PaCC
PaCC-V	Vertical PaCC
PE	Instance Position Encoding
GAP	Global Average Pooling

References

Yang, S.; Li, W.; Chen, H. Surface Finishing Theory and New Technology; Springer: Berlin/Heidelberg, Germany, 2018; pp. 65–224. [Google Scholar]
Tang, B.; Kong, J.; Wu, S. Review of surface defect detection based on machine vision. J. Image Graph. 2017, 22, 1640–1663. [Google Scholar]
Konovalenko, I.; Maruschak, P.; Brezinová, J. Steel Surface Defect Classification Using Deep Residual Neural Network. Metals 2020, 10, 846. [Google Scholar] [CrossRef]
Jia, P.; Tian, J.; Yang, Y. Defect Detection Method of Abrasive Block based on Machine Vision. Diam. Abrasives Eng. 2021, 41, 76–82. [Google Scholar]
Park, J.; Kwon, B.; Park, J. Machine learning-based imaging system for surface defect inspection. Manufact 2016, 3, 303–310. [Google Scholar] [CrossRef]
Zhang, C.; Nateghinia, E.; Miranda-Moreno, L.; Sun, L. Pavement distress detection using convolutional neural network (CNN): A case study in Montreal, Canada. Int. J. Transp. Sci. Technol. 2022, 11, 298–309. [Google Scholar] [CrossRef]
Ehtisham, R.; Qayyum, W.; Camp, C.; Plevris, V.; Mir, J.; Khan, Q.; Ahmad, A. Computing the characteristics of defects in wooden structures using image processing and CNN. Autom. Constr. 2024, 158, 105211. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Touvron, H.; Cord, M.; Douze, M.; Massa, F.; Sablayrolles, A.; e J’egou, H. Training data-efficient image transformers and distillation through at tention. In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 10347–10357. [Google Scholar]
Zhou, H.; Yang, R.; Hu, R.; Shu, C.; Tang, X.; Li, X. ETDNet: Efficient Transformer-Based Detection Network for Surface Defect Detection. IEEE Trans. Instrum. Meas. 2023, 72, 2525014. [Google Scholar] [CrossRef]
Singh, S.; Kumar, A.; Desai, K. Comparative assessment of common pre-trained CNNs for vision-based surface defect detection of machined components. Expert Syst. Appl. 2023, 218, 119623. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.; Kaiser, L.; Polosukhin, I. An Algorithm for Detecting Surface Defects in Industrial Strip Steel based on Receptive Field and Feature Information Supplementation. In Proceedings of the 2024 27th International Conference on Computer Supported Cooperative Work in Design, Tianjin, China, 8–10 May 2024; pp. 3078–3085. [Google Scholar]
Tammina, S. Transfer learning using VGG-16 with Deep Convolutional Neural Network for Classifying Images. Int. J. Sci. Res. Publ. 2019, 9, 143–150. [Google Scholar] [CrossRef]
Yang, H.; Ni, J.; Gao, J.; Han, Z.; Luan, T. A novel method for peanut variety identification and classification by Improved VGG16. Sci. Rep. 2021, 11, 15756. [Google Scholar] [CrossRef]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
Zhang, H.; Hu, W.; Wang, X. ParC-Net: Position Aware Circular Convolution with Merits from ConvNets and Transformer. In Proceedings of the ECCV, Tel Aviv, Israel, 23–27 October 2022. [Google Scholar]
Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11534–11542. [Google Scholar]
Şeker, A. Evaluation of fabric defect detection based on transfer learning with pre-trained AlexNet. In Proceedings of the 2018 International Conference on Artificial Intelligence and Data Processing (IDAP), Malatya, Turkey, 28–30 September 2018; pp. 1–4. [Google Scholar]
Xue, Y.; Wang, L.; Zhang, Y.; Shen, Q. Defect detection method of apples based on GoogLeNet deep transfer learning. Nongye Jixie Xuebao/Trans. Chin. Soc. Agric. Mach. 2020, 51. [Google Scholar]
Sabeenian, R.; Paul, E.; Prakash, C. Fabric defect detection and classification using modified VGG network. J. Text. Inst. 2023, 114, 1032–1040. [Google Scholar] [CrossRef]
Gao, M.; Song, P.; Wang, F.; Liu, J.; Mandelis, A.; Qi, D. A novel deep convolutional neural network based on ResNet-18 and transfer learning for detection of wood knot defects. J. Sens. 2021, 1–16. [Google Scholar]
Tridgell, S.; Kumm, M.; Hardieck, M.; Boland, D.; Moss, D.; Zipf, P.; Leong, P. Unrolling ternary neural networks. ACM Trans. Reconfigurable Technol. Syst. (TRETS) 2019, 12, 1–23. [Google Scholar]
Sengupta, A.; Ye, Y.; Wang, R.; Liu, C.; Roy, K. Going deeper in spiking neural networks: VGG and residual architectures. Front. Neurosci. 2019, 13, 95. [Google Scholar] [CrossRef]
Wang, S.; Fernandes, S.; Zhu, Z.; Zhang, Y. AVNC: Attention-based VGG-style network for COVID-19 diagnosis by CBAM. IEEE Sens. J. 2021, 22, 17431–17438. [Google Scholar]
Targ, S.; Almeida, D.; Lyman, K. Resnet in resnet: Generalizing residual architectures. arXiv 2016, arXiv:1603.08029. [Google Scholar]
Chen, C.; Azman, A. Improved Deep Learning Model for Workpieces of Rectangular Pipeline Surface Defect Detection. Computers 2024, 13, 30. [Google Scholar] [CrossRef]
Dwivedi, D.; Babu, K.; Yemula, P.; Chakraborty, P.; Pal, M. Identification of surface defects on solar pv panels and wind turbine blades using attention based deep learning model. Eng. Appl. Artif. Intell. 2024, 131, 107836. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Dučinskas, K.; Karaliutė, M.; Šaltytė Vaisiauskė, L. Spatially Weighted Bayesian Classification of Spatio-Temporal Areal Data Based on Gaussian-Hidden Markov Models. Mathematics 2023, 11, 347. [Google Scholar] [CrossRef]

Figure 1. Illustration of VGG-16 architecture.

Figure 2. Illustration of PaE-VGG architecture.

Figure 3. Illustration of the position-aware circular convolution.

Figure 4. Illustration of global circular convolution on PaCC-H.

Figure 5. Illustration of efficient channel attention.

Figure 6. Surface images of barrel media: normal and defectives.

Figure 7. The comparison diagram of the accuracy of each trained model.

Figure 8. A confusion matrix of the PaE-VGG model.

Figure 9. Barplot for all the classes.

Figure 10. Comparison of accuracy with other models.

Figure 11. Confusion matrices of several models.

Table 1. Description of barrel media image.

Type of Image	Number of Images
Normal	2037
Concave	417
Dark spots	1473
Sintering marks	93
Damage	1638
Abnormal shape	183
Adhesion	774
Total	6615

Table 2. Hyperparameters for PaE-VGG model.

Hyperparameters	PaE-VGG Model
Batch Size	32
Number of Epochs	70
Optimizer	Adam
Initial Learning Rate	0.001
Loss Function	Cross Entropy
Dropout	0.25

Table 3. Comparison of different configurations.

Configuration/Variant	ECA	ParC	Accuracy (%)
VGG16_BN	No	No	89.61
ECA-VGG	Yes	No	91.88
ParC-VGG	No	Yes	93.41
PaE-VGG	Yes	Yes	94.37

Table 4. The accuracy comparison of defect detection results using different methods.

Models	Accuracy (%)
AlexNet [18]	90.12
GoogleNet [19]	92.86
ResNet18 [21]	93.6
PaE-VGG	94.37

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Peng, H.; Cheng, L.; Tian, J. Detection of Surface Defects of Barrel Media Based on PaE-VGG Model. Mathematics 2025, 13, 1104. https://doi.org/10.3390/math13071104

AMA Style

Peng H, Cheng L, Tian J. Detection of Surface Defects of Barrel Media Based on PaE-VGG Model. Mathematics. 2025; 13(7):1104. https://doi.org/10.3390/math13071104

Chicago/Turabian Style

Peng, Hongli, Long Cheng, and Jianyan Tian. 2025. "Detection of Surface Defects of Barrel Media Based on PaE-VGG Model" Mathematics 13, no. 7: 1104. https://doi.org/10.3390/math13071104

APA Style

Peng, H., Cheng, L., & Tian, J. (2025). Detection of Surface Defects of Barrel Media Based on PaE-VGG Model. Mathematics, 13(7), 1104. https://doi.org/10.3390/math13071104

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Detection of Surface Defects of Barrel Media Based on PaE-VGG Model

Abstract

1. Introduction

2. Related Work

3. Proposed PaE-VGG Model for Surface Defect Recognition

3.1. VGG-Based Model

3.2. ParC Residual Block

3.3. Efficient Channel Attention

4. Dataset and Model Parameters for PaE-VGG Model

4.1. Dataset and Data Augmentation

4.2. Matrices for Model Analyses

4.3. Hyperparameters for Considered PaE-VGG Models

5. Results and Discussion for PaE-VGG Model

5.1. Ablation Experiment

5.2. Comparison with Other Models

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI