1. Introduction
In the context of “Dual Carbon”, renewable energy is increasingly becoming a dominant component of the energy mix. Natural gas, as a clean and efficient fossil fuel, is essential for improving the energy mix and supporting the shift to cleaner energy sources. Pipeline transportation, one of the five major transportation modes, is the primary means of natural gas delivery [
1,
2,
3]. Among the various pipeline materials, PE pipelines offer advantages such as cost-effectiveness, wear and corrosion resistance, and long service life. As a result, they are gradually replacing traditional steel pipelines and are widely used in the construction of gas pipeline networks [
4]. PE pipelines are typically buried underground and subject to uncontrollable factors such as changes in underground temperature, foundation settlement, termite infestation, and stress effects. These conditions may develop defects such as structural aging, deformation, cracks, and dislocation at pipe joints, which can lead to serious risks such as gas leaks, fires, explosions, and other hazardous events, posing a significant threat to public safety [
5,
6,
7,
8]. Therefore, regular inspection and maintenance of gas polyethylene pipelines are essential to mitigate safety risks.
In gas pipeline defect detection, conventional non-destructive testing methods include ultrasonic testing, X-ray, infrared thermal imaging and machine vision methods [
9]. Due to factors such as pipe material and inspection costs, ultrasonic testing, X-ray, and infrared thermal imaging are predominantly used for inspecting metal pipes [
10,
11,
12]. The machine vision method, on the other hand, is fast, reduces inspection costs and workload, is unaffected by pipe material, and can be a comprehensive inspection of the internal surface of the pipe’s internal surface, structural deformations, and other defects [
13,
14].
In the field of machine vision, deep learning has been extensively applied to tasks such as image classification and object detection. However, data are crucial for training models, and the high cost and privacy concerns associated with collecting data on defects in gas polyethylene pipes often make it difficult to obtain and the lack of public datasets for researchers’ reference. This further complicates the training of deep learning models. Therefore, using data augmentation to increase the number of defect samples is important to improve the accuracy of internal defect detection of gas polyethylene pipelines. Data augmentation techniques offer an effective solution to the problem of limited samples by processing and expanding the existing data, enabling the limited data to generate value equivalent to that of a larger datasets. This enhances the model’s learning ability and is primarily categorized into two types: supervised and unsupervised [
15].
Common methods for supervised data augmentation include geometric transformations, color transformations, CutMix, Mixup, Mosaic, and SMOTE [
16]. Geometric transformations involve operations such as rotation, scaling, translation, flipping, and cropping. Color transformations modify the color attributes of samples, including brightness, contrast, saturation, hue, and noise addition. CutMix generates a new image by combining two images: it cuts both images at a specific ratio and exchanges their corresponding sections [
17]. Mixup creates a new sample by linearly blending two images at the pixel level and assigning class labels proportionally [
18]. The Mosaic method, similar to CutMix, simulates complex scenarios by combining multiple images in a defined layout to generate a new image [
19]. The SMOTE method addresses class imbalance by synthesizing new samples from the minority class, thereby improving the classification model’s ability to recognize the minority class [
20]. Although these methods can effectively expand the datasets, they have certain limitations. Specifically, these approaches do not significantly alter the target features, leading to a limited enhancement of the image information within the datasets, which results in inadequate generalization ability of the model.
Unsupervised data augmentation can substantially increase data diversity by learning its feature distribution, transforming and expanding the original data to generate new samples. Currently, the main unsupervised data augmentation methods include reinforcement learning-based data enhancement and GAN. Reinforcement learning-based data augmentation is represented by AutoAugment, an automated data augmentation method proposed by Google (Mountain View, United States) in 2018 [
21]. However, reinforcement learning is a complex training method that is slow to stabilize and converge, requiring significant computational resources and time. GAN learn the distribution of data through adversarial training between generators and discriminators. They are widely used in image generation tasks due to their ability to produce high-quality, realistic data samples [
22,
23]. Given that the performance of the generator is typically weaker than that of the discriminator, Tan et al. [
24]. modified the generator of DCGAN by incorporating a two-branch structure, which improves the training stability of the generator network while keeping the output image’s dimensional size and channel count unchanged. However, this improvement significantly increases the complexity of the model structure, affecting the training cost and efficiency. Min et al. [
25] improved the diversity of images generated from rail defects by incorporating both the SAM and the CAM into DCGAN, but the quality of the generated images was low. Dewi et al. [
26] applied DCGAN to augment the datasets of traffic signs, combining both original and generated images for training, which resulted in a detection accuracy of 92%. However, the diversity and quality of the generated images are limited. Woldesellasse et al. [
27] utilized CGAN to generate new samples for addressing the class imbalance issue in a corrosion datasets of oil and gas metallic pipelines. After training on the CGAN-enhanced datasets, the test accuracy of the artificial neural network model improved by 9%. However, like other GAN models, CGAN is susceptible to mode collapse issues.
Despite the widespread use of GAN and their variants in data augmentation, with notable successes, there is still potential for further improvement in preventing model collapse and increasing the diversity of generated images [
28]. To address these issues, an improved DCGAN is proposed to generate defect images of gas-fired polyethylene pipelines. The key improvements are as follows:
The Minibatch Discrimination, Self-Attention Mechanism, and Spectrum Normalization are integrated into the DCGAN model to address the limitations of the original DCGAN, such as training instability, mode collapse, and poor quality, less diverse generated samples. These enhancements also help resolve the issue of sample scarcity in machine vision.
Fine-tuning the network architecture using the Two-Timescale Update Rule to ensure stability during the early stages of model training.
A more comprehensive evaluation was conducted to compare the validity of the generated defective images of gas PE pipes. The improved DCGAN notably improves classification accuracy, particularly for small sample sizes.
The rest of this paper is structured as follows.
Section 2 introduces the relevant theories of the applied methods in this paper.
Section 3 proposes the improved DCGAN network and specifies the algorithmic flow and the improved network structure.
Section 4 demonstrates the superior performance of the improved DCGAN through a series of experiments. Finally,
Section 5 provides a summary of the key findings and an outlook on future research.
3. Method
The improved DCGAN Algorithm 1 flow is presented below.
Algorithm 1 Improved DCGAN Algorithm. |
- Require:
Training datasets , batch size m, noise dimension , total epochs E, generator G, discriminator D, optimizer - Ensure:
Trained generator and discriminator
- 1:
Initialize weights for G and D - 2:
Set Adam optimizer for G with initial learning rate - 3:
Set Adam optimizer for D with initial learning rate - 4:
Define label smoothing parameter - 5:
Define learning rate schedulers - 6:
for epoch ← 1 to E do - 7:
Shuffle training data and create mini-batches - 8:
for each mini-batch do - 9:
Discriminator Training: - 10:
Generate noise vectors - 11:
Generate fake images - 12:
Compute real image loss: - 13:
▹ Label smoothing - 14:
- 15:
Compute fake image loss: - 16:
▹ Label smoothing - 17:
- 18:
- 19:
Backpropagate and update - 20:
Generator Training: - 21:
Generate new noise - 22:
Generate refined images - 23:
Compute generator loss: - 24:
- 25:
Backpropagate and update - 26:
Apply Improvements: - 27:
Insert self-attention after deconv layers in G ▹ Self-Attention - 28:
Apply minibatch discrimination before D’s output ▹ Minibatch Discrimination - 29:
Update learning rate - 30:
Update learning rate - 31:
end for - 32:
if epoch save_interval then - 33:
Visualize intermediate generated samples - 34:
Save model checkpoints and - 35:
end if - 36:
end for
|
The structure of the improved DCGAN is depicted in
Figure 5. Incorporating MD network after the discriminator’s final convolutional layer enhances its ability to differentiate between generated samples, mitigates mode collapse, and promotes the diversity of generated samples. Applying SN to each layer of convolution of the discriminator limits the number of spectral paradigms of the weight matrix, reduces the gradient explosion and vanishing problems, and makes the discriminator more accurate and stable in distinguishing between real and generated images. It is worth noting that citing SN does not eliminate the need for BN in the model. SN affects the weight of each layer, while BN affects the activation of each layer. Introducing SAM into the generator helps it better understand the global–local relationship, captures long-range dependencies between different regions in the image, and enables the generator to more accurately learn the structure and content of the image, thus generating more realistic images.
3.1. Discriminator Networks
The detailed structure of the improved discriminator network is presented in
Table 1, with its number of convolutional layers matching that of the original DCGAN.
The network takes a RGB image as input, with a convolution kernel, a stride of 2, and a Leaky ReLU activation function with a slope of 0.2. Both BN and SN are introduced in the convolutional layer to make the training performance more stable. After four layers of convolution the (4, 4, 1024) tensor is obtained, then MD is introduced after the last layer of convolution by which the tensor obtained is computed and spliced with the input information, this tensor represents the exponential sum of the differences between each sample in the current batch and the other samples, and this tensor is expanded in the last dimension to match the shape of the input information, and the final result is a 32,768 dimensional vector.
3.2. Generator Networks
Table 2 presents the detailed architecture of the improved generator network, which consists of a total of four transposed convolutional layers and three SA modules.
The generator first converts the 100-dimensional noise signal into a (4, 4, 1024) three-dimensional tensor by means of a fully connected layer, and then performs layer-by-layer convolution, with each layer of convolution corresponding to the dimensions in the discriminator, and adds SAM after the each convolutional layers, and ultimately outputs a image. A Tanh function is applied in the output layer, while ReLU activation functions are used in the other convolutional layers. The Tanh function compresses output values into the range (−1, 1), aligning with the standard format for image data preprocessing—this allows the pixel values of the generated image to be easily mapped to the desired range via a simple linear transformation. When incorporating the SAM in the generator, the extracted feature maps need to be converted into Query, Key and Value representations after three convolutional layers with convolutional kernel of 1, respectively. Key and Query are used together to compute the attention weights, and Value is weighted and combined based on the computed attention weights to obtain a richer and more expressive feature representation.
4. Result
The experimental equipment used in this paper is Intel (R) Core (TM) i9-13950HX 2.20 GHz (Intel Corporation, Santa Clara, CA, USA), NVIDIA RTX 3500Ada Generation Laptop GPU (NVIDIA Corporation, Santa Clara, CA, USA), and a PyTorch (Python 3.9, CUDA 12.1, TorchVision 0.17.1, TorchAudio 2.2.1) environment. Since the generator’s performance is generally weaker than the discriminator’s, this imbalance may lead to mode collapse in DCGAN during the early training phase. Therefore, the single learning rate of DCGAN was replaced with the TTUR, which assigns separate learning rates to the discriminator and generator to ensure early training stability. The experiments used the Adam optimizer, with the generator’s learning rate set to 2 and the discriminator’s rate set to 3 for a total of 3000 training rounds.
Table 3 presents a comparison of the model’s complexity before and after the improvement. The results show that the total number of parameters in the generator and discriminator has increased only slightly, suggesting that the optimization of the model structure has not introduced significant parameter redundancy. While the memory footprint for the generator’s forward/backward propagation has increased, this trade-off is essential for enhancing the quality of generated outputs.
4.1. Preprocessing of Data Sets
The experiment in this study was conducted using a defect datasets of gas PE pipelines, consisting of 150 defects categorized as cracks, fractures, and holes. During the training of a GAN, a small datasets can lead to overfitting, resulting in generated images that lack diversity and realism. In order to avoid this problem, the datasets is first pre-enhanced. Specifically, geometric transformations such as mirroring, rotation, and translation were applied, along with noise addition, expanding the datasets to 1000 defective images. For the sake of facilitating the comparison of the results, the sample size of the datasets was uniformly processed to
size, and
Figure 6 shows some of the images of the pipeline defect datasets.
4.2. Visualize Comparisons
Generate Image Comparisons
Firstly, the improved DCGAN generation results are compared with the real defective images, as illustrated in
Figure 7.
It can be seen that the image generated by the improved DCGAN has clear contours and can fit the real image with superior generation quality.
Figure 8 illustrates the images generated by each sub-model and the improved DCGAN to further validate the effectiveness of the algorithm improvements.
The comparison reveals that the original DCGAN suffered from mode collapse and generated images with low clarity. In contrast, the TTUR model demonstrated improved stability compared to the single learning rate used in the original DCGAN, effectively mitigating the mode collapse issue. Building on this, the SA-TTUR and SN-TTUR models further enhanced generator performance, but there is still some noise. MD-TTUR improves in generating image diversity compared to other sub-models. While the improved DCGAN model generates images that are more compatible with the features of real images, the image clarity is close to that of real images and generates shapes that are not present in the original images, the diversity and model training stability are significantly improved.
4.3. Convergence Analysis
The ability of TTUR and SN modules to enhance training stability requires further validation. To assess the effectiveness of these improvements, we compared the generator and discriminator losses of the Original DCGAN, TTUR, and SN-TTUR. The resulting training losses are presented in
Figure 9. The figure demonstrates that with an increasing number of training epochs, the generator loss in the Original DCGAN continues to rise, while the discriminator loss approaches zero, which indicates that the generator’s performance is weaker than that of the discriminator. In contrast, with TTUR, the generator loss exhibits a smoother decrease, and is significantly lower than that of the Original DCGAN. The SN-TTUR generator loss values are further reduced, and its fluctuation stabilizes, indicating that combining these two modules enhances the training stability of the DCGAN model. With the simultaneous application yielding the most favorable results.
4.4. Quantitative Evaluation of Generated Images
4.4.1. Image Evaluation Indicators
Since subjective qualitative methods alone are sometimes insufficient to assess image quality, quantitative evaluation methods are also needed for a more comprehensive analysis. Firstly, two image evaluation metrics, FID and SSIM compare the generated and real images based on defect types, and the average value is then obtained to measure the clarity and variety of the generated images. FID evaluates image quality, diversity, and is often used to compare the performance of GAN and other generation models. It evaluates the quality of an image by measuring the feature difference between the generated and real samples; a lower FID score indicates better image quality and diversity, with the ideal case being 0. SSIM is a metric that compares the brightness, contrast, and structure between two images. Unlike traditional disparity measures, SSIM aligns more closely with human visual perception and is closer to actual perception. SSIM has a value between 0 and 1. Values closer to 1 indicate better generated image quality.
Table 4 presents the FID and SSIM evaluation scores for each model. The results in
Table 4 indicate that the improved DCGAN model achieves superior performance across both metrics.
The scores of SA-TTUR, SN-TTUR and MD-TTUR in FID and SSIM have advantages and disadvantages, because FID can reflect the diversity of generated images to a certain extent, that is, SA-TTUR and MD-TTUR have better performance in generating image diversity. SN-TTUR produces better image quality. It also shows that the improved DCGAN model makes full use of the capabilities of each module to enhance the model performance.
4.4.2. Classified Evaluation Indicators
In addition to the image evaluation metrics, which is a quantitative evaluation method, classification index offers an alternative perspective to assess the enhancements introduced by the improved DCGAN model. The VGGNet, AlexNet and ResNet classifiers are used to train the three data augmentation methods (T1: traditional data augmentation, T2: improved DCGAN, T3: traditional data augmentation + improved DCGAN), respectively, and
Table 5 presents the training results for the three classifiers.
The T1 datasets exhibited the lowest performance, with accuracies of 78.23%, 72.90%, and 71.14% for the three classifiers. The corresponding G-mean values were 53.81%, 55.62%, and 66.25%, while the F-score were 70.12%, 66.86%, and 68.80%. In comparison, the T2 datasets showed improvements in accuracy by 3.03%, 4.09%, and 3.05% over T1. G-mean increased by 14.29%, 12.78%, and 1.64%, and F-score improved by 5.8%, 5.55%, and 2.14%. Notably, the T3 datasets achieved the highest accuracy across all three classifiers, with increases of 2.07%, 6.09%, and 2.63% over T2, reaching 83.33%, 83.08%, and 76.82%, respectively. G-mean improved by 8.71%, 5.79%, and 5.74% over T2, reaching 76.81%, 74.19%, and 73.63%, respectively. F-score increased by 4.42%, 5.74%, and 3.59% over T2, reaching 80.34%, 78.15%, and 74.53%, respectively. Bar charts are plotted according to
Table 5 for a more intuitive view of the performance of the improved DCGAN, as shown in
Figure 10.
As shown in the figure, after data augmentation using the improved DCGAN, all metrics significantly improved, with T3 achieving the highest accuracy. This demonstrates that the hybrid datasets substantially enhances classifier performance, further validating the effectiveness of the generated gas PE pipe defective images.
4.4.3. Effect of the Number of Expanded Images on Accuracy
In order to determine the impact of the number of expanded images on the classification accuracy, T3 data augmentation method, which has the best performance in
Section 4.4.2, is adopted to successively expand the images of the original datasets to test the accuracy. That is, on the basis of the original images, the expanded images and generated images by traditional data augmentation methods are added at the same time. The number of augmented images was gradually increased in steps of 100, 300, 500, 700, 1000, 1500, 2000, 2500, 3000, and 3500 to determine the optimal augmentation level.
Figure 11 shows the experimental results.
With the gradual increase in the expanded images, the accuracy of each classifier was enhanced with different extents. The accuracy of ResNet reached its highest value of 78.79% when the number of expanded images was 2000. When 2500 images were expanded, VGGNet and AlexNet achieved accuracies of 94.79% and 87.1%, respectively. And as the number of expansions continued to increase, the accuracy of each classifier did not improve further, but decreased. This decline could be attributed to the limited size of the original datasets, which constrained the diversity of features learnable by the improved DCGAN. As the amount of generated data in the classifier increases, overfitting occurs, resulting in a decline in accuracy.