Next Article in Journal
Chinese Medical Named Entity Recognition Based on Context-Dependent Perception and Novel Memory Units
Previous Article in Journal
Differences in Soft Tissue Wound Healing between Immediate and Delayed Implant Placement: An Experimental Preclinical In Vivo Investigation
Previous Article in Special Issue
Vital Views into Drone-Based GPR Application: Precise Mapping of Soil-to-Rock Boundaries and Ground Water Level for Foundation Engineering and Site-Specific Response
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Research on Intelligent Recognition Method of Ground Penetrating Radar Images Based on SAHI

Key Laboratory of Earth Exploration and Information Techniques of Ministry of Education, Chengdu University of Technology, Chengdu 610059, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2024, 14(18), 8470; https://doi.org/10.3390/app14188470
Submission received: 30 July 2024 / Revised: 12 September 2024 / Accepted: 18 September 2024 / Published: 20 September 2024
(This article belongs to the Special Issue Ground Penetrating Radar (GPR): Theory, Methods and Applications)

Abstract

:
Deep learning techniques have flourished in recent years and have shown great potential in ground-penetrating radar (GPR) data interpretation. However, obtaining sufficient training data is a great challenge. This paper proposes an intelligent recognition method based on slicing-aided hyper inference (SAHI) for GPR images. Firstly, for the problem of insufficient samples of GPR images with structural loose distresses, data augmentation is carried out based on deep convolutional generative adversarial networks (DCGAN). Since distress features occupy fewer pixels on the original image, to allow the model to pay greater attention to the distress features, it is necessary to crop the original images centered on the distress labeling boxes first, and then input the cropped images into the model for training. Then, the YOLOv5 model is used for distress detection and the SAHI framework is used in the training and inference stages. The experimental results show that the detection accuracy is improved by 5.3% after adding the DCGAN-generated images, which verifies the effectiveness of the DCGAN-generated images. The detection accuracy is improved by 10.8% after using the SAHI framework in the training and inference stages, which indicates that SAHI is a key part of improving detection performance, as it significantly improves the ability to recognize distress.

1. Introduction

Ground-penetrating radar (GPR) is a non-destructive detection method for the subsurface environment widely used in the fields of tunnel health conditions [1], underground pipelines [2], and highway structural distress detection [3]. The use of GPR in highway structural distress detection generates a huge amount of data. Even experienced practitioners spend significant time and effort interpreting GPR data. Manual interpretation is costly, inefficient, and lacks standardized reading criteria, making accuracy hard to guarantee, and, thus, greatly limiting its application in the field [4].
Deep learning [5] is a subclass of machine learning (ML) that uses techniques such as convolutional neural networks (CNNs) to learn features from images. CNNs, with multiple hidden layers, leverage learning from previous layers during model training [6], exhibiting a strong capability in recognizing computer vision features and high robustness in detection tasks. In recent years, deep learning technology has been developed vigorously, and many scholars have tried to apply deep learning to the interpretation of GPR data. Li et al. [7] used the deep learning model YOLO to detect hidden cracks in GPR images to realize the automatic identification and localization of hidden cracks in asphalt pavement. Three versions, namely YOLOv3, YOLOv4, and YOLOv5, were trained and compared, and finally, YOLOv5 was obtained. The three versions of YOLOv3, YOLOv4, and YOLOv5 were trained and compared, and finally, the conclusion that the YOLOv4 model is the most balanced deep learning model in terms of speed and actual performance in detecting hidden cracks was obtained. Qiu et al. [8] improved the problem of false and missed detection in real-time GPR detection by improving the network structure of YOLOv5, adding the attention mechanism and data augmentation, and then the regression equation for the location information of the ground-penetrating radar’s position information to establish a regression equation, which realized the accurate location of foreign objects in underground soil. Liu et al. [9] denoised the original GPR road detection image and then tested the accuracy of the original and denoised images using YOLOv3, and the final detection accuracy was improved by 30%. Liu et al. [10] established a dataset of asphalt pavement distresses including settlement, hidden cracks, and loosening, and then proposed a two-way object detection model for targets of different scales based on the distribution of hidden distresses in the GPR image data. The components of the two-way model were optimized and improved based on the basic model, and the final model’s average precision (AP) for small targets was improved by 17.9%, and the combined index AP (0.5:0.95) was improved by 9.9%. Gao et al. [11] proposed a Faster R-ConvNets object detection model, enabling intelligent detection of water damage, cracks, and uneven settlement damage in GPR images, with model precision and recall reaching 0.89.
However, training a reliable deep learning model requires a large amount of GPR data labeled with underground targets, which are often difficult to obtain due to the high cost of data collection and field validation. With less data, the scarcity of samples often leads to low recognition accuracy when performing model training. Therefore, data augmentation of GPR data remains an urgent problem. Generative adversarial networks (GANs) [12] are widely used to enrich the number and diversity of datasets, providing a new avenue for GPR data enhancement. Yue et al. [13] proposed an improved least squares generative adversarial network (LSGAN) model, which can generate high-precision GPR data and solve the scarcity of labeled GPR data. Zhao et al. [14] proposed a Wasserstein GAN (WAEGAN) based on generative adversarial networks (GANs) and verified that the method is effective in simultaneously generating multiple target classes and generating realistic GPR data.
Slicing-aided hyper inference (SAHI) is a recently developed technique for small object detection on high-resolution images, which can be integrated with various types of object detection methods. It achieves further enhancement of small object detection by automatically overlapping and segmenting original images during inference, significantly improving small object detection capabilities [15]. Wang et al. [16] proposed an improved object detection algorithm, YOLOX_w, based on YOLOX-X, which was developed for UAV aerial images with complex backgrounds and a large number of small targets. They preprocessed images using SAHI to slice them according to set overlap rates, allowing small targets to occupy larger pixel areas on slices. Combining SAHI with data augmentation strategies effectively enhances small object detection performance in UAV aerial images. Duan et al. [17] proposed a deep learning model-based, automatic detection method for submarine pipelines and introduced SAHI to solve the challenge of detecting small targets with inconspicuous features in large-size, low-resolution images. Muzammul et al. [18] utilized the VisDrone-DET dataset to combine the real-time detection and recognition model RT-DETR-X with the SAHI methodology, which was used to significantly improve the model detection accuracy, especially for small targets. Currently, there is no case of SAHI being used in GPR image recognition. In a GPR image, the structural loose distress only occupies a small portion of the pixels in the original image. After using SAHI, the original image is sliced according to the set size and overlap rate, so that the structural loose distress can occupy more pixels in the slice, and has a relatively greater visibility compared to the original image, which can avoid the leakage of detection to a large extent. The result is a larger visibility compared to the original image, which largely avoids missed detections.
In this paper, we first intelligently expand the GPR images of structural loose distresses based on DCGAN. Then we use YOLOv5 to detect the structural loose distresses in the GPR images. In the training stage, SAHI is used for data preprocessing, which slices the original image into the input size of the model to eliminate the loss of the target details of the image in the compression process. This allows the model to better learn distress features. In the inference stage, SAHI is combined for detection. Through experiments, it is verified that adding DCGAN-generated images and using SAHI in the training and inference stages can significantly improve the recognition accuracy of structural loose distress.

2. Methodology

2.1. DCGAN for Data Augmentation

2.1.1. GPR Image Dataset

The measured GPR images are obtained by using a vehicle-mounted MALA GX750 model GPR instrument with a center frequency of 750 MHz, manufactured by the Swedish company GuidelineGEO, Stockholm, Sweden, acquired on a highway in China with the acquisition vehicle at a speed of 55 km/h, and are coupled with air coupling. Then, the processed images are labeled with the open-source dataset annotation tool, labelImg, to label the structural loose distresses, and a sample dataset is produced. Finally, 624 GPR images containing structural loose distresses are obtained. One image may contain multiple distresses, so a total of 670 structural loose distresses are contained in the 624 images.
The object detection task often requires a large amount of labeled data to train the model, and only 624 images containing structural loose distresses are insufficient, so this paper uses DCGAN to expand the dataset.

2.1.2. GAN

GAN is a framework proposed by Goodfellow in 2014 and consists of two simultaneously trained networks: the generator and the discriminator (Figure 1) [12]. The generator generates spurious samples using the provided noise variables as input. The discriminator is responsible for distinguishing whether the input samples are real or fake. The goal of the generator is to produce real samples, while the goal of the discriminator is to specifically distinguish between real and fake samples. The competition between the generator and the discriminator makes the two models equivalent to optimizing the training process until an equilibrium state is reached. The loss function in GAN is defined as follows:
min G max D V D , G = E x ~ P data x log D x + E z ~ P z z log 1 D G z
where x is the real data, distribution is represented by P data x , G z is the fake data generated by the generator, distribution is represented by P z z , and z is random noise.

2.1.3. DCGAN

DCGAN was proposed by Alec Radford in 2015, and the changes made compared to the original GAN are as follows: (1) no pooling layer is used, and the fully connected layers in the generator and discriminator in the original GAN are replaced with convolution and transposed convolution layers; (2) batch normalization is used in each layer of the generator and discriminator; (3) the generator’s output layers using the Tanh activation function, and the other output layers using the rectified linear unit (ReLU) activation function; and (4) the leaky rectified linear unit (LeakyReLU) activation function is used on all layers of the discriminator [19]. These changes improve the stability of the GAN and the quality of the generated results. The network structure of the DCGAN used in the paper is shown in Figure 2.
Table 1 shows the detailed structure of the generator, which sets up several up-sampling layers to reshape the random noise into a size of 256 × 256. First, the 128-dimensional noise vector is converted into a (64, 64, 64) tensor through five transposed convolution blocks, each of which consists of a transposed convolution layer, a batch normalization layer, and a ReLU activation function. The kernel of the transposed convolution layer for each block is 4 × 4, and the step size of the transposed convolution layer for each block is 2, except for the first one, which has a step size of 1. Finally, a transposed convolution layer with a kernel of 4 × 4 and a step size of 4 is set up and connected to Tanh to resize the output to 256 × 256.
Table 2 shows the detailed structure of the discriminator, the opposite of the generator, with inputs of real data and fake images generated by the generator. LeakyReLU activation functions are all used to avoid mode collapse. The final output is used to adjust the generator to make the generated image closer to the real image.

2.2. Improvement of YOLOv5 Object Detection Method

2.2.1. YOLOv5 Model

YOLOv5 contains four network models: YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x, among which the YOLOv5s network has the smallest model depth and feature map width in the YOLOv5 series. In this paper, YOLOv5s is used for detection, and its overall structure consists of four parts: input, backbone, neck, and head. The schematic diagram of the network structure is shown in Figure 3.

2.2.2. SAHI

The pre-trained model used in YOLOv5 is trained on the COCO dataset. The resolution of the COCO dataset is mainly 640 × 480, so the input size of YOLOv5 is set to 640 × 640, whereas most of the datasets used in this thesis have a resolution of 1024 × 256, and if the size of the original image is inputted directly, the YOLO model will compress the image to the input size, thus resulting in pixel loss and reducing the detection target detail information. Therefore, the slicing-based, slicing-aided hyper inference (SAHI) open-source framework proposed by Akyon [15] in 2022 at the inference stage is adopted, and Figure 4 shows the schematic diagram of SAHI.
SAHI can be divided into two parts: full inference and aided inference. Full inference feeds the original image into the model to detect large targets. The SAHI part slices the original image I into l M × N overlapping slices P 1 I , P 2 I , ..., P l I , and then adjusts the size of the slices while maintaining the aspect ratio, and feeds each of the overlapping slices into the inference model for prediction separately and independently. Finally, the results of the full inference and SAHI calculations are jointly inputted into the NMS to remove the low-confidence detection frames of the same target and are converted to the original image size. Compared to directly inputting the original image, this method can produce a larger pixel area for the target object in the image, retaining the features of the target as much as possible, and avoiding the loss of features due to too few target pixel points in the original image.
In this paper, SAHI is not only applied to the inference stage of YOLOv5 but also to the training stage. However, unlike the inference stage, in the training stage, only the original image needs to be sliced into multiple overlapping slices and fed into the model for training without merging the slices to the original size. The overlapping slices are obtained by slicing the original image using SAHI according to the set size of the slices and the overlap rate between the slices, and then the sliced dataset is obtained, which is used for model training. This allows the model to focus more on the target, and can improve target recognition.
In this paper, the slicing size is set to 640 × original height, and the overlap rate between slices is set to 0.4, which ensures that the target exists completely in at least one slice. Finally, the dataset for training is obtained, and the results of a certain original image and slicing results in the dataset are shown in Figure 5 and Figure 6.

3. Results

3.1. DCGAN-Based Data Augmentation Results

3.1.1. Data Preprocessing

Data augmentation of structural loose distresses using DCGAN was applied. The resolution of the existing 624 GPR images ranges from 948 to 1705 in width and 136 to 278 in height, and the size of the labeling boxes ranges from 22 to 232 in width and 18 to 80 in height. The background of the original image is also more cluttered, the structural loose distress we need in the image occupies fewer pixels, the whole GPR image placed directly into the model to generate the image is very unattractive, and the generated images only learn the background information, not the structural loose distresses features. Therefore, the original image is firstly cropped to 256 × 256 by the labeled boxes for inputting into DCGAN for training. As mentioned earlier, many original images do not have a height of 256, so these images are cropped according to the original image height (Figure 7), and finally, 670 images of structural loose distress are obtained for DCGAN training.

3.1.2. Experimental Environment and Generated Image Results

The obtained 670 images were inputed into DCGAN for data augmentation. The training was performed on a computer with an Intel i7-13700K CPU, NVIDIA RTX 4090 graphics card, DDR4 128 GB 3200 MHz memory, CUDA version 11.6.1, PyTorch version 1.13.1, and Python 3.9.16 compiled environment. The training was completed for 50,000 epochs using the Adam optimizer with a learning rate of 0.0001 and a batch size of 670. Finally, 555 images containing structural loose distresses were generated using DCGAN to expand the dataset, and the distress annotation was performed using the labelImg tool. The resulting images are shown in Figure 8, displaying that DCGAN not only learns the highway pavement structure (the pavement structure is usually divided into the surface layer, base layer, and roadbed), but also generates clear details of the contour of the structural loose distress to learn the detailed features of the distress well. The generated structural loose distress image is closer to the real structural loose distress image, and the generated images can be used for sample expansion to enhance the diversity of the dataset to a certain extent as well.

3.2. Experimental Results of the Improved YOLOv5 Model

3.2.1. Dataset and Experimental Environment

The 624 GPR images were divided into the training set, validation set, and test set according to 8:1:1. The training set contains 499 images, the validation set contains 62 images and the test set contains 63 images. In the training stage using SAHI, it is necessary to preprocess the training and validation sets using SAHI to slice the original image according to the width size of 640 and the height of the original image, and the overlap rate between slices is set to 0.4. Finally, the training and validation sets were sliced into 958 and 118 slices each.
To verify the effect of GPR images generated using DCGAN in the training dataset and slicing the dataset using SAHI in the training stage on target recognition, four training datasets were created, as shown in Table 3.
The four datasets were fed into the model separately for training. The computer used to train YOLOv5 was the same as the one used to train DCGAN, and all experiments were validated using consistent hyperparameter training. The number of iterations was set to 500, the SGD optimizer was implemented, the initial learning rate was 0.01, the batch size was set to 16, and the other settings were the defaults for YOLOv5.

3.2.2. Evaluation Metrics

In this paper, we use precision (P), recall (R), average precision (AP), and mean average precision (mAP) to evaluate the experimental results, and the formulas for these four metrics are as follows:
P = TP TP + FP
R = TP TP + FN
AP = 0 1 P r dr
mAP = 1 N i = 1 N A P i
The overall model performance is also evaluated using ROC and AUC, which are important indicators of classification performance, with the horizontal coordinate being the False Positive Rate (FPR) and the vertical coordinate being the True Positive Rate (TPR). The AUC is the area formed by the ROC curve and the axes, and, usually, the larger the AUC, the better the classification performance of the model. The FPR and TPR are calculated as follows:
FPR = FP FP + TN
TPR = TP TP + FN
where TP denotes actual positive samples and predicted positive samples, TN denotes actual negative samples and predicted negative samples, FN denotes actual positive samples but predicted negative samples, FP denotes actual negative samples but predicted positive samples, and N is the number of categories representing the detection.

3.2.3. Test Results

After training based on the four types of datasets, the model has good detection performance and the detection results obtained by applying them to the test set are shown in Table 4. It can be seen from Table 4 that the mAP50 of dataset I, which contains only the original image, is only 63.3%, while the mAP50 of dataset II, which adds the image generated by using DCGAN based on dataset I, is 68.6%, an increase of 5.3%, indicating that the use of DCGAN can effectively expand the dataset to improve the detection accuracy. Dataset III is sliced by using SAHI to slice the training set and validation set in the training stage, and the final mAP50 is 66.3%, which is 3% higher compared to dataset I, indicating that slicing the images is conducive to the training of the model and thus improves the detection accuracy. Dataset IV, which includes both the dataset-sliced slices and the images generated using DCGAN, has the highest final mAP50 of 70.9%, an improvement of 7.6%.
Based on precision (P) and recall (R), the PR curve can be plotted. The comparison of the PR curves is plotted in Figure 9, where both datasets II and III show a small increase in recall compared to dataset I, while dataset IV shows a much higher recall and precision.
Table 5 shows the experimental results after using SAHI in the inference stage. From the table, it can be seen that after using SAHI in the inference stage, the mAP50 of dataset I reaches 66.9%, which is an increase of 3.6%, indicating that the use of SAHI in the inference stage is very effective, and that slicing the image increases the relative scale size of the target, enhancing the feature expression of the target and leading to an increase in the detection accuracy. The accuracy of dataset IV after using SAHI in the inference stage, compared with dataset I without SAHI, displayed a great improvement as the mAP50 increased by 10.8%, to 74.1%.
Figure 10 shows the PR curves of the experiment after using SAHI in the inference stage, and it can be seen that the recall rate increases substantially after using SAHI in the inference stage.
To objectively evaluate the performance of the model in this paper and to further illustrate the superiority of using DCGAN-generated images to expand the dataset and using SAHI in the training and inference stages, the model is evaluated using the ROC and AUC metrics. The ROC curve is an important metric for measuring the performance of an algorithm. The area under the curve (AUC) can be used as a quantitative evaluation metric for the ROC curve. In general, the larger the AUC, the better the detection of the ROC curve. Figure 11 shows the ROC curves for dataset I and dataset IV after using SAHI in the inference stage, respectively. From the ROC curves, it can be seen that the AUC of dataset IV using SAHI in the inference stage is 0.86, while the AUC of dataset I is 0.77, thus verifying the superiority of the improved model.
Figure 12 shows the visualization of the results after the detection of the test set. In terms of detection accuracy, it can be seen from the comparative effects in Figure 12a–e that the detection accuracy of the model trained on the dataset after data augmentation is improved, and then additionally, the detection accuracy is improved even more significantly after combining SAHI in the training and inference stages. In terms of the leakage situation, it can be seen from Figure 12b,d,e that the model trained on dataset I fails to recognize the structural loose distresses, and in Figure 12b, the model trained on the dataset after data augmentation can recognize the distress, but with lower accuracy, and cannot recognize the loose structure distress in Figure 12d,e. Whereas, after combining the SAHI framework in both the training and inference stages, it can recognize all the loose structure distresses that were missed, and all of them have higher accuracies of more than 70%, which greatly reduces the missed detection rate. As can be seen in Figure 12c,d, after using SAHI in the training and inference stages, some false detections occur, mainly because the feature expression of non-targets that are similar to the target features after image slicing is also enhanced, but from the point of view of the improvement of detection accuracy and the reduction of the leakage detections, the use of the image-expanded dataset generated by DCGAN and the use of SAHI in the training and inference stages can significantly improve the detection effect. Especially, the missed detection is greatly reduced after using SAHI.
To estimate and discuss the variability of the classification results, this paper reclassified the training, validation, and test sets to obtain ten different dataset divisions, which were then fed into the model for training, respectively. Table 6 shows the mAP50 of the models obtained after training on each dataset applied to each test set respectively, YOLOv5 indicates that the dataset was trained and tested directly using the base YOLOv5, and Ours represents the image-expanded dataset generated using DCGAN and SAHI was used in the training and inference stages. It can be concluded from the table that, for the individual experiments, using the DCGAN-generated, image-expanded dataset and using SAHI in the training and inference stages, the mAP50 is higher than the mAP50 of the base YOLOv5. The average mAP50 of the 10 experiments reaches 0.669, which is higher than the average value of the mAP50 of the base YOLOv5, which demonstrates the validity of the method used in this paper.

4. Conclusions

In this paper, to solve the problem of insufficient structural loose distress data, we propose to generate images based on DCGAN for data augmentation, which solves the problem of poor training effect due to insufficient data volume, and to input the generated images together with the original images into YOLOv5 for training. The recognition accuracy then improves by 5.3%. SAHI is introduced in the training stage, and the original images are sliced according to the set slice size and overlap rate before the original image is fed into YOLOv5 model for training. SAHI is also used in the inference stage and the final recognition accuracy reaches 74.1%, which is improved by 10.8%. The experimental results show that after expanding the data, the detection accuracy can be improved. Then, after introducing the SAHI framework in the training and inference stages, the detection accuracy can be improved more obviously, and most of the missed structural loose distress can be effectively detected, thus reducing the rate of missed detection.

Author Contributions

Conceptualization, R.C. and L.C.; Data curation, L.L.; Formal analysis, R.C. and C.L.; Funding acquisition, L.C. and C.L.; Investigation, R.C.; Methodology, R.C. and L.C.; Project administration, R.C. and L.C.; Resources, L.C. and C.L.; Software, R.C.; Supervision, L.C.; Validation, L.C., C.L. and L.L.; Visualization, L.C.; Writing—original draft, R.C.; Writing—review and editing, R.C. and L.C. All authors have read and agreed to the published version of the manuscript.

Funding

The authors are grateful to the key project of National Natural Science Foundation of China (G. No: 41930112).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets generated and analyzed during the current study are not publicly available due to being owned by other companies, but are available from the corresponding author on reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

  1. Zan, Y.W.; Su, G.F.; Li, Z.L.; Zhang, X.Y. A train-mounted GPR system for fast and efficient monitoring of tunnel health conditions. In Proceedings of the 2016 16th International Conference on Ground Penetrating Radar (GPR), Hong Kong, China, 13–16 June 2016; pp. 1–5. [Google Scholar]
  2. Luo, X.; He, T.; Wen, L.; Li, W.; Tang, L.; Cai, Z. Hyperbolic Feature Detection and Radius Calculation of Underground Pipeline. In Proceedings of the 2018 IEEE 3rd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), Chongqing, China, 12–14 October 2018; pp. 1949–1954. [Google Scholar]
  3. Xu, X.; Peng, S.; Xia, Y.; Ji, W. The development of a multi-channel GPR system for roadbed damage detection. Microelectron. J. 2014, 45, 1542–1555. [Google Scholar] [CrossRef]
  4. Liu, C.; Du, Y.; Yue, G.; Li, Y.; Wu, D.; Li, F. Advances in automatic identification of road subsurface distress using ground penetrating radar: State of the art and future trends. Autom. Constr. 2024, 158, 105185. [Google Scholar] [CrossRef]
  5. Lecun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436. [Google Scholar] [CrossRef] [PubMed]
  6. Dargan, S.; Kumar, M.; Ayyagari, R.M.; Kumar, G. A Survey of Deep Learning and Its Applications: A New Paradigm to Machine Learning. Arch. Comput. Methods Eng. 2019, 27, 1071–1092. [Google Scholar] [CrossRef]
  7. Li, S.; Gu, X.; Xu, X.; Xu, D.; Zhang, T.; Liu, Z.; Dong, Q. Detection of concealed cracks from ground penetrating radar images based on deep learning algorithm. Constr. Build. Mater. 2021, 273, 121949. [Google Scholar] [CrossRef]
  8. Qiu, Z.; Zhao, Z.; Chen, S.; Zeng, J.; Huang, Y.; Xiang, B. Application of an Improved YOLOv5 Algorithm in Real-Time Detection of Foreign Objects by Ground Penetrating Radar. Remote Sens. 2022, 14, 1895. [Google Scholar] [CrossRef]
  9. Liu, L.; Cao, L.; Lu, C.; Yang, X.; Wei, T.; Li, X.; Jiang, H.; Yang, L. A denoising method based on cyclegan with attention mechanisms for improving the hidden distress features of pavement. Sci. Rep. 2023, 13, 13910. [Google Scholar] [CrossRef] [PubMed]
  10. Liu, W.; Luo, R.; Xiao, M.; Chen, Y. Intelligent detection of hidden distresses in asphalt pavement based on GPR and deep learning algorithm. Constr. Build. Mater. 2024, 416, 135089. [Google Scholar] [CrossRef]
  11. Gao, J.; Yuan, D.; Tong, Z.; Yang, J.; Yu, D. Autonomous pavement distress detection using ground penetrating radar and region-based deep learning. Measurement 2020, 164, 108077. [Google Scholar] [CrossRef]
  12. Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Networks. arXiv 2014, arXiv:1406.2661. [Google Scholar] [CrossRef]
  13. Yue, Y.; Liu, H.; Meng, X.; Li, Y.; Du, Y. Generation of High-Precision Ground Penetrating Radar Images Using Improved Least Square Generative Adversarial Networks. Remote Sens. 2021, 13, 4590. [Google Scholar] [CrossRef]
  14. Zhao, D.; Guo, G.; Ni, Z.-K.; Pan, J.; Yan, K.; Fang, G. WAEGAN: A GANs-Based Data Augmentation Method for GPR Data. In IEEE Geosci. Remote Sens. Lett. 2023, 20, 1–5. [Google Scholar] [CrossRef]
  15. Akyon, F.C.; Onur Altinuc, S.; Temizel, A. Slicing Aided Hyper Inference and Fine-Tuning for Small Object Detection. In Proceedings of the 2022 IEEE International Conference on Image Processing (ICIP), Bordeaux, France, 16–19 October 2022; pp. 966–970. [Google Scholar]
  16. Wang, X.; He, N.; Hong, C.; Wang, Q.; Chen, M. Improved YOLOX-X based UAV aerial photography object detection algorithm. Image Vision Comput. 2023, 135, 104697. [Google Scholar] [CrossRef]
  17. Duan, B.; Wang, S.; Luo, C.; Chen, Z. Multi-Module Fusion Model for Submarine Pipeline Identification Based on YOLOv5. J. Mar. Sci. Eng. 2024, 12, 451. [Google Scholar] [CrossRef]
  18. Muzammul, M.; Algarni, A.; Ghadi, Y.Y.; Assam, M. Enhancing UAV Aerial Image Analysis: Integrating Advanced SAHI Techniques With Real-Time Detection Models on the VisDrone Dataset. IEEE Access 2024, 12, 21621–21633. [Google Scholar] [CrossRef]
  19. Radford, A.; Metz, L.; Chintala, S. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. arXiv 2015, arXiv:1511.06434. [Google Scholar]
Figure 1. Diagram of the GAN structure.
Figure 1. Diagram of the GAN structure.
Applsci 14 08470 g001
Figure 2. DCGAN network architecture.
Figure 2. DCGAN network architecture.
Applsci 14 08470 g002
Figure 3. YOLOv5 network architecture.
Figure 3. YOLOv5 network architecture.
Applsci 14 08470 g003
Figure 4. The principle of SAHI.
Figure 4. The principle of SAHI.
Applsci 14 08470 g004
Figure 5. Original image.
Figure 5. Original image.
Applsci 14 08470 g005
Figure 6. Slicing results.
Figure 6. Slicing results.
Applsci 14 08470 g006
Figure 7. Cropping method.
Figure 7. Cropping method.
Applsci 14 08470 g007
Figure 8. Images generated by DCGAN.
Figure 8. Images generated by DCGAN.
Applsci 14 08470 g008
Figure 9. PR curve diagram.
Figure 9. PR curve diagram.
Applsci 14 08470 g009
Figure 10. PR curve after using SAHI in the inference stage.
Figure 10. PR curve after using SAHI in the inference stage.
Applsci 14 08470 g010
Figure 11. ROC curves.
Figure 11. ROC curves.
Applsci 14 08470 g011
Figure 12. Visualization of the detection results on the test set. (In each sub-figure, top: detection results on the test set of the model trained using dataset I, middle: detection results on the test set of the model trained using dataset II, bottom: detection results on the test set of the model trained using dataset IV and after SAHI was used in the inference stage, where the green part represents the real label box).
Figure 12. Visualization of the detection results on the test set. (In each sub-figure, top: detection results on the test set of the model trained using dataset I, middle: detection results on the test set of the model trained using dataset II, bottom: detection results on the test set of the model trained using dataset IV and after SAHI was used in the inference stage, where the green part represents the real label box).
Applsci 14 08470 g012aApplsci 14 08470 g012bApplsci 14 08470 g012c
Table 1. Structure of the generator.
Table 1. Structure of the generator.
TypeLayerOutput Shape
ConvTranspose1ConvTranspose2d[4, 4, 512]
BatchNorm2d[4, 4, 512]
ReLU[4, 4, 512]
ConvTranspose2ConvTranspose2d[8, 8, 512]
BatchNorm2d[8, 8, 512]
ReLU[8, 8, 512]
ConvTranspose3ConvTranspose2d[16, 16, 256]
BatchNorm2d[16, 16, 256]
ReLU[16, 16, 256]
ConvTranspose4ConvTranspose2d[32, 32, 128]
BatchNorm2d[32, 32, 128]
ReLU[32, 32, 128]
ConvTranspose5ConvTranspose2d[64, 64, 64]
BatchNorm2d[64, 64, 64]
ReLU[64, 64, 64]
ConvTranspose6ConvTranspose2d[256, 256, 1]
Tanh[256, 256, 1]
Table 2. Structure of the discriminator.
Table 2. Structure of the discriminator.
TypeLayerOutput Shape
Conv1Conv2d[64, 64, 64]
LeakyReLU[64, 64, 64]
Conv2Conv2d[32, 32, 128]
BatchNorm2d[32, 32, 128]
LeakyReLU[32, 32, 128]
Conv3Conv2d[16, 16, 256]
BatchNorm2d[16, 16, 256]
LeakyReLU[16, 16, 256]
Conv4Conv2d[8, 8, 512]
BatchNorm2d[8, 8, 512]
LeakyReLU[8, 8, 512]
Conv5Conv2d[4, 4, 512]
BatchNorm2d[4, 4, 512]
LeakyReLU[4, 4, 512]
Conv6Conv2d[1, 1, 1]
Table 3. Dataset settings.
Table 3. Dataset settings.
Training DatasetIIIIIIIV
original images××
generated images××
slices××
Table 4. Experimental results.
Table 4. Experimental results.
DatasetmAP50/%
I0.633
II0.686
III0.663
IV0.709
Table 5. Experimental results after using SAHI in the inference stage.
Table 5. Experimental results after using SAHI in the inference stage.
DatasetmAP50/%
I0.669
IV 0.741
Table 6. Comparison of experimental results for 10 different datasets.
Table 6. Comparison of experimental results for 10 different datasets.
Method12345678910Average
YOLOv50.5910.5190.5830.490.6060.6260.5960.6930.6270.4660.58
Ours0.7840.5690.6730.5280.6760.6920.7210.7240.7460.5780.669
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chen, R.; Cao, L.; Lu, C.; Liu, L. Research on Intelligent Recognition Method of Ground Penetrating Radar Images Based on SAHI. Appl. Sci. 2024, 14, 8470. https://doi.org/10.3390/app14188470

AMA Style

Chen R, Cao L, Lu C, Liu L. Research on Intelligent Recognition Method of Ground Penetrating Radar Images Based on SAHI. Applied Sciences. 2024; 14(18):8470. https://doi.org/10.3390/app14188470

Chicago/Turabian Style

Chen, Ruimin, Ligang Cao, Congde Lu, and Lei Liu. 2024. "Research on Intelligent Recognition Method of Ground Penetrating Radar Images Based on SAHI" Applied Sciences 14, no. 18: 8470. https://doi.org/10.3390/app14188470

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop