Efficient Detection of Apparent Defects in Subway Tunnel Linings Based on Deep Learning Methods

Zheng, Ao; Qi, Shouming; Cheng, Yanquan; Wu, Di; Zhu, Jiasong

doi:10.3390/app14177824

Open AccessArticle

Efficient Detection of Apparent Defects in Subway Tunnel Linings Based on Deep Learning Methods

by

Ao Zheng

^1,2,

Shouming Qi

^3,4

,

Yanquan Cheng

⁴,

Di Wu

^1,2 and

Jiasong Zhu

^1,2,*

¹

National Key Laboratory of Green and Long-Life Road Engineering in Extreme Environment, Shenzhen 518060, China

²

School of Civil and Transportation Engineering, Shenzhen University, Shenzhen 518060, China

³

School of Transportation and Civil Engineering, Fujian Agriculture and Forestry University, Fuzhou 350108, China

⁴

Shenzhen Technology Institute of Urban Public Safety, Shenzhen 518000, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(17), 7824; https://doi.org/10.3390/app14177824

Submission received: 26 July 2024 / Revised: 30 August 2024 / Accepted: 2 September 2024 / Published: 3 September 2024

(This article belongs to the Section Civil Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

High-precision and rapid detection of apparent defects in subway tunnel linings is crucial for ensuring the structural integrity of tunnels and the safety of train operations. However, current methods often do not adequately account for the spatial characteristics of these defects and perform poorly in detecting and extracting small-scale defects, which limits the accuracy of detection and geometric parameter extraction. To address these challenges, this paper proposes an efficient algorithm for detecting and extracting apparent defects in subway tunnels. Firstly, YOLOv8 was selected as the foundational architecture due to its comprehensive performance. The coordinate attention module and Bottleneck Transformer 3 were then integrated into the model’s backbone to enhance the focus on defect-prone areas and improve the learning of feature relationships between defects and other infrastructure. Subsequently, a high-resolution detection layer was added to the model’s head to further improve sensitivity to subtle defects. Additionally, a low-quality crack dataset was created using an open access dataset, and transfer learning combined with Real-ESRGAN was employed to enhance the detail and resolution of fine cracks. The results of the field experiments demonstrate that the proposed model significantly improves detection accuracy in high-incidence areas and for small-scale defects, achieving a mean average precision (mAP) of 87% in detecting cracks, leakage, exfoliation, and related infrastructure defects. Furthermore, the crack enhancement techniques substantially improve the representation of fine-crack details, increasing feature extraction accuracy by a factor of four. The findings of this paper could provide crucial technical support for the automated operation and maintenance of metro tunnels.

Keywords:

subway tunnel; defect inspection; deep learning; image super-resolution; transfer learning

1. Introduction

With the continuous increase in urban metro operation time, early-constructed metro tunnels, under the coupled effects of geological conditions and operating environments, generally suffer from structural aging problems. These issues manifest as lining cracks, water leakage, and exfoliation [1]. The interaction between the apparent defects of the lining and its structure poses significant safety hazards for tunnel operations. When faced with extreme weather conditions and emergencies, such as heavy rain, changes in groundwater flow, damage to underground municipal facilities, and other underground engineering activities, metro tunnels can be impacted to varying degrees. The safety risks associated with lining defects are consequently amplified. These issues can exacerbate the degree of lining defects, increase the affected areas, and cause structural damage such as misalignment and voids. In severe cases, these problems can lead to catastrophic consequences such as tunnel flooding, collapse, and surface subsidence [2]. Therefore, timely detection of apparent defects in metro tunnels is extremely crucial for metro operation and public safety.

In engineering practice, the detection of apparent defects in metro tunnels primarily relies on manual inspections and the use of mobile measurement equipment for data collection and manual identification. However, as the pace of urban life accelerates and metro operation times increase, the maintenance window period for tunnels is increasingly compressed. The inefficiency, low precision, and over-reliance on subjective judgment of these methods are becoming more apparent [3]. These issues have led to growing research interest in developing accurate and efficient automatic detection technologies for metro tunnel lining defects in the field of tunnel maintenance. Over the past few decades, the use of image-based techniques for tunnel damage detection has seen significant advancements. The introduction of image processing technologies in tunnel inspection has revolutionized the field by offering non-contact, accurate, and efficient means of detecting structural and apparent defects. In related research on the automated detection of lining defects, visible light cameras or LiDAR (Light Laser Detection and Ranging) are typically used to collect and process data, resulting in panoramic grayscale images of the tunnel lining as the raw data [4,5,6,7,8]. Depending on the detection environment and target features, methods such as digital image processing and deep learning are employed for defect detection and extraction.

The detection of apparent tunnel defects by using digital image processing technology primarily relies on low-level features, such as pixel grayscale values, gradient changes, grayscale histograms, and frequency domain characteristics of the defects, so that they can be detected in the image. Algorithms such as image enhancement, noise reduction filtering, and threshold segmentation are used to extract image regions corresponding to different types of defects [9]. Murakami et al. [10] used frequency-shifted feedback (FSF) laser technology to achieve high-resolution imaging for the detection of apparent cracks in subway tunnel linings, capable of detecting cracks as narrow as 0.2 mm. Ai et al. [11] used a differential minimization algorithm to perform image differencing on surface images of metro tunnel linings collected at different times from the same location. They employed a neighborhood-connected region search algorithm and classified apparent defects, such as leakage, exfoliation, and cracks, based on the geometric parameters of the connected regions and the characteristics of the defects. In other related studies, targeted filtering algorithms and threshold segmentation algorithms have received widespread attention. Ba et al. [12] introduced an adaptive median–Gaussian filtering algorithm to further improve noise filtering in tunnel lining crack detection. Their algorithm applied Gaussian filtering and adaptive median filtering to remove detected Gaussian noise and salt-and-pepper noise, respectively, which effectively addressed the issue of poor noise filtering. Gong et al. [13] used full cross-sectional images of tunnel surfaces collected by a multiline scanning camera. They applied a multistage fusion filtering algorithm with local threshold binarization to filter out background noise and interference from images of tunnel-surface water stains and various facilities. Moreover, they employed an improved threshold segmentation algorithm based on edge gradient information to segment real cracks from noisy backgrounds and applied image differencing and region search algorithms to defect detection. Lei et al. [14] proposed a metro tunnel lining crack segmentation algorithm that combines adaptive threshold segmentation, edge detection, and dual-threshold Otsu methods. The algorithm merges the segmented cracks with the background regions and sequentially stitches the sub-blocks together to obtain the overall crack image. In summary, the use of digital image processing methods to extract apparent defects in tunnel linings involves calculations based on the pixel matrix of the original image. By designing filtering algorithms to remove known types of background noise and performing calculations based on the geometric or pixel characteristics of the target defects, good detection results can be achieved for tunnel image data with low complexity or linear feature background noise. However, in practice, metro tunnels have numerous facilities and similar textures, with adverse factors, such as uneven lighting, varying scales, and occlusions, leading to a large amount of nonlinear noise. For complex scenes and low-quality image data, the detection accuracy, efficiency, and robustness of these methods are limited. Additionally, these methods are constrained by the extraction of a single type of defect. When the original image contains multiple types of defects, the segmentation threshold and algorithm design must be adjusted accordingly, and the robustness and generalizability of the algorithms should be improved.

In recent years, with the rapid development of deep learning in computer vision technology [15], many scholars have constructed deep learning-based object detection and semantic segmentation models to extract and integrate the multiscale features of defects, achieving the accurate detection and extraction of various apparent defects in metro tunnel linings [16]. Xue et al. [17] built an FCN to quickly classify apparent defects, such as cracks and leakage, in images of metro tunnel linings and used the region-based FCN model to extract and classify the locations of defects in images containing them. Gao et al. [18] proposed an algorithm combining faster regions with convolutional neural networks (Faster R-CNN), adaptive region-of-interest layers, and fully convolutional networks (FCNs) for detecting tunnel defects, such as cracks, leakage, and exfoliation, in metro tunnel linings. Zhou et al. [19] proposed a deep convolutional neural network (DCNN)-based model to achieve multichannel feature extraction of spalling defects in subway tunnel lining. The depth information helps to quantify the volume of spalling defects in the tunnel lining. Li et al. [20] enhanced the resolution of metro tunnel images via image enhancement operations, such as contrast enhancement, and used the Faster R-CNN detection model to achieve high-precision automatic detection of tunnel surface defects. Chen et al. [21] also proposed a metro shield tunnel leakage detection method integrating cylindrical voxels and Mask R-CNN, achieving leakage detection and 3D visualization in different tunnel scenarios. Constructing deep learning networks evidently allows for targeted extraction of the deep features of apparent tunnel defects. This approach is more beneficial for the automatic detection, extraction, and classification of multiple types of defects. Compared with digital image processing techniques, deep learning networks offer better performance in large-scale and noise-complex metro tunnel defect detection tasks.

However, most current research focuses on optimizing the performance of models themselves to improve defect detection accuracy. Model design thinking that integrates the distribution characteristics of tunnel defects and real-world detection scenarios is generally lacking. For practical engineering requirements, the extraction accuracy for specific defects still requires further improvement. In particular, current research faces the following challenges: (1) In actual metro tunnel scenarios, defects such as cracks, exfoliation, and leakage are mainly distributed around relatively weak areas such as lining joints, circumferential seams, and areas around bolt holes and grouting holes (Figure 1). Current research insufficiently focuses on these critical features. Fully leveraging the spatial location characteristics of defects is crucial for improving detection accuracy. (2) Small target defects on metro tunnel lining surfaces are not prominent and are difficult to detect, which limits detection accuracy. Enhancing the detection of small target defects remains a pressing research challenge. (3) Apparent cracks in metro tunnel linings, as early indicators of various defects, have received extensive attention in engineering practice. Currently, the standard for detection accuracy has reached the millimeter level, highlighting the importance of high-precision extraction. However, given the undistinguishable features in the early stages of crack development and limitations in sensor performance, defect areas often suffer from blurred details and low image quality. These factors pose significant challenges for the accurate and high-precision extraction of cracks.

In this study, an algorithm for detecting apparent defects in tunnel linings and associated facilities in metro tunnel scenarios is proposed to address the aforementioned issues. Additionally, image super-resolution models and transfer learning techniques are used to enhance the resolution and clarity of the detected crack areas. In particular, the main contributions of this study are as follows:

(1) By analyzing the characteristics of metro tunnel detection scenarios and the lining appearance data, YOLOv8, a single-stage object detection model, was selected as the network architecture. Then, the CA was introduced into the backbone of the model to fully utilize the spatial location features of defects and associated facilities. This approach embeds positional information into channel attention to capture the target structure, enabling the network model to focus efficiently on defect-prone areas and assign higher weights to hotspots. Furthermore, a transformer computation module was incorporated into the model by using Bottleneck Transformer 3 (BoT3) to extract global contextual information from defect images. This module models the global feature dependency relationships of the feature map, learning the feature associations between defect-prone areas, their background information, and distant targets. This approach helps the model better understand the global scene and defect occurrence features, thereby enhancing the extraction of associative information between apparent defect locations, their background areas, and related facilities.

(2) A shallow, high-resolution detection layer was added to the model to address the issues of small-scale, undistinguishable features of minor defects in metro tunnel linings. This layer uses clearer positional and structural information about the defects to precisely locate small target defects and their structures. The detection accuracy of apparent defects in metro tunnel linings was improved by the abovementioned integrated optimization strategies.

(3) Low-quality images were generated from a high-resolution concrete crack dataset to address the issues of limited detail, unclear texture structure, and restricted high-precision segmentation in the detected crack regions of metro tunnel linings, from which the Real-ESRGAN image super-resolution model was trained. This process transferred high-quality, high-resolution concrete crack features to the detection results of metro tunnel lining cracks, enhancing the spatial resolution of the detected crack image regions by four times. Additionally, the texture representation and detailed information of the cracks were enhanced, providing technical support for the high-precision and accurate extraction of apparent cracks in metro tunnel linings.

The primary objective of this research was to develop a comprehensive algorithm for detecting apparent defects in metro tunnel linings and enhancing defect feature extraction. The specific objectives of this study were as follows: (i) To thoroughly analyze the spatial characteristics of defect occurrence and integrate the coordinate attention (CA) module and Bottleneck Transformer 3 into the backbone of YOLOv8, thereby improving detection accuracy in defect-prone areas. (ii) To incorporate a high-resolution detection layer into the head of YOLOv8, thus enhancing the model’s sensitivity to small-scale defects and improving detection accuracy. (iii) To validate the effectiveness of the proposed model in detecting defects within metro tunnel scenarios through ablation studies and comparative experiments. (iv) To construct a low-quality concrete crack dataset and develop a crack enhancement algorithm by combining transfer learning with the Real-ESRGAN image super resolution model. (v) To evaluate the efficacy of the proposed enhancement algorithm for high-precision crack extraction using both subjective and objective assessments.

2. Literature Background

The study of the automated detection of apparent defects in metro tunnel linings has evolved significantly over the years, with digital image processing and deep learning being methods developed and refined. The sensors or data used for data collection, research content, methodologies, and key contributions of the research mentioned in the Introduction are listed in Table 1.

3. Methods

The technical framework adopted in this study is shown in Figure 2. First, a dataset of apparent defects in metro tunnel linings was created via field data collection and preprocessing. Then, the basic framework of the defect detection model was determined by analyzing metro tunnel detection scenarios. Model optimization design strategies were formulated based on the characteristics of the lining appearance image data to extract and integrate the spatial region features of apparent defects, improve the detection of small target defects, and develop a defect detection model tailored for metro tunnel scenarios. In subsequent research, the proposed detection model was used to obtain the coordinates and images of defect areas. For apparent cracks in metro tunnel linings, transfer learning to migrate high-resolution concrete crack features was applied to the image super-resolution model Real-ESRGAN. This approach enhanced the resolution and texture details of the detected crack images, resulting in higher precision in segmentation and extraction. The output of the defect detection model serves as input for further processes, establishing strong interconnections and providing technical support for the entire cycle of defect detection and extraction. The details of the research methods used in this study are described in the following sections.

3.1. Detection Model for Apparent Defects in Metro Tunnel Linings

3.1.1. Basic Framework for Modeling the Defects Detection Model

In an environment for detecting apparent defects in metro tunnel linings, the enclosed tunnel space limits the data transmission and reception rates. Additionally, the fixed tunnel space restricts the types of edge detection equipment that can be used. For metro tunnels in operation or under construction, the allowed maintenance window periods for inspection are also limited [22]. Therefore, the spatial environment and time conditions in actual detection scenarios impose high requirements on the selection of detection models. With the aim of adapting to the metro tunnel detection environment and conditions, the designed model should include the following features: (1) high defect-detection accuracy, with the ability to assign higher weights to high-incidence areas; (2) rapid model inference speed to meet real-time detection requirements while maintaining detection accuracy; and (3) a small model size that can be easily deployed on lightweight and portable edge devices. The fundamental model for detecting apparent defects in metro tunnel linings was selected on the basis of these three design principles and further optimized in subsequent research.

In deep learning-based object detection models, algorithms can be categorized into two-stage and single-stage approaches based on the process of generating prediction boxes. The two-stage algorithms (such as Faster RCNN [23] and Cascade RCNN [24]) first generate a sparse set of candidate boxes, followed by regression on these candidate boxes and classification using a classifier. These methods typically offer higher detection accuracy but come with a larger number of model parameters and higher deployment costs. On the other hand, single-stage algorithms directly utilize convolutional layers to perform regression and classification predictions on pre-defined anchor boxes of different scales. While these methods are computationally more efficient and require fewer model parameters, their detection accuracy can still be further enhanced [25,26]. The representative single-stage algorithms include SSD [27] and YOLO (You Only Look Once) [28,29,30]. Among these models, the YOLO series of algorithms were designed for fast and efficient target detection. Unlike traditional two-stage detection models, YOLO processes the entire image in a single pass through the network, directly performing both bounding box regression and classification tasks. This approach eliminates the need for complex region proposal steps, streamlining the detection process. The core principle of YOLO is to treat object detection as a regression problem. Specifically, YOLO divides the input image into a fixed number of grid cells, with each cell predicting one or more bounding boxes and their associated class labels. By combining data labels and utilizing the backpropagation algorithm, YOLO adjusts the model’s weights during training, enabling the network to directly output the positions of bounding boxes and the categories of objects within those boxes at the final layer. Because the YOLO model simultaneously performs detection and classification in a single forward pass, it significantly increases detection speed. Additionally, YOLO’s compact network structure maintains a smaller model size, which allows it to strike an excellent balance between accuracy, speed, and model size. This makes the YOLO series of algorithms particularly well-suited for scenarios requiring real-time detection.

Among the YOLO algorithm versions, YOLOv8 [31,32] is a state-of-the-art (SOTA) model in object detection, which advances and refines the capabilities of previous YOLO versions. As shown in Figure 3, the YOLOv8 model builds upon its predecessors by incorporating significant enhancements to its architecture, which is composed of three primary components: the feature extraction backbone, the feature enhancement module (neck), and the detection head. Upon receiving an input image, the YOLOv8 model processes it through the feature extraction backbone. Here, the cross-stage partial bottleneck (CSPBottleneck) with a dual conversion (C2f) module is employed to extract essential features such as edges and textures. This stage efficiently captures both local and global features while optimizing gradient flow for effective training. Next, the extracted features are passed to the neck, where cross-stage feature transmission occurs. This step integrates features from different layers, allowing the model to handle objects of various scales by combining detailed high-resolution features with deeper, more abstract representations. Finally, the refined features are directed to the detection head, where the decoupled-head module separates the tasks of classification and localization. This separation enhances the precision of both tasks by reducing interference, enabling the model to output accurate bounding boxes and class labels for objects within the image.

As an advanced single-stage object detection model, YOLOv8 demonstrates superior performance in terms of detection accuracy, speed, and model parameters. Considering the inherent challenges presented by the metro tunnel environment, such as restricted spatial constraints, limited data transmission bandwidth, and the complex visual conditions marked by low illumination and intricate background structures, YOLOv8’s compact architecture proves particularly advantageous. Its design facilitates precise and efficient detection while ensuring seamless deployment on lightweight, portable edge devices, which is essential in the spatially confined and operationally demanding context of metro tunnels. Therefore, YOLOv8 was selected as the basic framework in this study, and a detection model tailored to apparent-defect detection scenarios in metro tunnel linings was designed based on this framework.

3.1.2. Coordinate Attention Module

Poor ventilation and inadequate lighting in the enclosed spaces of metro tunnels can lead to noisy images with high background similarity due to increased dust particles and uneven illumination. The information loss during the down-sampling process of obtaining low-resolution feature layers in the network model exacerbates these issues. Additionally, field experiments have shown that apparent defects in metro tunnel linings often occur at circumferential joints and seams between linings, with specific spatial location characteristics. Enhancing the weight calculation for these hotspot areas during convolution computations can improve defect detection rates and accuracy. Therefore, the coordinate attention (CA) module [33] is introduced into the feature extraction part of the YOLOv8 model to obtain more detailed spatial information on target detection and suppress irrelevant information from different channels. The principle of the coordinate attention module involves decomposing channel attention encoding into two one-dimensional feature encodings. These encodings are then subjected to average pooling along the X and Y dimensions, respectively, to mitigate the positional information loss that typically occurs when transforming a feature tensor into a one-dimensional vector via two-dimensional global pooling. By embedding positional information in this way, the channel attention module can simultaneously capture target structures while considering both channel and spatial information. It enhances the model’s ability to capture features in hotspot regions and long-range dependencies, allowing the network to quickly and accurately focus on defect-prone key areas while maintaining computational efficiency.

The structure of the coordinate attention module is shown in Figure 4. The computation of this model primarily consists of two steps: coordinated relationship embedding and coordinate attention generation.

Specifically, in the coordinated relationship embedding step, the input feature map undergoes average pooling along the horizontal and vertical coordinates within the spatial ranges of (H, 1) and (1, W) for each channel. This feature aggregation method generates a pair of direction-aware feature maps that capture long-range dependencies in one direction while preserving precise positional information in the other. The output of the c-th channel at height h and width w can be expressed as follows:

z_{c}^{h} (h) = \frac{1}{W} \sum_{0 \leq i < W} x_{c} (h, i),

(1)

z_{c}^{w} (w) = \frac{1}{H} \sum_{0 \leq j < H} x_{c} (j, w)

(2)

where

z_{c}^{h} (h)

denotes the pooled output of the c-th channel at the vertical coordinate direction h;

z_{c}^{w} (w)

denotes the pooled output of the c-th channel at the vertical coordinate direction w; W is the horizontal dimension; H is the vertical dimension; and

x_{c} (i, j)

is the corresponding positional eigenvalue of the c-th channel.

In the coordinate attention generation step, the pooled results are first concatenated to capture bidirectional features, resulting in a pair of direction-aware feature maps. Convolution is then applied for dimensional transformation, followed by a batch normalization (BN) layer to standardize data distribution. Subsequently, the results are passed through a sigmoid function to obtain an intermediate feature map f that encodes spatial information. This approach enables the establishment of long-range dependencies in one spatial direction while preserving precise positional information in the other. The equations are as follows:

f = δ (F_{1} ([Z^{h}, Z^{w}]))

(3)

where

[Z^{h}, Z^{w}]

is the feature layer stacking operation along the spatial dimension; F1 is 1 × 1 convolutional transform function; δ is the sigmoid activation function; and f is the intermediate feature map that encodes spatial information in horizontal and vertical directions.

Then, f is decomposed into two independent tensors along the spatial dimension, and the two 1 × 1 convolutional transform functions F_h and F_w are utilized to transform f^h and f^w into the same channel number tensor, respectively, which is computed by the sigmoid activation function, and ultimately outputted together with the input x. The output result of the coordinate attention module, Y, is shown below:

g^{h} = σ (F_{h} (f^{h})),

(4)

g^{w} = σ (F_{w} (f^{w})),

(5)

y_{c} (i, j) = x_{c} (i, j) \times g_{c}^{h} (i) \times g_{c}^{w} (j)

(6)

where g is the result of splitting the attention output of the tensor in different spatial dimensions; σ is the sigmoid function; F is the convolution calculation function; and f is the splitting of the tensor along different spatial dimensions.

In this study, the CA calculation module was added to layers 6, 9, and 12 of the base model feature extraction network, respectively, so that the model can pay attention to and focus on the area where the defects occur in the feature maps at different scales and improve the accuracy of defect detection.

3.1.3. Bottleneck Transformer 3

When convolutional neural networks are used to extract features from images of apparent defects in metro tunnel linings, the receptive field during convolution and pooling operations on high-dimensional feature maps is large. This scheme can reduce the resolution of the feature maps, limiting the number of pixels available for detecting minor defects in high-dimensional feature maps, which restricts the detection accuracy. In the context of metro tunnel linings, the locations of apparent defects and their background areas still have connections with distant bolt holes, grouting holes, and auxiliary facilities. Integrating the transformer algorithm [34] during the feature extraction stage can effectively help the model extract global contextual information from images. This approach allows the model to capture global feature dependencies, learn the feature associations between defect-prone areas, their background information, and distant targets, and better understand the overall scene and defect occurrence characteristics.

In this study, a global contextual feature extraction module named BoT3 (Bottleneck Transformer 3) [35] is incorporated into the high-dimensional feature extraction part of the model. As shown in Figure 5, the input feature map generates half the target feature layers by using two Conv modules, with one Conv output fed into the BoT (Bottleneck Transformer) module for self-attention mechanism computation. The feature maps are then stacked, and a Conv calculation is performed to output the target feature map. In the BoT calculation, the dimensional input feature layers are first reduced using a contraction operation and input into the MHSA (Multi-Head Self-Attention) [35,36] computation module. After obtaining the output, the feature dimensions are expanded back to their original size and summed with the original input feature layers to produce the final output.

The computational flow of the MHSA is shown in Figure 6. The dimensions of the input feature map X are H × W × d, which denote the height and width of the input feature matrix and the dimensions of individual tokens, respectively. The input X is subjected to convolution operation to obtain W_Q, W_K, and W_V. The relative position encoding of the height and width of the input feature map are R_h and R_w, respectively, which are initialized as trainable parameters and summed up by the broadcasting mechanism, i.e., the two-dimensional (i,j) position is encoded as R_hi + R_wj (d-dimensional vectors are summed up). The final output r represents the position encoding. Matrix multiplication of qr is carried out to obtain content–position output qr^T, and matrix multiplication of qk is executed to obtain content–content output qk^T. Matrix addition of qr^T with qk^T is conducted, softmax normalized exponential processing is applied to the resulting matrix, and the matrix format of the processed output values is HW × HW. Finally, the output value is a matrix multiplied by V weights to obtain the output Z.

In Figure 6, q, k, and v represent queries, keys, and weights, respectively; + and × represent element-by-element summation and matrix multiplication, respectively; and 1 × 1 represents point-by-point convolution. The blue parts represent position encodings and value projections, respectively.

The BoT3 module can capture information in feature maps via global attention. In this study, the BoT3 module is embedded after the spatial pyramid pooling fast (SPPF) operation to avoid large-scale computation and memory usage. This module can increase the connections between defect-prone areas and other target areas in the high-dimensional feature layers’ global features, thereby uncovering the correlations between apparent defects in the linings and other targets and obtaining more feature information.

3.1.4. The Proposed Defect Detection Model for Subway Tunnel Linings

The proposed model framework for detecting apparent defects in metro tunnel linings is shown in Figure 7. The model consists of three parts: the feature extraction network (backbone), the feature fusion network (neck), and the prediction network (head). The CA attention mechanism is embedded in the 6th, 9th, and 12th layers of the feature extraction network, allowing the model to focus on and highlight defect-prone areas in feature maps of different scales, thereby improving defect detection accuracy. Additionally, the BoT3 global feature extraction module is added after the SPPF module to increase the connections between defect-prone areas and other target areas in the global features of the high-dimensional feature layers. This approach helps to uncover the correlations between apparent defects in the linings and other targets, obtaining more feature information. In actual metro tunnel scenarios, the target sizes of auxiliary facilities (e.g., cables, pipes, and fixtures) are relatively large, while the sizes and positions of bolt holes and grouting holes are relatively fixed and moderate in scale. Among the categories of apparent defects, leakage areas are relatively large, but exfoliation and cracks vary in size. In particular, for small-scale fine cracks, the detection effectiveness is limited, indicating that the detection images contain complex targets of different scales. The original YOLOv8 model includes three output detection layers. When the input detection image size is 640 × 640, the corresponding detection layer feature map sizes are 80 × 80, 40 × 40, and 20 × 20. Given the small size of fine defects, detecting small target defects requires higher resolution and more accurate positioning information, with a greater emphasis on the ability to express image details. In model calculations, shallow feature maps have relatively high resolution and contain clearer defect locations and structural information. Therefore, fully integrating and utilizing shallow, high-resolution feature maps can improve the detection of small target defects.

The loss function calculation of the model includes class loss and position loss. Class loss measures the difference between the predicted class of the bounding box and the true class, calculated using binary cross entropy (BCE). Position loss measures the distance between the predicted box and the true box, which consists of bounding box regression loss (bbox) and confidence loss (dfl). In this experiment, bounding box regression loss uses CIoU loss, while confidence loss uses distribution focal loss (DFL). The final loss function calculation is the weighted sum of these three components, as follows:

T o t a l L o s s = α L o s s (c l s) + β L o s s (bbox) + γ L o s s (d f l)

(7)

3.2. Subway Tunnel Lining Fine-Crack Enhancement Model Based on Real-ESRGAN and Migration Learning

The metro tunnel lining surface contains numerous facilities, such as cables, pipes, fixtures, bolt holes, and grouting holes, whose textures and gray levels exhibit varying degrees of similarity to apparent defects. Directly extracting the geometric parameters of defects on a global scale is subject to significant noise interference. The improved detection algorithm for apparent defects in metro tunnel linings can obtain the relative coordinates and types of defect areas. When these coordinates are cropped in the original image, other images with simpler background complexity can be obtained. Matrix calculations can then be directly applied to these images, overcoming the impact of complex facilities on defect feature extraction calculations. However, compared with those of exfoliation and leakage, the detailed information in small-scale crack defect areas is relatively unclear. Although the background interference is low, the structure itself is not well defined, leading to some degree of error in direct geometric region extraction and calculation. This issue is addressed in this research by using image super-resolution and transfer learning methods to enhance small-crack areas, providing technical support for subsequent fine segmentation and extraction work.

Image super-resolution reconstruction involves extrapolating the spectrum beyond the cutoff frequency of an image, essentially restoring a low-resolution image to a high-resolution image [37,38]. The algorithms for this process are based on techniques such as interpolation [39], reconstruction [40], and deep learning [41]. Deep learning-based image super-resolution reconstruction methods use convolutional neural networks to learn the nonlinear mapping relationship between low-resolution and high-resolution images from a large number of high-quality image pairs. The learned mapping function is then applied to the input low-resolution image to obtain the corresponding high-resolution image. Compared with the other two methods, deep learning approaches offer better image detail restoration capabilities and adaptability [42,43].

The crack regions in metro tunnel linings have many texture structures and detailed information. Therefore, Real-ESRGAN [44], which has strong texture detail feature extraction and restoration capabilities, is used in this study for image super-resolution operations. The network structure is shown in Figure 8. The generator in Real-ESRGAN adopts the same residual-in-residual dense blocks (RRDB) [45] structure as ESRGAN [46], which can better enhance the texture details and realism of the images. In the discriminator network, the U-Net [47] network structure replaces the VGG-style [48] structure. The spatial size of the feature maps output by U-Net is the same as that of the input images, and each pixel is compared with the real value to compute the difference and backpropagate the error, achieving more precise discrimination of local textures.

In the calculation of the loss function, the total loss function of the model is the sum of L₁ loss, perception loss, and GAN loss, where L₁ loss is the sum of the absolute value of the difference between the target value and the predicted value, which indicates the average error of the predicted value, as shown in Equation (8):

L_{1} = \frac{1}{n} \sum_{i = 1}^{n} |f_{i} - y_{i}|

(8)

where f_i represents the pixel value of the image generated by the generator, y_i represents the pixel value of the real image, and n is the total number of pixels.

The perception loss is extracted by feeding the SR image generated by the generator and the actual HR image into the pre-trained VGG19 network for feature extraction. Then, the mean square error is used on the extracted feature maps, as shown in Equation (9):

L_{p e r c e p t i o n} = \frac{1}{W_{i, j} H_{i, j}} \sum_{w = 1}^{W_{i, j}} \sum_{h = 1}^{H_{i, j}} {(φ_{i, j} {(I^{H R})}_{w, h} - φ_{i, j} {(G_{θ} (I^{L R}))}_{w, h})}^{2}

(9)

where θ is a network parameter, I is the actual HR image, G(I) is the reconstructed SR image, W and H are the image dimensions, and φ_(i,j) is the feature map of the j-th convolution before the i-th max-pooling layer.

GAN loss feeds the output of the generative model into the discriminant model (U-Net) and computes its binary cross-entropy loss (BCELoss) on the results:

L_{G A N} = - (y_{i}^{r} l o g D (x_{i})) - (1 - y_{i}^{r}) l o g (1 - D (x_{i}))

(10)

where

y_{i}^{r}

is the true label and

D (x_{i})

is the discriminator output.

The detected crack areas have relatively blurred details and undistinguishable morphological features. The general Real-ESRGAN model is combined with transfer learning to perform image super-resolution operations on the crack area images output by the defect detection model. This tool enhances the crack structures and details, providing technical support for subsequent defect extraction, analysis, and related tasks.

4. Data Preparation

4.1. Tunnel Lining Apparent Data Collection

A metro tunnel inspection vehicle independently developed for data collection by the research team was used to obtain images of metro tunnel linings with apparent defects. This equipment consists of a FARO 3D laser scanner (Figure 9a, Artec 3D, Luxembourg), an MDC mobile detection rail car (MDC Aluminium Foot Rest Railway Components, Phagwara, Punjab) (with dimensions of 1620 mm × 565 mm × 540 mm, suitable for a track gauge range of 1425–1445 mm, and weighing approximately 30 kg, with a cruising speed of 2 km/h in this experiment), a speed sensor, an industrial computer, a power supply, and other components. During the operation phase, the rotating lens built into the laser scanner sensor emits laser pulses at the target object during high-speed rotation. The laser receiver captures the reflected information, including the spatial coordinates of the point cloud and RGB information. The apparent orthophoto grayscale map of the metro tunnel lining is then obtained via operations such as point cloud projection, interpolation calculations, and grayscale conversion. Additionally, the pulse information sent by the speed sensor is converted into speed and mileage information by the mileage recording module and synchronously transmitted to the industrial computer to obtain the corresponding positional information of the image data.

The data depicting apparent defects were collected by the research team from two sections of metro tunnels in Shenzhen, Guangdong Province, China. Both sections were constructed using the shield tunneling method, with the tunnel structures designed as single-track circular tunnels, each with a diameter of 5.4 m. The tunnel linings were made of reinforced concrete segments, and the tunnels were buried at depths ranging from 25 to 35 m. The surrounding geological environment includes residual cohesive soil and fully weathered tuffaceous sandstone, with overall poor stability of the surrounding rock. The strata in the tunnel areas are highly permeable with good water-bearing properties. The field experiment is shown in Figure 9b. The total length of the collected data segment is 2.4 km, and apparent orthophoto grayscale images (Figure 9c) of the metro tunnel linings were obtained. These images were used as raw data for subsequent defect detection and extraction.

4.2. The Subway Tunnel Lining Apparent Defects Dataset

Apparent defects in metro tunnel linings often occur around the joints and circumferential seams of the lining segments. These areas mostly include waterproof materials, which are less rigid than the lining material and are more prone to deformation under external forces, resulting in cracks, exfoliation, and leakage. Additionally, in the presence of external forces, the areas around bolt holes and grouting holes are prone to defects such as water seepage, mud leakage, and slurry leakage. Thus, the occurrence of apparent defects in metro tunnel linings has certain spatial location characteristics. The apparent orthophoto grayscale maps of the linings were also cut along the joints and circumferential seams of the lining segments to preserve and enhance this feature. The dataset is then created based on the preserved single segment. During the dataset creation process, LabelImg was used to annotate relevant facilities and defects on the lining surface, including apparent auxiliary facilities (e.g., cables, pipes, and fixtures), bolt holes and grouting holes, joints and circumferential seams, cracks, exfoliation, and leakage, as detection targets. Compared with the previous work of the research team, the present study focused more on the detailed annotation of minor apparent defects, increasing the proportion of small target defects. Horizontal, vertical, and diagonal mirroring were used for data augmentation to create the experimental dataset while maximizing the utilization of the dataset. The number of instances for each target category is shown in Table 2. During the experiments, the dataset was divided into training, validation, and testing sets at a ratio of 7:1:2.

4.3. Super-Resolution Enhancement Dataset for Lining Surface Cracks

In the study of super-resolution enhancement for the detected crack image regions, the open-source image super-resolution dataset DIV2K [49] was first used to pretrain the Real-ESRGAN model. This dataset contains 800 high-resolution images from various scenes and subjects, with resolutions up to 2048 × 1080. The dataset also provides corresponding low-resolution images. When the pretraining method is used, the performances of the generator and discriminator models can be improved, enabling them to generate high-quality, high-resolution images from low-resolution images. The pretrained model is then used for transfer learning to further enhance the super-resolution restoration effect of the crack image regions.

An open-source concrete-crack dataset was used to train the super-resolution model, especially since the apparent surface of metro tunnel linings is composed of concrete. The dataset used is Aft_Original_Crack_DataSet_Second [50], which contains 1625 high-resolution images of cracks on concrete backgrounds. After the image dataset was converted to grayscale, second-order degradation operations (e.g., blurring, down sampling, noise, and JPEG compression) were applied to obtain an equal number of low-resolution images (Figure 10). These images were used together as the model training dataset to learn the texture features of cracks. The crack morphology and detailed features were transferred to the image super-resolution model for detailed restoration of the crack areas.

4.4. Experimental Environment

Data processing, code debugging, and model testing for this experiment were conducted on a computer with the Windows 11 operating system, an Intel Core i9-9900 CPU (Intel, Santa Clara, CA, USA), and an RTX 3060 GPU (NVIDIA, Santa Clara, CA, USA). The software was written using the Python 3.8 programming language. Data augmentation was performed using dependencies such as OpenCV 4.6.0 and NumPy 1.23.0, and the network model was built using the PyTorch 1.11 deep learning framework. During the model training phase, experiments were conducted by connecting to a remote server. The operating system of the remote server was Ubuntu 20.04, with an NVIDIA V100 GPU and 32 GB of memory. The relevant GPU dependencies were CUDA 11.3 and cuDNN 8.9.5. The other software dependencies used were consistent with those used on the original experimental computer.

5. Experiments and Results

5.1. Training Results of the Subway Tunnel Lining Apparent-Defect Detection Model

After setting up the experimental environment, the optimized model was trained and validated. The hyperparameters for the network model training phase were set as follows: stochastic gradient descent was used as the optimizer with an initial learning rate of 0.01, momentum of 0.937, a decay coefficient of 0.0005, a warmup epoch of 3, a batch size of 16, and a data loader worker count of 8. The training ran for 850 epochs. The training and validation losses are shown in Figure 11a. During the first 50 epochs of model training, both the training loss and validation loss decreased rapidly, with the validation loss showing larger fluctuations. Subsequently, the rate of loss reduction slowed, and after 600 epochs, the training and validation losses stabilized, with the validation loss being lower than the training loss, indicating that the model did not overfit. Additionally, the mean average precision (mAP) values were recorded during each validation process (Figure 11b). The model’s validation accuracy increased rapidly during the first 50 epochs. Then, the rate of increase slowed, maintaining a low growth rate after 300 epochs and gradually converging after 600 epochs. The final model accuracy on the validation set stabilized at 0.879 and 0.652.

5.2. Ablation Experiments on Detection Models of Apparent Defects in Subway Tunnel Linings and Related Facilities

Ablation experiments were conducted to validate the effectiveness of the improved model and to investigate the impact of each introduced module on the model’s detection performance. The experiments were divided into eight groups trained under the same experimental environment, training parameters, and dataset. Using YOLOv8s as the base model, CA mechanisms, BoT3 feature enhancement, and high-resolution detection layers were introduced at the corresponding positions in the model, and their combinations were tested to obtain experimental results.

The accuracy changes during model validation for each experimental group are shown in Figure 12. In the initial stage of model validation, as no pretrained model was used, the initial inference had fewer predicted boxes, resulting in a lower recall value. However, the precision was higher, indicating that a high proportion of the few predicted boxes were correct. As the number of epochs increased, the number of predicted boxes gradually increased, the recall gradually increased, and the precision decreased, indicating a decrease in the proportion of correct predictions. Subsequently, both values gradually increased and eventually converged. The final improved model achieved a recall value of 0.831, the highest among the experimental groups, and a precision value of 0.915, the third highest among the experimental groups, representing a 1.9% improvement over the original model. Using mAP values to evaluate model accuracy, without any improvement strategies, the model achieved a mAP@50 of 0.831 and a mAP@50–95 of 0.604 for the detection of tunnel lining facilities and defects. During the first 100 epochs of model training, the validation accuracy was consistently greater than that of the other experimental groups. As the number of epochs increased, the model parameters further improved, and the validation accuracy of the other optimized experimental groups gradually exceeded that of the original YOLOv8s model. The applied optimization strategies showed progressive accuracy improvements. The final optimized model achieved mAP values of 0.879 and 0.652 for all categories during validation, representing a 4.8% improvement over the original model. This indicates that the optimized model can better detect tunnel lining facilities and defects during validation.

The performance of each experimental group on the test set is shown in Table 3. The test results for apparent defects include mAP@50 values for cracks, leakage, exfoliation, and all other categories to evaluate model accuracy. Finally, weight parameter heatmaps were generated via model weight visualization for auxiliary analysis and validation to enhance the interpretability of the model optimization process.

As shown in the Figure 13, combining the results of the ablation experiments with the weight heatmap analysis, it can be concluded that adding high-resolution detection layers significantly improves defect detection accuracy. The weight distribution results show that it can better utilize shallow texture features to enhance the focus on small target defects and more precisely fit the defect morphology, but the results also show some sensitivity to other detection targets and noise. Applying CA can improve the detection of defects and facilities in key areas by increasing the weight distribution for defect regions at the current feature layer scale, but it does not significantly enhance small target defect detection. Incorporating BoT3 enhances the weight distribution for medium- and large-scale defects, facility targets, and their background areas and is more sensitive to defect areas adjacent to tunnel surface facilities and other targets. Among the pairwise combinations of experimental groups, adding both high-resolution detection layers and CA simultaneously improves the parameter weights for small target defect areas and morphologies and frequently occurring defect areas (edges), providing better detection performance than other pairwise combinations. Ultimately, applying all three optimization strategies achieves finer granularity in the parameter weight distribution, better fits small target defect morphologies, and balances greater attention to edge regions where defects frequently occur and adjacent areas of tunnel surface facilities. This situation aligns more closely with the spatial characteristics of apparent defects in tunnels. Compared with the YOLOv8s model, the optimized model achieved a mAP of 0.838 for the detection of three types of apparent defects, an improvement of 7%, and a mAP of 0.87 for all target categories in metro tunnels, an improvement of 4.3%. Thus, the optimized model is better suited for metro tunnel detection scenarios and is more sensitive to apparent defects in metro tunnels, validating the effectiveness of the optimization strategies.

5.3. Comparison of Test Results with the Original Model

The effectiveness of detecting tunnel lining defects and other targets before and after optimization can be more intuitively determined by comparing the performance of the optimized model with that of the original model on the test dataset. Similarly, the heatmap of the parameter weights of the model at the detection layer can be visualized before and after the optimization. The heatmap of the model at the detection layer of the small targets before and after the improvement is also added, allowing for the effective comparison of detecting small targets before and after model optimization. Moreover, improved thermograms were added to compare the model’s focus on small target defect detection before and after model optimization, and the markedly improved regions were labeled in the improved thermograms. As shown in Figure 14a,b, the optimized model can focus on the small cracks around the ring joints in the small target detection layer and outputs the prediction results. The optimized model has higher detection accuracy and confidence for small- and medium-scale cracks at the boundary of the ring joint and near the apparent appurtenances of the lining. The optimized model can pay more attention to this boundary area and assign higher parameter weights to the defected area, which has better detection results (Figure 14c). For leakage damage, which occurs around bolt holes, grouting holes, and joints (Figure 14d), the optimized model can assign more weight to the leakage damage at bolt holes, which improves the detection rate and has higher detection confidence and better prediction frame fitting results for large-scale damage detection (Figure 14e).

5.4. Comparison with Other Target Detection Models

In the relative study of deep learning target-detection-model-based subway tunnel lining superficial defects and appurtenances, according to the model type division, the current mainstream two-stage target detection models Faster RCNN [23] and Cascade RCNN [24] and single-stage detection models SSD [27], YOLOv3, and YOLOv5 were selected. Furthermore, the optimization of the model based on YOLOv5 was discussed in the earlier work of the research team by incorporating CA and BiFPN-feature fusion networks [51]. Therefore, comparison experiments were added to utilize the dataset of this study and impose the same optimization strategy by using YOLOv5s and YOLOv8s as the base frameworks. The models were defined as YOLOv5s-Improve and YOLOv8-Improve, which are jointly compared with the algorithms proposed in this study in terms of their comprehensive performance, which is judged by the mAP, the size of the model weight file, and the detection frame rate (ms/per image).

The experimental results are shown in Table 4; the accuracies of the two-stage target detection models Faster RCNN and Cascade RCNN are greater than those of the SSD and YOLOv3 single-stage target detection models on the dataset. However, their model-weight files require a significant amount of memory, and model detection is inefficient. The detection speed and memory requirements of the YOLOv5 detection model are better than those of the other models. However, the detection accuracy of the apparent defect and other targets is slightly lower than that of the YOLOv8 model. Compared with the optimization strategies of similar studies, the detection model proposed in this study has greater accuracy in detecting apparent defects. The mAP is greater than that of both YOLOv5 and YOLOv8. Although the imposed optimization strategy adds additional computations which make the model detection speed and model memory occupation slightly lower than those of the smaller YOLOv5 and YOLOv8 base frameworks, the difference is minimal, and the model still achieves real-time detection and deployment of lightweight materials and has better performance in terms of the comprehensive performance of the model, which is more suitable for metro tunnel scenarios. The model has better comprehensive performance and is more suitable for detecting superficial defects in subway tunnel scenarios.

5.5. Experimental Results of the Super-Resolution Enhancement of Apparent Cracks in Subway Tunnel Linings

During the model pretraining phase, the DIV2K dataset was used to train the Real-ESRGAN, with the generator RRDB containing 23 blocks, a scale of 4, an initial learning rate of 0.0001, and Adam as the optimizer, trained for 100K iterations. The generator training loss results of the pretrained model (Figure 15a) indicate that the generator model loss converges, demonstrating a good ability to generate high-resolution images, and can be used as a pretraining model for transfer learning. Subsequently, the generator and discriminator were trained on the created concrete crack image super-resolution dataset by combining the L1 loss, perceptual loss, and adversarial loss. The generator RRDB contained 23 blocks, and the discriminator network was U-Net. During training, the batch size was 16, and the initial learning rates for the generator and discriminator were both 0.0001, with Adam as the optimizer, trained for 150K iterations. The training loss curves of the generator and discriminator are shown in Figure 15b,c. The generator and discriminator reached a balance during continuous adversarial training, effectively performing image super-resolution reconstruction. The losses of both the generator and discriminator fluctuated within a reasonable range, indicating good training results.

When the proposed metro tunnel defect detection model was applied, crack images could be obtained based on the relative coordinates of the detected fine-crack areas (Figure 16). These images were used as test data, and the trained Real-ESRGAN model was utilized to perform super-resolution restoration on the acquired low-resolution crack images.

Additionally, traditional bicubic interpolation and deep learning-based super-resolution models, such as the SRGAN and ESRGAN, were used for the experiments in this study. The relevant models used the official pretrained models and parameters from the DIV2K dataset during training. The comparative test results are shown in Figure 17.

From a visual subjective perspective, traditional interpolation methods have significant limitations in enhancing details and removing blur from crack images. In contrast, deep learning methods can enhance image clarity to varying degrees by learning the morphology and texture features of concrete cracks. The SRGAN provides some texture enhancement and detail restoration for the main crack areas but is limited in enhancing branch areas. The ESRGAN shows better restoration capabilities for both the main and branch structures of cracks but still shows noticeable artifacts in the background area and lacks performance for some potential detailed information. The application of Real-ESRGAN results in strong detail enhancement for crack areas and better representation of the concrete background, achieving good results in local detail enhancement and artifact suppression.

Given that the experimental data did not include high-resolution crack region images as true labels, no-reference image quality evaluation metrics, a perception-based image quality evaluator (PIQE) [52], and an entropy-based no-reference image quality assessment (ENIQA) [53] were used for objective quantitative evaluation. The PIQE calculates the no-reference quality score of the image via block-wise distortion estimation. ENIQA extracts mutual information and 2D entropy from the spatial and frequency domains of an image and uses a support vector classifier and support vector regression to predict the image quality score. The lower the PIQE and ENIQA values, the greater the image quality. The comparative experimental results are shown in Table 5. Compared with those of other models, the crack images generated by Real-ESRGAN have higher quality, with more detail and background information. The PIQE and ENIQA values for the generated crack region images are 2.7 and 0.092, respectively, which are 0.13 and 0.017 lower than those of ESRGAN, indicating better reconstruction effects. These results can provide technical support for the analysis of the formation mechanism of apparent cracks in metro tunnel linings and for maintenance and repair work.

A comparative experiment on crack segmentation and extraction was conducted to intuitively demonstrate the effectiveness and necessity of crack image super-resolution. A publicly available crack segmentation model provided in [54] was selected as an objective evaluation model. This model is based on the U-Net framework and adopts the VGG16 architecture. Its dataset combines 12 crack segmentation datasets containing a total of 11,200 crack images. Using the provided pretrained model, the original crack images and the super-resolution images were used as input to obtain the crack segmentation probability maps and binary maps (Figure 18). Analysis of the experimental results revealed that the use of the image super-resolution algorithm increased the resolution of the crack areas by four times. This finding is highly significant for improving the accuracy of crack geometric parameter extraction. Additionally, the same segmentation model achieved better segmentation results for the super-resolution crack images. The enhancement of the overall shape and detailed information of the cracks greatly improved the segmentation and recognition results, yielding more accurate and complete extraction results. Furthermore, the findings verify the importance of the image super-resolution model used in this study for crack segmentation and extraction tasks.

6. Discussion

6.1. Discussion on Engineering Practice Applications in Metro Tunnel Maintenance

The YOLOv8-based detection model for apparent defects in metro tunnel linings proposed in this study holds significant practical value for metro tunnel engineering. Its high defect-detection accuracy, coupled with its overall performance in detection speed and model size, makes it particularly well-suited for metro tunnel scenarios. The enhanced detection accuracy, particularly in identifying small defects such as fine cracks and minor exfoliation, directly improves the safety and reliability of metro tunnel operations. By detecting defects at an earlier stage, this model enables timely maintenance interventions, thereby reducing the risk of catastrophic failures, such as tunnel collapse or severe water leakage, which could have devastating consequences. Early detection not only enhances the safety of tunnel operations but also prolongs the lifespan of infrastructure by preventing minor issues from escalating into major structural problems. Moreover, the application of super-resolution technology significantly improves the clarity and detail of crack images, which is crucial for accurate crack segmentation and geometric feature extraction. By enhancing image quality, super-resolution not only increases the accuracy of subsequent analyses but also provides more reliable data support for high-precision engineering assessments. This is of great importance for decision-making in tunnel maintenance, enabling more effective guidance for actual repairs and preventive maintenance work.

However, despite the model’s excellent performance in controlled experimental environments, its application in more diverse and complex real-world tunnel environments may reveal certain limitations. For instance, the model’s effectiveness could be challenged by varying lighting conditions, different tunnel materials, or more complex defect patterns that may not have been fully represented in the training dataset. Additionally, the processing time and computational resource demands of super-resolution technology could pose a bottleneck in practical deployment, particularly on resource-constrained mobile devices.

6.2. Future Work

In future work, the research team will focus on improving the deployment and generalization capabilities of the metro tunnel lining defect detection model on mobile devices. Building on this current research, we will continue to focus on segmentation and extraction algorithms for minor apparent defects in metro tunnel linings. By enhancing image detail information, we aim to improve feature extraction accuracy and reduce errors by developing an integrated algorithm for the efficient detection of apparent defects and the high-precision extraction of geometric features. This scheme can provide technical support for the refined and intelligent operation and maintenance of metro tunnels.

7. Conclusions

To address the challenges of insufficient utilization of spatial location characteristics in detecting apparent defects in metro tunnel linings, limited detection accuracy (especially for small-scale defects), and poor detail clarity and geometric feature extraction accuracy for fine defects, this study proposes a comprehensive deep learning-based subway tunnel lining apparent-defect detection algorithm. The algorithm consists of two main components: (1) A detection model for apparent defects in metro tunnel linings. A YOLOv8-based framework was employed, incorporating the coordinate attention (CA) module, Bottleneck Transformer 3 (BoT3), and a high-resolution detection layer. These enhancements significantly improve the model’s focus on defect-prone areas in metro tunnel linings and enhance detection accuracy, particularly for small-scale defects. The experimental results demonstrate that the improved detection model achieves a mean average precision (mAP) of 87% in identifying cracks, exfoliation, water leakage, and other apparent facilities, representing a 4.3% increase compared to the original YOLOv8 model. Moreover, its overall performance is better suited to metro tunnel detection scenarios compared to other object detection models. (2) A subway tunnel lining fine-crack enhancement model. In this component, a dataset of low-quality concrete cracks was constructed based on an open-access dataset, and the Real-ESRGAN model, combined with transfer learning techniques, was applied to enhance the resolution and clarity of fine-crack images. This approach addresses the challenges of limited detail and unclear texture in detected crack regions. The super-resolution techniques increased the spatial resolution of the crack images by four times, with PIQE and ENIQA values reaching 2.7 and 0.092, respectively, indicating a significant improvement in image quality. This enhancement provides robust data support for precise crack segmentation and geometric feature extraction tasks.

The above results indicate that the proposed algorithm can provide technical support for metro tunnel defect detection and maintenance tasks. Its robust performance in detection accuracy and image enhancement makes it particularly well suited for application in the metro tunnel environment. The proposed algorithm could contribute to the advancement of intelligent metro tunnel maintenance systems.

Author Contributions

Conceptualization, A.Z. and J.Z.; methodology, A.Z., J.Z. and Y.C.; software, A.Z., S.Q. and D.W.; validation, S.Q., D.W. and Y.C.; formal analysis, J.Z.; investigation, A.Z. and S.Q.; resources, A.Z. and D.W.; data curation, A.Z. and S.Q.; writing—original draft preparation, A.Z. and Y.C.; writing—review and editing, A.Z., J.Z. and Y.C.; visualization, A.Z. and D.W.; supervision, J.Z.; project administration and funding acquisition, J.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Shenzhen Science and Technology Innovation Commission (grant Nos. 20220810173255002 and JCYJ20230808105103006).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to privacy.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Feng, S.J.; Feng, Y.; Zhang, X.L.; Chen, Y.H. Deep learning with visual explanations for leakage defect segmentation of metro shield tunnel. Tunn. Undergr. Space Technol. 2023, 136, 105107. [Google Scholar] [CrossRef]
Tan, K.; Cheng, X.J.; Ju, Q.Q.; Wu, S.B. Correction of Mobile TLS Intensity Data for Water Leakage Spots Detection in Metro Tunnels. IEEE Geosci. Remote Sens. Lett. 2016, 13, 1711–1715. [Google Scholar] [CrossRef]
Wu, H.B.; Ao, X.R.; Chen, Z.; Liu, C.; Xu, Z.R.; Yu, P.F. Concrete Spalling Detection for Metro Tunnel from Point Cloud Based on Roughness Descriptor. J. Sens. 2019, 2019, 1–12. [Google Scholar] [CrossRef]
Yu, A.B.; Mei, W.S.; Han, M.L. Deep learning based method of longitudinal dislocation detection for metro shield tunnel segment. Tunn. Undergr. Space Technol. 2021, 113, 103949. [Google Scholar] [CrossRef]
Huang, Z.; Fu, H.L.; Fan, X.D.; Meng, J.H.; Chen, W.; Zheng, X.J.; Wang, F.; Zhang, J.B. Rapid Surface Damage Detection Equipment for Subway Tunnels Based on Machine Vision System. J. Infrastruct. Syst. 2021, 27, 04020047. [Google Scholar] [CrossRef]
Liu, X.F.; Hong, Z.L.; Shi, W.; Guo, X.D. Image-Processing-Based Subway Tunnel Crack Detection System. Sensors 2023, 23, 6070. [Google Scholar] [CrossRef]
Liu, X.R.; Zhu, L.Q.; Wang, Y.D.; Yu, Z.J. A crack detection system of subway tunnel based on image processing. Meas. Control 2022, 55, 164–177. [Google Scholar] [CrossRef]
Wang, A.; Togo, R.; Ogawa, T.; Haseyama, M. Defect Detection of Subway Tunnels Using Advanced U-Net Network. Sensors 2022, 22, 2330. [Google Scholar] [CrossRef] [PubMed]
Attard, L.; Debono, C.J.; Valentino, G.; Di Castro, M. Tunnel inspection using photogrammetric techniques and image processing: A review. ISPRS J. Photogramm. Remote Sens. 2018, 144, 180–188. [Google Scholar] [CrossRef]
Murakami, T.; Saito, N.; Komachi, Y.; Okamura, K.; Michikawa, T.; Sakashita, M.; Kogure, S.; Kase, K.; Wada, S.; Midorikawa, K. High Spatial Resolution Survey Using Frequency-Shifted Feedback Laser for Transport Infrastructure Maintenance. J. Disaster Res. 2017, 12, 546–556. [Google Scholar] [CrossRef]
Ai, Q.; Yuan, Y. Rapid Acquisition and Identification of Structural Defects of Metro Tunnel. Sensors 2019, 19, 4278. [Google Scholar] [CrossRef]
Ba, Y.L.; Zuo, J.; Jia, Z.M. Image Filtering Algorithms for Tunnel Lining Surface Cracks Based on Adaptive Median-Gaussian. In Proceedings of the 6th International Conference on Transportation Engineering (ICTE), Chengdu, China, 20–22 September 2019; pp. 849–853. [Google Scholar]
Gong, Q.M.; Zhu, L.Q.; Wang, Y.D.; Yu, Z.J. Automatic subway tunnel crack detection system based on line scan camera. Struct. Control Health Monit. 2021, 28, e2776. [Google Scholar] [CrossRef]
Lei, M.F.; Liu, L.H.; Shi, C.H.; Tan, Y.; Lin, Y.X.; Wang, W.D. A novel tunnel-lining crack recognition system based on digital image technology. Tunn. Undergr. Space Technol. 2021, 108, 103724. [Google Scholar] [CrossRef]
Gu, J.X.; Wang, Z.H.; Kuen, J.; Ma, L.Y.; Shahroudy, A.; Shuai, B.; Liu, T.; Wang, X.X.; Wang, G.; Cai, J.F.; et al. Recent advances in convolutional neural networks. Pattern Recognit. 2018, 77, 354–377. [Google Scholar] [CrossRef]
Liu, L.; Ouyang, W.L.; Wang, X.G.; Fieguth, P.; Chen, J.; Liu, X.W.; Pietikainen, M. Deep Learning for Generic Object Detection: A Survey. Int. J. Comput. Vis. 2020, 128, 261–318. [Google Scholar] [CrossRef]
Xue, Y.D.; Li, Y.C. A Fast Detection Method via Region-Based Fully Convolutional Neural Networks for Shield Tunnel Lining Defects. Comput.-Aided Civ. Infrastruct. Eng. 2018, 33, 638–654. [Google Scholar] [CrossRef]
Gao, X.W.; Jian, M.; Hu, M.; Tanniru, M.; Li, S.Q. Faster multi-defect detection system in shield tunnel using combination of FCN and faster RCNN. Adv. Struct. Eng. 2019, 22, 2907–2921. [Google Scholar] [CrossRef]
Zhou, M.; Cheng, W.; Huang, H.; Chen, J. A Novel Approach to Automated 3D Spalling Defects Inspection in Railway Tunnel Linings Using Laser Intensity and Depth Information. Sensors 2021, 21, 5725. [Google Scholar] [CrossRef]
Li, D.W.; Xie, Q.; Gong, X.X.; Yu, Z.H.; Xu, J.X.; Sun, Y.X.; Wang, J. Automatic defect detection of metro tunnel surfaces using a vision-based inspection system. Adv. Eng. Inform. 2021, 47, 101206. [Google Scholar] [CrossRef]
Chen, Q.; Kang, Z.Z.; Cao, Z.; Xie, X.W.; Guan, B.W.; Pan, Y.X.; Chang, J. Combining Cylindrical Voxel and Mask R-CNN for Automatic Detection of Water Leakages in Shield Tunnel Point Clouds. Remote Sens. 2024, 16, 896. [Google Scholar] [CrossRef]
Man, K.; Liu, R.L.; Liu, X.L.; Song, Z.F.; Liu, Z.X.; Cao, Z.X.; Wu, L.W. Water Leakage and Crack Identification in Tunnels Based on Transfer-Learning and Convolutional Neural Networks. Water 2022, 14, 1462. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
Cai, Z.; Vasconcelos, N. Cascade R-CNN: Delving into High Quality Object Detection. In Proceedings of the 31st IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 17–21 June 2018. [Google Scholar]
Zou, Z.X.; Chen, K.Y.; Shi, Z.W.; Guo, Y.H.; Ye, J.P. Object Detection in 20 Years: A Survey. Proc. IEEE 2023, 111, 257–276. [Google Scholar] [CrossRef]
Jiao, L.C.; Zhang, F.; Liu, F.; Yang, S.Y.; Li, L.L.; Feng, Z.X.; Qu, R. A Survey of Deep Learning-Based Object Detection. IEEE Access 2019, 7, 128837–128868. [Google Scholar] [CrossRef]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the 14th European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 11–14 October 2016. [Google Scholar]
Ge, Z.; Liu, S.T.; Wang, F.; Li, Z.M.; Sun, J. YOLOX: Exceeding YOLO Series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
Li, C.Y.; Li, L.L.; Jiang, H.L.; Weng, K.H.; Geng, Y.F.; Li, L.; Ke, Z.D.; Li, Q.Y.; Cheng, M.; Nie, W.Q.; et al. YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications. arXiv 2022, arXiv:2209.02976. [Google Scholar]
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv 2022, arXiv:2207.02696. [Google Scholar]
Talaat, F.M.; ZainEldin, H. An improved fire detection approach based on YOLO-v8 for smart cities. Neural Comput. Appl. 2023, 35, 20939–20954. [Google Scholar] [CrossRef]
Li, Y.T.; Fan, Q.S.; Huang, H.S.; Han, Z.G.; Gu, Q. A Modified YOLOv8 Detection Network for UAV Aerial Image Recognition. Drones 2023, 7, 304. [Google Scholar] [CrossRef]
Hou, Q.B.; Zhou, D.Q.; Feng, J.S. Coordinate Attention for Efficient Mobile Network Design. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 19–25 June 2021; pp. 13708–13717. [Google Scholar]
Liu, Y.; Zhang, Y.; Wang, Y.X.; Hou, F.; Yuan, J.; Tian, J.; Zhang, Y.; Shi, Z.C.; Fan, J.P.; He, Z.Q. A Survey of Visual Transformers. IEEE Trans. Neural Netw. Learn. Syst. 2023, 35, 7478–7498. [Google Scholar] [CrossRef]
Srinivas, A.; Lin, T.Y.; Parmar, N.; Shlens, J.; Abbeel, P.; Vaswani, A. Bottleneck Transformers for Visual Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 19–25 June 2021; pp. 16514–16524. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. In Proceedings of the 31st Annual Conference on Neural Information Processing Systems (NIPS), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Chen, H.; He, X.; Qing, L.; Wu, Y.; Ren, C.; Sheriff, R.E.; Zhu, C. Real-world single image super-resolution: A brief review. Inf. Fusion 2022, 79, 124–145. [Google Scholar] [CrossRef]
Li, W.; Schierle, G.S.K.; Lei, B.; Liu, Y.; Kaminski, C.F. Fluorescent Nanoparticles for Super-Resolution Imaging. Chem. Rev. 2022, 22, 12495–12543. [Google Scholar] [CrossRef]
Freeman, W.T.; Adelson, E.H. The design and use of steerable filters. IEEE Trans. Pattern Anal. Mach. Intell. 1991, 13, 891–906. [Google Scholar] [CrossRef]
Yang, Y.-q. Research on the single image super-resolution method based on sparse Bayesian estimation. Clust. Comput.-J. Netw. Softw. Tools Appl. 2019, 22, 1505–1513. [Google Scholar] [CrossRef]
Chen, Y.; Xia, R.; Yang, K.; Zou, K. MFFN: Image super-resolution via multi-level features fusion network. Vis. Comput. 2024, 40, 489–504. [Google Scholar] [CrossRef]
Gao, G.; Xu, Z.; Li, J.; Yang, J.; Zeng, T.; Qi, G.-J. CTCNet: A CNN-Transformer Cooperation Network for Face Image Super-Resolution. IEEE Trans. Image Process. 2023, 32, 1978–1991. [Google Scholar] [CrossRef]
Li, Y.; Sixou, B.; Peyrin, F. A Review of the Deep Learning Methods for Medical Images Super Resolution Problems. IRBM 2021, 42, 120–133. [Google Scholar] [CrossRef]
Wang, X.T.; Xie, L.B.; Dong, C.; Shan, Y.; Soc, I.C. Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data. In Proceedings of the 18th IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada, 11–17 October 2021; pp. 1905–1914. [Google Scholar]
Zhang, Y.L.; Tian, Y.P.; Kong, Y.; Zhong, B.N.; Fu, Y. Residual Dense Network for Image Restoration. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 2480–2495. [Google Scholar] [CrossRef] [PubMed]
Wang, X.T.; Yu, K.; Wu, S.X.; Gu, J.J.; Liu, Y.H.; Dong, C.; Qiao, Y.; Loy, C.C. ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks. In Proceedings of the 15th European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 63–79. [Google Scholar]
Falk, T.; Mai, D.; Bensch, R.; Çiçek, Ö.; Abdulkadir, A.; Marrakchi, Y.; Böhm, A.; Deubner, J.; Jäckel, Z.; Seiwald, K.; et al. U-Net: Deep learning for cell counting, detection, and morphometry. Nat. Methods 2019, 16, 67–70. [Google Scholar] [CrossRef]
Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2015, arXiv:1409.1556. [Google Scholar]
Timofte, R.; Agustsson, E.; Van Gool, L.; Yang, M.-H.; Zhang, L.; Lim, B.; Son, S.; Kim, H.; Nah, S.; Lee, K.M.; et al. NTIRE 2017 Challenge on Single Image Super-Resolution: Methods and Results. In Proceedings of the 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, 22–25 July 2017. [Google Scholar]
Li, L.; Park, P.; Yang, S.-B. The role of public-private partnership in constructing the smart transportation city: A case of the bike sharing platform. Asia Pac. J. Tour. Res. 2021, 26, 428–439. [Google Scholar] [CrossRef]
Yin, Z.-R.; Lei, Z.; Zheng, A.; Zhu, J.; Liu, X.-Z. Automatic Detection and Association Analysis of Multiple Surface Defects for Shield Subway Tunnel. Sensors 2023, 23, 7106. [Google Scholar] [CrossRef] [PubMed]
Venkatanath, N.; Praneeth, D.; Chandrasekhar, B.H.M.; Channappayya, S.S.; Medasani, S.S. Blind image quality evaluation using perception based features. In Proceedings of the 2015 Twenty First National Conference on Communications (NCC), Mumbai, India, 27 February–1 March 2015. [Google Scholar]
Chen, X.Q.; Zhang, Q.Y.; Lin, M.H.; Yang, G.Y.; He, C. No-reference color image quality assessment: From entropy to perceptual quality. EURASIP J. Image Video Process. 2019, 2019, 77. [Google Scholar] [CrossRef]
Crack_Segmentation. Available online: https://github.com/khanhha/crack_segmentation (accessed on 19 March 2020).

Figure 1. The prone areas of apparent tunnel lining defects.

Figure 2. The technical framework of this paper.

Figure 3. YOLOv8 framework.

Figure 4. Coordinate attention module.

Figure 5. BoT3 module.

Figure 6. The calculation process of the MHSA [35].

Figure 7. The proposed defect detection model.

Figure 8. Real-ESRGAN network.

Figure 9. (a) Data acquisition equipment, (b) field experiments, (c) apparent grayscale image of metro tunnel lining.

Figure 10. Concrete crack image data and its degradation process.

Figure 11. (a) Train loss and validation loss of the proposed model. (b) The validation mAP of the proposed model.

Figure 12. Train and validation results of the ablation experiment. (a) The mAP@50 result, (b) the mAP@50–95 result, (c) the precision result, (d) the recall result.

Figure 13. Grad-CAM analyzing results for ablation test results.

Figure 14. Comparison of the test results. The images shown from left to right are the input image, the weight assignment of the small target detection layer of the original model, the weight assignment of the small target detection layer of the proposed model, the weight assignment of the full detection layer of the original model, the weight assignment of the full detection layer of the proposed model, the detection result of the original model, and the detection result of the proposed model. (a) Facilities and crack detected result. (b) Facilities, crack and bolt hole detected result. (c) Facili-ties, crack, bolt hole, joint and exfoliation detected result. (d,e) Facilities, bolt hole and leakage detected result.

Figure 15. Training results of Real-ESRGAN based on the transfer learning method. (a) The first training loss of the generator by using DIV2K dataset, (b) the second training loss of the generator by using concrete crack image super-resolution dataset, (c) the second training loss of the discriminator by using concrete crack image super-resolution dataset.

Figure 16. Extraction of apparent crack areas in subway tunnel lining using the proposed detection model.

Figure 17. The comparison of super-resolution results of apparent cracks in subway tunnel linings of different methods.

Figure 18. The comparison of segmentation results before and after using the proposed super-resolution model of crack images (based on the open access VGG19-Unet network).

Table 1. The related research of tunnel lining defect detection and extraction.

Year	Author	Sensors/Data	Research Content	Research Method	Contribution
2017	Takeharu et al. [10]	High-spatial-resolution LIDAR	Crack detection on tunnel surfaces	Reflectance imaging; 3D measurement; spectroscopy	Report on a LIDAR system for detecting 200 μm wide cracks at a distance of 5 m from the surface of a tunnel and detecting a 100 μm difference in level
2019	Ai et al. [11]	Charge-coupled device (CCD) cameras; laser emitter	Leakages and cross-sectional deformation identification in metro tunnels	Image differencing algorithm; transmissive projection	Rapidly acquiring and identifying surface defects and cross-sectional deformation in metro tunnels
2019	Ba et al. [12]	Image of tunnel lining surface cracks	Tunnel lining surface crack image reduction and enhancement	Adaptive median–Gaussian filtering algorithm	Effectively filter out the Gaussian noise and salt-and-pepper noise in the crack image of tunnel lining surface, better protect the crack edge and other details in the image
2021	Gong et al. [13]	Linear array camera	Crack detection in subway tunnel surface	Frequency-domain enhancing algorithm; multistage fusion filtering algorithm; improved seed growth algorithm	Acquire the feature information of the crack and extract the crack in a large image with uneven light and complex background
2021	Lei et al. [14]	CCD cameras	Tunnel lining crack recognition and geometric feature extraction	Differentiated noise filtering; improved segmenting method combining adaptive partitioning, edge detection, and threshold method	Overcome the problems of uneven light, noise, and spots in tunnel lining images
2018	Xue et al. [17]	CCD cameras	Classify defect-free and defect images, then detect cracks and leakages in the defect images	Fully convolutional network (FCN); region-based fully convolutional networks (R-FCN)	Adopting image classification and target detection algorithms to realize fast processing and disease detection of massive subway tunnel lining images
2019	Gao et al. [18]	HD motion cameras; laser sensor	Crack detection and leakage semantic segmentation in subway tunnels	Faster RCNN; FCN algorithm	Avoid patching seams, pipeline smearing, obscuring and other interference; take full advantage of the FCN-RCNN multi-defect detection network to improve the detection rate
2021	Zhou et al. [19]	PROFILER 9012 laser scanner	Automated segmentation and quantification of spalling defects in tunnel lining	Spalling intensity depurator network	Efficient detection and volumetric calculation of apparent spalling in subway tunnel linings
2021	Li et al. [20]	Line-scan cameras	Crack, leakage, and falling block detection in metro tunnel lining	Image stitching algorithm; image contrast enhancement; Faster RCNN	Improve the quality of images and avoid repeating detection for overlapped regions of the captured tunnel images, respectively; achieve automatic tunnel surface defect detection with high precision
2024	Chen et al. [21]	Faro S350 ground laser scanner	Leakage detection in metro tunnel lining	Mask region-based convolutional neural network (Mask RCNN)	Achieve the detection and 3D spatial visualization of curved shield tunnel point cloud water leakage

Table 2. Division of the dataset.

	Facilities	Bolt Hole	Joint	Crack	Exfoliation	Leakage	Image
train	5403	4676	1279	1367	891	463	1784
valid	786	624	248	233	155	96	254
test	1476	1335	404	407	256	121	509
all	7665	6635	1931	2007	1302	680	2547

Table 3. Ablation test results.

	CA	HR Head	BoT3	Facilities	Bolt Hole	Joint	Crack	Exfoliation	Leakage	[email protected]/Defects	[email protected]/All
Yolov8s	×	×	×	0.92	0.982	0.756	0.708	0.768	0.829	0.768	0.827
	√	×	×	0.932	0.986	0.782	0.715	0.773	0.85	0.779	0.84
	×	√	×	0.921	0.984	0.772	0.736	0.801	0.901	0.813	0.853
	×	×	√	0.93	0.989	0.813	0.73	0.798	0.854	0.794	0.852
	√	√	×	0.926	0.987	0.786	0.758	0.818	0.919	0.832	0.866
	√	×	√	0.932	0.988	0.813	0.754	0.824	0.887	0.822	0.866
	×	√	√	0.925	0.987	0.79	0.728	0.797	0.912	0.812	0.857
	√	√	√	0.929	0.987	0.789	0.768	0.824	0.923	0.838	0.87

Table 4. Comparison results with other models.

Model	mAP@50(Defects)	mAP@50(All)	File Size/MB	ms/per Image
Faster RCNN-res50	0.739	0.78	327.31	29.1
Cascade RCNN-res50	0.754	0.805	543.4	38.9
SSD	0.72	0.754	190.36	15.7
Yolov3-d53	0.723	0.79	476.23	18.4
Yolov5s	0.74	0.811	13.95	6.8
Yolov5s improve	0.763	0.825	14.33	8.5
Yolov8s	0.768	0.827	22.04	8
Yolov8s improve	0.798	0.848	22.25	9.2
Proposed model	0.838	0.87	23.37	10.7

Table 5. No-reference evaluation results of super-resolution enhancement method for apparent-crack images of subway tunnel linings.

	Bicubic	SRGAN	ESRGAN	Real-ESRGAN
PIQE	63.12	41.73	2.83	2.7
ENIQA	0.357	0.338	0.109	0.092

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zheng, A.; Qi, S.; Cheng, Y.; Wu, D.; Zhu, J. Efficient Detection of Apparent Defects in Subway Tunnel Linings Based on Deep Learning Methods. Appl. Sci. 2024, 14, 7824. https://doi.org/10.3390/app14177824

AMA Style

Zheng A, Qi S, Cheng Y, Wu D, Zhu J. Efficient Detection of Apparent Defects in Subway Tunnel Linings Based on Deep Learning Methods. Applied Sciences. 2024; 14(17):7824. https://doi.org/10.3390/app14177824

Chicago/Turabian Style

Zheng, Ao, Shouming Qi, Yanquan Cheng, Di Wu, and Jiasong Zhu. 2024. "Efficient Detection of Apparent Defects in Subway Tunnel Linings Based on Deep Learning Methods" Applied Sciences 14, no. 17: 7824. https://doi.org/10.3390/app14177824

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Efficient Detection of Apparent Defects in Subway Tunnel Linings Based on Deep Learning Methods

Abstract

1. Introduction

2. Literature Background

3. Methods

3.1. Detection Model for Apparent Defects in Metro Tunnel Linings

3.1.1. Basic Framework for Modeling the Defects Detection Model

3.1.2. Coordinate Attention Module

3.1.3. Bottleneck Transformer 3

3.1.4. The Proposed Defect Detection Model for Subway Tunnel Linings

3.2. Subway Tunnel Lining Fine-Crack Enhancement Model Based on Real-ESRGAN and Migration Learning

4. Data Preparation

4.1. Tunnel Lining Apparent Data Collection

4.2. The Subway Tunnel Lining Apparent Defects Dataset

4.3. Super-Resolution Enhancement Dataset for Lining Surface Cracks

4.4. Experimental Environment

5. Experiments and Results

5.1. Training Results of the Subway Tunnel Lining Apparent-Defect Detection Model

5.2. Ablation Experiments on Detection Models of Apparent Defects in Subway Tunnel Linings and Related Facilities

5.3. Comparison of Test Results with the Original Model

5.4. Comparison with Other Target Detection Models

5.5. Experimental Results of the Super-Resolution Enhancement of Apparent Cracks in Subway Tunnel Linings

6. Discussion

6.1. Discussion on Engineering Practice Applications in Metro Tunnel Maintenance

6.2. Future Work

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI