**Effectiveness of ASPP module (CM4):**

In CM4, this module increases F-measure from 0.428 to 0.444 and IoU from 0.272 to 0.286 compared with the baseline module (CM7) in Table 3. Additionally, the obtained results from Table 4 suggest that the addition of the ASPP module significantly improved the detection performance of small-, medium-, and large-scale defects. The obtained results show the effectiveness of the ASPP module.

#### **Effectiveness of layer extend operation (CM5):**

In CM5, compared with the baseline (CM7), this module increases F-measure from 0.428 to 0.495 and IoU from 0.272 to 0.329 as shown in Table 3. Additionally, Table 4 suggests that CM5 is superior to CM4, CM6, and the baseline (CM7). These results suggest that deeper networks improve the detection of all scales of defects. However, this operation could not be applied to networks with the ASPP module due to patch size limitations in the experimental setting.

#### **Effectiveness of Inception module (CM6):**

In CM6, we only replaced all convolution blocks with the inception module. This operation increased F-measure from 0.428 to 0.443 and IoU from 0.272 to 0.285 compared with the baseline (CM7) in Table 3. Additionally, Table 4 shows that the detection rate of each scale significantly improved compared with the baseline. This indicates that the addition of the inception module can contribute to the representation ability of low- and high-level information.

#### **Analysis of the proposed method:**

As shown in Table 3, PM outperformed all other methods. Furthermore, from Table 4, we can see that PM achieves better accuracy in detecting large-scale defects but has some limitations in detecting small-scale defects. The limitation of small-scale defects may influence the detection performance of the inspection task. Thus, qualitative analysis is also required.

#### 5.2.2. Qualitative Analysis

In this part, we discuss the visual quality of the results. The estimation results are shown in Figures 6–9. Figure 6 shows detection result samples of all regions of the test image. Figures 7 and 8 show the detection results of peeling and cracks. From Figures 6–8, we can see that PM achieves a high detection quality when detecting various defects compared to CMs. On the other hand, we show the over-fitting result sample in Figure 9. In some cases, we observed that vertical cracks tend to over-fit in our model. The quantitative analyses show that the proposed method has some limitations in detecting small-scale defects, and according to Figure 8, these limitations may not influence the actual inspection works. Compared with all CMs, the result of PM achieves fewer instances of false detection, which would lead to less unnecessary work for inspectors.

(**a**) Origin image (**b**) Ground Truth

(**c**) PM (**d**) CM1 (**e**) CM2

**Figure 6.** *Cont.*

**Figure 6.** Results of proposed method and comparative methods. (From left to right: (**a**): original image; (**b**): ground truth; (**c**): results obtained by the proposed method; and (**d**–**j**): results obtained by the comparative methods.)

**Figure 7.** Example of the result in peeling detection. (**a**) Original image. (**b**) Ground Truth. (**c**) PM. (**d**) CM1. (**e**) CM2. (**f**) CM3. (**g**) CM4. (**h**) CM5. (**i**) CM6. (**j**) CM7.

**Figure 8.** Example of the result for crack detection. (**a**) Original image. (**b**) Ground truth. (**c**) PM. (**d**) CM1. (**e**) CM2. (**f**) CM3. (**g**) CM4. (**h**) CM5. (**i**) CM6. (**j**) CM7.

#### *5.3. Discussion*

In the field of image recognition, various models have been proposed consistently owing to the AI boom. In the models for general object recognition, the error rate of recognition now exceeds that of humans, and there is a glimpse of a direction to target more advanced tasks. Applications of AI are beginning to be explored in all areas, one of which is infrastructure maintenance. In this paper, we have proposed a method for detecting defects in subway tunnel images. By constructing a model that takes into account the characteristics of the data, the proposed method achieved a higher accuracy in detecting defects compared to conventional methods.

What we should consider here is how much the system should achieve to reach the accuracy that can be applied in the real world. The quantitative evaluation results obtained from this experiment showed that the IoU was around 0.3–0.4. This value may not be sufficient when compared to the accuracy of general image recognition. However, as shown in the results of the qualitative evaluation, cracks and other defects in the image can be detected even if there is some deviation. For example, if we consider the practical applications of the proposed method, such as supporting the registration of defects in CAD systems or identifying dense regions of defects, we can say that the proposed method has reached a system that can be applied in practice.

**Figure 9.** Example of the results of over-fitting parts. (**a**) Origin image. (**b**) Ground truth. (**c**) PM. (**d**) CM1. (**e**) CM2. (**f**) CM3. (**g**) CM4. (**h**) CM5. (**i**) CM6. (**j**) CM7.

There are some limitations in this study. This study was conducted using data from a certain subway line in Japan, and there is still room for future studies on the general applicability to a wide variety of data. In this study, 47 high-resolution subway tunnel images were divided into patches to enable the network training; however, it would be desirable to have a larger number of images to verify the robustness of our method. In addition, since the accuracy is considered to vary depending on the year of construction of tunnels, verification using a wide variety of data is necessary. Specifically, the condition of the wall depends on the construction method of the subway tunnel, and furthermore, the new construction method may be completely different from the conventional construction method. When considering the versatility of the model, it will be necessary to verify the versatility of the model for various types of data.

#### **6. Conclusions**

In this study, we present a new version of the U-Net architecture to improve the detection performance of defects in subway tunnel images. By introducing ASPP and inception modules in the U-Net-based network architecture, we improved the capacity of the network for defect detection. The experimental results on a real-world subway tunnel image dataset showed that our method outperformed other segmentation methods quantitatively and qualitatively. Different from conventional crack detection methods, our model can detect various types of defects in a single model, which enhances the practicality for supporting tunnel inspections. In future works, we will investigate a new strategy for enhancing detection accuracy and discuss its application to other real-world tasks.

**Author Contributions:** Conceptualization, A.W., R.T., T.O. and M.H.; methodology, A.W., R.T., T.O. and M.H.; software, A.W; validation, A.W., R.T., T.O. and M.H.; data creation, A.W.; writing original draft preparation, A.W.; writing—review and editing, R.T., T.O. and M.H.; visualization, A.W.; funding acquisition, T.O. and M.H. All authors read and agreed to the published version of the manuscript.

**Funding:** This work was supported by KAKENHI Grant Number JP17H01744.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** The authors thank Tokyo Metro Co., Ltd, for providing the research data used in this study.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

