**Limitation of Deeplab-v3+ (CM1):**

Deeplab-v3+ used atrous convolution, ASPP module, and a simplified decoder branch, achieving great improvement compared with the baseline. There was a slight difference in the detection accuracy for various kinds of defect. Although Deeplab-v3+ applied multiple kinds of modules to improve detection performance for multi-scale defects, it still lacks detection accuracy for large-scale defects as shown in Table 3.

#### **FCN and SegNet (CM2, CM3):**

FCN and SegNet, as classic segmentation networks, show a certain degree of incompatibility in our subway tunnel dataset, not only with a low accuracy but also with a large number of false detection instances as shown in Table 3. Especially, the performance of SegNet is extremely poor. Although the detection accuracy of small targets such as cracks can be maintained, it is almost impossible to detect large defects as shown in Table 4. These result in the low overall detection accuracy and precision of the network. Unlike U-Net, the SegNet decoder uses the max-pooling indices received from the corresponding encoder to perform nonlinear upsampling of the input feature map as a typical symmetric encoder–decoder architecture. It is considered that this function did not work well in the subway tunnel dataset.
