*4.2. Defect Localization Results*

The trained model using learning scheme (c) was employed to implement the interpretation techniques presented in Section 2.

Images of the testing subset were used to visualize the implementation results. Figure 8 shows sample examples of the obtained results.

As intended, the resulting heatmaps highlight the discriminative image regions that contributed to image classification. These heatmaps show the probability of the target class at each pixel. By analyzing the qualitative results in Figure 8, the active regions are primarily consistent with the defect area. Grad-CAM++ provided better visualization results for cracks and efflorescence examples compared to Grad-CAM.

The pixel-level maps generated after applying a threshold of 0.5 provide a coarse localization of the concrete defects and offer semantically meaningful discrimination at the pixel level between defects and background. Therefore, it is believed that in the context of weakly supervised semantic segmentation, interpretation methods can provide relevant pixel-level maps using only image annotations as the supervision condition. The proposed method has reasonably captured a coarse localization of defects while avoiding the annotation workload of the fully supervised semantic segmentation-based frameworks.

However, since the visualization results using these interpretation techniques depend on the feature space learned by the classifier, some highlighted areas do not represent the target classes in the test images, and other regions representing damage were not captured.

As a result, it would be challenging to localize and quantify the damage precisely (e.g., crack path and density). This can be attributed to the underlying complexity of the training dataset, its limited size, and the limited learning capabilities of the pre-trained network due to the difference between the source domain (ImageNet dataset) and the target domain (the proposed concrete damage dataset). Thus, to further examine the potential of interpretation techniques in weakly supervised semantic segmentation, more customized networks tailored to the damage classification task and trained on more comprehensive datasets should be explored.

**Figure 8.** Sample results of the interpretation techniques implementation (rows 1–2: cracks, rows 3–4: concrete spalling, rows 5–6: efflorescence).
