**5. Conclusions and Perspectives**

Structural Health Monitoring (SHM) is gaining increasing importance in assessing bridge conditions as it allows for identifying, localizing, and evaluating damage severity. Therefore, SHM is part of an economic strategy since it intervenes in the definition of maintenance actions and participates in the optimization of its allocated resources.

This paper presented a benchmark image dataset featuring three common defects (i.e., cracks, efflorescence, and spalling) of concrete bridges. The dataset covers different appearances of the three defects and the concrete surface in the real world of bridge inspection. A VGG16 network was trained on the proposed dataset following three Transfer Learning schemes with varying layers. In each learning configuration, the performance

of the model was evaluated based on classification metrics, computational time, and generalization ability. Experiments showed a significant gain in classification measures when retraining the classification layers and the last two convolutional layers of the VGG16 network. The trained model yielded a high testing accuracy of 97.13%, combined with high F1-scores of 97.38%, 95.01%, and 97.35% for cracks, efflorescence, and spalling, respectively. In addition, a slight tendency to overfitting was observed in the corresponding learning scheme, which means that increasing the number of layers to be retrained will lead to the degradation of the model's generalization performance. These experimental results show the robustness of the proposed learning setting as it ensures a balance between classification metrics, computational time, and generalization ability.

This work also explored the potential of interpretation techniques to localize the three defects in the context of weakly supervised semantic segmentation. To this end, two gradient-based back-propagation methods were used to generate pixel-level heatmaps of test images leveraging the above-discussed learning setting. The resulting maps highlight the regions contributing to the classification result and then provide relevant pixel-level maps to localize defects using a model trained on image-level annotations. However, since these techniques rely on the feature space learned by the model, their results are limited by the representativity of target classes in the training dataset, the challenging complexity of the concrete surface texture and condition in bridge inspection images, and the learning capability of the model.

Therefore, in another attempt to solve the damage localization task, more advanced object detection models and datasets with different annotation levels will be investigated in future works.

**Author Contributions:** Conceptualization, H.Z. and M.R.; methodology, H.Z., M.R. and M.E.A.; software, H.Z.; validation, H.Z., M.R., M.E.A., A.C. and R.S.; formal analysis, H.Z., M.R. and M.E.A.; investigation, H.Z. and M.R.; resources, H.Z.; data curation, H.Z.; writing—original draft preparation, H.Z., M.R., M.E.A., A.C. and R.S.; writing—review and editing, A.C. and G.J.; visualization, H.Z., M.R., M.E.A., A.C. and R.S.; supervision, M.R.; project administration, M.R.; funding acquisition, A.C. and G.J. All authors have read and agreed to the published version of the manuscript.

**Funding:** The financial support from the NSERC Discovery Grant program.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.
