**1. Introduction**

Bridges are key elements of a road network and play a critical role in the functional operation of the transportation system. During their service life, they are subjected to multiple deterioration mechanisms induced by material aging, variable loading, aggressive environmental actions, and extreme weather conditions. As a result, various types of damage (e.g., crack and corrosion [1,2]) occur over time and alter the structural behavior of bridges. Therefore, it is essential to accurately and timely detect and evaluate the damage to prevent failure and maintain structural safety and serviceability.

Structural Health Monitoring (SHM) has attracted much attention and has been the subject of several works in recent decades. Numerous techniques based on sensors (e.g., accelerometers, velocimeters, and displacement sensors) [3], non-destructive testing (e.g., ground-penetrating radar, infrared and ultrasonic techniques) [4], and visual inspection [5,6] have been deployed to identify, localize, and quantify damage in bridges. However, visual inspection has been the predominant practice for bridge condition assessment [7–9]. Trained inspectors conduct an in situ examination of bridge elements based on

**Citation:** Zoubir, H.; Rguig, M.; El Aroussi, M.; Chehri, A.; Saadane, R.; Jeon, G. Concrete Bridge Defects Identification and Localization Based on Classification Deep Convolutional Neural Networks and Transfer Learning. *Remote Sens.* **2022**, *14*, 4882. https://doi.org/10.3390/rs14194882

Academic Editor: Lefei Zhang

Received: 31 May 2022 Accepted: 26 September 2022 Published: 30 September 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

established guidelines and evaluate the condition of the entire bridge. However, the conventional framework of this practice is time-consuming, labor-intensive, and error-prone due to the subjective judgment of inspectors. Moreover, it requires access equipment and vehicles to reach areas of the bridge with low accessibility, which incurs additional costs to the monitoring operation [8].

In recent years, technological advances in civil engineering and related disciplines have promoted the emergence of innovative tools to manage civil infrastructures. Within this context, bridge owners and managers have shown increasing interest in Unmanned Aerial Vehicles (UAVs) as an assistive, efficient, and cost-effective means offering grea<sup>t</sup> potential for inspection automation [8,10]. However, one of the major challenges associated with this inspection scheme lies in deploying an efficient method to process the large amount of image data collected by the UAVs' sensors. To this end, several vision-based techniques have been extensively explored to automate defect detection in different civil engineering structures. These methods include traditional Image Processing Techniques (IPTs) [11], Machine Learning algorithms [12], and Deep Convolutional Neural Networks (DCNNs) [13].

In the particular context of concrete damage detection, cracks are the primary type of damage investigated by researchers. IPTs are used to extract representative properties of cracks from input images by applying various filters and morphological operations (e.g., Edge Detectors [14], Thresholding [15], Percolation [16,17], and Principal Component Analysis [18]). Then, the extracted features are fed to Machine Learning models, such as Support Vector Machines [19,20] and Nearest Neighbor Classifiers [20], to perform the classification task. However, IPTs provide hand-crafted features for training [21] and present limited learning capabilities that do not represent the complexity of the concrete texture and the challenging conditions of image acquisition, such as lighting, shading, and camera movements [21,22].

On the other hand, DCNNs extract features from a set of training images through the convolution operation and classify them within one learning framework. Owing to their robust feature extraction and learning capabilities, DCNNs have been widely examined in concrete damage classification studies.

For example, Dorafshan et al. [23] demonstrated the superiority of the AlexNet network [24] over six standard edge detectors in classifying concrete crack images of the SDNET dataset [25].

Kim et al. [26] trained and optimized the LeNet-5 network [27] to detect cracks in concrete surfaces using a dataset of 40,000 images. The proposed model achieved an accuracy of 99.8% and could be implemented using low-power computational devices.

Yu et al. [28] developed a method based on DCNNs to detect cracks in image patches of damaged concrete. The authors proposed an architecture consisting of six convolutional layers, two pooling layers, and three fully connected layers and employed the enhanced chicken swarm algorithm to optimize the meta-parameters of the DCNN model.

Mundt et al. [29] proposed the CODEBRIM dataset that features five non-exclusive damage classes in bridges (i.e., crack, spallation, exposed reinforcement bar, efflorescence, and corrosion). In addition, they investigated reinforcement learning approaches to build a DCNN model for the multi-target classification task, and their best meta-learned models yielded a testing accuracy of 72%.

Since training DCNNs requires a significant amount of image data and due to the limited size of concrete damage datasets, researchers have explored Transfer Learning techniques to train deep learning networks for concrete damage classification [30–33]. Pretrained DCNNs (e.g., AlexNet [24], VGG [34], ResNet [35], Inception [36]) on large benchmark datasets (e.g., ImageNet [37], MNIST [27], CIFAR100 [38]) are used to transfer knowledge from a source domain (e.g., ImageNet dataset) to a target domain (e.g., a smallscale concrete damage dataset) through different settings and learning approaches [39].

Yang et al. [21] developed a low-cost automated inspection approach based on UAVs and deep learning. They constructed the CSSC database and used a fine-tuned VGG16 model to classify cracks and spalling in concrete bridge elements and achieved a mean accuracy of 93.36% with the CSSC dataset.

Hüthwohl et al. [40] used a pre-trained inception-V3 network to define a hierarchical multi-classifier for reinforced concrete bridge defects (i.e., cracks, efflorescence, spalling, exposed reinforcement, and rust staining). Experimental results showed that the multiclassifier could assign class labels with an average F1-score of 83.5%.

Yang et al. [33] proposed an end-to-end-based Transfer Learning method for crack detection using three knowledge transfer approaches (i.e., sample, model, and parameter transfer knowledge), a fine-tuned VGG16 model, and three crack datasets. Their experiments showed that by training 13 convolutional and two fully connected layers of the pre-trained VGG16 model on the three datasets, crack detection was improved and achieved a testing accuracy of 97.07% on the SDNET dataset.

Bukhsh et al. [31] investigated cross-domain and in-domain Transfer Learning approaches. They compared the performance of the VGG16, InceptionV3, and the ResNet50 models in different Transfer Learning strategies to detect damages in six binary and multilabel concrete damage datasets. Their experiments demonstrated that combined representations of in-domain and cross-domain Transfer provide considerable performance gain, particularly with tiny datasets.

Zhu et al. [41] built a robust classifier to detect four defects, including cracks, pockmarks, spalling, and exposed rebar. They used the pre-trained inceptionV3 model to extract features from input images and a fully connected network to classify defects. The proposed model was trained on 1180 images with arbitrary sizes and resolutions for 374.1 s and recorded a testing accuracy of 97.8%.

On the other hand, Gao and Mosalam [32] proposed the concept of structural ImageNet and manually labeled 2000 images for four recognition tasks: component type identification (binary), spalling condition check (binary), damage level evaluation (three classes), and damage type determination (four classes). They applied two different strategies of Transfer Learning based on the pre-trained VGG16 model. For the damage type multi-classification task, a 68.8% accuracy with 23% overfitting was obtained by retraining the last two convolutional blocks of the network.

In the aforementioned works, the performance of the proposed methods has varied according to the size and complexity of the datasets and the adopted Transfer Learning approach. Most studies have re-trained more than two or all convolutional layers and update a high number of the network parameters to achieve a higher detection accuracy. However, this approach is computationally expensive, requires more training time, and is also subject to overfitting in the context of heavily parameterized networks and small datasets.

In a bridge condition assessment framework, defect localization is crucial to evaluate damage's impact on the bridge's structural integrity. For this purpose, deep learning basedsemantic segmentation algorithms have been deployed to provide pixel-level classification results to improve damage detection accuracy.

Zhang et al. [42] designed a fully convolutional model to detect and group image pixels for three types of concrete surface defects (i.e., crack, spalling, and exposed rebar). The authors prepared a dataset with mask labeling of 1443 images to train and test the model. Their proposed method achieved a semantic segmentation accuracy of 75.5%.

Fu et al. [43] introduced a crack detection method based on an improved DeepLabv3+ semantic segmentation algorithm. They established a concrete bridge crack segmentation dataset to train and test the proposed model. The experimental results proved the effectiveness of the trained algorithm that reached an average intersection over union ratio of 82.37%.

Wang et al. [44] constructed a crack dataset of 2446 manually labeled images to train and evaluate the performance of five deep networks for semantic segmentation. The best model achieved an F1-score of 77.32% and an intersection over union ratio of 62.98%. The

authors also discussed the influence of dataset choice and image noise on the detection performance.

Dung and Anh [45] developed a fully convolutional network-based method and annotated 600 crack-labeled images for semantic segmentation. The proposed model reached approximately 90% for the average precision score. The authors demonstrated their method's effectiveness by accurately identifying and capturing crack path and density variation in a crack opening video.

The above studies have shown very promising results in detecting damages. However, the fully supervised semantic segmentation deep networks are complex and are faced with a common major challenge associated with data scarcity. These models require training labeled images with pixel-level annotations that are expensive and necessitate the empirical knowledge of field experts. Furthermore, most publicly available concrete damage datasets only provide image-level annotations.

To alleviate the heavy workload associated with data annotation in a fully supervised learning framework, weakly supervised segmentation methods consider different weak annotations (e.g., image-level and bounding box labels) as the supervision condition [46]. Within the context of damage detection, Dong et al. [47] designed a patch-based weakly supervised semantic segmentation network to detect cracks in construction materials. In their proposed method, an input image is cropped, and the resulting patches are annotated at an image level. Class activation maps of cracks are obtained for each patch. They are fed to a fully connected conditional random field to generate the corresponding synthetic labels, which are used to train a segmentation network.

König et al. [48] presented a weakly supervised segmentation approach leveraging classification labels to detect surface cracks. To obtain pixel-level segmentation pseudo labels, the authors utilized a patch-threshold segmentation combined with coarse localization maps generated by a Convolutional Neural Network trained on images with classification annotations. The generated pseudo labels were used to train a standard semantic segmentation network to perform crack segmentation.

Zhu and Song [49] developed a weakly supervised network for crack segmentation in asphalt concrete bridge decks. Based on an autoencoder, the original data generates a weakly supervised start point for convergence, and image feature extraction and segmentation are performed under weak supervision.

This paper investigates a weakly supervised framework based on interpretation techniques and leveraging image-level annotations to generate pixel-level maps. The goal is to provide a coarse localization of three distinct types of damage in concrete bridge images.

The main contributions of this work are the following:


The rest of the paper is organized as follows: Section 2 presents an overview of the methodology followed in this paper, the proposed dataset, the VGG16 model, and the interpretation techniques studied in this work. Section 3 details the experimental setup, and the experimental results are presented and discussed in Section 4. Conclusions are provided in the final section of the paper.
