1. Introduction
Cracks are the first signs of civil structure degradation, and they can arise for various reasons, including structural foundation displacement, shrinkage and extension, uneven mix, bloated soil, and overloaded environmental and manufacturing disasters. Crack detection and recognition activities can be conducted automatically or manually and are subjected to human experts visually analyzing and evaluating the structure [
1]. Manual inspection involves fetching a schematic of a crack and recording the circumstances of the abnormalities. Usually, manual inspection methods take a long time, rely on the observer, are potentially sensitive to the inspector’s insight, and lack descriptive methodology [
2]. Automatic inspection methods provide a coherent solution that reduces subjectivity and replaces manual observation with the human eye [
3]. Automatic crack identification has been established considering slow, biased, and old human inspection processes for fast and efficient surface defect assessment [
4]. An image vision computer system is introduced to address this shortcoming, with the goal of instantly and reliably transforming image or video input into meaningful intelligence. It contains structural components, recognizes modifications in a reference image, and quantifies local and global visual damage [
5]. Automating the process may significantly reduce computational costs and allow regular inspection intervals. The detection of cracks and disintegration of a bridge is examined as a part of automation using the rapid Haar transform [
6]. Crack detection filters of various sizes were developed to locate cracking spots in inspection imagery. A semi-automatic strategy incorporating Sobel and Laplacian operators was used to identify crack edges, and a graph search algorithm was used to obtain cracks depending on the user input [
7]. A principal-component-analysis-(PCA)-based approach combined with linear structural analysis has been reported to identify linear structural fractures in concrete bridge decks with the highest classification accuracy [
8]. One of the most critical structural inspection and maintenance methods is vision-based technology, which utilizes essential diagnostic imaging devices, such as sensors [
9]. Miscommunication deflection assessment, steel corrosion recognition, and spalling diagnosis are some of the most recent advancements in vision-based inspection and testing [
10]. As a result, this technique has some constraints in terms of real-world applications as creating an automated system that can encompass all unpredicted possibilities for fast perceptible damage remediation in the physical world is difficult [
11]. Deep learning (DL) has recently been recognized as one of the most potent remedies to this challenge [
12]. In addition, machine learning (ML) strategies for DL models based on neural networks containing numerous hidden units have evolved with numerous benefits for a better solution [
13].
Structural inspections across long, nonintrusive distances based on high precision have been performed using optics and computer vision advancements. The fundamental shortcoming of image-processing-based approaches is the lack of consistency and vibration among the fracture pixels [
14]. DeepLab V2 was used to detect several cracks in the images. Automated inspection has achieved an expected improvement owing to the rapid evolution of ML. ML algorithms can acquire feature representations and execute confidence intervals without requiring human model training, which is performed using conventional techniques [
15]. Data collection, feature extraction, and categorization are all conventional machine learning algorithms for pavement crack detection. Shallow convolutional-neural-network-(CNN)-based architectures have been implemented to identify surface cracks and achieve greater accuracy with efficient computational costs [
16,
17]. Deep CNN systems use a multilayer neural net to retrieve significant characteristics from the input data [
18]. Numerous analyses of ML-based crack detection methods have revealed that the classifier may not produce reliable results if the derived features do not identify actual cracks [
19]. The Hessian matrix has been used both to accentuate cracks over blebs or staining and to modify the thickness fluctuation of cracks during image pre-processing [
20]. Probabilistic relaxation is employed to determine cracks coarsely, eliminate noise, and accomplish adaptive thresholding for their operation [
21]. Transfer learning methods allow CNNs to be used without incurring high computing costs or needing a prior understanding of the working functionality of CNN layers. Visual Geometry Group’s VGGNet [
22], Microsoft’s ResNet [
23], and Inception-V3 [
24] are some of Google’s transfer learning design models that employ photographic data as input. Models built over the hybridization of ML algorithms have been found to increase the performance of vision-based systems over traditional algorithms. Hybridizing optimization techniques with support vector machines (SVM), such as fuzzy logic, k-nearest neighbours, artificial neural networks (ANN), and evolutionary algorithms, led to substantial improvements in recognition accuracy [
25]. Pavement Crack Detection Net (PCDNet) eliminates more localized distortion, detects smaller cracks, and interprets information at a significantly faster rate than other methods [
26]. Optimal intelligence technology was used to examine the wall quality parameters concerning many components of the dynamic situations of the retaining wall [
27]. In addition, using ant colony optimization (ACO), these criteria resulted in the best design solution. A hybrid ANN-ACO algorithm utilizes various parameters under different structural conditions, and dynamic loads remarkably impact structural models [
16]. Ensemble models built by stacking the best-performing ML algorithms have proven to produce efficient results in predictive models [
28]. A pre-processed CNN classifier of VGG-16 with ResNet-50 detects fractures in image and inception models for object localization incorporated class actuation monitoring (CAM) for excellent optimal detection [
29]. Optimizing the hyperparameters of CNN architectures, such as VGG16 and Resnet, proved to achieve greater accuracy in object identification and classification [
30]. Unmanned aerial vehicles (UAVs) and camera systems were implemented to identify cracks and trigger U-Net in pixel-wise classification for feature and flaw identification with various feature sets. Texture-based video processing methods handle local binary patterns (LBP) using SVM and Bayes decision theories. For noisy and complicated bridge photographs, wavelet features were retrieved from the scene using a sliding-window-texture-analysis-based technique [
31]. A deep CNN-based damage locating method used DenseNet to identify the damaged and undamaged steel frames from the images provided as inputs. The model outperformed the MobileNet and ResNet architectures [
32]. Cracks are detected by the trained ConvNet, SVM, and boosting methods on a sliding window, where the SVM sometimes fails to distinguish the crack from the background. CNN using traditional Canny and Sobel edge detection methods can scan larger pixel images and exhibit excellent robustness during the training process. The AdaBoost classifier was used for pre-processing the crack image, and DL techniques were used for crack-detection in the image data [
33]. An EfficientNet-based transfer learning model was developed to detect and classify surface cracks in high-rise buildings using UAV. Microcrack detection is achieved by solving the binary classification problem of cracks using autoencoders and softmax regression [
34]. CNN-based order forensics framework for detecting picture operator chains is described. The two-stream Framework captures both local noise residual data and manipulating artifacts proof. The model may automatically detect alternated detection features straight from picture data and is suggested explicitly for forensically recognizing a chain consisting of two image operators [
35]. To directly extract features from the photos, dual-filtering CNN base was designed. It treats each resampling parameter as a separate class, followed by the formulation of resampling parameter estimation and reformulates it as a multi-classification problem [
36]. A reliable blind watermarking system based on 3D convolutional neural networks that can extract and integrate watermark images into animated GIF files was proposed [
37].
Literature studies reviewed that manual inspection of structures is challenging, time-consuming and provides biased results. Several research studies have implemented different image-based methods, machine learning and deep learning algorithms to enable the automatic monitoring of building structures. The performance of ML algorithms on image data could have been more appreciated as it depends on many features and encompasses complex feature engineering tasks. The DCNN-based models were employed for damage detection and classifications, but not all models performed efficiently due to insufficient data, overfitting, and vanishing gradient problems. The models’ competency was enhanced by customising the convolutional layers, hybridising, ensembling, and transfer-learning techniques. The ResNet, DenseNet, VGG, and Xception models were observed to perform efficiently on detecting the structural damages such as concrete cracks, and steel bar damages.
To overcome the issues discussed in previous studies, this study proposes an automatic vision-based crack-identification system based on DL to identify crack portions from a large dataset of images acquired in the environment. The main breakthrough is the establishment of neural network-based classification models for various structural environments. Using various cameras and visual equipment, such as drones, this technology intends to ease routine inspections of concrete buildings and increase the speed of the diagnosis process of precise crack distribution while retaining accuracy. First, a series of images captured under a combination of structural, meteorological, and photographic circumstances were gathered, allowing for easy classification of the images using search keywords. A transfer learning approach was established to minimize time and money while constructing a DL model.
2. Research Methodology
The Conv2D ResNet exponential model was fitted with a dataset of images for each wall defect, such as cracks, holes, efflorescence, damp patches, and spalls. The dataset was collected through publicly available dataset repositories, such as the kaggle dataset or Structural Defects Network (SDNET) 2018 [
38]. The model was trained with 80% of the 5000 images and tested with 20% of the images. The research methodology used in this study is shown in
Figure 1. In stage 1, existing crack detection methods for building walls were explored and analysed. In Stage 2, a novel Conv2D ResNet exponential model was designed to detect the damage class of the building wall. The dataset consisted of a collection of wall images with different defects, such as cracks, holes, efflorescence, damp patches, and spalls. Training and testing of the proposed model were performed using an 80:20 wall quality dataset. Stage 3 evaluated the proposed model using the test data and compared Conv2D with existing models, such as DenseNet, VGG19, and Xception, of several activation layers, such as softplus, softsign, Relu, elu, and tanh. In stage 4, the performance of the proposed model was analysed using metrics such as precision, recall, F-score, and accuracy. The architecture of the DL activation layer is illustrated in
Figure 2.
The workflow of the Conv2D ResNet exponential model is illustrated in
Figure 3. An open access dataset (SDNET2018) published by Utah State University is used for implementation. In this proposed research work, the Snip&Sketch annotation tool was used for extracting the region of interest from the images available in the dataset. The Conv2D ResNet exponential model was trained with 4000 images and tested with 1000 images. The initial base model was pre-trained with initial weights using ImageNet, and Conv2D was designed with CNN models such as Xception, VGG19, DenseNet, and ResNet to extract the general high-level features. The base model was added to the custom layers, developed using a transfer-based DL approach, and the performance was analysed on wall quality prediction. The Xception, VGG19, DenseNet, and ResNet models were fitted with different activation layers, such as softplus, softsign, tanh, selu, elu, and exponential, and were evaluated using model loss, precision, accuracy, recall, and F-score measures.
As an overview of novelty, Conv2D ResNet exponential model was built over the Conv2D ResNet base model, along with the transfer learning custom activation layers that classify the wall defects more effectively with high accuracy. The wall quality dataset was fitted with the Conv2D ResNet model, which acted as a base model, thereby learning the types of wall defects from 90% of the training data. The acquired knowledge from Conv2D ResNet was transferred to refit the model by integrating the layer with the exponential activation function, which identified the wall defects in a single image, thereby validating the transfer learning by transporting the knowledge from the base model to the custom layer that enhanced the accuracy.
3. Implementation Setup
A dataset with 1000 images was collected for various defects, such as cracks, holes, efflorescence, damp patches, and spalls. Wall cracks signify trouble with the living area’s groundwork. Once wall cracks are detected in a residence, it generally means that the foundation is transitioning. Cracks in walls are caused by the contraction and relaxation of construction materials due to temperature and water content oscillations. A hole in a wall is also a significant defect that affects the quality of a building structure over time. Efflorescence is merely the deposition of salts on the surface of the aggregate, which is usually white. Once dry, the efflorescence consists of a white cover outside the concrete wall. When absorbed water and salts vaporize, they appear on the walls as crystallized patches or a layer of white powder. Damp patch condensation occurs when warm humid air within the same room is exposed to a cold interior wall or surface. Then, it quickly compresses the air back into the water. This evaporation then condenses on the interior wall surface, causing damp patches on the wall. A spall in the wall refers to the discoloration, clamping, fading, imploding, or flaking of concrete or brickwork, especially where the surface components have been destroyed. A spall can occur because of moisture absorption, combustion, or mechanical processes. The wall quality dataset contains 1000 images for each defect, such as holes, cracks, efflorescence, damp patches, and spalls, and is represented as follows in Equations (1)–(6):
where
represents the wall images with holes,
represents the wall images with cracks,
represents the wall images with efflorescence,
represents the wall images with damp patches, and
represents the wall images with spall. The wall quality dataset images used for implementation are shown in
Figure 4.
The wall quality dataset was pre-trained with the ImageNet by substituting the weights and trained with convolutional neural network models, such as Conv2D Xception, DenseNet, VGG19, and ResNet models, to extract the essential features from the image. Equation (7) represents the Gaussian function applied for feature extraction. The parameter “r” denotes the variance of the Gaussian function.
The gaussian orientation function used for image filtering is shown in Equation (8).
where
,
, and
denote the second derivatives of the Gaussian function, as represented in Equations (9)–(11).
The input images in the wall quality dataset were processed with four-layer Conv2D layers—convolution filtering, sigmoid filter, linear transformation, and linear sigmoid—to generate the final output and are denoted in Equations (12)–(14).
where “
R,
C” represents the rows and columns of the input image matrix.
The input images in the wall quality dataset were trained with Conv2D and designed with CNN models, such as Xception, VGG19, DenseNet, and ResNet, to extract the general high level features, which are represented in Equations (15)–(18).
where
CP,
TL,
CL,
DB,
FCL,
SC,
MP, and
PL represent the convolution and pooling layer, transition layer, classification layer, dense block layer, convolution relu max pooling layer, fully connected layer, separable convolution layer, max pooling layer, and normal pooling layer, respectively.
Apply the sigmoid function to the above equation, as shown in (19) and (20).
The linear transformation was applied to the above layer to process the third layer of the CNN, as in (17).
The final output is given by applying a linear sigmoid, as given in (18).
The proposed Conv2D ResNet exponential model is built with exponential activation layer and is denoted in (23).
4. Prescriptive and Predictive Data Analysis of Wall Quality Defects
The pre-training of the wall quality dataset was performed with the ImageNet by substituting the weights and trained with CNN models, such as Conv2D Xception, DenseNet, VGG19, and ResNet models, for extracting the features. The designed base model was fitted to custom layers with various activation layers, such as softplus, softsign, tanh, selu, elu, and exponential, to analyse the performance of identifying the defects in the wall quality. Activation functions play an essential role in developing neural networks. The activation function determines the frequency with which the network structure acquires a training dataset. The activation function at the output layer determines a model’s prediction. An activation function is a unit placed at the end or middle of a neural network that determines whether a neuron will be activated. The activation function is a complex nonlinear transformation applied to an input signal. The signal is subsequently processed and provided as an input to the next layer of neurons. Softsign is the activation function of neural networks. Equation (24) provides the mathematical notation for the softsign activation function.
The softplus function is a soft equivalent of the ReLU activation function, occasionally used instead of the ReLU in neural networks. Softplus is related to the sigmoid function and is represented by the following mathematical Equation (25).
The tanh activation function is a hyperbolic tangent activation function that reflects the sigmoid activation function. The tanh function takes the input as any real value and outputs the value from −1 to 1. It is represented by the following mathematical Equation (26).
The scaled exponential linear unit (selu) activation function is implicitly induced with normalization properties. Normalization with selu is conducted as the input value x is less than zero, and the output is the product of x and lambda. When x is zero, the output is equal to 0. If x is less than zero, then the output is the product of lambda and alpha by the x-value minus the alpha value’s exponential, multiplied by the lambda value. Equation (27) represents the selu activation function.
The exponential linear unit (Elu) activation function mainly focuses on the positive values, and the alpha value is selected from 0.1 to 0.3, as represented in Equation (28).
The exponential activation function indicates the positive-valued function of a real input variable and is represented by Equation (29).
The model is examined with the activation function discussed above and compiled to analyse performance indices such as step loss, accuracy, validation loss, and validation accuracy. The loss function computes the difference between the actual output of the model and the target outcome. Step loss indicates the loss incurred at each iteration. The validation loss specifies the loss for the validation dataset, which is part of the training dataset. The wall quality falls under the classification problem, and the loss function is given by Equation (30).
Accuracy is the ratio of the number of correct predictions to the total number of predictions and is denoted by Equation (31).
The model was compiled with ten epochs, and prescriptive data analysis was performed by analysing the performance indices for each epoch, as shown in
Table 1,
Table 2,
Table 3 and
Table 4.
The wall quality dataset was pre-trained with the images from ImageNet by substituting the weights and then trained with convolutional neural network models such as Conv2D Xception, DenseNet, VGG19, and ResNet models to extract the essential features from the image. The designed base model was fitted to custom layers with various activation layers, such as softplus, softsign, tanh, selu, elu, and exponential, to analyse the performance defect identification in the wall quality. The performance was analysed in terms of model loss, model accuracy, precision, recall, and F-score, as shown in
Table 5,
Table 6,
Table 7 and
Table 8 and
Figure 5,
Figure 6 and
Figure 7. It was observed that the performance of ResNet was comparatively better, with greater accuracy and F-score values, followed by DenseNet, Xception, and VGG19 models. The model performance across different activation layers was studied. It was noted that the models’ performance metrics were better with the exponential activation layer than with the other layers. Thus, the Conv2D ResNet model implemented with the exponential activation layer provided an enhanced model for wall quality detection.
The performance of the Conv2D Xception, DenseNet, VGG19, and ResNet models using the exponential activation layer was studied using learning curves during the training and validation phases.
Figure 5 shows the accuracy and loss curves of all models over each epoch during training phase. The results revealed that accuracy gradually increased over epochs and stabilized after a threshold. The learning loss of the Xception, DenseNet, and ResNet models was more significant during early epochs and dropped as the epoch increased.
Figure 6 and
Figure 7 showcase the learning curves of the models during the validation phase. It is noted the performance of ResNet exponential model had a consistent increase in accuracy and decrease in loss at each epoch. The other models were found to have declining accuracy at certain epochs.
The wall quality dataset was fitted with the Conv2D ResNet model, which acted as a base model, thereby learning the types of wall defects from 90% of the training data. Now the model had learnt the exact region of interest through which it categorized the wall defect class as cracks, holes, efflorescence, damp patches, or spalls from the more significant number of images. The acquired knowledge from Conv2D ResNet was transferred to refit the model by integrating the layer with the exponential activation function, which identified the wall defects of a single image, thereby validating the transfer learning. When the test image was fitted with the Conv2D ResNet exponential model, the given image was validated with the wall defect class of the actual value and predicted value of the model, which was termed as the True and Guess label, respectively shown in
Figure 8,
Figure 9,
Figure 10 and
Figure 11.
Implementation results portrayed that the proposed Conv2D ResNet model with exponential activation layer outperforms with an accuracy of 0.9147, precision of 0.9212, recall of 0.9134, and F-Score of 0.9978 compared with other Conv2D models, such as Xception, VGG19, and DenseNet. The comparative study of learning accuracy during the training and validation phases of the implemented models is represented in
Figure 12a,b. The results illustrated the accuracy of each model at every epoch, and it is noticeable that the proposed Conv2D ResNet exponential model steadily gained accuracy at every epoch, whereas the VGG19 model struggled for accuracy during both phases. The DenseNet and Xception models showed instabilities with epochs. The evaluation loss attained by the models is depicted in
Figure 12c, which shows that the Conv2D ResNet exponential model had minimal losses at each epoch compared with the DenseNet, Xception and VGG19 models in that order. The Conv2D ResNet exponential model performed efficiently, producing a greater accuracy with minimal loss in wall quality detection.
5. Conclusions
This study attempted to analyse the performance of DL models for evaluating the quality of wall structures. This work’s main contribution is designing the Conv2D ResNet exponential model-based architecture that classifies wall defects, such as cracks, holes, efflorescence, damp patches, and spalls. A dataset with 5000 images was used to train the proposed model, which achieved the requirements of this research work and outperformed the other Conv2D models. The Conv2D ResNet model with 48 convolution layers, one maxpool, and an average pool layer was implemented in this study. This model served as the base and integrated with the exponential activation layer, improving the classifier’s performance in detecting wall defects. The proposed Conv2D ResNet exponential model was further investigated using the performance metrics precision, recall, F-score, and accuracy. The Conv2D ResNet exponential model classified the wall defect type through transfer learning and was also used to analyse the performance of the other CNN model with several activation layers. The wall quality dataset was fitted with the Conv2D ResNet model, which acted as a base model, thereby learning the types of wall defects from 90% of the training data. The acquired knowledge from Conv2D ResNet was transferred to refit the model by integrating the layer with the exponential activation function, which identified the wall defects of a single image, thereby validating the transfer learning by transferring the knowledge from the base model to the custom layer that enhances the accuracy.
This research provides a proper fitting of residual networks to reduce the loss, thereby improving the accuracy, of wall classification with other Conv2D models. The performances of the Xception, VGG19, DenseNet, and ResNet models were fitted with different activation layers such as softplus, softsign, tanh, selu, elu, and exponential, along with transfer learning and analysed using performance evaluation metrics. The dataset used for the proposed Conv2D ResNet exponential model can be used for classifying the defect type in the wall. The same dataset could also be used to identify the defect’s depth through object detection methods. However, categorizing the class of defects, such as cracks, holes, efflorescence, damp patches, and spalls in the walls directly related to the characteristics of the wall quality available in the dataset. Once the wall defect class is identified, the respective maintenance procedure can easily be conducted. Implementation results portrayed that the proposed Conv2D ResNet model with exponential activation layer outperforms with an F-Score of 0.997826 compared with other Conv2D models, such as Xception, VGG19, and DenseNet. The potential findings of the proposed Conv2D ResNet exponential model are identifying the appropriate activation layer function that provides the highest accuracy in predicting the type of wall defect. The proposed Conv2D ResNet exponential model improved the overall effectiveness of classifying the wall defects compared with the other deep learning techniques. As an overview of novelty, the Conv2D ResNet exponential model was built over the Conv2D ResNet base model and extended the transfer learning using custom activation layers that effectively classify the wall defects with high accuracy. Despite the Conv2D ResNet exponential model’s impressive performance, it is still challenging for researchers to fine-tune the base model hyper-parameters by integrating them with various optimizers. This research’s future enhancement focused on validating the accuracy of the wall defect prediction for various combinations of convolutional layers and probabilistic loss functions.