Limited Field Images Concrete Crack Identification Framework Using PCA and Optimized Deep Learning Model

Pan, Yuan; Zhou, Shuangxi; Guan, Jingyuan; Wang, Qing; Ding, Yang

doi:10.3390/buildings14072054

Open AccessArticle

Limited Field Images Concrete Crack Identification Framework Using PCA and Optimized Deep Learning Model

by

Yuan Pan

¹,

Shuangxi Zhou

^1,2,

Jingyuan Guan

³,

Qing Wang

⁴ and

Yang Ding

^5,*

¹

School of Civil Engineering and Architecture, East China Jiaotong University, Nanchang 330013, China

²

School of Civil and Engineering Management, Guangzhou Maritime University, Guangzhou 510725, China

³

Guangdong Tobacco Jieyang City Co., Ltd., Jieyang 522000, China

⁴

China Nerin Engineering Co., Ltd., Nanchang 330013, China

⁵

Department of Civil Engineering, Hangzhou City University, Hangzhou 310058, China

^*

Author to whom correspondence should be addressed.

Buildings 2024, 14(7), 2054; https://doi.org/10.3390/buildings14072054

Submission received: 21 May 2024 / Revised: 21 June 2024 / Accepted: 3 July 2024 / Published: 5 July 2024

(This article belongs to the Special Issue Research on the Construction Mechanical Behavior and Deformation Characteristics of Lining Structure—2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

:

Concrete crack identification methods based on machine learning can greatly improve extraction efficiency and precision. However, in many cases, model training requires a large amount of sample data, and insufficient data makes it difficult to effectively obtain model parameters. This study introduces a deep learning framework that integrates filters, principal component analysis, and attention mechanisms suitable for small sample sizes. Firstly, the histogram equalization method is used for the raw images, which can effectively enhance image contrast. Then, to acquire effective images of the crack, different methods are employed for crack detection, which are subsequently handled by principal component analysis (PCA) for optimal feature choice. Att-Unet and Att-Mask R-cnn segmentation models are used to design the detection for concrete cracks. To raise the learning ability of the segmentation models, an attention mechanism is applied to each feature layer of the decoder, and the loss function is evaluated using a combination of the Focal function and Cross Entropy. To verify the effectiveness of the proposed method, Deep Crack datasets and 76 sets of concrete crack data were collected for testing. Experimental results have shown that the method proposed can significantly reduce the model’s demand for data volume and improve training speed, which provides a new direction for small-sample crack extraction.

Keywords:

concrete crack; limit images; deep learning; principal component analysis; attention mechanism

1. Introduction

Concrete structures, which are widely utilized in infrastructure projects, often exhibit diverse defects resulting from various factors such as dynamic loading, environmental impacts, and initial material imperfections [1,2]. For safety reasons, individuals conduct regular inspections of structures to maintain their structural integrity and safety. However, the commonly used detection methods are highly labor-intensive and time-consuming, and heavily reliant on the expertise and experience of professionals. Computer vision technology (CV) has been enthusiastically pursued by researchers. It is widely used in concrete crack extraction techniques based on a convolution neural network (CNN). Therefore, CV uses a camera system or unmanned aerial vehicle (UAV) to supervise infrastructure. It is crucial to incorporate image processing techniques to extract significant characteristics of concrete cracks from images.

Recently, to enhance the effective extraction of the concrete crack, significant research efforts have been made to identify cracks and segment them accurately. Yu et al. [3] integrated machine learning, image processing, and information fusion techniques to thoroughly and multifacetedly inspect for defects in reinforced concrete. Various filters were introduced to extract edge features, and principal component analysis was utilized to determine the optimal set of feature bands. A total of 1287 images were utilized in the experiments, and the results indicated that it significantly enhanced detection precision. K C et al. [4] proposed an identification framework based on a CNN model. This approach uses approximately 40,000 images in a CNN model to predict the crack depth. Li et al. [5] compared five network approaches; a dataset including 500 images was used to predict the results of trained segmentation models used in ensemble learning models. Shim et al. [6] developed a deep network which mixed the ratio of training data and used a stereo-vision-based triangulation technique to measure crack size. In this study, 1522 labeled images and 6000 unlabeled images were involved in testing. A crack extraction method based on stacking ensemble learning was introduced by Lee and Kim [7], which solves the problem of low prediction accuracy of individual models. Ali et al. [8] compared four classical models based on training data size network complicacy. Experimental results show that training data size significantly affects model performance.

Many classical network models such as lightweight Deeplabv3+ [9], Mask R-cnn [10], Unet [11], and PSPnet [12] are used for identifying concrete cracks. Xun et al. [13] introduced an improved DeepLabV3+ model, which is based on a combined three-line attention module, original pyramid pooling module, MobileNetV2 network, and Focal loss, to process the raw images for concrete cracks (4160 images). He et al. [14] employed D-LinkNet as a backbone to determine the crack detection algorithm. The segmentation branches allow the model to focus more on the distance between the line and the well site. In order to solve the problem of data imbalance, in a study by Cao et al. [15], the MSLPG was added to the Mask DenseNet+network framework. Compared with the ablation experiment, this method demonstrates enhanced accuracy, sensitivity, and AUC. Yu et al. [16] presented an improved U-Net model (RLAU-Net) based on mirror padding and strip pooling kernels. The proposed model demonstrates that it has high detection efficiency and accuracy. Rao et al. [17] used attention and Random Forest to predict crack width. Compared with four other segmentation models, the proposed model has superior performance. Wei et al. [18] employed shift pooling PSPNet to replace the original pooling module, allowing the pixels of the grid to gain the entirety of the local features. The improved PSPNet obtains the best result and is closest to the label image.

Even though the aforementioned methods effectively extract crack features, the extraction accuracy is largely influenced by the size of the sample data. In actuality, due to the work environment and conditions, gathering a large collection of images of concrete cracks is quite challenging. Existing methods are not able to guarantee accuracy in the case of a small sample size. To address this issue, image rotation, mirroring, and cropping are employed to expand the training data sample size. This is also applied to the identification and detection of concrete cracks [19,20,21]. Hou et al. [22] utilized a data augmentation method to expand sample data, and then convert the three-band color image into a binary image. The proposed method can effectively enlarge data volume and increase the test accuracy. Wang et al. [23] used nine traditional data augmentation methods to participate in model training, while using greedy algorithms to calculate and obtain the best combination of data. The results show that the rotation method has the best performance in crack detection. A generative adversarial network (GAN) was used to generate new images by Xu et al. [24]. The proposed method proved to be a significant improvement. Despite the fact that data augmentation methods can relieve the problem of small data samples, further research is needed in this field. The main reason is that simply expanding the data volume is not enough to increase the accuracy.

To address the challenges of small-sized sample data, a crack detection framework is proposed via fusing data space and feature space [25], principal component analysis (PCA) [26], and deep learning [27]. To start with, under small sample conditions, the images are improved using the histogram equalization method. Then, various filters extract features from the image after histogram equalization GLCM (gray level co-occurrence matrix) [28] and edge extraction [29], and the sensitive concrete crack is extracted by principal component analysis (PCA). The classic segmentation models, Unet and Mask R-cnn, are selected to train the model on concrete crack images. To enhance the model’s learning capability, an attention mechanism module is incorporated during the decoding process, allowing the model parameters to allocate more weight toward target recognition. Finally, the Dice function and Focal algorithm are employed to balance uneven sample data and to improve the model’s behavior.

2. Materials and Methods

2.1. Establishment of Dataset

For network model learning, 816 images and label datasets were used, as shown in Figure 1. Among them, 739 images were Deep Crack [30], which are open images. To further evaluate the effectiveness of this study, we collected 76 images of concrete cracks from the civil materials laboratory.

Deep Crack: In the deep crack datasets, there were 739 RGB images and manually labeled images. Images were cropped to a size of 512 × 512; among them, 100 images were used for this proposed method.

Self-Made Crack (SMC): We collected 76 RGB concrete crack images via an Apple phone, and cropped them to files with a resolution size of 512 × 512 and stored them in TIFF format with an average size of 40 kb. Image mirroring and rotation were employed to process the images to expand the amount of data.

2.2. Original Image Pre-Processing

2.2.1. Histogram Equalization

Due to concrete being made by mixing building materials such as sand and gravel, steel bars, and concrete, the difficulty of identifying concrete cracks lies in the high similarity between the cracks and the background (especially with tiny cracks) and the influence of noise. Therefore, to enhance the crack extraction accuracy, it is necessary to use image enhancement methods to eliminate the influence of background textures on concrete surfaces.

Histogram equalization [31] is an enhancement technique that aims to increase brightness and thus adjust contrast in images where the concrete crack and background have similar pixel values, thereby increasing the dynamic range of grayscale differences between pixels and achieving the effect of enhancing the overall contrast of the image. Before adding the figure data to the network model, histogram equalization first processes the image data to highlight the local features of the cracks. Histogram equalization is capable of creating great difference between the cracks and the background, as seen in Figure 2. The cracks are black while the background is brighter, showing that the pre-processing of images has an enhanced effect on crack identification.

2.2.2. Image Augmentation

Datasets of high quality and quantity are the main factors in achieving high accuracy while training a net model. Traditional data augmentation methods usually only change the image based on the direction or size of the data, without altering the basic features of the image. Therefore, after histogram equalization, pre-processed images are further processed for feature detection of a crack. It is believed that the pixels between the crack and background are significantly different. Therefore, texture filters can extract more crack features and enhance their differences from the background. In this study, seven different texture filters are utilized to change the image form.

Contrast: Contrast refers to the systematic comparison of grayscale values between two corresponding pixels. A high contrast indicates substantial pixel differences, resulting in a visually striking contrast; a low contrast, on the other hand, denotes minor pixel differences, which may be less noticeable visually. The pixel difference between concrete cracks and the background is significant with high contrast, whereas the differences between cracks and the background are minor with low contrast. Consequently, employing contrast for extracting crack features is effective.

C on = \sum_{i} \sum_{j} {(i - j)}^{2} P (i, j)

(1)

i represents the row elements of the image, j represents the column elements of the image, P(i, j) represents the element values in the i row j column of the grayscale co-occurrence matrix.

ASM: Angular Second Moment (ASM) is an algorithm for evaluating global images, which primarily computes the sum of the squared values of all pixels in the gray-level co-occurrence matrix. This algorithm provides an overview of the entire grayscale value distribution in the image. In the concrete crack area, the pixel variations are minimal, resulting in low energy. However, when the pixel is in the concrete crack or background area, that is, at the edge position, the energy value is higher.

A sm = \sum_{i} \sum_{j} P {(i, j)}^{2}

(2)

Correlation: Correlation characterizes the local features of an image by computing the differences in grayscale values along the horizontal and vertical axes of the crack image. A higher correlation value signifies a stronger correlation in that direction, suggesting the presence of texture features; conversely, a lower correlation value indicates the absence of texture features in that direction.

C o r r = \frac{[\sum_{i} \sum_{j} ((i j) P (i, j)) - u_{x} u_{y}]}{σ_{x} σ_{y}}

(3)

Entropy: Entropy is an evaluation algorithm for the global information complexity of crack images. It primarily measures the level of information content within the image. The highest value, reflecting maximum randomness in pixel values, indicates a more complex crack image; lower values reflect simpler crack images.

E n t = - \sum_{i} \sum_{j} P (i, j) \log P (i, j)

(4)

Homogeneity: Homogeneity is an algorithm that computes the patterns of feature value variations in local crack images. A very low value suggests minimal texture changes in the crack image within the area, whereas a high value indicates significant texture changes, suggesting the presence of cracks in that area.

H = \sum_{i} \sum_{j} \frac{P (\frac{i, j}{d, θ})}{1 + {(i, j)}^{2}}

(5)

Laplace filter: Laplace filter is a method that can achieve data sharpening, which can enhance the areas where cracks and background grayscale suddenly change in concrete cracks, and can weaken the areas where grayscale slowly changes.

g (x, y) = f (x, y) + c [\nabla^{2} f (x, y)]

(6)

Here, f(x, y) is the original image, and g(x, y) is the sharpened image, when the center of the convolution kernel is negative c = −1, and when the center of the convolution kernel is positive c = 1.

Sobel filter: The Sobel operator is a combination of Gaussian smoothing and first-order discrete differential operators, mainly used for edge extraction of images.

G_{X} = [\begin{matrix} - 1 & 0 & + 1 \\ - 2 & 0 & + 2 \\ - 1 & 0 & + 1 \end{matrix}] \times A G_{y} = [\begin{matrix} + 1 & + 2 & + 1 \\ 0 & 0 & 0 \\ - 1 & - 2 & - 1 \end{matrix}] \times A G = \sqrt{G_{x}^{2} + G_{y}^{2}}

(7)

Among them are the convolution factors in the horizontal (x) and vertical (Y) directions. A is the original image, and G is the approximate value of the horizontal and vertical gradients for each pixel in the image.

Figure 3 show comparative examples of various filters used in the processing of crack images. As clearly illustrated in the figure, various texture filters produce distinct results when applied to the original image, with certain filters adept at distinguishing cracks from the background. Overall, the gray image processed by the filter can be used to extract features as input for net model classifiers.

2.2.3. Image Data Fusion

The augmentation feature images are fused with the original RGB images into high band images (HBI). This can provide more crack information in the images; the richer the crack information data, the easier it is for the net model to learn to further improve the extraction accuracy. However, after integrating the image, the dimensional data of the image is greatly increased, resulting in data redundancy and increased network training time, leading to a decrease in efficiency. Consequently, finding a methodology to reduce data complexity while maintaining data integrity is crucial. The principal component analysis algorithm can convert multi-band images into images with a smaller number of bands while retaining image information, and while preserving important information from the original data as much as possible. The effect of processing high-dimensional datasets is very obvious because it can reduce computational resources and improve algorithm efficiency. This study combines multiple feature images to obtain image data with high-dimensional features. Therefore, through the PCA algorithm, while ensuring that the crack feature information is not missing, the dimension of concrete crack images is reduced, and the computational load and time taken by the network model are decreased.

Using principal component analysis (PCA), the image can be substituted by its principal component bands, arranged in descending order of their individual contribution to the total variance. Figure 4 shows the PCA results of an image processed by Deep Crack image number 0009 (after fusion of all features, the image dimension is 24), displaying the individual contributions of the image. It can be observed that the first two PCs have the highest eigenvalues, at 22,966.3229 and 22,543.8473, respectively, which can be seen as Table 1. Starting from the third PC, the eigenvalue of the PC decreases to below 11,251.2018, while the cumulative eigenvalue can continue to rise. The first 15 PCs can contribute over 95% to the total. Even if there is a minor loss of approximately 5% in the feature information, the significant reduction in feature dimension offers a considerable advantage during the training of the classifier model. Adhering to the identical selection criterion of maintaining over 95% of the total contribution, the selected principal component (PC) is 15, commonly referred to as PC15 (Table 1).

3. Deep Learning Network Framework

3.1. Backbone Network

The role of backbone networks is to improve the learning efficiency of concrete crack recognition models. He et al. [32] proposed the residual network in 2015 to address the issue of gradient disappearance, which typically happens in networks as network depth increases. By enlarging the depth of the network and utilizing skip connections within its internal structure, which has good generalization capabilities, it extracts deep feature information and prevents gradient disappearance. As a result, the residual network is selected as the backbone network.

3.2. Semantic Segmentation Models

A semantic segmentation model is an end-to-end, pixel-to-pixel detection method. Among these, the encoder-decoder model is an end-to-end salient learning algorithm, which allows the entire model to be trained end-to-end without the need for a manual feature or rule design. It is a commonly used neural network model, where the encoder is responsible for converting input data into a fixed-length vector for representation, which contains semantic information about the input data. The role of the decoder is to compare the semantic information of the encoder with the previous prediction results, gradually forming the target result.

Two well-established models, Mask R-cnn and U-Net, were employed to validate the efficacy of our proposed methodology. Figure 5 and Figure 6 show the three semantic segmentation models. Figure 4 shows the Attention-Mask R-cnn (Att-Mask) architecture used in this study. Att-Mask primarily comprises three components: input layer, feature extraction layer, and fully connected layer. Among them, the main function of the feature extraction layer, including the backbone network and feature pyramid network, is to acquire effective features from the original data. As the network deepens, it often leads to problems such as model over-fitting and gradient vanishing, but the deep residual module can effectively address this issue. Therefore, we chose the residual network Resnet as the backbone network. The fully connected layer is to compare the obtained effective features with the labels and further modify the model parameters. A softmax activation function is added in the fully connected layer to make the sum of the output values 1. The loss function is utilized to quantify the effectiveness or performance of the model.

Figure 6 shows the Attention-Unet (Att-Unet) architecture used in this concrete crack segmentation. It is called an encoder-decoder structure. In fact, the encoder is a contracting process, while the decoder is an expansive process. During the contracting process, we use VGG16 [33] to extract backbone features (Conv1, Conv2, Conv3, Conv4, Conv5). While the input image size is 512 × 512 × 3, in the Conv1 path, we perform two 3 × 3 64-channel convolutions to obtain an effective feature layer of [512, 512, 64], and then perform 2 × 2 max pooling to obtain a layer of [256, 256, 64]. The acquisition of other feature layers is similar to Conv1, except that the number of channels gradually increases to 128, 256, 512, 1024, respectively. Using the contracting step, we can obtain five preliminary effective feature layers. In this paper, before up-sampling, attention mechanism operations on the five feature layers were obtained to enhance the network’s sensitivity to targets and increase target weights. We utilize the five preliminary effective features to further extract deep features. We up-sample the preliminary effective features to obtain a broader range of feature layers and fuse them with the previous effective feature. During the up-sampling process, employing a 2 × 2 convolution kernel can reduce the number of feature channels.

3.3. Attention Mechanism

The SENet [34] (Figure 7) model’s main focus is to obtain the input feature layer and its per-channel weights. To start with, global average pooling is performed on the feature map F, then two full connections are performed. After completing the two full connections, we take the Sigmoid function again to fix the value to between 0 and 1, and then obtain the weight (between 0 and 1) for each channel. Upon acquiring this weight, we superimpose it onto the original input feature layer.

4. Feature-Level-Based Crack Using Segmentation Model

In this section, an experimental model for identifying concrete cracks based on the above methods is introduced. To improve the capacity of the segmentation model, a selection of hyperparameters and an attention mechanism are proposed during the segmentation model training. This experiment is only a selection of segmentation model hyperparameters, so 100 images from the Deep Crack images were selected for testing.

4.1. Selection of Hyperparameters

The setting of segmentation model parameters has a significant impact on the training results, so it is particularly important to configure segmentation parameters. In this experiment, the computer configuration is an AMD Ryzen 73800X processor and NVIDIA GeForce RTX 2060 graphics card. The code environment configuration is Python 3.6, CUDA10.1, and Tensorflow2.2.0 from Google, Mountain View, CA, USA. The hyperparameters to be selected are batch-size, optimizer, momentum, epoch, and learning rate. Partial hyperparameters such as optimizer and momentum refer to the experience. Based on the computer configuration, the batch size is set to 4.

The learning rate is the most important hyperparameter; it has a significant impact on network model training. Through many network model training experiences, it is known that setting the learning rate too high may lead to the model not converging. If the learning rate is too low, the training time will be long and the efficiency will be low. Therefore, the range of learning rate is set to 0.001–0.0001. Based on the convergence of loss values the epoch is set to 100.

The training loss curve and validation loss curve are set according to the epoch for two segmentation models of each architecture (Figure 8). When the Unet learning rate is 0.001, 0.0005, the loss values are almost the same and when the learning rate is 0.0001 and 0.0002 the loss values of the model are the best performing. Considering both the training time and accuracy, the learning rate of 0.0002 is better.

For the Mask R-cnn model, when the learning rate is 0.001, the decrease rate of the training loss value is slower than that of other learning rates, and the final loss value is relatively higher, indicating that the learning rate at 0.001 cannot fully mobilize the model’s learning ability. The loss curves of the other three learning rates are similar, with lower loss values. Considering the impact of time, 0.0005 is chosen as the optimal learning rate for this model.

4.2. Attention Mechanism Network

The attention mechanism is usually achieved through a series of trainable weights, which are learned during the training process of the model. Through this approach, the model can dynamically adjust its focus on input data, thereby achieving better performance in identifying concrete cracks.

Figure 9 shows the comparison of loss values between two segmentation models with the attention mechanism and the original segmentation model. Although the loss value of Att-Unet (Figure 9a) in the initial epoch is higher than that of the Unet network, as the number of epochs increases, the loss value of Att-Unet decreases faster, and the final training loss value is also lower. In the validation training, the final loss value of Att-Unet is more advantageous, indicating that the segmentation model with an added attention mechanism has a stronger ability to extract targets.

Figure 9b shows the training loss curve and validation loss curve of the Mask network model with an added attention mechanism compared with the Mask R-cnn. From the trend of the loss curve, it can be seen that the Att-Mask network model with the added attention mechanism tends to stabilize faster and has lower loss values; the addition of the attention mechanism has a positive impact on the recognition ability of the network.

4.3. Predicted Images of Two Segmentation Models

Figure 10 shows the results of two datasets in Unet and Att-Unet networks, respectively. From the prediction results in images, it can be seen that the prediction performance of the two datasets after the Att-Unet model training is better than that of the Unet model. Unet segmentation prediction results showed a large range of misclassification and omission phenomena. The prediction results of the Att-Unet segmentation show little difference from the labeled images, and there is no misclassification phenomenon. However, the ability to detect small cracks still needs to be improved.

To verify the adaptability of the added attention mechanism, Mask R-cnn and Att-Mask semantic segmentation models were trained on the SMC dataset to obtain prediction results. This is due to the fact that the Mask R-cnn network model has three outputs including mask, classes, and boundary box regression.

From the prediction results in Figure 11, we know that the Att-Mask model with the added SENet module has a very similar prediction result to the label image, detecting most targets with a very high confidence level of essentially 0.9 or above. The Mask R-cnn model not only shows the phenomenon of target missed detection, but also has lower confidence compared with Att-Mask. The test that combines two models and data shows that adding SENet is very effective.

5. Results and Discussion

5.1. Precision Evaluation Indicators

For semantic segmentation tasks, IoU, Recall, and Precision are three indicators to evaluate the segmentation results. The function of IoU is to calculate the ratio of the actual pixels in the crack to the predicted pixels. The Recall rate is used to evaluate pixels that are actually cracks and are actually predicted to be cracks. Precision is used to evaluate the ratio of pixels predicted as cracks to the total pixels in cracks. P_T (true positive) refers to pixels that detect the target as the correct target; P_F (false positive) refers to pixels that detect the background as the target; N_F (false negative) refers to pixels that detect the target as other targets; N_T (true negative) refers to pixels that predict the background as the background, and the calculation formula is shown in Formulas (8)–(10).

P resicion = P_{T} / (P_{T} + P_{F})

(8)

I oU = P_{T} \cap (P_{T} + P_{F}) / P_{T} \cup (P_{T} + P_{F})

(9)

R ecall = P_{T} / (P_{T} + N_{F})

(10)

5.2. Scores of Two Segmentation Models

Two sets of data were input into two segmentation models for comparative experiments. The method name consists of two parts: the name of the model and the input data type. For example, Att-Unet-deepcrack100-PC15 indicates that the method uses the Att-Unet segmentation model. The input data consists of 100 filtered and enhanced Deep Crack data sets selected by PCA. Table 2 shows the comparison of three accuracy indicators and training time obtained after 100 epochs of each method. From Table 2, it can be seen that regardless of the type of data, the accuracy indicators processed by the method in this article are much higher than those of the unprocessed data. For example, the Precision of Att-Unet-deepcrack100-PC15 is 85%, Recall is 91%, and IoU is 92%, which is 11%, 4% and 9% higher than the Att-Unet-deepcrack100, respectively. The experimental results demonstrate that in the case of limited data, the proposed method can greatly improve the accuracy of object detection. Of course, due to the significantly increased dimensionality of input data, the training time of the model has increased. Att-Unet-deepcrack100-PC15 required 773 min for 100 training epochs, whereas Att-Unet-deepcrack100 only needed 319 min, saving 454 min compared with Att-Unet-deepcrack100-PC15. However, the accuracies of Att-Unet-deepcrack100-PC15 were 85%, 91%, and 92%, respectively, which represent values of 11%, 4%, and 9% higher than the prediction accuracies of Att-Unet-deepcrack100. This represents a significant improvement. Therefore, when the sample size is small and there is ample time, this method is highly suitable for identifying cracks with high accuracy.

5.3. Predicted Images of Two Segmentation Models

This section aims to validate the effectiveness of augmented data in target detection. Figure 12 and Figure 13 show the predicted images and labeled images of the two segmentation models.

The input data for the Att-Unet model is the Deep Crack data, and from Figure 12 the target prediction result figure after data expansion is closer to the label image than the target prediction result figure of the original data. The prediction results using original data show a large range of target missing phenomena, especially when the cracks are narrow, and the prediction effect is not satisfactory. However, when using augmented data for object detection, even for tiny cracks, the c segmentation model can still effectively extract crack targets. Although there is still a small gap between the predicted results and the labeled image, the segmentation of the augmented data can effectively improve the detection ability of the model.

To verify the adaptability between data and models, the Att-Mask model was used to conduct experiments using SMC data. The overall prediction results of the Att-Mask segmentation model are weaker than those of the Att-Unet segmentation model. However, from the resulting graph, it can be seen that the dataset obtained through the method in this article serves as input data, and the overall accuracy of the prediction results is much higher than that of the original small amount of data. The confidence in the output results has also been correspondingly improved, which also verifies the effectiveness of the method proposed.

5.4. Experimental Comparison Results

By using classic feature extraction methods to extract a small amount of image data, increasing the dimensionality of the image can effectively improve the training parameters of deep learning models. The Deep Crack dataset is divided into six sets, including 200 images, 300 images, 400 images, 500 images, 600 images, and 739 images, for comparative experiments using the Att-Unet segmentation model. This study will conduct experimental comparisons from three aspects: loss curve, prediction accuracy, and prediction image. Figure 14 shows the loss curve of each group of data compared with the method proposed. The loss curve of the 100PC15 dataset tends to stabilize quickly and the validation loss value is lower, indicating that high-dimensional data has stronger expressive power.

As shown in Figure 15, firstly, with the increase in training samples, the accuracy of the prediction results improves. The prediction results of 500 image training samples are much higher than those of 100 image training samples. As the number of samples increases, and particularly when the number of samples reaches 600 or more, more misclassification occurs in the prediction results. The reason for this phenomenon may be due to overfitting of the segmentation model as the sample size increases, resulting in unsatisfactory prediction results. This indicates that a greater number of samples is not better.

Figure 15g shows the prediction results of the proposed method, although the results are slightly worse than those predicted using 500 images, with a small amount of errors. But it has better predictive performance than the other groups. In the case of insufficient data, using this method can significantly improve prediction accuracy.

Figure 16 shows three accuracy curves for each data group with epochs ranging from 1 to 100. When the data consists of 500 images, the performance of the three accuracy curves is the best, with the highest IoU reaching 86.23%, the highest Precision reaching 94.23%, and the highest Recall reaching 93.93%. As the amount of data increases, the three accuracies also improve. When the amount of data exceeds 500, the accuracy begins to slightly decrease. This phenomenon may be due to poor data quality in the later stage of Deep Crack, resulting in poor model parameter quality after network model training. In subsequent work, we will search for images with clear quality for further experimentation. The prediction accuracy of the deepcrack100-PC15 dataset is in the upper middle range, which is higher than the accuracy of the 100–300 dataset and lower than the accuracy of the 400–500 dataset. This indicates that using multiple filters to extract features for training can greatly improve prediction accuracy, further proving its effectiveness.

6. Conclusions

In this research, various filters were employed to expand the sample dataset, enhance sample diversity, and fuse it with the original data into multi-band, high-dimensional sample data when the sample size was small. Due to the multi-band nature of the samples, it is easy to cause information redundancy, while principal component analysis algorithms can reduce sample dimensions and preserve over 95% of data information. Reducing the dimensionality of sample data can improve the efficiency of segmentation models. Then, two models, Att-Unet and Att-Mask, were established for training and prediction. The SENet attention mechanism was chosen to optimize the parameters of the model. The data results indicate that the highest prediction accuracy based on the Att- Unet model is 96.14%. In addition, through comparative experiments, it is shown that the method proposed in this article is a superior method to use with small datasets in identifying concrete cracks. Therefore, the proposed method can be regarded as a potential method for concrete cracking. In the future, more concrete images with fewer concrete cracks will be collected to evaluate its effectiveness.

Author Contributions

Funding acquisition, S.Z. and Y.D.; Investigation, Y.P. and Q.W.; Methodology, Y.P. and Q.W.; Software, Y.P. and J.G.; Supervision, Q.W. and Y.P.; Writing—review and editing, Y.P., J.G. and Q.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by a Training plan for academic and technical leaders of major disciplines in Jiangxi Province (grant no. 20213BCJL22039), Natural Science Foundation of China (grant no. 52163034), and Scientific and technological innovation activity plan for college students in Zhejiang Province (grant no. 2019R401212).

Data Availability Statement

Data is contained within the article.

Conflicts of Interest

Author Jingyuan Guan was employed by the company Guangdong Tobacco Jieyang City Co., Ltd., Author Qing Wang was employed by the company China Nerin Engineering Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Ding, Y.; Ye, X.W.; Zhang, H.; Zhang, X.S. Fatigue life evolution of steel wire considering corrosion-fatigue coupling effect: Analytical model and application. Steel Compos. Struct. 2024, 50, 363–374. [Google Scholar]
Ding, Y.; Ye, X.W.; Su, Y.H.; Zheng, X.L. A framework of cable wire failure mode deduction based on Bayesian network. Structures 2023, 57, 104996. [Google Scholar] [CrossRef]
Yu, Y.; Rashidi, M.; Samali, B.; Yousefi, A.M.; Wang, W. Multi-image-feature-based hierarchical concrete crack identification framework using optimized SVM multi-classifiers and D–S fusion algorithm for bridge structures. Remote Sens. 2021, 13, 240. [Google Scholar] [CrossRef]
Laxman, K.C.; Tabassum, N.; Ai, L.; Cole, C.; Ziehl, P. Automated crack detection and crack depth prediction for reinforced concrete structures using deep learning. Constr. Build. Mater. 2023, 370, 130709. [Google Scholar] [CrossRef]
Li, S.; Zhao, X. A performance improvement strategy for concrete damage detection using stacking ensemble learning of multiple semantic segmentation networks. Sensors 2022, 22, 3341. [Google Scholar] [CrossRef] [PubMed]
Shim, S.; Kim, J.; Cho, G.C.; Lee, S.W. Stereo-vision-based 3D concrete crack detection using adversarial learning with balanced ensemble discriminator networks. Struct. Health Monit. 2023, 22, 1353–1375. [Google Scholar] [CrossRef]
Lee, T.; Kim, J.H.; Lee, S.J.; Ryu, S.K.; Joo, B.C. Improvement of Concrete Crack Segmentation Performance Using Stacking Ensemble Learning. Appl. Sci. 2023, 13, 2367. [Google Scholar] [CrossRef]
Ali, L.; Alnajjar, F.; Al Jassmi, H.; Gocho, M.; Khan, W.; Serhani, M.A. Performance evaluation of deep CNN-based crack detection and localization techniques for concrete structures. Sensors 2021, 21, 1688. [Google Scholar] [CrossRef] [PubMed]
Fu, H.; Meng, D.; Li, W.; Wang, Y. Bridge crack semantic segmentation based on improved Deeplabv3+. J. Mar. Sci. Eng. 2021, 9, 671. [Google Scholar] [CrossRef]
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
Wahyuni, I.; Wang, W.J.; Liang, D.; Chang, C.C. Rice Semantic Segmentation Using Unet-VGG16: A Case Study in Yunlin, Taiwan. In Proceedings of the 2021 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS), Hualien, Taiwan, 16–19 November 2021; IEEE: New York, NY, USA, 2021; pp. 1–2. [Google Scholar]
Zhu, X.; Cheng, Z.; Wang, S.; Chen, X.; Lu, G. Coronary angiography image segmentation based on PSPNet. Comput. Methods Programs Biomed. 2021, 200, 105897. [Google Scholar] [CrossRef]
Zhou, X.; Li, Y.L.; Zhou, Y.Y.; Wang, H.Y.; Li, J.R.; Zhao, J.Q. Dam surface crack detection method based on improved DeepLabV3+ network. J. Tsinghua Univ. Sci. Technol. 2023, 63, 1153–1163. [Google Scholar]
He, H.; Xu, H.; Zhang, Y.; Gao, K.; Li, H.; Ma, L.; Li, J. Mask R-CNN based automated identification and extraction of oil well sites. Int. J. Appl. Earth Obs. Geoinf. 2022, 112, 102875. [Google Scholar] [CrossRef]
Cao, X.; Pan, J.S.; Wang, Z.; Sun, Z.; Haq, A.U.; Deng, W.; Yang, S. Application of generated mask method based on Mask R-CNN in classification and detection of melanoma. Comput. Methods Programs Biomed. 2021, 207, 106174. [Google Scholar] [CrossRef] [PubMed]
Yu, C.; Du, J.; Li, M.; Li, Y.; Li, W. An improved U-Net model for concrete crack detection. Mach. Learn. Appl. 2022, 10, 100436. [Google Scholar]
Rao, A.S.; Nguyen, T.; Le, S.T.; Palaniswami, M.; Ngo, T. Attention recurrent residual U-Net for predicting pixel-level crack widths in concrete surfaces. Struct. Health Monit. 2022, 21, 2732–2749. [Google Scholar] [CrossRef]
Yuan, W.; Wang, J.; Xu, W. Shift pooling PSPNet: Rethinking pspnet for building extraction in remote sensing images from entire local feature pooling. Remote Sens. 2022, 14, 4889. [Google Scholar] [CrossRef]
Ding, Y.; Ye, X.W.; Guo, Y. Copula-based JPDF of wind speed, wind direction, wind angle, and temperature with SHM data. Probab. Eng. Mech. 2023, 73, 103483. [Google Scholar] [CrossRef]
Ding, Y.; Ye, X.W.; Guo, Y.; Zhang, R.; Ma, Z. Probabilistic method for wind speed prediction and statistics distribution inference based on SHM data-driven. Probab. Eng. Mech. 2023, 73, 103475. [Google Scholar] [CrossRef]
Ding, Y.; Wei, Y.J.; Xi, P.S.; Ang, P.P.; Han, Z. A long-term tunnel settlement prediction model based on BO-GPBE with SHM data. Smart Struct. Syst. 2024, 33, 17–26. [Google Scholar]
Hou, Y.; Liu, S.; Cao, D.; Peng, B.; Liu, Z.; Sun, W.; Chen, N. A deep learning method for pavement crack identification based on limited field images. IEEE Trans. Intell. Transp. Syst. 2022, 23, 22156–22165. [Google Scholar] [CrossRef]
Wang, Z.; Yang, J.; Jiang, H.; Fan, X. CNN training with twenty samples for crack detection via data augmentation. Sensors 2020, 20, 4849. [Google Scholar] [CrossRef] [PubMed]
Xu, B.; Liu, C. Pavement crack detection algorithm based on generative adversarial network and convolutional neural network under small samples. Measurement 2022, 196, 111219. [Google Scholar] [CrossRef]
Zhang, X.; Zhang, K.; Sun, Y.; Zhao, Y.; Zhuang, H.; Ban, W.; Chen, Y.; Fu, E.; Chen, S.; Liu, J.; et al. Combining spectral and texture features of UAS-based multispectral images for maize leaf area index estimation. Remote Sens. 2022, 14, 331. [Google Scholar] [CrossRef]
Maćkiewicz, A.; Ratajczak, W. Principal components analysis (PCA). Comput. Geosci. 1993, 19, 303–342. [Google Scholar] [CrossRef]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
De Siqueira, F.R.; Schwartz, W.R.; Pedrini, H. Multi-scale gray level co-occurrence matrices for texture description. Neurocomputing 2013, 120, 336–345. [Google Scholar] [CrossRef]
Acharya, K.; Ghoshal, D. Modified von Neumann neighborhood and taxicab geometry-based edge detection technique for infrared images. Int. J. Wavelets Multiresolut. Inf. Process. 2023, 21, 2350027. [Google Scholar] [CrossRef]
Liu, Y.; Yao, J.; Lu, X.; Xie, R.; Li, L. Deep Crack: A deep hierarchical feature learning architecture for crack segmentation. Neuro Comput. 2019, 338, 139–153. [Google Scholar]
Rahman, H.; Paul, G.C. Tripartite sub-image histogram equalization for slightly low contrast gray-tone image enhancement. Pattern Recognit. 2023, 134, 109043. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
Nijaguna, G.S.; Babu, J.A.; Parameshachari, B.D.; de Prado, R.P.; Frnda, J. Quantum Fruit Fly algorithm and ResNet50-VGG16 for medical diagnosis. Appl. Soft Comput. 2023, 136, 110055. [Google Scholar] [CrossRef]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar] [CrossRef]

Figure 1. Crack datasets.

Figure 2. Image processing results: (a) Deep crack; (b) Deep crack histogram equalization; (c) Self-Made crack; (d) Self-Made crack histogram equalization.

Figure 3. Feature extraction results of images with only cracks by different filters. (a) Original image; (b) Contrast; (c) ASM; (d) Correlation; (e) Entropy; (f) Homo; (g) Laplace; (h) Sobel.

Figure 4. PCA results of images processed.

Figure 5. Att-Mask network structure.

Figure 6. Att-Unet network structure.

Figure 7. SENet attention mechanism.

Figure 8. Loss curves of two segmentation models.

Figure 9. Training loss and validation loss versus epochs: (a) Unet; (b) Mask R-cnn.

Figure 10. Comparison images of predicted results. (a) Original images; (b) labeled images; (c) model predicted images; (d) Att-Unet model predicted images.

Figure 11. SMC data images of predicted results. (a) Original images; (b) labeled images; (c) Mask R-cnn model predicted images; (d) Att-Mask model predicted images.

Figure 12. The prediction results and labeled images of the Att-Unet model. (a) Original image; (b) labeled image; (c) original image predicted result; (d) PC15 image predicted result.

Figure 13. The prediction results and labeled images of the Att-Mask model. (a) Original image; (b) labeled image; (c) original image predicted result; (d) PC15 image predicted result.

Figure 14. Training loss and validation loss versus epochs of Att-Unet.

Figure 15. Performance of test set on Att-Unet models. (a) Original image; (b) labeled image; (c) 100 images predicted result; (d) 300 images predicted result; (e) 500 images predicted result; (f) 739 images predicted result; (g) PC15 image predicted result.

Figure 16. Comparison curve of accuracy for each data group.

Table 1. PCA results of images processed.

PC	Eigenvalue	Percent	PC	Eigenvalue	Percent
1	22,966.3229	18.10%	13	4057.7283	89.90%
2	22,543.8473	35.88%	14	4034.6898	93.08%
3	11,251.2018	44.75%	15	3423.6056	95.78%
4	11,025.9367	53.44%	16	2469.8018	97.73%
5	9314.8470	60.78%	17	1285.1394	98.74%
6	5549.7150	65.16%	18	1240.3846	99.72%
7	5287.4921	69.32%	19	188.9968	99.87%
8	5259.7388	73.47%	20	166.0394	100.00%
9	4388.7498	76.93%	21	0.0000	100.00%
10	4197.7833	80.24%	22	0.0000	100.00%
11	4124.0324	83.49%	23	0.0000	100.00%
12	4077.8208	86.70%	24	0.0000	100.00%

Table 2. Accuracy and computation time.

Methods	Precision/%	Recall/%	IoU/%	Training Time (min)	Epoch
Att-Unet-deepcrack100	74	87	83	319	100
Att-Unet-deepcrack100-PC15	85	91	92	773	100
Att-Unet-SMC	94.42	88.7	84.29	210	100
Att-Unet-SMC-PC15	96.14	94.32	91	409	100
Att-Mask-deepcrack100	72	85	81	1424	100
Att-Mask-deepcrack100-PC15	81	86	88	1851	100
Att-Mask-SMC	92.48	87.64	83.32	808	100
Att-Mask-SMC-PC15	93.56	91.31	88.15	1214	100

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Pan, Y.; Zhou, S.; Guan, J.; Wang, Q.; Ding, Y. Limited Field Images Concrete Crack Identification Framework Using PCA and Optimized Deep Learning Model. Buildings 2024, 14, 2054. https://doi.org/10.3390/buildings14072054

AMA Style

Pan Y, Zhou S, Guan J, Wang Q, Ding Y. Limited Field Images Concrete Crack Identification Framework Using PCA and Optimized Deep Learning Model. Buildings. 2024; 14(7):2054. https://doi.org/10.3390/buildings14072054

Chicago/Turabian Style

Pan, Yuan, Shuangxi Zhou, Jingyuan Guan, Qing Wang, and Yang Ding. 2024. "Limited Field Images Concrete Crack Identification Framework Using PCA and Optimized Deep Learning Model" Buildings 14, no. 7: 2054. https://doi.org/10.3390/buildings14072054

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Limited Field Images Concrete Crack Identification Framework Using PCA and Optimized Deep Learning Model

Abstract

1. Introduction

2. Materials and Methods

2.1. Establishment of Dataset

2.2. Original Image Pre-Processing

2.2.1. Histogram Equalization

2.2.2. Image Augmentation

2.2.3. Image Data Fusion

3. Deep Learning Network Framework

3.1. Backbone Network

3.2. Semantic Segmentation Models

3.3. Attention Mechanism

4. Feature-Level-Based Crack Using Segmentation Model

4.1. Selection of Hyperparameters

4.2. Attention Mechanism Network

4.3. Predicted Images of Two Segmentation Models

5. Results and Discussion

5.1. Precision Evaluation Indicators

5.2. Scores of Two Segmentation Models

5.3. Predicted Images of Two Segmentation Models

5.4. Experimental Comparison Results

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI