Next Article in Journal
A Novel Blunge Calibration Intelligent Feature Classification Model for the Prediction of Hypothyroid Disease
Previous Article in Journal
Overview of Mobile Communications in Colombia and Introduction to 5G
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Crack Segmentation Model Combining Morphological Network and Multiple Loss Mechanism

Department of Information Science, Xi’an University of Technology, Xi’an 710048, China
*
Author to whom correspondence should be addressed.
Sensors 2023, 23(3), 1127; https://doi.org/10.3390/s23031127
Submission received: 6 December 2022 / Revised: 6 January 2023 / Accepted: 16 January 2023 / Published: 18 January 2023
(This article belongs to the Section Physical Sensors)

Abstract

:
With the wide application of computer vision technology and deep-learning theory in engineering, the image-based detection of cracks in structures such as pipelines, pavements and dams has received more and more attention. Aiming at the problems of high cost, low efficiency and poor detection accuracy in traditional crack detection methods, this paper proposes a crack segmentation network by combining a morphological network and a multiple-loss mechanism. First, for improving the identification of cracks with different resolutions, the U-Net network is used to extract multi-scale features from the crack image. Second, for eliminating the effect of polarized light on the cracks under different illuminations, the extracted crack features are further morphologically processed by a white-top hat transform and a black-bottom hat transform. Finally, a multi-loss mechanism is designed to solve the problem of the inaccurate segmentation of cracks on a single scale. Extensive experiments are carried out on five open crack datasets: Crack500, CrackTree200, CFD, AEL and GAPs384. The experimental results showed that the average ODS, OIS, AIU, sODS and sOIS are 75.7%, 73.9%, 36.4%, 52.4% and 52.2%, respectively. Compared with state-of-the-art methods, the proposed method achieves better crack segmentation performance. Ablation experiments also verified the effectiveness of each module in the algorithm.

1. Introduction

Inevitably, there are some injuries and defects in concrete structures that arise in the process of building and using because of reasons having to do with design, construction, loads and materials and so on. These defects, which initially present as cracks, should seriously affect the safety and durability of a structure, and they need to be discovered as early as possible. Crack segmentation based on machine vision is a process of automatically extracting cracks from images captured by cameras. Automatic crack segmentation not only has important theoretical research significance, but also has an important early warning application in practical engineering.
Crack segmentation methods based on machine vision are mainly divided into traditional algorithms [1,2,3,4,5,6,7,8,9] and deep-learning-based methods [10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33].
Compared with the depth-learning method, the traditional crack segmentation method does not require training and is simpler and more direct [1,2,3,4,5,6,7,8,9]. This kind of algorithm is easy to implement and fast in operation. Although the traditional method is simple and fast, it is easily affected by noise. Improper threshold selection will significantly reduce the segmentation accuracy.
Depth-learning-based methods can automatically extract the shallow and deep characteristics of cracks, so they can greatly improve segmentation and detection accuracy with regard to cracks. The backbone networks commonly used for crack segmentation include the multi-scale pyramid structure network [10,11,12,13,14,15,16,17] and the U-Net network [18,19,20,21,22]. Methods based on multi-scale features can extract more abundant crack image features, but the existing methods make it difficult to detect cracks similar to the background. Because of its good performance in medical image segmentation, the U-Net network and its expansion are widely used in crack detection [18,19,20,21,22]. From the experimental results of many studies, it can be seen that the methods based on deep learning have a significant improvement in the effects of crack segmentation. However, the segmentation accuracy of cracks under polarized light is generally poor.
In order to improve crack segmentation performance under polarized light, this paper proposes a crack segmentation method that combines a morphological network and a multi-loss function. In summary, the main contributions of this paper are listed as follows.
(1) In view of inaccurate crack detection caused by the lack of an effective processing mechanism for the influence of the existing depth network model on polarized light, this paper proposes a morphological-processing network module, which processes the crack feature with a white-top hat and a black-bottom hat transformation to correct the influence of uneven lighting, and the bright cracks on the dark background and the dark cracks under the bright background are effectively extracted by the proposed method.
(2) In view of the incomplete detection on different scales of the crack image, a multi-scale loss fusion mechanism is adopted to overcome the problem of too thin or too thick cracks extracted by single-scale detection.
(3) The experimental results show that the proposed method can accurately detect all kinds of cracks in the five public crack datasets.
The paper is organized as follows. Related works are presented in Section 2. The proposed crack segmentation method is proposed in Section 3. Experimental results and analysis are presented in Section 4. The conclusions are drawn in Section 5.

2. Related Works

Compared with deep-learning methods, traditional crack segmentation methods do not require training, and the execution process is simpler and more straightforward [1,2,3,4,5,6,7,8]. In traditional methods, manual features are usually used to extract cracks. Because cracks exhibit certain edge characteristics, commonly used edge detection operators are performed for the extraction of cracks [1]. Based on Euclidean graphs as crack pattern descriptors, salient skeleton features are extracted to locate the cracks in an image [5]. A Gabor filter is used to detect the cracks in any direction [8]. Morphological filters and dynamic thresholding are applied to extract cracks of different thicknesses [9]. Using the pixel brightness difference between cracks and surrounding areas, methods such as random structure forest [4] and the threshold-processing method [6,7] can also effectively extract cracks. Although the traditional method is simple in implementation and fast in operation, it is easily affected by noise. Improper selection of the threshold will significantly reduce the segmentation accuracy.
Because the shallow and deep features of an image can be extracted automatically, the method based on deep learning can greatly improve the accuracy of crack segmentation and detection. Due to the ability to extract the information at different resolutions, multi-scale features are widely used in the segmentation and detection of cracks of different sizes [10,11,12,13,14,15,16,17]. Chun et al. [10] proposed a pavement crack detection method based on multi-scale attention and hesitant fuzzy set (HFS) theory. Saining et al. [11] first performed edge detection with a multiple loss structure, and they then fused multi-scale features to improve the detection effect of cracks. Yang et al. [12] proposed a pyramid structure and a hierarchical boosting network to detect pavement cracks. Wang et al. [13] proposed a bridge crack detection model, which combined the Inception-Resnet-v2 module, multi-scale feature fusion and a GKA clustering mechanism to improve the real-time detection performance. However, it is difficult for this algorithm to segment the small cracks accurately. Sun et al. [14] used an adaptive bilateral filtering algorithm to reduce the influence of noise before using the FPHBN network to segment the cracks. Li et al. [15] fused feature maps of different scales and used a class-balanced cross-entropy loss function to improve the accuracy, speed and robustness of crack segmentation, but the edge of the segmented crack was blurred. The attention mechanism is imposed on the fused multi-scale feature to enhance the distinction of crack feature representation [16]. A feature aggregation network with the spatial-channel squeeze and excitation attention mechanism module was proposed in paper [17] to accurately segment cracks. Although these methods based on multi-scale features can extract more abundant fracture image features, they were unable to distinguish cracks and backgrounds under biased light, which would lead to a decline in segmentation and detection performance.
Since the U-Net network can not only extract multi-scale features of objects with different sizes, the cross-layer connection in the encoder–decoder structure can also avoid information loss in the deep network. Therefore, the U-Net network has been widely used in image target segmentation, since it was proposed for medical image segmentation. Due to its excellent segmentation performance, U-Net network and its extension have also been widely used in crack segmentation [18,19,20,21,22] and rust detection in metallic constructions [23]. Hong et al. [20] introduced an attention module to the U-Net network and fused the features of skip connections to segment cracks on UAV aerial photography pavement. Jacob et al. [21] proposed an improved U-Net algorithm and introduced a data augmentation strategy to improve the accuracy of crack segmentation. A U-Net network with alternately updated clique was designed to separate cracks from the background [22].
In addition, scholars have also conducted research on crack segmentation and detection based on a combination of detection and segmentation [24,25], transformers [26], super-resolution reconstructions [27], an attention mechanism [28,29], transfer-learning technology [30], a fully convolutional network [31,32,33], deep learning and heuristic image post-processing [34]. As a result, relatively effective results have been obtained in the segmentation of various cracks.
Compared with traditional methods, the methods based on depth learning have achieved better crack segmentation performance, but due to the lack of an effective mechanism to remove polarized light, it is difficult to accurately detect cracks with an uneven brightness distribution.
Aiming at the problem of inaccurate crack detection caused by uneven illumination in images, this paper proposes a morphological network module, which performs white-top hat and black-bottom hat processing on the crack feature map to correct unevenness. Aiming at the incomplete description of cracks by the shallow and deep features of the network, a multi-scale loss fusion mechanism is used to overcome the problem of too thin or too coarse cracks extracted by single-scale detection.

3. The Proposed Crack Segmentation Method

A flow chart of the proposed crack segmentation method is shown in Figure 1, which consists of four main modules: feature extraction based on a backbone network, feature enhancement based on morphological processing, multi-scale feature fusion and multi-objective loss function calculation.
First, the crack image is inputted into the U-Net network to extract features of crack images. The U-Net network uses an encoder–decoder structure to extract features of different scales and fuses features of the same scale of encoder and decoder to make the crack feature more prominent. Second, the extracted features are inputted into the morphological-processing network, and the white-top hat and black-bottom hat transformation are performed, respectively, to correct the influence of uneven illumination on the captured crack images. Then, the output features of the U-Net network and the morphological-processing network are fused and inputted into the side network to obtain a crack prediction map at each scale. Finally, the multi-scale prediction results are fused to obtain the final crack segmentation result, and the loss function of each scale and the final prediction map loss are fused into the final loss function.

3.1. Multi-Scale Feature Extraction Based on U-Net Network

This paper selects the U-Net network as the feature extraction module. As shown in Figure 2, the U-Net network consists of the left encoding part, the right decoding part and the next two convolution and activation layers. The encoding part consists of four repeating structures, each of which consists of two 3 × 3 convolutional layers, nonlinear ReLU layers and a 2 × 2 max pooling layer, which correspond to the blue and red arrows in Figure 2, respectively. The decoding part is similar to the encoding part and also consists of four repeating structures. Deconvolution up-conv is used before each repeating structure to halve the number of channels and double the size of the feature map, corresponding to the green arrow in the figure. The deconvolution result is concatenated with the feature map of the same scale from the corresponding encoding part, which corresponds to the white/blue blocks of each repeating structure. The concatenated feature map is subjected to two 3 × 3 convolutions, which correspond to the blue arrows in the decoding part. At the last layer of the network, the feature map with 64 channels is converted into the prediction result of whether it is a crack through 1 × 1 convolution, corresponding to the cyan arrow. As shown in Table 1, the parameters of the U-Net network structure are listed.
In this paper, a pavement crack image with a size and dimension of W × H × 3 is used as the input image of the U-Net network, and the three features of the decoder part with sizes of W × H × 1 , W 2 × H 2 × 128 and W 4 × H 4 × 256 are extracted and sent to the morphological network for depolarization processing.

3.2. Morphological Network

When the camera takes a picture of the crack, different parts of the crack perhaps appear as different colors and brightness values due to the change of light and shadow and different shooting angles, which is called a polarized light phenomenon. This phenomenon will lead to the same crack being treated as different objects, thus affecting the performance of target segmentation. In order to overcome the influence of polarized light on object segmentation performance, a morphological network is designed to enhance the output features of the U-Net network to highlight the bright cracks on a dark background and the dark cracks on a bright background.
The morphological network consists of the white-top hat transformation T h a t ( I ) and the black-bottom hat transformation B h a t ( I ) :
T h a t ( I ) = I ( I W e )
B h a t ( I ) = ( I W d ) I
where I R W × H × C represents the output feature of the U-Net network, ( I W e ) represents the morphological erosion operation on I and ( I W d ) represents the morphological expansion operation on I . Here, W d R M × N × K and W e R M × N × K are dilation and erosion filters, and W × H and C represent the resolution and number of channels of the extracted features, respectively. M × N and K are the size and the number of the filters.
For k [ 1 , K ] , x [ 1 , W ] , y [ 1 , H ] , the dilation ( ) and erosion ( ) operations on the feature map I are as follows:
( I W d ) ( x , y ) = max i S 1 , j S 2 ( I ( x + i , y + j , k ) + W d ( i , j , k ) )
( I W e ) ( x , y ) = min i S 1 , j S 2 ( I ( x + i , y + j , k ) W e ( i , j , k ) )
The value ranges of S 1 and S 2 are as follows:
S 1 = [ M 1 2 , M 1 2 ]
S 2 = [ N 1 2 , N 1 2 ]
Examples of dilation and erosion operations are shown in Figure 3 and Figure 4, respectively. The erosion operation is to make the dark area in the image larger, and the dilation operation is to make the bright area in the image larger. The functions of dilation and erosion are to eliminate noise, segment independent cracks, connect adjacent elements in the image and find the maximum or minimum area in the image. Therefore, subtracting morphologically manipulated images from the original image (and vice versa) can highlight areas that are brighter or darker than that in the original image, thus correcting the effects of uneven illumination. The structure diagram of the morphological network is shown in Figure 5, which consists of parallel erosion and dilation processing layers, a subtracting operation and a concatenation operation. The K feature maps of the expansion layer are obtained by convoluting K expansion filters W d with the output features of U-Net. In the same way, the K feature maps of the erosion layer are obtained by convolving the K erosion filters W e with the output features of the U-Net. After expansion and corrosion processing, the white-top hat feature and black-bottom hat feature are obtained through difference operation. The difference feature map and morphological processing feature map are concatenated in series, weighted by linear combination and convolved by a 1 × 1 filter to obtain the output feature map of the morphological network.
In order to verify the effectiveness of the morphological network module, feature maps before and after the morphological network processing are extracted, respectively, which are shown in Figure 6. As shown in Figure 6, there is a large difference between the extracted crack shape and the ground truth data without morphological processing, and there are many false detection areas. Compared to the feature map before morphological processing, the feature map after morphological processing is closer to the ground truth. The experimental results verify the effectiveness of the morphological processing.

3.3. Side Network and Loss Function

To obtain the crack prediction result, the output feature of the morphological network on each scale is inputted to the side network for channel merging and up-sampling. The specific operations are as follows:
(1) The enhanced feature sized of W 4 × H 4 × 256 is performed a 1 × 1 convolution operation to obtain the dimension reduction feature with the number of channels being 1. Through two 2 × 2 up-sampling convolution operations, the up-sampled feature with the same size as the original map is obtained. Then, the prediction map of the first scale is obtained by the activation function processing, which is recorded as Y ^ s i d e ( 1 ) .
(2) The prediction result of the second scale is obtained after the enhanced feature sized of W 2 × H 2 × 128 is conducted in a 1 × 1 convolution operation, a 2 × 2 up-sampling and activation function processing, which is denoted as Y ^ s i d e ( 2 ) .
(3) The prediction result of the third scale is obtained by only activation function processing, which is expressed as Y ^ s i d e ( 3 ) with the size of W × H × 1 .
From the observation of the crack images, it can be seen that the size of the crack takes a small proportion in the image. That is to say, the number of negative samples used for model training is far greater than the number of positive samples. A large number of negative samples will encourage the model to ignore the learning of positive samples, which will lead to a poor prediction of positive samples and a low F1 value. For solving the sample imbalance problem in the model training, the Dice Loss is used to reduce the learning degree of simple negative samples, and thus to improve the segmentation performance. The Dice Loss is calculated as follows.
Given two sets A and B, their Dice similarity coefficient S ( A , B ) is defined as Equation (7), and its value range is [0, 1]:
S ( A , B ) = 2 | A B | | A | + | B |
where | A B | is the number of elements in the intersection of A and B, and | A | and | B | represent the number of elements in the set A and B, respectively. In this paper, A and B are predicted and true positive sample sets, respectively.
TP (True Positive), TN (True Negative), FP (False Positive) and FN (False Negative) are usually used as evaluation indicators in the binary classification problem. Here, A = T P + F P , B = F P + F N and A B = T P , so the Dice coefficient S is adjusted as Equation (8):
S ( A , B ) = 2 T P 2 T P + F P + F N
Dice Loss L d i c e is defined as follows:
L d i c e = 1 S ( A , B ) = 1 2 T P 2 T P + F P + F N
On the one hand, Dice Loss-based network learning directly uses the segmentation effect evaluation index as the loss function, and on the other hand, a large number of background pixels are ignored in the calculation of the ratio between the intersection and the union. Therefore, the Dice Loss function is chosen to solve the problem of uneven positive and negative samples of crack images and to improve the convergence speed at the same time.
In order to make full use of the multi-scale information of the crack, this paper calculates the loss of the prediction results at each scale and then fuses the loss of the final prediction result to obtain the objective loss function. The design of the multiple loss function is shown in Figure 7.
The final crack prediction result Y ^ is obtained by concatenation and 1 × 1 convolution of multi-scale prediction results. The objective loss function L d i c e s u m is the sum of the loss on all scales L d i c e s i d e and the final prediction loss L d i c e f i n a l , which is used to adjust the training of the network model until it converges to the desired value, and thus, the network model is obtained.
L d i c e s u m = L d i c e s i d e + L d i c e f i n a l
L d i c e s i d e = m = 1 M L d i c e ( Y ^ s i d e ( m ) , Y s i d e )
where Y ^ s i d e ( m ) and Y s i d e are the predicted result and the ground-truth of the m t h scale, and M is the total number of scales, which is set as three in the experiment.
Figure 8 shows the crack image, its prediction results on three scales such as Y ^ s i d e ( 1 ) , Y ^ s i d e ( 2 ) and Y ^ s i d e ( 3 ) and the final prediction result Y ^ . Y ^ is the final prediction result by performing a 1 × 1 convolution on the concatenation of the features of three scales. As can be seen from Figure 8, shallow features Y ^ s i d e ( 1 ) and Y ^ s i d e ( 2 ) pay more attention to the location information of cracks, but there is a relatively large difference between the prediction result and the ground-truth due to the down-sampling operation. Y ^ s i d e ( 3 ) is more accurate at the details of the crack image due to the combination of deep features and shallow features through cross-layer connections. The final prediction result after fusion is the closest to the ground truth, which proves the effectiveness of multi-scale prediction fusion.

4. Analysis of Experimental Results

In order to verify the performance of the proposed crack segmentation network, extensive comparisons are made on five open crack datasets, i.e., CrackTree200 [35], Crack500 [36], GAPs384 [37], AEL [38] and CFD [39]. Further, ablation experiments are used to verify the effectiveness and necessity of each module.

4.1. Experimental Dataset and Evaluation Indicators

CrackTree200 [35] is a road crack image dataset that was proposed in 2012, which includes 206 road crack images with resolutions of 800 × 600. The image is rich in shadows and occlusion, and the cracks are slender and diversified in distribution. The numbers of images in the training dataset and the test dataset are 126 and 80, respectively.
Crack500 [36] is a dataset of pavement cracks that was taken with mobile phones in 2016. Each high-definition image was cropped into 16 crack images with a size of 360 × 640. The dataset includes 1896 training images, 348 verification images and 1124 test images.
GAPs384 [37] is the German Asphalt Pavement Distress (GAPs) dataset, which contains various common pavement diseases such as cracks, potholes, patches, etc. The images are characterized by low illumination, oil stains, zebra stripes and other noise, and the image resolution is 540 × 440. The training set and test set contain 409 images and 100 images, respectively.
AEL [38] is a small dataset of crack images collected under various background environments, which consists of 47 training images and 11 test images.
CFD [39] is a widely used road crack dataset that was photographed by mobile phones, including 118 images with a size of 480 × 320, 95 of which are training images and 23 are test images. Sample labeling is performed at the pixel level.
The development environment used in the experiment included TensorFlow 1.14.0, OpenCV Python 4.5.1.48, cuda10.0.0, cudnn8.04, python 3.6.2, etc. The computing processor was an eight-core Intel Core i7-9700K CPU, and the graphics processor was a GeForce RTX 2080 SUPER.
The objective evaluation indicators used in this paper were the segmentation performance evaluation indicators ODS (Optimal Dataset Scale), OIS (Optimal Image Scale), AIU (Average Intersection over Union) [12] and sODS (simplified versions of ODS) and sOIS (simplified versions of OIS) [18]. OIS is the aggregate F measure of the best threshold in each image in the dataset, and ODS is the best F-measure on the fixed threshold dataset. AIU is the coincidence ratio between the predicted fracture area and the real fracture area. SOIS and sODS are simplified versions of OIS and ODS. The higher the value of the five evaluation indicators, the better the segmentation effect.

4.2. Results and Analysis

4.2.1. Comparison and Analysis of Subjective Results

In order to verify the better performance of the proposed algorithm, comparative experiments were conducted with the classical object detection and segmentation algorithms FCN [40], RCF [41], HED [11], FPHBN [12], DAUNet [18] and SPLAC U-Net [23]. Figure 9 shows the original crack image, the ground truth of the crack and the comparison experiment results, where the extracted crack is represented by white pixels, and the background is represented by black pixels.
It can be seen from Figure 9 that the segmentation of cracks of FCN [40] is the most incomplete, especially for the slender cracks appearing in the datasets CrackTree200 [34], AEL [38] and CFD [39]. Compared with FCN [40], the result of RCF [41] is slightly better, but the segmentation results contain a lot of noise, and the boundary of the segmented crack is also fuzzy. The segmented cracks by FPHBN [12] and HED [11] are thicker than the ground truth, and the extracted slender cracks are incomplete, that is, an intact crack is divided into broken parts. DAUNet [18] and SPLAC U-Net [23] can accurately locate cracks; however, the details of extracted cracks are not relatively obvious. As shown in the detection results from columns 3 to 9 of the fifth row in Figure 9, although all the comparison methods detect the interference object as a crack to some extent, the false detection ratio of the proposed method is the minimum. Although the crack segmentation result of the proposed method contains some noise and interference objects, it is the most competitive compared with other methods.
In order to verify the good segmentation performance for the cracks with an abruptly changed shape, Figure 10 shows the three images randomly selected from the CRACK500 [36], CFD [39] and GAPs384 [37] datasets and the segmentation results of the proposed method in this paper. The rectangular boxes in the left image and right image, respectively, represent the selected crack region in the original image and the extracted crack region in the segmentation image. It can be seen from the Figure 10 that the proposed method can easily identify the cracks at the turning, thinning and thickening points.
When the lighting conditions change during the shooting of the crack image, different parts of the same crack may show different colors and brightness values, making the target visually easily misclassified as different objects. Under polarized light, the target discrimination of the depth features extracted by U-Net network would become worse, which inevitably leads to the decline in target segmentation performance. In addition, the bias-light phenomenon also brings some challenges to manual labeling. In order to further verify the good crack segmentation performance of the proposed method in the case of bias-light, three crack images with an obvious light change, a shadow change and under low illumination were selected, respectively, for comparison experiments. Figure 11 shows the original crack images and the segmented cracks by FCN [40], RCF [41], HED [11], FPHBN [12], DAUNet [18], SPLAC U-Net [23] and our method, respectively. As shown in Figure 11, the segmentation results of FCN [40], RCF [41] and HED [11] involved a lot of noise pollution, resulting in poor segmentation performance. Compared to FCN [40], RCF [41], HED [11] and FPHBN [12], the segmentation performance of DAUNet [18] and SPLAC U-Net [23] was better. However, as shown in the sixth and seventh columns of the second row of Figure 11, cracks could hardly be detected by these two methods under low illumination. Although there were a few false alarms and missed detection, the proposed method still achieved the best crack segmentation performance under a variety of conditions. The excellent performance benefitted from the use of white top-hat transformation and black bottom-hat transformation to eliminate the influence of polarized light and enhance the features of the crack.

4.2.2. Comparison and Analysis of Objective Results

In order to verify the good performance of the proposed method on each dataset, Table 2, Table 3, Table 4, Table 5 and Table 6 show the evaluation and comparison results of the ODS, OIS, AIU, sODS and sOIS indicators of some algorithms on the five datasets. From the results in these tables, it can be seen that the FCN [40] had the worst performance on the five datasets, followed by RCF [41]. Both HED [11] and FPHBN [12] had a similar performance, which had a certain performance improvement compared with FCN [40] and RCF [41]. DAUNet [18] and SPLAC U-Net [23] had some improvements compared with the four other networks; however, our method showed absolute advantages in the five datasets. The average ODS, OIS, AIU, sODS and sOIS of the proposed method on the five crack datasets were 75.7%, 73.9%, 36.4%, 52.4% and 52.2%, respectively.

4.2.3. Comparison and Analysis of Ablation Experiments

Ablation experiments were performed to further verify the contribution of each module to the segmentation performance. Figure 12a–f show the original crack image, the ground truth and the segmentation results using only the single-objective loss function, the multi-objective loss function, only the morphological module and a combination of the multi-loss function and the morphological module, respectively. It can be seen from Figure 12 and Table 7 that the segmentation result using only the single-objective loss function was the worst, followed by multi-loss function. The segmentation result of using the morphological module and single-loss function was better, and the segmentation result of the combination of the multi-loss function and the morphological module was the best. Compared with the basic U-Net segmentation network, the segmentation performance indicators ODS, OIS, AIU, sODS and sOIS of our method were increased by 19.8%, 15.2%, 20.9%, 26.6% and 26.0%, respectively.

4.2.4. Comparison and Analysis of Computational Cost

Normally, the FLOPs (floating point operations) and the PARAMS (parameter amount of the neural network) are used to measure the complexity and size of the neural network model, respectively. The FLOPs and PARAMS of our method were about 65.23G and 32.45M, respectively. Compared to the U-Net backbone network, the FLOPs and PARAMS of the combination of the morphological-processing network and the multi-loss function module were only increased by 20.31G and 1.41M, respectively.
We further investigated the execution performance of our crack detection method. Experimental tests were performed on NVIDIA GeForce RTX 2080 SUPER. The crack segmentation ran on average at 3.0 fps, and the computational costs included reading crack images, image preprocessing, feature extraction and fusion and crack segmentation.

5. Conclusions

A novel crack segmentation method using a combination of the U-Net backbone network, a morphological-processing network and multi-loss function was proposed in this paper. Aiming at the problem that the existing depth network models lack an effective processing mechanism to eliminate the influence of polarized light, which leads to inaccurate crack segmentation, this paper has designed a morphological-processing network module composed of a white-top hat transformation and a black-bottom hat transformation. This module can correct the influence of uneven lighting and effectively extract the bright cracks on a dark background and the dark cracks on a bright background. In order to avoid the problem of too thin or too thick cracks segmented on a single scale, this paper extracts and fuses the cracks on each scale to obtain more accurate cracks. In addition, a multi-loss function was designed to enhance the robustness of the network model. The experimental results on five crack datasets prove that the proposed algorithm has absolute superiority over several classical crack segmentation algorithms.

Author Contributions

Conceptualization, F.Z.; methodology, F.Z., Y.C. and L.L.; software, L.L. and Y.C.; validation and formal analysis, Y.C.; writing—original draft preparation, F.Z.; writing-review and editing, F.Z., Y.C. and L.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Shaanxi Natural Science Basic Research Project—Joint Fund Project of Hanjiang-to-Weihe River Valley Water Diversion Project Construction under Grant 2021JLM-59, the Key R&D Project of Shaanxi Province of China under grant 2022GY-305 and the National Natural Science Foundation of China (NSFC) under grant 62273273.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets CrackTree200, Crack500, GAPs384 and CFD in this paper can be obtained from the following link: https://data.lib.vt.edu/articles/dataset/Concrete_Crack_Conglomerate_Dataset/16625056, accessed on 5 December 2022. The dataset AEL can be obtained at https://tuprd-my.sharepoint.com/:u:/g/personal/tug13683_temple_edu/ESjezwsNLERMpvY85wOEKWkBQKY1A21M1rDhLID11pyRsg, accessed on 5 December 2022.

Acknowledgments

The authors would like to thank Qian Liu for her insightful experimental verification and formal analysis that helped improve the quality of this paper. She currently works in Hanjiang-to-Weihe River Valley Water Diversion Project Construction Co., Ltd., which is located in Xi'an, China.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Han, H.; Deng, H.; Dong, Q.; Gu, X.; Zhang, T.; Wang, Y. An advanced Otsu method integrated with edge detection and decision tree for crack detection in highway transportation infrastructure. Adv. Mater. Sci. Eng. 2021, 2021, 9205509. [Google Scholar] [CrossRef]
  2. Zhu, X. Detection and recognition of concrete cracks on building surface based on machine vision. Prog. Artif. Intell. 2022, 11, 143–150. [Google Scholar] [CrossRef]
  3. Cao, J. Research on crack detection of bridge deck based on computer vision. IOP Conf. Ser. Earth Environ. Sci. 2021, 768, 012161. [Google Scholar] [CrossRef]
  4. Peng, C.; Yang, M.; Zheng, Q.; Zhang, J.; Wang, D.; Yan, R.; Li, B. A triple-thresholds pavement crack detection method leveraging random structured forest. Constr. Build. Mater. 2020, 263, 120080. [Google Scholar] [CrossRef]
  5. Strini, A.; Schiavi, L. Euclidean Graphs as Crack Pattern Descriptors for Automated Crack Analysis in Digital Images. Sensors 2022, 22, 5942. [Google Scholar] [CrossRef]
  6. Parrany, A.M.; Mirzaei, M. A new image processing strategy for surface crack identification in building structures under nonuniform illumination. IET Image Process. 2022, 16, 407–415. [Google Scholar] [CrossRef]
  7. Akagic, A.; Buza, E.; Omanovic, S.; Karabegovic, A. Pavement crack detection using Otsu thresholding for image segmenta-tion. In Proceedings of the 41st International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), Opatija, Croatia, 21–25 May 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1092–1097. [Google Scholar]
  8. Medina, R.; Llamas, J.; Gómez-García-Bermejo, J.; Zalama, E.; Segarra, M.J. Crack detection in concrete tunnels using a gabor filter invariant to rotation. Sensors 2017, 17, 1670. [Google Scholar] [CrossRef]
  9. Oliveira, H.; Correia, P.L. Automatic road crack segmentation using entropy and image dynamic thresholding. In Proceedings of the 17th European Signal Processing Conference, Glasgow, UK, 24–28 August 2009; IEEE: Piscataway, NJ, USA, 2015; pp. 622–626. [Google Scholar]
  10. Li, C.; Wen, Y.; Shi, Q.; Yang, F.; Ma, H.; Tian, X. A Pavement Crack Detection Method Based on Multiscale Attention and HFS. Comput. Intell. Neurosci. 2022, 2022, 1822585. [Google Scholar] [CrossRef]
  11. Xie, S.; Tu, Z. Holistically-Nested edge detection. In Proceedings of the IEEE International Conference on Computer Vision, Cambridge, MA, USA, 20–23 June 1995; IEEE: Piscataway, NJ, USA, 2002; pp. 1395–1403. [Google Scholar]
  12. Yang, F.; Zhang, L.; Yu, S.; Prokhorov, D.; Mei, X.; Ling, H. Feature pyramid and hierarchical boosting network for pavement crack detection. IEEE Trans. Intell. Transp. Syst. 2019, 21, 1525–1535. [Google Scholar] [CrossRef] [Green Version]
  13. Wang, J.; He, X.; Faming, S.; Lu, G.; Cong, H.; Jiang, Q. A Real-Time Bridge Crack Detection Method Based on an Improved Inception-Resnet-v2 Structure. IEEE Access 2021, 9, 93209–93223. [Google Scholar] [CrossRef]
  14. Sun, M.; Zhao, H.; Li, J. Road crack detection network under noise based on feature pyramid structure with feature enhancement (road crack detection under noise). IET Image Process. 2022, 16, 809–822. [Google Scholar] [CrossRef]
  15. Li, H.; Zong, J.; Nie, J.; Wu, Z.; Han, H. Pavement crack detection algorithm based on densely connected and deeply supervised network. IEEE Access 2021, 9, 11835–11842. [Google Scholar] [CrossRef]
  16. Wang, W.; Su, C. Convolutional neural network-based pavement crack segmentation using pyramid attention network. IEEE Access 2020, 8, 206548–206558. [Google Scholar] [CrossRef]
  17. Qiao, W.; Liu, Q.; Wu, X.; Ma, B.; Li, G. Automatic pixel-level pavement crack recognition using a deep feature aggregation segmentation network with a scSE attention mechanism module. Sensors 2021, 21, 2902. [Google Scholar] [CrossRef]
  18. Polovnikov, V.; Alekseev, D.; Vinogradov, I.; Lashkia, G.V. DAUNet: Deep Augmented Neural Network for Pavement Crack Segmentation. IEEE Access 2021, 9, 125714–125723. [Google Scholar] [CrossRef]
  19. Lau, S.L.; Chong, E.K.; Yang, X.; Wang, X. Automated pavement crack segmentation using u-net-based convolutional neural network. IEEE Access 2020, 8, 114892–114899. [Google Scholar] [CrossRef]
  20. Hong, Z.; Yang, F.; Pan, H.; Zhou, R.; Zhang, Y.; Han, Y.; Liu, J. Highway Crack Segmentation from Unmanned Aerial Vehicle Images Using Deep Learning. IEEE Geosci. Remote Sens. Lett. 2021, 19, 6503405. [Google Scholar] [CrossRef]
  21. König, J.; Jenkins, M.D.; Mannion, M.; Barrie, P.; Morison, G. Optimized deep encoder-decoder methods for crack segmentation. Digit. Signal Process. 2021, 108, 102907. [Google Scholar] [CrossRef]
  22. Li, G.; Ma, B.; He, S.; Ren, X.; Liu, Q. Automatic tunnel crack detection based on u-net and a convolutional neural network with alternately updated clique. Sensors 2020, 20, 717. [Google Scholar] [CrossRef] [Green Version]
  23. Katsamenis, I.; Doulamis, N.; Doulamis, A.; Protopapadakis, E.; Voulodimos, A. Simultaneous Precise Localization and Classification of metal rust defects for robotic-driven maintenance and prefabrication using residual attention U-Net. Autom. Constr. 2022, 137, 104182. [Google Scholar] [CrossRef]
  24. Nguyen, N.H.T.; Perry, S.; Bone, D.; Le, H.T.; Nguyen, T.T. Two-stage convolutional neural network for road crack detection and segmentation. Expert Syst. Appl. 2021, 186, 115718. [Google Scholar] [CrossRef]
  25. Kang, D.; Benipal, S.S.; Gopal, D.L.; Cha, Y.J. Hybrid pixel-level concrete crack segmentation and quantification across complex backgrounds using deep learning. Autom. Constr. 2020, 118, 103291. [Google Scholar] [CrossRef]
  26. Wang, W.; Su, C. Automatic concrete crack segmentation model based on transformer. Autom. Constr. 2022, 139, 104275. [Google Scholar] [CrossRef]
  27. Xiang, C.; Wang, W.; Deng, L.; Shi, P.; Kong, X. Crack detection algorithm for concrete structures based on super-resolution reconstruction and segmentation network. Autom. Constr. 2022, 140, 104346. [Google Scholar] [CrossRef]
  28. Zhou, Q.; Qu, Z.; Cao, C. Mixed pooling and richer attention feature fusion for crack detection. Pattern Recognit. Lett. 2021, 145, 96–102. [Google Scholar] [CrossRef]
  29. Wang, J.; Liu, F.; Yang, W.; Xu, G.; Tao, Z. Pavement Crack Detection Using Attention U-Net with Multiple Sources. In Proceedings of the Chinese Conference on Pattern Recognition and Computer Vision (PRCV), Nanjing, China, 16–18 October 2020; Springer: Cham, Germany, 2020; pp. 664–672. [Google Scholar]
  30. Nogay, H.S.; Akinci, T.C.; Yilmaz, M. Detection of invisible cracks in ceramic materials using by pre-trained deep convolutional neural network. Neural Comput. Appl. 2022, 34, 1423–1432. [Google Scholar] [CrossRef]
  31. Song, W.; Jia, G.; Jia, D.; Zhu, H. Automatic pavement crack detection and classification using multiscale feature attention network. IEEE Access 2019, 7, 171001–171012. [Google Scholar] [CrossRef]
  32. Liu, Y.; Yao, J.; Lu, X.; Xie, R.; Li, L. DeepCrack: A deep hierarchical feature learning architecture for crack segmentation. Neurocomputing 2019, 338, 139–153. [Google Scholar] [CrossRef]
  33. Islam, M.M.; Kim, J.M. Vision-based autonomous crack detection of concrete structures using a fully convolutional encoder–decoder network. Sensors 2019, 19, 4251. [Google Scholar] [CrossRef] [Green Version]
  34. Protopapadakis, E.; Voulodimos, A.; Doulamis, A.; Doulamis, N.; Stathaki, T. Automatic crack detection for tunnel inspection using deep learning and heuristic image post-processing. Appl. Intell. 2019, 49, 2793–2806. [Google Scholar] [CrossRef]
  35. Zou, Q.; Cao, Y.; Li, Q.; Mao, Q.; Wang, S. CrackTree: Automatic crack detection from pavement images. Pattern Recognit. Lett. 2012, 33, 227–238. [Google Scholar] [CrossRef]
  36. Zhang, L.; Yang, F.; Zhang, Y.D.; Zhu, Y.J. Road crack detection using deep convolutional neural network. In Proceedings of the 2016 IEEE international conference on image processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 3708–3712. [Google Scholar]
  37. Eisenbach, M.; Stricker, R.; Seichter, D.; Amende, K.; Debes, K.; Sesselmann, M.; Gross, H.M. How to get pavement distress detection ready for deep learning? A systematic approach. In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, 14–19 May 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 2039–2047. [Google Scholar]
  38. Amhaz, R.; Chambon, S.; Idier, J.; Baltazart, V. Automatic crack detection on two-dimensional pavement images: An algorithm based on minimal path selection. IEEE Trans. Intell. Transp. Syst. 2016, 17, 2718–2729. [Google Scholar] [CrossRef]
  39. Shi, Y.; Cui, L.; Qi, Z.; Meng, F.; Chen, Z. Automatic Road crack detection using random structured forests. IEEE Trans. Intell. Transp. Syst. 2016, 17, 3434–3445. [Google Scholar]
  40. Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 8–10 June 2015; pp. 3431–3440. [Google Scholar]
  41. Liu, Y.; Cheng, M.M.; Hu, X.; Wang, K.; Bai, X. Richer convolutional features for edge detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 22–25 July 2017; pp. 3000–3009. [Google Scholar]
Figure 1. Schematic diagram of crack segmentation method.
Figure 1. Schematic diagram of crack segmentation method.
Sensors 23 01127 g001
Figure 2. U-Net network structure.
Figure 2. U-Net network structure.
Sensors 23 01127 g002
Figure 3. Example diagram of a dilation operation.
Figure 3. Example diagram of a dilation operation.
Sensors 23 01127 g003
Figure 4. Example diagram of an erosion operation.
Figure 4. Example diagram of an erosion operation.
Sensors 23 01127 g004
Figure 5. Structural diagram of the morphological network.
Figure 5. Structural diagram of the morphological network.
Sensors 23 01127 g005
Figure 6. Feature map before and after the morphological network processing: (a) original image, (b) ground truth, (c) feature map before morphological processing, (d) feature map after morphological processing.
Figure 6. Feature map before and after the morphological network processing: (a) original image, (b) ground truth, (c) feature map before morphological processing, (d) feature map after morphological processing.
Sensors 23 01127 g006
Figure 7. Multi-loss function design.
Figure 7. Multi-loss function design.
Sensors 23 01127 g007
Figure 8. Side network prediction result and final prediction result: (a) original image; (b) ground truth image; (c) Y ^ s i d e ( 1 ) ; (d) Y ^ s i d e ( 2 ) ; (e) Y ^ s i d e ( 3 ) ; (f) Y ^ .
Figure 8. Side network prediction result and final prediction result: (a) original image; (b) ground truth image; (c) Y ^ s i d e ( 1 ) ; (d) Y ^ s i d e ( 2 ) ; (e) Y ^ s i d e ( 3 ) ; (f) Y ^ .
Sensors 23 01127 g008
Figure 9. Subjective effect comparison results: (a) original images; (b) ground truth; (c) FCN [40]; (d) RCF [41]; (e) HED [11]; (f) FPHBN [12]; (g) DAUNet [18]; (h) SPLAC U-Net [23]; (i) our method.
Figure 9. Subjective effect comparison results: (a) original images; (b) ground truth; (c) FCN [40]; (d) RCF [41]; (e) HED [11]; (f) FPHBN [12]; (g) DAUNet [18]; (h) SPLAC U-Net [23]; (i) our method.
Sensors 23 01127 g009
Figure 10. Detail region of crack segmentation results: (a) original images; (b) segmentation images.
Figure 10. Detail region of crack segmentation results: (a) original images; (b) segmentation images.
Sensors 23 01127 g010
Figure 11. Segmentation comparison results under polarized light: (a) original images; (b) FCN [40]; (c) RCF [41]; (d) HED [11]; (e) FPHBN [12]; (f) DAUNet [18]; (g) SPLAC U-Net [23]; (h) our method.
Figure 11. Segmentation comparison results under polarized light: (a) original images; (b) FCN [40]; (c) RCF [41]; (d) HED [11]; (e) FPHBN [12]; (f) DAUNet [18]; (g) SPLAC U-Net [23]; (h) our method.
Sensors 23 01127 g011
Figure 12. Subjective comparison results of the ablation experiments: (a) original image; (b) the ground truth; (c) the segmentation result using only a single-loss module; (d) the segmentation results using only the multi-loss module; (e) the segmentation result using only the morphological module; (f) the segmentation results with the combination of multi-loss and morphological module.
Figure 12. Subjective comparison results of the ablation experiments: (a) original image; (b) the ground truth; (c) the segmentation result using only a single-loss module; (d) the segmentation results using only the multi-loss module; (e) the segmentation result using only the morphological module; (f) the segmentation results with the combination of multi-loss and morphological module.
Sensors 23 01127 g012
Table 1. The parameter setting of U-Net network structure.
Table 1. The parameter setting of U-Net network structure.
Network Layer NameConvolutional Layer Parameter SettingsOutput Dimension
Conv1-x [ 3 × 3 . 64 3 × 3 . 64 ] W × H × 64
Pooling1 [ 2 × 2 ] W 2 × H 2 × 64
Conv2-x [ 3 × 3 . 128 3 × 3 . 128 ] W 2 × H 2 × 128
Pooling2 [ 2 × 2 ] W 4 × H 4 × 128
Conv3-x [ 3 × 3 . 256 3 × 3 . 256 ] W 4 × H 4 × 256
Pooling3 [ 2 × 2 ] W 8 × H 8 × 256
Conv4-x [ 3 × 3 . 512 3 × 3 . 512 ] W 8 × H 8 × 512
Pooling4 [ 2 × 2 ] W 16 × H 16 × 512
Conv5-x [ 3 × 3 . 1024 3 × 3 . 1024 ] W 16 × H 16 × 1024
Up-Conv1 [ 2 × 2 . 512 ] W 8 × H 8 × 512
Conv6-x [ 3 × 3 . 512 3 × 3 . 512 ] W 8 × H 8 × 512
Up-Conv2 [ 2 × 2 . 256 ] W 4 × H 4 × 256
Conv7-x [ 3 × 3 . 256 3 × 3 . 256 ] W 4 × H 4 × 256
Up-Conv3 [ 2 × 2 . 128 ] W 2 × H 2 × 128
Conv8-x [ 3 × 3 . 128 3 × 3 . 128 ] W 2 × H 2 × 128
Up-Conv4 [ 2 × 2 . 64 ] W × H × 64
Conv9-x [ 1 × 1.1 ] W × H × 1
Table 2. Segmentation performance comparison results on Cracktree200 dataset.
Table 2. Segmentation performance comparison results on Cracktree200 dataset.
MethodYearODSOISAIUsODSsOIS
FCN [40]20150.3340.3330.008N/AN/A
RCF [41]20170.2550.4870.032N/AN/A
HED [11]20150.3170.4490.040N/AN/A
FPHBN [12]20200.5170.5790.0410.0950.125
DAUNet [18]20210.7810.8050.1280.2340.276
SPLAC U-Net [23]20220.8870.8940.1810.4060.427
Our method20220.9300.9320.2160.4290.430
Table 3. Segmentation performance comparison results on Crack500 dataset.
Table 3. Segmentation performance comparison results on Crack500 dataset.
MethodYearODSOISAIUsODSsOIS
FCN [40]20150.5130.5770.379N/AN/A
RCF [41]20170.4900.5860.403N/AN/A
HED [11]20150.5750.6250.481N/AN/A
FPHBN [12]20200.6040.6350.4890.6470.591
DAUNet [18]20210.6760.7060.5650.7500.731
SPLAC U-Net [23]20220.6810.6910.5830.7460.753
Our method20220.7170.7320.5920.7740.763
Table 4. Segmentation performance comparison results on CFD dataset.
Table 4. Segmentation performance comparison results on CFD dataset.
MethodYearODSOISAIUsODSsOIS
FCN [40]20150.5850.6090.021N/AN/A
RCF [41]20170.5420.6070.105N/AN/A
HED [11]20150.5930.6260.154N/AN/A
FPHBN [12]20200.6830.7050.1730.3770.372
DAUNet [18]20210.8120.8310.3700.6030.593
SPLAC U-Net [23]20220.7930.8280.3830.5970.612
Our method20220.8280.8390.4030.6110.620
Table 5. Segmentation performance comparison results on AEL dataset.
Table 5. Segmentation performance comparison results on AEL dataset.
MethodYearODSOISAIUsODSsOIS
FCN [40]20150.3220.2650.022N/AN/A
RCF [41]20170.4690.3970.069N/AN/A
HED [11]20150.4290.4210.075N/AN/A
FPHBN [12]20200.4920.5070.0790.3190.283
DAUNet [18]20210.6150.6600.2230.4000.394
SPLAC U-Net [23]20220.6560.6790.2350.3560.370
Our method20220.7120.7580.2050.3640.307
Table 6. Segmentation performance comparison results on GAPs384 dataset.
Table 6. Segmentation performance comparison results on GAPs384 dataset.
MethodYearODSOISAIUsODSsOIS
FCN [40]20150.0880.0910.015N/AN/A
RCF [41]20170.1720.1200.043N/AN/A
HED [11]20150.2090.1750.069N/AN/A
FPHBN [12]20200.2200.2310.0810.1210.156
DAUNet [18]20210.5140.3420.2170.3490.388
SPLAC U-Net [23]20220.5730.3910.3890.4530.473
Our method20220.5970.4320.4030.4440.490
Table 7. Ablation experiment results.
Table 7. Ablation experiment results.
U-NetMorphological NetworkMulti-LossODSOISAIUsODSsOIS
0.5190.5800.3830.5080.503
0.5980.6330.4830.5590.540
0.6540.6590.5330.6600.604
0.7170.7320.5920.7740.763
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhao, F.; Chao, Y.; Li, L. A Crack Segmentation Model Combining Morphological Network and Multiple Loss Mechanism. Sensors 2023, 23, 1127. https://doi.org/10.3390/s23031127

AMA Style

Zhao F, Chao Y, Li L. A Crack Segmentation Model Combining Morphological Network and Multiple Loss Mechanism. Sensors. 2023; 23(3):1127. https://doi.org/10.3390/s23031127

Chicago/Turabian Style

Zhao, Fan, Yu Chao, and Linyun Li. 2023. "A Crack Segmentation Model Combining Morphological Network and Multiple Loss Mechanism" Sensors 23, no. 3: 1127. https://doi.org/10.3390/s23031127

APA Style

Zhao, F., Chao, Y., & Li, L. (2023). A Crack Segmentation Model Combining Morphological Network and Multiple Loss Mechanism. Sensors, 23(3), 1127. https://doi.org/10.3390/s23031127

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop