1. Introduction
Cracks are the most frequent defects in concrete structures, and their formation and propagation significantly deteriorate the safety and durability of infrastructures [
1]. The accurate detection of the location and severity of cracks is crucial for the operational safety and long-term performance of structures, which depend on the detection and quantification of crack parameters, and machine vision-based crack detection technology has been widely applied due to its high efficiency, convenience, and non-contact advantages [
2]. Traditional image processing methods (white-box methods) and artificial intelligence techniques (black-box methods) are replacing manual inspection, measurement, and analysis [
3] and have drawn considerable attention from researchers.
White-box methods have the advantages of a low computational cost, traceability, transparency, and interpretability; they do not require large volumes of crack image datasets, and they have been widely used in crack image detection for more than a decade. Wang introduced an algorithm for a multi-angle, multi-structure element morphological filter based on morphological filter techniques, and the method could preserve details in images and enhance the effectiveness of crack identification and extraction [
4]. Xu et al. improved detection accuracy using the OTSU method and by adaptively setting Canny threshold parameters [
5]. Dow et al. proposed a skeleton-marker method to remove binary noise and segment concrete cracks and achieved more reliable crack detection results than previous methods [
6]. The acquisition of concrete crack images often takes place in complex environments, including illumination variations, stains, and oil residues on the cracks’ surfaces and, sometimes, outer object occlusion. Unfavorable environmental factors may influence the value of the gray threshold, the area threshold, and the connectivity in image processing and pose threats or challenges in crack detection and parameter calculation, all of which limit the generalization capability of image segmentation [
7].
In AI-based crack segmentation methods, the deep learning method demonstrates excellent performance and has become popular as the mainstream method due to its high accuracy, robustness, and strong generalization capabilities [
8,
9,
10,
11,
12]. Liu et al. first applied U-Net, a concrete crack segmentation method optimized with the Adam algorithm, and achieved more accurate crack image segmentation than prior to its integration, with effectiveness and robustness [
8]. Xiang introduced a dual-coding network, DTrC-Net, and it outperformed other state-of-the-art segmentation networks and exhibited superior generalization in complex scenes [
7]. Su et al. proposed the CBAM-Unet algorithm in bridge crack identification, which could effectively reduce detection costs and enhance efficiency [
9]. Ren et al. presented an improved deep fully convolutional neural network named CrackSegNet with a higher accuracy and generalization capability, and it made tunnel detection and monitoring efficient and cost-effective [
10]. In another study, a combination of white-box methods and black-box methods yielded superior pixel-level segmentation results [
11]. Han et al. integrated deep learning with a digital image processing method in crack recognition [
12], and the model could automatically locate and extract cracks by means of a deep convolutional neural network combined with local threshold image segmentation. Also, this approach could precisely locate the maximum crack width’s position and calculate its width.
Deep learning-based concrete crack segmentation depends on deep convolutional neural network architecture, datasets, and evaluation metrics [
13]. The dataset is the digital basis for the data-driven image segmentation method, also called the data-dependent image segmentation method, where dataset quality and quantity are equally important. High-quality concrete crack dataset fabrication is labor-intensive and expensive and causes the scarcity of large-scale datasets, which poses a great challenge to crack semantic segmentation algorithms and limits their robustness and generalizability. As the accuracy of the deep learning-based crack segmentation method is determined by dataset quality and labeling fineness, the standardization of concrete crack image capture scene and dataset fabrication is in urgent demand, yet there are still no standards or characterization parameters for concrete crack dataset production so far.
To standardize the concrete crack imaging scene and produce a high-quality dataset, the objective of this paper is to provide more insight into the difference in concrete crack image segmentability, with images from different exposure scenes, using current dataset standardization analysis, a photoelectric principal study, scene core characterization index selection, and indoor experimental validation. This paper is organized as follows: The current pavement and concrete crack datasets are collected in
Section 2 to analyze their standardization levels. In
Section 3, the photoelectric principle in concrete crack imaging is analyzed, and equivalent exposure, as a key control index for concrete crack scene characterization, is proposed and verified using images from 21 designed indoor scenes. In
Section 4, the mean values and standard deviations of image histograms are analyzed to reveal the grayscale distribution law of concrete crack images from 50 equivalent exposure scenes, and the segmentation accuracy of images in each scene is calculated and compared to find out the optimal equivalent exposure interval for concrete crack image segmentation. Finally, this study is concluded in
Section 5.
2. Dataset Scene Standardization Analysis
As the information carrier of infrastructure cracks, crack image datasets are prerequisite for white-box and black-box methods based on crack image segmentation and are of great importance to researchers [
14,
15,
16,
17,
18,
19,
20,
21,
22]. Therefore, a series of crack image datasets of different materials, including concrete, pavement asphalt, and metals, were designed and produced. Although a number of public infrastructure crack image datasets were released, the image capture devices, infrastructure materials, image sizes, image numbers, and imaging environments of popular datasets are totally different, as list in
Table 1, which indicates that there is no standard rule for the selection of above factors.
Image capture devices could be mainly categorized into three classes including handheld, vehicle-carried, and unmanned aerial vehicle-carried camera or laser radar [
14], and the type and resolution of capture devices are also different from each other. For example, the capture devices of current public datasets in
Table 1 are, respectively, an iPhone5 [
15], a mobile phone [
20,
22], an area-array camera [
18], and a 2D laser [
19,
21]. The classification hierarchy of these devices is very coarse and indistinct, which indicates that there is no standard for crack capture device selection. It can also be seen that the resolution and image number of each dataset are different, indicating non-standardization in the image size and number of datasets.
Moreover, there are non-standardization problems in the image capture environment, image sources, and image labeling in current public infrastructure crack image datasets. In the dataset list in
Table 1, only images in Crack3238 [
7], DeepCrack [
17], and TITS [
19] are high quality labeled, and there are mislabeling and incorrectness in other datasets. The images in
Figure 1 are typical non-standardization image samples from the datasets in
Table 1. In
Figure 1a,b, the object building captured is too large, reasonable spatial resolution could not be used, and there are unnecessary background images in the visual field in the image, for example, the sky or unrelated buildings and objects. In
Figure 1c, the watermarked pavement crack image is from a network with an unrecognized spatial resolution and capture device, and in
Figure 1d, the out-of-focus pictures may have be captured due to operational issues, such as shaking of the acquisition equipment, improper shutter speed, and incorrect aperture. In
Figure 1e,f, certain cracks are left unlabeled due to overexposure.
In the theory of photography, image quality is greatly influenced by camera type, image capture scene parameters including surrounding illumination intensity, wavelength and angle, and image capture parameters such as shooting angle, shutter speed, and aperture size. Insight into the photoelectric process of concrete crack image capture is necessary to explore crack imaging scene standardization.
4. Optimal Exposure Scene Analysis
All machine vision applications are influenced by image contrast, which indicates the difference in objects and is the basis for the image processing algorithm. Contrast is the direct consequence of illumination or exposure. To explore the influence of exposure on concrete crack image capture scenes and segmentation precision, images of four concrete specimens, shown in
Figure 6a–d,were captured indoors under 50 equivalent exposures. The specimens were concrete boards with cracks except for specimen A, as shown in
Figure 6a. Considering the difficulties in controlling illumination intensity and uniformity, the exposure time control method was applied in 50 different exposure scenes, shown in
Table 4, and the exposure time was set from 50 to 2500 ms with an increment of 50 ms. For each specimen, one image was captured under each equivalent scene; in total, 200 images were captured. In the scenes controlled by exposure time, the illumination was set to a constant of 100 lx, and the image capture devices and verification zones were the same as described in
Section 3.
4.1. Concrete Imaging Mechanism Analysis
In fifty equivalent exposure scenes, the illuminance at the imaging sensor was different due to exposure time variation, and images of different gray levels were captured, as shown in
Figure 7a–h, which includes images captured in the scene with exposures of 5lx·s, 40lx·s, 75lx·s, 110lx·s, 145lx·s,180lx·s, 215lx·s, and 250lx·s, respectively.
Figure 7a indicates obvious underexposure of the image, and
Figure 7e–h indicates overexposure, so images from these exposure scenes did not have enough contrast for crack segmentation.
To clarify and quantify the gray level distribution law of the concrete images from different exposure scenes, image histograms from the first ten scenes are plotted in
Figure 8, from which the mean value
μI and standard deviation
σi of each image is calculated. The normal distribution curves are:
where
i is the scene number. The normal distribution curves of the first ten exposure scenes are plotted, in the case that overexposure existed in an image from larger equivalent exposure scenes, compared with the corresponding histograms shown in
Figure 9.
The HMV and HSD variation under the 21 scenes in
Section 3 and the 50 equivalent exposure scenes in
Table 4 are shown in
Figure 10a,b. It can be seen that the two curves of the HMV and HSD in different exposure scenes are closely correlated. In
Figure 10, the HMV curve goes up at a nearly constant gradient in the low-exposure interval until leveling off at the cutoff point in the higher exposure interval. After the cutoff exposure point, the HMV remains at the maximum constant value of 65,535.
Figure 10 also shows more exposure scenes to help represent the HMV and HSD variation with exposure precisely. For the HMV exposure curve, the cut-off exposure point is 145 lx·s in 50 scenes instead of 154 lx·s, as in
Figure 5, and for the HSD exposure curve, the cut-off exposure point is 115 lx·s in 50 scenes instead of 106 lx·s, as in
Figure 5. Also, the experiment results show that for the HMV and exposure gradient, the cut-off exposure values are specimen-dependent, depending on the reflective coefficient of the concrete surface according to Equation (6), which is not discussed in this paper.
4.2. Optimal Equivalent Exposure Analysis
In concrete crack image segmentation, there is a remarkable imbalance between concrete and the crack pixels in images, and the imbalance extent depends mainly on image size. According to dataset image size in previous research, the size of 300 px×300 pixels for image segmentation was used, and the histogram of specified size images showed a good bimodal property, which is the basis for the Otsu method and iterative thresholding-based segmentation.
Figure 11a is the original gray image,
Figure 11b is the segmented binary image produced by iterative thresholding, and
Figure 11c is the ground-truth binary image produced by manual labeling with LabelMe4.5.6 software.
In segmentation accuracy metrics, precision, recall, and the F1 score are the most popular indices and are defined as follows:
In Equations (9) and (10), true positive (
TP), false positive (
FP), true negative (
TN), and false negative (
FN) are the elements in the confusion matrix of the segmented image and the ground truth image. The F1 score [
24,
25] is defined based on precision and recall as follows:
Figure 12 illustrates the
precision and
recall of the image segmentation for three samples under 50 exposure conditions. Referring to the precision and recall values in
Table 5, it can be seen that the precision and recall are both large within an equivalent exposure range of 5–80 lx·s, suggesting effective crack segmentation using digital image processing within this range.
Figure 13 reveals a similar law in the F1 scores of three specimens in relation to equivalent exposure. The F1 scores decrease from their peaks downward by 0.5 with the increase in exposure, where the optimal exposure is also 5~80 lx·s. In
Figure 13, the largest F1 scores, indicating the most effective distinction between concrete and cracks, are achieved also within this exposure interval, where the scene is also the same as that marked by a red cross in
Figure 12.
Figure 13 indicates that a higher exposure level adversely affects crack detection accuracy. Moreover, the precision–recall curve and the F1 score of the three specimens show that the segmentation accuracy properties are specimen-dependent even in the same exposure scene, with the same image processing or segmentation method. The property that segmentation accuracy depends on specimen is called segmentability, which needs further exploration.
To be more specific,
Table 5 presents the precision, recall, and F1 score values for the first 30 groups of images in the equivalent exposures scenes for specimen D. it can be seen that the F1 score values gradually decrease with the increase in equivalent exposure, and the highest value reaches 96.3% in the exposure range of 5–50 lx
s.
Metric comparisons between the iterative thresholding segmentation method and the ground truth of three specimens are plotted in
Figure 14, and the
TP,
FP,
FN, and
TN areas are colored with numbers. Panel (a) displays the original images with real cracks and concrete pixel numbers, and panels (b) and (c) present the segmentation maps under the low-exposure scenes of 5 lx
s and 10 lx
s. It can be seen that low exposure reduced the contrast of the concrete crack images and resulted in false positives (
FPs) and lead to segmentation error. Panels (d) and (e) show the optimal segmentation maps of the scenes achieving the highest F1 score, and the equivalent exposure interval is in the range of 5~50 lx
s. Panels (f) and (g) show the segmentation result under the high-equivalent exposure scenes of 245 lx
s and 250 lx
s. The accuracy of crack segmentation decreases in the high-exposure intervals, as higher exposure induces an increase in false negatives (
FNs), midsegments part of the cracks as concrete, and causes cracks to appear thinner. The experimental results show that equivalent exposure is an important factor in characterizing the concrete image capture scene, and 5~50 lx
s is the optimal equivalent exposure interval for the best segmentation accuracy.
From
Figure 14, it can be seen that, within low-exposure intervals, a minor number of false positives (
FPs) frequently occur in crack image segmentation because the gray value of the cracks is proximate to that of concrete. Conversely, within high-exposure intervals, cracks are frequently misclassified as concrete and result in false negatives (
FNs).
5. Conclusions
To standardize the concrete crack image capture scene and produce a high-quality concrete crack image dataset, this paper presents exposure as the scene characterizing parameter for concrete crack image capture based on a standardization investigation, the analysis of current public concrete crack segmentation dataset, and a photoelectric principal analysis. Through equivalent exposure scene design and validation, the optimal exposure interval is proposed in 50 scenes. The main conclusions of this paper are drawn as follows:
(1) The analysis of present publicly accessible datasets showed that non-uniform image capture devices, spatial resolution, image size and number, mislabeling, inappropriate spatial resolution or unnecessary backgrounds, sourced images, out-of-focus images, and motion blur are frequent problems. Thus, the standardization of concrete crack image acquisition scenes is a great challenge for high-precision concrete detection.
(2) Based on the photoelectric principle of the concrete crack imaging process, equivalent exposure was taken as the scene characterization parameter for machine-vision-based infrastructure crack detection. Twenty-one equivalent exposure scenes were designed, and the law of image histograms, mean values, and standard deviations were analyzed to validate the effectiveness of equivalent exposure or the equivalence of exposure time control and illumination control in crack detection.
(3) Concrete crack segmentation of images from 50 equivalent exposure scenes revealed that the highest segmentation precision happened within the 5–50 lx·s equivalent exposure interval, and the F1 score could reach 96.3%. In addition, high exposure was detrimental to concrete crack detection.
This paper found that the standardization of the concrete crack image scene was significant. Optimal equivalent exposure was the core characterization index, which could help to enhance crack segmentation accuracy. The factors that constitute the concrete crack image scene were multiple and complex. In this paper, the experimental specimen number was limited, the illumination was uniform, and the validation was an experiment conducted indoors. More outdoor or onsite experiments should be carried out on more real infrastructure members in complex illumination environments. In addition, the comprehensive understanding of concrete crack imaging scene still needs further exploration. At the same time, the dataset quality evaluation method and characteristic index are not totally clear and should be further explored, and there are factors that influence the quality of datasets, like imaging spatial resolution and image size, which still need more investigation and validation with deep learning algorithms.