LCSNet: Light-Weighted Convolution-Based Segmentation Method with Separable Multi-Directional Convolution Module for Concrete Crack Segmentation in Drones

Zhang, Xiaohu; Huang, Haifeng

doi:10.3390/electronics13071307

Open AccessArticle

LCSNet: Light-Weighted Convolution-Based Segmentation Method with Separable Multi-Directional Convolution Module for Concrete Crack Segmentation in Drones

by

Xiaohu Zhang

and

Haifeng Huang

^*

School of Electronic and Communication Engineering, Sun Yat-sen University, Guangzhou 510275, China

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(7), 1307; https://doi.org/10.3390/electronics13071307

Submission received: 10 March 2024 / Revised: 28 March 2024 / Accepted: 29 March 2024 / Published: 31 March 2024

Download

Browse Figures

Versions Notes

Abstract

Concrete cracks pose significant safety hazards to buildings, and semantic segmentation models based on deep learning have achieved state-of-the-art results in concrete crack detection. However, these models usually have a large model size which is impossible to use in drones. To solve this problem, we propose a Light-Weighted Convolution-Based Segmentation Method with a Separable Multi-Directional Convolution Module (LCSNet). In our proposed method, light-weighted convolution is used to substitute all traditional convolutions. In addition, a light-weighted structure named a Separable Multi-Directional Convolution Module (SMDCM) is used to substitute traditional parallel structures or attention modules to learn contextual or detail features. Thus, the ability to extract the contextual feature information of the model can be retained while the computational complexity is largely reduced. Through these two improvements, the model size of the proposed model can have a lower computational complexity. The experimental results show that our proposed LCSNet can achieve accuracies of 94.2%, 83.6%, 99.2%, and 83.3% on the Cracktree200, CRACK500, CFD, and RECrack datasets, respectively, which are higher than those of traditional models. However, the model size of our LCSNet is only 2M.

Keywords:

artificial intelligence; machine learning; deep neural network; image segmentation application; crack segmentation

1. Introduction

In geologically active areas, concrete may undergo various cracks after frequent geological activities. If these cracks are not repaired in a timely manner, this could lead to safety hazards. Previously, the detection of concrete cracks was mainly performed by manual detection, which was inefficient and costly. Therefore, the automatic detection of concrete cracks is very important. With the rise of computer vision technology in recent years, research on the automatic detection of concrete cracks has also been enriched, greatly improving the efficiency of the automatic detection of concrete cracks.

The earliest method was to use digital signal processing algorithms for concrete crack detection. By de-noising the original image and setting a series of manual thresholds, computer crack detection could be effectively achieved. For instance, Salloo et al. [1] proposed a segmentation method based on a grayscale threshold, which segments cracks according to differences in the characteristics of cracks and non-cracks. Tanaka et al. [2] proposed a morphological method to detect cracks. By combining morphological operators such as dilation and corrosion, their method can extract the shape and structure of cracks. Thus, it can effectively detect cracks. Cheng et al. [3] discovered that the gray value of cracks in concrete images is lower than that of the environmental background, and the pixels of cracks are more continuous than those of the environmental background. Through these two characteristics, they designed a method based on fuzzy logic which can distinguish the background easily. Zou et al. [4] proposed a road crack detection algorithm based on wavelets. Their method decomposes the original image and de-noises images at each level of scale, and then recombines the images, which can improve crack detection accuracy. However, these crack detection methods based on digital signal processing algorithms rely on handcrafted features, which are difficult to design.

Along with developments in deep learning, researchers began to use deep-learning-based methods in crack detection tasks, since these methods could extract high-level features automatically. Early deep-learning-based methods mainly used CNN to achieve simple crack classification [5]. For example, Junfang [6] proposed a crack classification model based on CNN. Their model consists of seven layers: the input layer, the convolutional layers, the pooling layers, the fully connected layers, and the output layers. The results showed that their proposed CNN-based crack classification method can obtain good classification results. However, these classification-based deep learning methods could not segment cracks clearly. Thus, researchers began to adopt deep-learning-based pixel-level semantic segmentation technology to explore precise crack detection. The advantage of semantic segmentation is that it can extract cracks in images pixel by pixel. Thus, semantic segmentation models, including encoding decoding structures such as FCN [7], U-Net [8], and DeepLab [9], have been continuously proposed in the detection of concrete cracks. For example, Sen [10] proposed an FCN based on a crack detection model. In his model, firstly, data augmentation is used to generate different resolutions of images. Secondly, training datasets are input into his proposed deep FCN model for feature extraction and encoding. Finally, de-convolution in the top layers is added to reconstruct the features and output the final segmentation results. However, on the one hand, the segmentation results obtained by FCNs are not precise enough, and on the other hand, FCNs do not fully consider the relationships between pixels when classifying each pixel. To solve these problems, Liu’s [11] U-Net adopts a left–right symmetric network structure, and the network layers for encoding and decoding processes are basically the same. In addition, U-Net adds multiple down-sampling and up-sampling modules on the basis of the FCN network. Down-sampling enables the model to compress input image features, retain key information, and act as an encoder. Multiple up-samplings can gradually restore feature maps and reduce feature loss. More importantly, U-Net uses skip connections to fuse the pixel-level features with the semantic-level features of the image. Therefore, Jiang [12] presented an extended version of the U-Net framework, named MSK-UNet. Specifically, the U-shaped network structure is chosen as the framework to extract more hierarchical representations. Also, he introduces selective kernel (SK) units to replace U-Net’s standard convolution blocks to obtain the receptive fields with distinct scales.

With the development of low-cost drone technology, it has become possible for drones to automatically detect cracks. The benefits of drone automatic crack detection are as follows. Firstly, compared to vehicles, drones are inexpensive, cost-effective, and suitable for large-scale deployment. Secondly, the group control technology of drones is currently very reliable and can deploy a large number of drones for detection at the same time.

However, the model size of current crack segmentation models is very large, usually larger than 50 M. But embedded systems on low-cost drones typically do not have GPUs, so they can only rely on low-performing CPUs for neural network calculations. Moreover, the storage space of these embedded systems is also very small, usually less than 10 M [13], so the current models are completely unable to run on these embedded systems. In order to deploy crack segmentation models on embedded systems, we propose a light-weighted crack segmentation model, the Light-Weighted Convolution-Based Segmentation Method with a Separate Multi Directional Convolution Module (LCSNet). The main contributions of this study include the following:

(1): To reduce the computational complexity of convolutional layers, we propose a light-weighted convolution named the LConv layer, which divides traditional convolution into a spatial feature extraction part and a channel feature extraction part. In the spatial feature extraction part, we evenly divide features into five groups based on the channel. Only one large filter is used to extract features in each group. A large number of 1 × 1 filters are used to extract channel features in the channel feature extraction part.
(2): A Separable Multi-Directional Convolutional Module (SMDCM) is used to substitute parallel modules or attention modules to extract the contextual or detail feature information of the model. However, the computational complexity of the SMDCM is much smaller than that of parallel modules or attention modules.
(3): Public crack datasets are very small; moreover, the ambient light and texture in these crack pictures do not strongly interfere with cracks. This study creates a new dataset named RECrack, which contains a large number of irregular cracks with a variety of environmental and textural interferences.

2. Materials and Methods

2.1. The Structure of the LCSNet

In this paper, we design a Light-Weighted Convolution-Based Segmentation Method (LCSNet). As shown in Figure 1, our model is divided into the encoder stage and the decoder stage. In the encoder stage, stacked Light-Weighted Convolution (LConv) is used to extract high-level features. This is different from traditional convolutions, which use a large number of large filters for features in every channel. In LConv, features are divided into five groups by channel and LConv only uses a single large convolution filter for each group of features. Thus, LConv could reduce a large number of parameters in convolution layers. Also for the purpose of enhancing the ability to extract contextual feature information without increasing too many parameters, we propose a Separable Multi-Directional Convolution Module (SMDCM) to substitute the traditional parallel module or attention module. Specifically, SMDCM could enlarge the receptive fields from different directions with a single convolution filter. In the decoder stage, up-sampling is used to enlarge the size of the features while decoding features. A detail description of these modules is illustrated as follows.

2.2. The Lightweighted Convolution

To reduce the computational complexity of the convolutional layers [14], we propose a light-weighted convolution named the LConv layer, which divides the traditional convolution into a spatial feature extraction part and a channel feature extraction part. In the spatial feature extraction part, we decompose features by channel into five groups, and the same convolution filter is shared in each group, as shown in Figure 2. Firstly, the input features are fed into the AverageSplit layer, where they are evenly divided into five groups (C1, C2, C3, C4, and C5) based on the channel. Then, each group of features is calculated with a single convolution kernel to extract spatial features. In order to extract strip features of cracks, we design a 3 × 1 size convolution kernel and a 1 × 3 size convolution kernel for two groups of features, respectively. Also, we design a 7 × 1 size convolutional kernel and a 1 × 7 size convolutional kernel for another two groups of features, respectively. It is worth noting that each group of features only has one convolution kernel for calculation, which greatly reduces the complexity of the convolution calculation. After completing the convolution calculation, we input the output features into the Shuffle layer to randomly mix the output features. Finally, the randomly mixed features are input into a convolutional layer containing a large number of 1 × 1 convolutional kernels to extract channel features to calculate the final output features.

The calculation of the whole process of the LConv is shown as follows:

\begin{array}{l} F_{2} (C 1, C 2, C 3, C 4, C 5) = A v e r a g e S p l i t (F_{1}, 5) \\ F_{3_1} = S i n g l e C o n v (C 1, 3 * 3) \\ F_{3_2} = S i n g l e C o n v (C 2, 3 * 1) \\ F_{3_3} = S i n g l e C o n v (C 3, 1 * 3) \\ F_{3_4} = S i n g l e C o n v (C 4, 7 * 1) \\ F_{3_5} = S i n g l e C o n v (C 5, 1 * 7) \\ F_{4} = S h u f f l e (F_{3_1}, F_{3_2}, F_{3_3}, F_{3_4}, F_{3_5}) \\ F_{5} = C o n v (F_{4}, 1 * 1) \end{array}

(1)

where SingleConv(x) represents the spatial convolution for each group of features with only one filter, and shuffle(x) represents the Shuffle operation. Conv(x,1 × 1) represents the 1 × 1 channel convolution.

To illustrate the effectiveness of our LConv, we also compare the computational complexity of the LConv with traditional convolution, and the number of convolution filters is set to N.

The parameters in LConv can be calculated as follows:

(3 * 3 + 3 * 1 + 1 * 3 + 7 * 1 + 1 * 7) + N * 1 = N + 29

(2)

The parameters in traditional convolution can be calculated as follows:

3 * 3 * N = 9 N

(3)

In convolution layers, the number of convolution filters N is often set to 128 or 256, thus, we can see that the computational complexity of (2) is 10 times smaller than (3). Therefore, we can see that, compared with traditional convolution, LConv could reduce a large number of parameters.

Our LConv decomposes multiple large convolutional filters in a regular convolution into two parts, using very few large convolutional filters for group convolution and multiple 1 × 1 convolutional filters for channel convolution. This way, due to the extremely low computational complexity of the 1 × 1 convolutional filters, the overall computational complexity of LConv is extremely low. In addition, the spatial features extracted by large convolutional filters in current ordinary convolutions are often repetitive, resulting in redundancy. Our LConv extracts spatial features and channel features separately, reducing the extraction of redundant features. More importantly, we use convolutional filters of different shapes in spatial feature extraction and rearrange the output features of these convolutional filters. This operation effectively mixes the spatial features of the convolutions. Therefore, LConv would not affect the distribution of high-level features. Thus, our LConv can reduce the number of parameters while maintaining the feature extraction performance.

2.3. The Separable Multi-Directional Convolution Module

Due to the fact that cracks are usually irregular, elongated shapes, traditional convolutions find it difficult to extract contextual or detail feature information from these elongated features. Although parallel designed modules [15] or attention blocks [16] can solve these problems, these modules have a high computational complexity. Inspired by the design of deformable convolution [17], we propose a light-weighted deformable convolution to increase the receptive field of traditional convolutions. Thus, we propose a Separable Multi-Directional Convolutional Module (SMDCM). The SMDCM is divided into two parts; the first part is the Multi-Directional Convolution and the second part is 1 × 1 channel convolution.

(1): Multi Directional Convolution

For traditional convolution, assuming w_c,m,n are the kernel weights, x is the input feature, 2 × d + 1 is the size of the receptive field, and p is the number of channels. The calculation formula is:

y_{j, k} = \sum_{m = - d}^{d} \sum_{n = - d}^{d} w_{m, n} * x_{j + m, k + n}

(4)

The Multi-Directional Convolution we propose is slightly different from traditional convolution. Except for regions with traditional convolution, convolution calculations are performed on the upper-left, upper-right, lower-left, and lower-right offset regions of each feature.

The formula for calculating the convolution in the upper-left offset region is:

y_{j, k} = \sum_{m = - 2 d}^{0} \sum_{n = - 2 d}^{0} w_{m, n} * x_{j + m, k + n}

(5)

The formula for calculating the convolution in the upper-right offset region is:

y_{j, k} = \sum_{m = 0}^{2 d} \sum_{n = - 2 d}^{0} w_{m, n} * x_{j + m, k + n}

(6)

The formula for calculating the convolution in the lower-left offset region is:

y_{j, k} = \sum_{m = - 2 d}^{0} \sum_{n = 0}^{2 d} w_{m, n} * x_{j + m, k + n}

(7)

The formula for calculating the convolution in the lower-right offset region is:

y_{j, k} = \sum_{m = 0}^{2 d} \sum_{n = 0}^{2 d} w_{m, n} * x_{j + m, k + n}

(8)

The convolution calculation in the central direction is traditional convolution. Compared to ordinary convolution, we perform convolutions in multiple directions of each pixel, expanding the receptive field of features and extracting richer contextual information from multiple directions. It can be seen that, for each pixel on the feature map, the same convolution kernels are used to extract the regional features in the upper-left, upper-right, lower-left, lower-right, and center directions. Therefore, compared to traditional convolution, our multi-directional convolution module can increase the receptive field without increasing the number of parameters.

(2): Separable Multi-Directional Convolutional Module (SMDCM)

The Separable Multi-Directional Convolutional Module (SMDCM) is shown in Figure 3.

Firstly, the input features are fed into the AverageSplit layer, where they are evenly divided into five groups (C1, C2, C3, C4, and C5) based on the channel. Then, each group of features is then input into the Multi-Directional Convolution to learn contextual or detail feature information. To reduce the parameters in the convolution, it is noticed that there is only a single convolution filter for each group of features in the Multi-Directional Convolution for spatial feature extraction. The results generated from the Multi-Directional Convolution are shuffled and input into the 1 × 1 channel convolution. In the channel convolution, there are a large number of 1 × 1 convolution filters used for the channel feature transformation. The calculation of the whole process of the SMDCM is shown as follows:

\begin{array}{l} A_{2} (C 1, C 2, C 3, C 4, C 5) = A v e r a g e S p l i t (A_{1}) \\ A_{3}, A_{4}, A_{5}, A_{6}, A_{7} = M D C (A_{2}) \\ A_{7} = s h u f f l e (A_{3}, A_{4}, A_{5}, A_{6}, A_{7}) \\ A_{8} = C o n v (A_{7}, 1 * 1) \end{array}

(9)

where MDC(x) represents the Multi-Directional Convolution, and shuffle(x) represents the Shuffle operation. Conv(x, 1 × 1) represents the 1 × 1 channel convolution.

The parameters in SMDCM can be calculated as follows:

(3 * 3 * 5) + N * 1 = 45 + N

(10)

The parameters in traditional convolution can be calculated as follows:

3 * 3 * N = 9 N

(11)

In convolution layers, the number of convolution filters N is often set to 128 or 256, thus, we can see that the computational complexity of (10) is 10 times smaller than (11). Therefore, we can see that, compared with traditional convolution, the SMDCM can reduce a large number of parameters.

In summary, our SMDCM also decomposes multiple large convolutional filters in ordinary convolutions into two parts, using very few large convolutional filters for spatial feature extraction and multiple 1 × 1 convolutional filters for channel feature extraction. This method effectively reduces the number of convolutional parameters. More importantly, we use Multi-Directional Convolution to extend the large convolutional filter for peripheral feature extraction, effectively increasing the receptive field of the model. Therefore, our model enhances its ability to extract the contextual features of cracks, which is highly helpful for extracting irregular crack features.

2.4. Our Collected Dataset

Due to the limited number of publicly available crack datasets based on semantic segmentation, and the low level of interference in these datasets, they differ significantly from the real environment. Therefore, we created a dataset ourselves, named the RECrack dataset. We used a vehicle equipped with high-definition cameras to capture road cracks. In order to facilitate the collection of road crack data, we installed 6 cameras on the vehicle to collect crack data from different directions. We collected data in multiple scenarios, such as schools, parking lots, sidewalks, highways, and internal factory roads. In addition, we collected data in different environments such as on sunny, rainy, and shaded days. These data satisfy our research on crack segmentation in different interference environments, and are closer to the real environment, which is beneficial for our algorithm to be directly applied to crack detection equipment. Our entire dataset consists of approximately 10,000 crack images. Here are some examples of images from our RECrack dataset, which is shown in Figure 4.

3. Datasets and Experimental Setup

3.1. Datasets

Table 1 shows the splitting of the four datasets. Detailed descriptions are as follows:

The Cracktree200 dataset [18]: This dataset was collected by experimental drones for pavement road crack segmentation, and contains several classes of cracks such as shadow, occlusion, low contrast, and noise etc.

The CFD dataset [19]: This dataset was collected by a camera installed on an airplane for road crack segmentation. It contains many low-contrast crack images.

The Crack500 dataset [20]: This dataset includes 3020 images collected by Temple University, mainly captured on campus by students. It has two kinds of size, 1440 × 2560 and 2560 × 1440.

The RECrack dataset: This dataset was collected by us in multiple scenarios such as schools, parking lots, sidewalks, highways, and internal factory roads. Different from other crack segmentation datasets, this dataset covers various weather conditions and illumination conditions, etc. Thus, its interference is more complex than that of other crack segmentation datasets.

3.2. Experimental Setup

In the experiments we designed, all images were normalized and augmented before being input into our model. In our proposed SMDCM, we set the number of convolution filters for the 1 × 1 convolution of the LConv as 128, and the number of convolution filters for the 1 × 1 convolution of the SMDCM as 256. Also, Stochastic Gradient Descent [21] (SGD) was used as the training policy to train our model and E-Focal Loss was used as the loss function. In addition, we used accuracy, recall, and F1 measures as the evaluation criteria.

4. Results

4.1. Comparison with the State-of-the-Art Methods

In order to evaluate the performance of our proposed LCSNet, we designed several experiments. As for comparison, segmentation accuracy, recall, and F1 measure were used as the main indicators in our experiments. In addition, we used several mainstream crack detection methods proposed by other researchers as our baseline. Explanations of these methods are described as follows:

ConvNet: a deep-convolution-based segmentation neural network, this is the basic convolution-based segmentation model.

U-Net proposed by Di: a U-Net-based network proposed by Alessandro not pre-trained with ImageNet for road crack segmentation. In order to make a fair comparison, the ResNet backbone of this model is not pre-trained by ImageNet.

DWTA-U-Net: a U-Net-based network with discrete wavelet transformed image features for concrete crack segmentation.

CrackW-Net: a ResU-Net-based CNN for pavement crack segmentation proposed by Han.

Split-Attention Network: a channel-wise attention-based network.

DMA-Net: DeepLab with Multi-Scale attention for pavement crack segmentation proposed by Sun.

ACAU-Net: an atrous convolution and attention U-Net model for pavement crack segmentation proposed by Feng.

Cascaded Attention DenseU-Net: an attention-based network with global attention and core attention for road crack detection.

ECA-Net: a light-weighted channel attention-based convolution neural network.

FU-Net: a generative-adversarial-networks-based U-Net for road crack segmentation proposed by Gao.

Two-stage-CNN: a two-stage CNN for road crack detection and segmentation proposed by Nhung.

PSNet: a Parallel-Convolution-Based U-Net for Crack Detection with a Self-Gated Attention Block proposed by Zhang.

PHCNet: a Pyramid Hierarchical-Convolution-Based U-Net for Crack Detection with a Mixed Global Attention Module and Edge Feature Extractor proposed by Zhang.

As shown in Table 2, we can see that our LCSNet achieved the best accuracy, recall, and F1 measure compared with these baseline public datasets. Also, we can see that our LCSNet has the smallest model size. Therefore, this proves the effectiveness of our proposed model.

Firstly, LConv decomposes the convolution filters into a spatial feature extraction part and a channel feature extraction part. Single large spatial convolution filters are used for each group of features. In the channel feature extraction part, a large number of 1 × 1 convolution filters are used for channel feature extraction. Thus, this structure not only retains the function of multi-channel feature extraction, but also largely reduces the parameters in the convolution layers. In addition, the SMDCM can improve the model’s ability to extract contextual or detail feature information with far fewer parameters than traditional parallel modules. Therefore, our LCSNet could obtain a higher accuracy with a very small model size.

In addition, from Table 2a, it is worth noting that our model performed best on the RECrack dataset. The reason for this is that there exists a large amount of shadow interferences on the RECrack dataset, and these shadow interferences interfere with crack detection. Moreover, the RECrack dataset has various crack styles, resulting in the whole dataset being more challenging.

We also compared attention-based models with U-Net extended models, and we could see that these attention-based models, such as Two-stage-CNN, FU-Net, ECA-Net, ACAU-Net, DMA-Net, and Split-Attention Network, achieved a better performance, because adding an attention mechanism to models can improve the representation ability of these models, while effectively reducing the interference of marble stripes on cracks and reducing the false detection of cracks, thereby improving the overall segmentation accuracy of the model. Compared with original U-Net models, extended U-Net models, such as CrackW-Net and DWTA-U-Net, could achieve a better accuracy, because these extended U-Nets use a backbone extractor with stronger feature extraction capabilities, which can extract the irregular features of cracks more effectively. Compared with fully convolution neural networks such as ConvNet, U-Net models can achieve a higher performance, since U-Nets add down-sampling and up-sampling modules on the basis of FCNs. This dow- sampling can enable the model to compress input image features, retain key information, and act as an encoder to better encode features.

Here, the specific segmentation results of LCSNet are shown in Figure 5, which were taken from the CRACK500 datasets. The regions marked by the red dotted boxes represent the differences in our results. Also, we compare some detection results of our LCSNet with those of state-of-the-art methods. It can be seen that LCSNet shows better segmentation results.

4.2. Effects of Using Different Numbers of Filters in LConv

For the purpose of evaluating the effect of using different numbers of filters in LConv, we conducted an experiment.

As shown in Figure 6, we can see that adjusting the number of filters in LConv of our proposed LCSNet largely affected the model’s segmentation accuracy. It can be seen that the accuracy of the model improved quickly at first and then decline slowly. The reasons are as follows:

Firstly, the number of filters determines the complexity of the convolutional layers and the feature representation capability of the model. More channel filters means that the network can learn more diverse features, and leads to better performance.

Secondly, these filters can significantly increase the nonlinear transformation ability while keeping the scale of the feature map unchanged. These nonlinear transformations can help the model to capture the complex features of images, thereby improving the model’s ability to fit different cracks.

However, having too many filters would increase the number of parameters in the model, making it more complex and potentially increasing the risk of overfitting. Moreover, these filters might lead to redundancy in feature extraction, meaning that the features extracted by multiple filters are highly correlated, which would reduce the model’s generalization ability.

Thus, the number of filters in LConv is set to 256 in our proposed LCSNet.

4.3. Effects of Using Different Sizes of Filters in Multi-Directional Convolution in SMDCM

For the purpose of evaluating the effect of using different sizes of filters in Multi-Directional Convolution in SMDCM, we conducted an experiment.

As shown in Figure 7, we can see that adjusting the sizes of the filters in Multi-Directional Convolution in the SMDCM of our proposed LCSNet would largely affect the model’s segmentation accuracy. It can be seen that the accuracy of the model improves quickly at first and then declines slowly. The reasons are as follows:

The size of convolution filters is an important parameter that affects the receptive field, feature extraction ability, and model complexity of the model. If the size of the convolution filter is too small, the model would only capture local detail information and not capture the overall semantic features. In addition, smaller convolution filters typically require more convolution layers to increase their receptive fields and extract higher-level features, which would increase the complexity and computational cost of the model.

Larger convolution filters might excessively smooth images or features, resulting in less accurate feature extraction. It may blur detailed information and reduce the sensitivity of the model to subtle changes. Additionally, larger convolution filters might contain redundant feature information and might contain similar features, leading to feature redundancy and increasing the model complexity.

Thus, the size of the filters in Multi-Directional Convolution in SMDCM is set to 5 × 5 in our proposed LCSNet.

4.4. Comparison of Different Light-Weighted Segmentation Models

For the purpose of evaluating the effect of our light-weighted models, we also compare some other light-weighted segmentation models with our proposed LCSNet.

As shown in Table 3, we can see that our LCSNet obtained the best accuracy. This was because, firstly, LCSNet uses strip convolutions such as (1 × 3 size kernel, 3 × 1 size kernel, 1 × 7 size kernel, and 7 × 1 size kernel) in LConv, and these strip convolutions not only reduce the computational complexity in traditional convolutions, but they can also more properly adapt elongated cracks. But these light-weighted models usually use squared convolution filters, thus, these models cannot adapt elongated cracks properly. Secondly, the SMDCM substitutes parallel modules or attention modules to learn detail or contextual feature information, which improves the model’s capability without increasing many parameters.

4.5. Effects of Using Different Sizes of the LCSNet

For the purpose of evaluating the effect of using different sizes of the LCSNet, we conducted an experiment.

From Table 4, we can see that, as the depth of the model increases, our accuracy also improves slightly, but the computational complexity and storage size of the model also increase accordingly. The reasons are as follows:

A deeper CNN model can provide stronger feature extraction capabilities by increasing the number of layers and parameters, capturing more complex and abstract features, thereby improving the performance and accuracy of the model. In addition, by increasing the number of layers, a deeper CNN model can reduce the loss of information during transmission, and deep layers can provide more nonlinear transformations, enabling the model to better process crack images. Finally, deep CNN models can gradually expand the receptive field by stacking multiple convolutional layers, providing a better spatial perception ability and capturing a larger range of contextual information, which helps to improve the performance of the model.

However, the matrix-computing capability of embedded chips on low-cost drones is currently very weak, and the storage space for models on these chips is also limited. Therefore, we need to make a balanced choice between performance and cost. Thus, we only use 36 LConvs in LCSNet, and the model size of the LCSNet is 2M.

4.6. Effects of Using Different Loss Functions

For the purpose of evaluating the effect of using loss functions, we conducted an experiment.

As shown in Table 5, we can see that the E-Focal Loss obtained the best accuracy. This is because E-Focal Loss introduces a category-related modulation factor into Focal Loss. This modulation factor has two dynamic factors (a focusing factor and weighting factor), which independently handles positive and negative imbalances of different categories. Firstly, the focusing factor determines the learning concentration of hard positive samples based on the degree of imbalance of their corresponding categories. Then, the weighting factor increases the influence of rare categories, ensuring that the loss contribution of rare samples is not overwhelmed by frequent samples. Compared with Focal loss with Cross Entropy Loss and Mean Square Error Loss, we can see that Focal Loss obtained the best result, because there exists a class imbalance in crack images, so Focal Loss uses a parameter to adjust this imbalance.

5. Conclusions

With the development of low-cost drone technology, it has become possible for drones to automatically detect cracks. However, the model size of current crack segmentation models is very large, usually larger than 50 M. But embedded systems on low-cost drones typically do not have GPUs, thus, current models cannot be used on these embedded systems.

To solve the above problem, in our research, we propose a Light-Weighted Convolution-Based Segmentation Method with a Separable Multi-Directional Convolution Module (LCSNet). In our proposed method, light-weighted convolution is used to substitute all traditional convolutions. In LConv, features are divided into five groups by channel and LConv only uses a single large convolution filter for each group of features. Thus, LConv can reduce a large number of parameters in convolution layers. In addition, a light-weighted structure named Separable Multi-Directional Convolution Module (SMDCM) is used to substitute a traditional parallel structure or attention module to learn contextual or detail features. Thus, the ability to extract the contextual feature information of the model can be retained while the computational complexity is largely reduced. Through these two improvements, the model size of the proposed model has a lower computational complexity.

In addition, current public crack datasets do not contain complex cracks from real environments; thus, we created a new dataset named RECrack, which contains a large number of irregular cracks with a variety of environmental and textural interferences.

The experimental results showed that our proposed LCSNet could achieve accuracies of 94.2%, 83.6%, 99.2%, and 83.3% on the Cracktree200, CRACK500, CFD, and RECrack datasets, respectively, which are higher than those of traditional models. Additionally, the recall and F1 measure of our model showed the best performances. However, the model size of our LCSNet is only 2 M.

Also, we compared our LCSNet with some state-of–the-art light-weighted segmentation models. The experimental results showed that our LCSNet obtained the highest accuracy, which proves the effectiveness of our methods.

Drones can quickly cover large areas and complete crack detection tasks in a short period of time, improving work efficiency. Drones are equipped with high-resolution cameras that can capture the details of cracks and provide accurate measurement data, making crack detection results more accurate and reliable. Embedded CPUs on inexpensive drones typically have a weak matrix-computing power and are not optimized for special computing operations. To address this issue, our model uses a light-weighted computing structure with a low model storage and low computational complexity. In addition, our models are composed of convolutions and do not have specially designed complex operations. Therefore, compared to other lightweight models, our model is easier to deploy on various embedded devices.

In this article, we provide an available idea for deploying crack segmentation algorithms on drones. Although this study achieved good results on four crack datasets, it mainly focused on concrete crack detection. However, there are cracks in different materials in real environments, such as steel and wood, etc. These materials are also the basic materials of architecture, and the textures of these cracks are different from those of concrete cracks. Our proposed model might have a limited performance on these materials. Therefore, we could explore transfer learning research based on the transformer model, or integrate additional sensor data for enhancing the detection capabilities of models in the future.

From another perspective, due to the fact that building damage may not present as obvious crack characteristics, this might be due to misalignment or tilting. Therefore, only using our existing crack segmentation model for building damage detection has certain limitations. In addition, not all cracks are dangerous or sequential to structural health, and the severity of cracks requires more accurate assessment. Therefore, in response to these two aspects, in the future, we would focus on a multimodal detection algorithm based on large models, including integrating radar, image, and SHM (structural health monitoring) sensors, etc. In addition, we would focus on crack severity assessments for different buildings to evaluate the impact of cracks on building structures in the future.

Author Contributions

Conceptualization, X.Z. and H.H.; methodology, X.Z.; software, X.Z.; validation, H.H.; formal analysis, H.H.; investigation, X.Z. and H.H.; resources, X.Z. and H.H.; data curation, X.Z. and H.H.; writing—original draft preparation, X.Z.; writing—review and editing, H.H.; visualization, X.Z.; supervision, X.Z.; project administration, X.Z. and H.H. All authors have read and agreed to the published version of the manuscript.

Funding

This project was partially supported by the National Natural Science Foundation of China (No. 62071499).

Data Availability Statement

Part of the dataset used in this article is a public dataset, which can be found on the Internet, and the dataset we created can be requested from the authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Salloo, P.K.; Soltani, S.; Wong, A.K.C. A survey of thresholding techniques. Comput. Vis. Graph. Image Process. 1988, 41, 233–260. [Google Scholar]
Tanaka, N.; Uernatsu, K. A crack detection method in road surface images using morphology. In Proceedings of the 1998 IAPR Workshop on Machine Vision Applications, MVA, Chiba, Japan, 17–19 November 1998; Available online: http://b2.cvl.iis.u-tokyo.ac.jp/mva/proceedings/CommemorativeDVD/1998/papers/1998154.pdf (accessed on 9 November 2019).
Cheng, H.; Chen, J.; Glazier, C.; Hu, Y.G. Novel approach to pavement cracking detection based on fuzzy set theory. J. Comput. Civ. Eng. 1999, 13, 270–280. [Google Scholar] [CrossRef]
Zou, Y.; Wang, G.; Zou, C. Wavelet packet denoising for pavement surface cracks detection. In Proceedings of the 2008 International Conference on Computational Intelligence and Security, Suzhou, China, 13–17 December 2008; IEEE Computer Society: Washington, DC, USA, 2008; pp. 481–484. [Google Scholar]
Cha, Y.J.; Choi, W.; Buyukozturk, O. Deep Learning-Based Crack Damage Detection Using Convolutional Neural Networks. Comput. Aided Civ. Infrastruct. Eng. 2017, 32, 361–378. [Google Scholar] [CrossRef]
Song, J. Classification of Pavement Crack Images Based on CNN. China Comput. Commun. 2018, 21, 4325–4516. [Google Scholar]
Wan, H.; Gao, L.; Su, M.; Sun, Q.; Huang, L. Attention-Based Convolutional Neural Network for Pavement Crack Detection. Adv. Mater. Sci. Eng. 2021, 2021, 5520515. [Google Scholar] [CrossRef]
Fu, H.; Meng, D.; Li, W.; Wang, Y. Bridge Crack Semantic Segmentation Based on Improved Deeplabv3+. J. Mar. Sci. Eng. 2021, 9, 671. [Google Scholar] [CrossRef]
Zhang, L.; Shen, J.; Zhu, B. A research on an improved Unet-based concrete crack detection algorithm. Struct. Heal. Monit. 2020, 20, 1864–1879. [Google Scholar] [CrossRef]
Wang, S.; Wu, X.; Zhang, Y.; Chen, Q. Image Crack Detection with Fully Convolutional Network Based on Deep Learning. J. Comput. Aided Des. Comput. Graph. 2018, 30, 859–867. [Google Scholar]
Liu, F.; Wang, L. UNet-based model for crack detection integrating visual explanations. Constr. Build. Mater. 2022, 322, 126265. [Google Scholar] [CrossRef]
Jiang, X.; Jiang, J.; Yu, J.; Wang, J.; Wang, B. MSK-UNET: A Modified U-Net Architecture Based on Selective Kernel with Multi-Scale Input for Pavement Crack Detection. J. Circuits Syst. Comput. 2022, 32, 2350006. [Google Scholar] [CrossRef]
Dong, L.; Yang, Z.; Cai, X.; Zhao, Y.; Ma, Q.; Miao, X. WAVE: Edge-device cooperated real-time object detection for open-air applications. IEEE Trans. Mob. Comput. 2022, 22, 4347–4357. [Google Scholar] [CrossRef]
Maji, P.; Mullins, R. On the reduction of computational complexity of deep convolutional neural networks. Entropy 2018, 20, 305. [Google Scholar] [CrossRef] [PubMed]
Yu, S.; Huan, K.; Liu, X.; Wang, L.; Cao, X. Quantitative model of near infrared spectroscopy based on pretreatment combined with parallel convolution neural network. Infrared Phys. Technol. 2023, 132, 104730. [Google Scholar] [CrossRef]
Yang, L.; Bai, S.; Liu, Y.; Yu, H. Multi-scale triple-attention network for pixelwise crack segmentation. Autom. Constr. 2023, 150, 104853. [Google Scholar] [CrossRef]
Wang, W.; Dai, J.; Chen, Z.; Huang, Z.; Li, Z.; Zhu, X.; Hu, X.; Lu, T.; Lu, L.; Li, H.; et al. Internimage: Exploring large-scale vision foundation models with deformable convolutions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 17–24 June 2023; pp. 14408–14419. [Google Scholar]
Sizyakin, R.; Voronin, V.; Gapon, N.; Pižurica, A. A deep learning approach to crack detection on road surfaces. In Proceedings of the Conference on Artificial Intelligence and Machine Learning in Defense Applications, Online, 21–25 September 2020. [Google Scholar] [CrossRef]
Pan, X.; Kartal, E.; Giraldo, L.S.; Schwartz, O. Brain-Inspired Weighted Normalization for CNN Image Classification; Cold Spring Harbor Laborator: Laurel Hollow, NY, USA, 2021. [Google Scholar] [CrossRef]
Shuo, M.I.; Fengshou, T.; Ruibin, S.; Min, G.E.; Rucheng, Z. Performance of Swish Activation Function on Small- and Medium-Scale Data Sets. Technol. Innov. Appl. 2018. [Google Scholar]
Ramanjaneyulu, K.; Venkat Subbarao, N.; Sravani, N. Image Retrieval Based on CNN Architectures. Int. J. Innov. Eng. Manag. Res. 2018, 7, 115–122. [Google Scholar]
Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 3431–3440. [Google Scholar]
Jenkins, M.D.; Carr, T.A.; Iglesias, M.I.; Buggy, T.; Morison, G. A Deep Convolutional Neural Network for Semantic Pixel-Wise Segmentation of Road and Pavement Surface Cracks. In Proceedings of the 2018 26th European Signal Processing Conference (EUSIPCO), Rome, Italy, 3–7 September 2018; pp. 2120–2124. [Google Scholar] [CrossRef]
Nguyen, N.T.H.; Le, T.H.; Perry, S.; Nguyen, T.T. Pavement crack detection using convolutional neural network. In Proceedings of the International Symposium on Information and Communication Technology, Da Nang, Vietnam, 6–7 December 2018. [Google Scholar]
Di Benedetto, A.; Fiani, M.; Gujski, L.M. U-Net-Based CNN Architecture for Road Crack Segmentation. Infrastructures 2023, 8, 90. [Google Scholar] [CrossRef]
Yang, G.; Geng, P.; Ma, H.; Liu, J.; Luo, J. Dwta-unet: Concrete crack segmentation based on discrete wavelet transform and unet. In Proceedings of 2021 Chinese Intelligent Automation Conference; Deng, Z., Ed.; Lecture Notes in Electrical Engineering; Springer: Singapore, 2022; Volume 801. [Google Scholar]
Han, C.; Ma, T.; Huyan, J.; Huang, X.; Zhang, Y. Crackw-net: A novel pavement crack image segmentation convolutional neural network. IEEE Trans. Intell. Transp. Syst. 2022, 23, 22135–22144. [Google Scholar] [CrossRef]
Zhang, C.; Jiang, W.; Zhao, Q. Semantic segmentation of aerial imagery via split-attention networks with disentangled nonlocal and edge supervision. Remote Sens. 2021, 13, 1176. [Google Scholar] [CrossRef]
Sun, X.; Xie, Y.; Jiang, L.; Cao, Y.; Liu, B. Dma-net: Deeplab with multi-scale attention for pavement crack segmentation. IEEE Trans. Intell. Transp. Syst. 2022, 23, 18392–18403. [Google Scholar] [CrossRef]
Feng, J.; Li, J.; Shi, Y.; Zhao, Y.; Zhang, C. Acau-net: Atrous convolution and attention u-net model for pavement crack segmentation. In Proceedings of the 2022 International Conference on Computer Engineering and Artificial Intelligence (ICCEAI), Shijiazhuang, China, 22–24 July 2022; pp. 561–565. [Google Scholar]
Li, J.; Liu, Y.; Zhang, Y.; Zhang, Y. Cascaded attention denseunet (cadunet) for road extraction from very-high-resolution images. Int. J. Geo-Inf. 2021, 10, 329. [Google Scholar] [CrossRef]
Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Hu, Q. Eca-net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020. [Google Scholar]
Gao, Z.; Peng, B.; Li, T.; Gou, C. Generative adversarial networks for road crack image segmentation. In Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, 14–19 July 2019; pp. 1–8. [Google Scholar]
Nguyen, N.H.T.; Perry, S.; Bone, D.; Le, H.T.; Nguyen, T.T. Two-stage convolutional neural network for road crack detection and segmentation. Expert Syst. Appl. 2021, 186, 115718. [Google Scholar] [CrossRef]
Zhang, X.; Huang, H. PSNet: Parallel-Convolution-Based U-Net for Crack Detection with Self-Gated Attention Block. Appl. Sci. 2023, 13, 9875. [Google Scholar] [CrossRef]
Zhang, X.; Huang, H. PHCNet: Pyramid Hierarchical-Convolution-Based U-Net for Crack Detection with Mixed Global Attention Module and Edge Feature Extractor. Appl. Sci. 2023, 13, 10263. [Google Scholar] [CrossRef]
Emara, T.; Munim, H.E.A.E.; Abbas, H.M. LiteSeg: A Novel Lightweight ConvNet for Semantic Segmentation. In Proceedings of the 2019 Digital Image Computing: Techniques and Applications (DICTA), Perth, Australia, 2–4 December 2019. [Google Scholar] [CrossRef]
Wang, B.; Li, H.S. Lane detection algorithm based on MoblieNet + UNet lightweight network. In Proceedings of the 2021 3rd International Symposium on Robotics & Intelligent Manufacturing Technology (ISRIMT), Changzhou, China, 25–26 September 2021; pp. 352–356. [Google Scholar] [CrossRef]
Tsai, T.H.; Tseng, Y.W. BiSeNet V3: Bilateral segmentation network with coordinate attention for real-time semantic segmentation. Neurocomputing 2023, 532, 33–42. [Google Scholar] [CrossRef]
Ruan, J.; Xie, M.; Gao, J.; Liu, T.; Fu, Y. Ege-unet: An efficient group enhanced unet for skin lesion segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Vancouver, BC, Canada, 8–12 October 2023; Springer Nature: Cham, Switzerland, 2023; pp. 481–490. [Google Scholar]
Jiang, W.; Xie, Z.; Li, Y.; Liu, C.; Lu, H. Lrnnet: A light-weighted network with efficient reduced non-local operation for real-time semantic segmentation. In Proceedings of the 2020 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), London, UK, 6–10 July 2020; pp. 1–6. [Google Scholar]
Zhang, Y.; Du, P. Parallel minimum mean square error equalization for reduced-zero-padding orthogonal time frequency space with the aid of unitary precoding. Trans. Emerg. Telecommun. Technol. 2023, 34, e4688. [Google Scholar] [CrossRef]
Farahnak-Ghazani, F.; Baghshah, M.S. Multi-label classification with feature-aware implicit encoding and generalized cross-entropy loss. In Proceedings of the 2016 24th Iranian Conference on Electrical Engineering (ICEE), Shiraz, Iran, 10–12 May 2016. [Google Scholar] [CrossRef]
Yang, L.; Zhang, F.; Wang, P.S.-P.; Li, X.; Luo, H. Multi-Content Merging Network Based on Focal Loss and Convolutional Block Attention in Hyperspectral Image Classification. Int. J. Pattern Recognit. Artif. Intell. 2022, 36, 2250018. [Google Scholar] [CrossRef]
Li, B.; Yao, Y.; Tan, J.; Zhang, G.; Yu, F.; Lu, J.; Luo, Y. Equalized focal loss for dense long-tailed object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 6990–6999. [Google Scholar]

Figure 1. The structure of the Light-Weighted Convolution-Based Segmentation Method (LCSNet).

Figure 2. The structure of the Light-Weighted Convolution (LConv).

Figure 3. The Separable Multi-Directional Convolutional Module (SMDCM).

Figure 4. Some examples of images from our RECrack dataset. (a) Crack on a sunny day; (b) crack on a shaded day; and (c) crack on a rainy day.

Figure 5. An example of the comparison of our proposed LCSNet with the state-of-the-art results, the example crack image is taken from the Crack500 datasets.

Figure 6. Accuracy comparison using different numbers of filters in LConv.

Figure 7. Accuracy comparison of using different size of filters in Multi-Directional Convolution in SMDCM.

Table 1. Splitting of the datasets.

Dataset	Image Size	Train	Test
The Cracktree200 dataset	800 × 600	165	41
The CFD dataset	480 × 320	95	23
The Crack500 dataset	1440 × 2560 or 2560 × 1440	1896	1124
The RECrack dataset	1920 × 1080	8000	2000

Table 2. (a) Accuracy comparison with the state-of-the-art methods.

Methods	Cracktree200	Crack500	CFD	RECrack	Model Size
ConvNet [22]	0.471	0.591	0.579	0.423	-
U-Net by Jenkins [23]	0.75	0.681	0.851	0.516	-
U-Net by Nguyen [24]	0.763	0.695	0.856	0.569	-
U-Net proposed by Di [25]	0.791	0.732	0.887	0.635	-
DWTA-U-Net [26]	0.90	0.77	0.973	0.675	-
CrackW-Net [27]	0.855	0.789	0.959	0.671	-
Split-Attention Network [28]	0.851	0.73	0.963	0.688	-
DMA-Net [29]	0.793	0.746	0.965	0.692	-
ACAU-Net [30]	0.861	0.792	0.967	0.713	-
Cascaded Attention DenseU-Net [31]	0.863	0.74	0.97	0.742	137 M
ECA-Net [32]	0.885	0.753	0.971	0.773	87 M
FU-Net [33]	0.89	0.795	0.983	0.765	90 M
Two-stage-CNN [34]	0.892	0.79	0.981	0.782	230 M
PSNet [35]	0.926	0.812	0.985	0.803	185 M
PHCNet [36]	0.929	0.823	0.989	0.812	167 M
LCSNet	0.942	0.836	0.992	0.833	2 M
(b) Recall comparison with the state-of-the-art methods.
Methods	Cracktree200	Crack500	CFD	RECrack
Split-Attention Network [28]	0.857	0.725	0.981	0.663
DMA-Net [29]	0.823	0.775	0.978	0.702
ACAU-Net [30]	0.854	0.776	0.959	0.707
Cascaded Attention DenseU-Net [31]	0.853	0.732	0.956	0.696
ECA-Net [32]	0.891	0.767	0.982	0.781
FU-Net [33]	0.864	0.761	0.967	0.753
Two-stage-CNN [34]	0.851	0.773	0.972	0.777
PSNet [35]	0.932	0.829	0.986	0.809
PHCNet [36]	0.914	0.817	0.972	0.805
LCSNet	0.932	0.828	0.988	0.829
(c) F1 measure comparison with the state-of-the-art methods.
Methods	Cracktree200	Crack500	CFD	RECrack
Split-Attention Network [28]	0.85	0.73	0.97	0.68
DMA-Net [29]	0.81	0.76	0.97	0.70
ACAU-Net [30]	0.86	0.78	0.96	0.71
Cascaded Attention DenseU-Net [31]	0.86	0.74	0.96	0.72
ECA-Net [32]	0.89	0.76	0.98	0.78
FU-Net [33]	0.88	0.78	0.97	0.76
Two-stage-CNN [34]	0.87	0.78	0.98	0.78
PSNet [35]	0.93	0.82	0.99	0.81
PHCNet [36]	0.92	0.82	0.98	0.81
LCSNet	0.94	0.83	0.99	0.83

Table 3. Accuracy comparison with the state-of-the-art light-weighted models.

Methods	Cracktree200	Crack500	CFD	RECrack
LiteSeg [37]	0.925	0.814	0.983	0.826
MobileNet+UNet [38]	0.892	0.786	0.963	0.791
BiSeNet v3 [39]	0.919	0.792	0.972	0.804
EGE-UNet [40]	0.928	0.803	0.977	0.815
LRNNet [41]	0.937	0.812	0.981	0.824
LCSNet	0.942	0.836	0.992	0.833

Table 4. Accuracy comparison with different sizes of the LCSNet.

Number of the LConv in LCSNet (Different Size of the LCSNet)	Cracktree200	Crack500	CFD	RECrack	Model Size
18	0.892	0.783	0.951	0.776	1M
36	0.942	0.836	0.992	0.833	2M
72	0.943	0.839	0.993	0.837	4M
144	0.947	0.842	0.994	0.841	8M

Table 5. Accuracy comparison with different loss functions.

Methods	Cracktree200	Crack500	CFD	RECrack
Mean Square Error Loss [42]	0.936	0.829	0.982	0.821
Cross Entropy Loss [43]	0.937	0.831	0.983	0.827
Focal Loss [44]	0.939	0.835	0.989	0.831
E-Focal Loss [45]	0.942	0.836	0.992	0.833

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, X.; Huang, H. LCSNet: Light-Weighted Convolution-Based Segmentation Method with Separable Multi-Directional Convolution Module for Concrete Crack Segmentation in Drones. Electronics 2024, 13, 1307. https://doi.org/10.3390/electronics13071307

AMA Style

Zhang X, Huang H. LCSNet: Light-Weighted Convolution-Based Segmentation Method with Separable Multi-Directional Convolution Module for Concrete Crack Segmentation in Drones. Electronics. 2024; 13(7):1307. https://doi.org/10.3390/electronics13071307

Chicago/Turabian Style

Zhang, Xiaohu, and Haifeng Huang. 2024. "LCSNet: Light-Weighted Convolution-Based Segmentation Method with Separable Multi-Directional Convolution Module for Concrete Crack Segmentation in Drones" Electronics 13, no. 7: 1307. https://doi.org/10.3390/electronics13071307

APA Style

Zhang, X., & Huang, H. (2024). LCSNet: Light-Weighted Convolution-Based Segmentation Method with Separable Multi-Directional Convolution Module for Concrete Crack Segmentation in Drones. Electronics, 13(7), 1307. https://doi.org/10.3390/electronics13071307

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

LCSNet: Light-Weighted Convolution-Based Segmentation Method with Separable Multi-Directional Convolution Module for Concrete Crack Segmentation in Drones

Abstract

1. Introduction

2. Materials and Methods

2.1. The Structure of the LCSNet

2.2. The Lightweighted Convolution

2.3. The Separable Multi-Directional Convolution Module

2.4. Our Collected Dataset

3. Datasets and Experimental Setup

3.1. Datasets

3.2. Experimental Setup

4. Results

4.1. Comparison with the State-of-the-Art Methods

4.2. Effects of Using Different Numbers of Filters in LConv

4.3. Effects of Using Different Sizes of Filters in Multi-Directional Convolution in SMDCM

4.4. Comparison of Different Light-Weighted Segmentation Models

4.5. Effects of Using Different Sizes of the LCSNet

4.6. Effects of Using Different Loss Functions

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI