ALdamage-seg: A Lightweight Model for Instance Segmentation of Aluminum Profiles

Zhu, Wenxuan; Su, Bochao; Zhang, Xinhe; Li, Ly; Fang, Siwen

doi:10.3390/buildings14072036

Open AccessArticle

ALdamage-seg: A Lightweight Model for Instance Segmentation of Aluminum Profiles

by

Wenxuan Zhu

^1,2,

Bochao Su

^1,*,

Xinhe Zhang

²,

Ly Li

³ and

Siwen Fang

^1,2

¹

Institute of Intelligent Manufacturing Technology, Shenzhen Polytechnic University, Shenzhen 518055, China

²

School of Electronic and Information Engineering, University of Science and Technology Liaoning, Anshan 114051, China

³

School of Mathematics, Harbin Institute of Technology, Harbin 150001, China

^*

Author to whom correspondence should be addressed.

Buildings 2024, 14(7), 2036; https://doi.org/10.3390/buildings14072036

Submission received: 10 April 2024 / Revised: 9 May 2024 / Accepted: 7 June 2024 / Published: 3 July 2024

(This article belongs to the Special Issue Study of Material Technology in Structural Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

Aluminum profiles are widely used in various manufacturing sectors due to their flexibility and chemical properties. However, these profiles are susceptible to defects during manufacturing and transportation. Detecting these defects is crucial, but existing object detection models like Mask R-CNN and YOLOv8-seg are not optimized for this task. These models are large and computationally intensive, making them unsuitable for edge devices used in industrial inspections. To address this issue, this study proposes a novel lightweight instance segmentation model called AL-damage-seg, inspired by the YOLOv8n-seg architecture. This model utilizes MobileNetV3 as the backbone. In YOLOv8n-seg, the role of C2f is to enhance the nonlinear representation of the model to capture complex image features more efficiently. We upgraded and improved it to form multilayer feature extraction module (MFEM) and integrates a large separable kernel attention (LSKA) mechanism in the C2f module, resulting in C2f_LSKA, to further optimize the performance of the model. Additionally, depth-wise separable convolutions are employed in the feature fusion process. The ALdamage-seg’s weight on the Alibaba Tian-chi aluminum profile dataset constitutes 43.9% of that of YOLOv8n-seg, with its GFLOPs reduced to 53% relative to YOLOv8-seg, all the while achieving an average precision (

m A P

) of 99% relative to YOLOv8-seg. With its compact size and lower computational requirements, this model is well-suited for deployment on edge devices with limited processing capabilities.

Keywords:

industrial inspection; lightweight; machine vision; instance segmentation; YOLOv8

1. Introduction

Aluminum profiles are widely used in the fields of vehicle manufacturing and construction [1]. Their shape is shown in Figure 1. Given their soft nature and active chemical characteristics, these profiles are susceptible to various forms of damage throughout the processes of manufacturing and transportation. Issues such as pitting, dents, and conductivity failures in products can significantly impact their quality and functionality. Consequently, the significance of defect detection and identification in aluminum profiles has been increasingly recognized, with a growing demand among manufacturers for methods that are both effective and precise [2]. In response, the past few years have witnessed the introduction of numerous advanced technologies and approaches aimed at enhancing the accuracy and efficiency of these processes. This type of method has brought technological advancements to the aluminum profile manufacturing industry.

The advancement of electronic technology has catalyzed a shift from traditional, labor-intensive industrial inspection methods towards automated technologies, such as infrared sensor inspection [3,4]. In recent years, the evolution of deep learning and computer vision technologies has broadened their applicability in the domain of image segmentation. Convolutional neural networks (CNNs), characterized by their potent feature extraction capabilities and the ability to autonomously learn image features, have become instrumental in the realm of industrial inspection. For instance, in 2021, improvements to the YOLOv4 model were made by Li, X. et al. [5], incorporating techniques such as weighted residual connections (WRC), cross-stage partial connections (CSPC), cross-mini batch normalization (CmBN), self-adaptive training (SAT), and Mish activation to refine the model’s accuracy. These enhancements have paved new avenues for augmenting the efficiency and precision of industrial inspection processes. Then, in 2023, an enhancement to the YOLOv5 model, designated DAYOLOv5, was introduced by Chen, L. et al. [6]. This model demonstrates enhanced performance on datasets comprising small objects, rendering it apt for applications within real-world industrial contexts.

Image segmentation can accurately segment different areas or objects in the image, achieve fine positioning and recognition, and better extract the information in the image [7,8]. In 2024, a novel semantic segmentation network model, HilbertSCNet, was proposed by Zheng, Q. et al. [9]. This model synergizes the image dimension reduction algorithm, facilitated by the traversal of the Hilbert curve, with the concept of dual paths and introduces a novel spatial computation module. This innovation effectively addresses the issue of information loss pertaining to small objects during the down-sampling process. Furthermore, in 2023, a novel workflow predicated on the U-Net Convolutional Neural Network was put forth by Bijal, C. et al. [10], which significantly enhances the accuracy of semantic segmentation. However, semantic segmentation methods are good at distinguishing between different kinds, but they do not have the ability to distinguish between different entities in the same category.

To advance beyond these limitations, instance segmentation emerges as a solution, enabling the differentiation and segmentation of individual instances within a uniform category [11,12,13]. In the year 2023, Iván, G. et al. introduced a novel re-inference approach that enhances initial recognition outcomes achieved through the integration of Mask-RCNN and super-resolution [14]. This method facilitates the identification of elements that were previously undetected, thereby elevating the recognition quality. Such an enhancement notably augments the model’s ability to generalize. Nonetheless, the computational demands of models akin to Mask-RCNN are deemed excessive, rendering them less than ideal for deployment on edge devices where computational resources are constrained.

Addressing these challenges necessitates the formulation of a more streamlined network architecture specifically tailored for instance segmentation. The selection process gravitated towards an enhancement of the YOLO [15,16] model series, with particular attention to the recent strides and commendable performance demonstrated by the YOLOv8 series in object recognition. The YOLOv8n-seg, emerging from this series, serves as the foundational model, offering an array of models at varying scales to accommodate the demands of edge device environments. Of these, YOLOv8n-seg stands out as the most lightweight model designed for instance segmentation.

The research introduces a refined method for instance detection, named YOLOv8-seg ALdamage-seg, engineered to align with the specific requirements of industrial inspection on edge devices [17]. At the outset, a lightweight backbone network is established utilizing MobileNetV3, thereby reducing the model’s weight. The incorporation of the multi-layer feature extraction module (MFEM) enhances the network’s discernment of pivotal information within feature maps. By incorporating the conv module and

B o t t l e n e c k

in the multi-layer feature extraction process, the MFEM effectively integrates feature maps while preserving spatial dimensions. This integration enhances the accuracy and practicality of the model. Through the amalgamation of the C2f module with the large separable kernel attention (LSKA) mechanism, and the substitution of standard convolution in the neck network segment with depth-wise separable convolution (DPConv), the C2f_LSKA module is conceived. This module significantly curtails the model’s complexity and the overall count of parameters. The employment of ALdamage proves instrumental in the extraction of instance segmentation images pertaining to aluminum profiles damage, furnishing robust support for the subsequent steps of damage type detection and the evaluation of damage severity.

The research introduces a novel lightweight instance segmentation model called AL-damage-seg, specifically designed for detecting defects in aluminum profiles. This model performs instance segmentation on aluminum profile damage and exhibits reduced weight and lower GFLOP requirements, making it suitable for deployment on edge devices with limited computational resources.
Two new modules have been proposed: MFEM and C2f_LSKA. These modules not only reduce the model’s weight but also enhance feature extraction capabilities.
By utilizing MobileNetV3 as the backbone network and incorporating MFEM, C2f_LSKA, and DPConv, AL-damage-seg achieves significant lightweighting while maintaining high detection accuracy.

2. Methods

2.1. Methodological Flow and Data Collection

The ALdamage-seg model boasts attributes such as reduced volume and superior detection precision. The process employed by this model is depicted in Figure 2, beginning with the creation of an aluminum profile damage dataset. The dataset used in this study is the Alibaba Tian-chi aluminum profile dataset, and instance segmentation labels were created for its dataset images. The process is shown in Figure 3, where Labelme 3.16.7 was used to segment the damaged areas on the surface of the aluminum profiles. This dataset encompasses various types of damage, including non-conductivity, scratches, orange-peel textures, dents, and spots. Data augmentation techniques, such as image flipping and rotation, are applied to enhance the dataset. Following augmentation, the dataset is segmented into training, validation, and test sets, adhering to an 8:1:1 distribution. The subsequent phase involves the design and development of ALdamage-seg. In the final stage, the ALdamage-seg model undergoes training and validation, with its efficacy evaluated on the test set. This approach not only ensures a comprehensive understanding of the model’s capabilities but also underscores the methodological rigor applied throughout the research.

2.2. YOLO Introduction

The object detection algorithm YOLO was introduced by Joseph Redmon et al. in 2015 [18,19,20,21], presenting a novel approach to object detection by converting the problem into a regression challenge and enabling continuous training and detection. The inaugural release in the YOLO series, YOLOv1, marked a significant departure in object recognition methods, facilitating end-to-end training and detection through this transformation. Subsequent developments in deep learning have spurred iterative advancements in the YOLO series, incorporating enhancements such as novel network architectures, multi-level feature integration, and advanced prediction methodologies. These iterations have consistently improved recognition speed and accuracy.

The most recent iteration, YOLOv8, represents a notable leap forward in terms of detection accuracy and processing speed. This iteration underscores the YOLO series’ commitment to balancing model efficiency with high performance. In the realm of computer vision and detection, the evolving requirements for YOLO variants, particularly in industrial surface defect detection, emphasize the need for rapid detection, precise accuracy, and compatibility with edge devices possessing limited computational capacity. From the perspective of industrial manufacturing, a comprehensive analysis of YOLO’s evolution from its original version to the latest YOLOv8 has been undertaken. The structure diagram of YOLOv8 is shown in Figure 4. This review highlights the YOLO series’ dominance in the object detection domain, attributed to its optimal balance between speed and precision. The series has been extensively researched, refined, and applied across various domains, notably in identifying surface defects within industrial manufacturing settings [22]. This body of work exemplifies the YOLO series’ integral role in advancing object detection technology.

YOLOv8n-seg, an instance segmentation model derived from YOLOv8 [23], amalgamates object detection and semantic segmentation capabilities to concurrently identify and delineate industrial components within images. This model is engineered to merge the tasks of object recognition and segmentation, facilitating a more precise analysis of industrial components. Leveraging the backbone and anchor-free detection head from YOLOv8, it excels in image feature extraction and object detection. Distinct from YOLOv8, YOLOv8n-seg incorporates a specialized segmentation branch within its detection head, enabling it to predict both the segmentation mask and the confidence levels for each bounding box. As a result, the model adeptly segments industrial components, rendering it suitable for a variety of industrial inspection tasks, including defect identification and size measurement. To enhance accuracy in both detection and segmentation, YOLOv8n-seg employs multiple loss functions: classification loss, regression loss, and distributed focal loss for object detection, alongside cross-entropy loss and dice loss for segmentation.

2.3. Construction of ALdamage-seg

2.3.1. General Architecture

In pursuit of augmenting the segmentation efficiency on edge devices, an innovative approach for segmenting ALdamage-seg instances, based on YOLOv8n-seg, is introduced. The principal objective is to refine the model’s lightweight properties while maintaining segmentation efficacy. Illustrated in Figure 5 is the network structure of the proposed method. The architecture commences with the establishment of the backbone using MobileNetV3, followed by the construction of the MFEM and C2f_LSKA components, and the integration of DPConv in lieu of conventional convolutional layers to form the neck network. Feature fusion is achieved through a concatenation layer at varied levels, with the resultant feature map directed towards the detection head of YOLOv8n-seg for subsequent detection. The backbone, MFEM, DPConv, and C2f_LSKA components are elucidated in sequence, showcasing the method’s strategic design to optimize model performance for industrial applications.

2.3.2. Backbone

Proposed by the Google team in 2019, MobileNetV3 stands as a lightweight network model that addresses the computational demands traditional convolutional neural networks impose, which are often too intensive for mobile and embedded devices. Achieving significant strides in mobile image classification, object detection, and semantic segmentation tasks, MobileNetV3 incorporates numerous innovative technologies. Characterized by its inverse residual structure and linear bottleneck, MobileNetV3 enables the network to acquire a specific number of parameters via the backpropagation process, which are then eligible for training. This mechanism effectively mitigates the issue of gradient descent. Furthermore, its reversible network design preserves input data details, enhancing the network’s expressiveness without regard to network depth. Utilizing a constant memory footprint for gradient computation, the reversible network’s architecture, in conjunction with the computational efficiency of DPConv, assures a lightweight framework.

2.3.3. MFEM Module

To enhance detection precision, the MFEM module includes conv module and

B o t t l e n e c k

module. It starts with a

1 * 1

conv layer to maintain spatial dimensions. Then, a

3 * 3

conv operation generates a feature map, followed by n iterations of forward propagation in the second segment. Outputs from these layers undergo a

1 * 1

conv operation, resulting in three distinct feature maps that are concatenated for the final output. Figure 6 illustrates the network architecture, designed for efficient feature map integration to optimize detection accuracy.

The MFEM is designed to construct a feature pyramid network through adaptive learning. This module takes input feature maps and produces feature maps at various scales via a sequence of convolutional operations and residual connections. The operating principle of the MFEM is described as follows:

Initially, a formula is established, expressed as:

Y = C o n v (X, C, kernel_size, stride, padding)

(1)

Here,

C o n v

represents a convolution operation applied to the input

X

, resulting in the output

Y

.

C

denotes the number of channels;

kernel_size

refers to the size of the convolution kernel;

stride

indicates the stride, which is the convolution operation’s step size in both horizontal and vertical directions, detailing the pixel shift for each convolution step;

padding

pertains to padding, describing the pixel addition around the input feature map’s edge.

Suppose the input of the feature map is

x

, the output is

y

and

n

stand for the number of

B o t t l e n e c k

modules. First, a

1 * 1

convolution operation is performed on the input

x

to obtain

2 \times c

feature maps, as follows:

y_{1}, y_{2} = C o n v (x, 2 \times c, 1 * 1, 1, 1)

(2)

Then, perform

n

B o t t l e n e c k

module operations on, resulting in a feature map, which is:

\{\begin{matrix} B o t t l e n e c k (y_{2}) & = & y_{3} \\ B o t t l e n e c k (y_{3}) & = & y_{4} \\ ⋮ & ⋮ \\ B o t t l e n e c k (y_{n + 1}) & = & y_{n + 2} \end{matrix}

(3)

Next, perform a

3 * 3

convolution on

y_{1}

,

y_{3}

,

y_{4}

and

y_{n + 2}

to obtain four feature maps, namely:

f_{1} = C o n v (y_{1}, c, 3 * 3, 1, 1)

(4)

f_{2} = C o n v (y_{3}, c, 3 * 3, 1, 1)

(5)

f_{3} = C o n v (y_{4}, c, 3 * 3, 1, 1)

(6)

f_{4} = C o n v (y_{n + 2}, c, 3 * 3, 1, 1)

(7)

Finally, concatenate

f_{1}

,

f_{2}

,

f_{3}

,

f_{4}

using the

C o n c a t

module and perform a

1 * 1

convolution operation to obtain the output

y_{O U T}

:

y_{O U T} = C o n v (C o n c a t (f_{1}, f_{2}, f_{3}, f_{4}), c, 1 * 1, 1, 1)

(8)

2.3.4. DPConv

To mitigate computational complexity and diminish model size, this research incorporates DPConv in lieu of traditional convolutional layers within the model framework. As deep learning technology evolves, CNNs have showcased exceptional efficacy in image classification and object detection tasks. Nonetheless, standard convolutional layers are characterized by substantial computational demands and extensive parameter sets, limiting the feasibility of deploying these models on mobile devices and embedded systems. The adoption of DPConv serves as a significant solution to this limitation, positioning itself as a key innovation for developing lightweight CNN models. The configuration of this module is detailed in the subsequent sections. The architecture of DPConv is depicted in Figure 7.

The DPConv decomposes the conventional convolution layer into two distinct convolutional processes: depth-wise convolution and point-wise convolution. Depth-wise convolution operates by convolving each input feature channel independently, maintaining the original number of channels, which markedly reduces computational requirements and parameter count. Point-wise convolution, on the other hand, facilitates the interchange of information across channels by applying a

1 * 1

convolution kernel to the output of the depth-wise convolution, ensuring an efficient exchange of information between channels with minimal computational burden.

2.3.5. C2f_LSKA: LSKA Attention Mechanism Fusion Part

In an endeavor to enhance the model’s segmentation capabilities while simplifying its complexity, a fusion module termed C2f_LSKA, combining C2f and LSKA [24], is introduced. This module integrates the LSKA attention mechanism into C2f. LSKA, predicated on the large kernel attention (LKA) mechanism design, is an attention module that has demonstrated superior performance over visual transformers (ViTs) across a variety of visual tasks. The visual attention network (VAN) of the LKA mechanism, however, faces a quadratic rise in computational and memory demands with the increase in convolution kernel size. Addressing these challenges and facilitating the use of substantially large convolution kernels within the VAN’s attention module, Wai, L.K. et al. proposed LSKA—a suite of attention modules characterized by large separable kernels. The architectural designs of the LKA and LSKA modules are detailed, reflecting the strategic implementation of attention mechanisms to bolster the model’s segmentation efficiency. The structures of LKA and LSKA are depicted in Figure 8:

The LSKA concept involves dividing the first two layers of the LKA into four layers. Each LSKA layer comprises two one-dimensional convolution layers. The operational principle of LSKA is detailed below.

Initially, two deep convolution operations are performed on the input

F

, with parameters

kernel_size

and

1 * (2 d - 1)

for the first operation, and

(2 d - 1) * 1

for the second. The formulaic representation is:

Z = C o n v (C o n v (F, c, 1 * (2 d - 1), 1, 1), c, (2 d - 1) * 1, 1, 1)

(9)

where

Z

is the output of a depth convolution with a

kernel_size

of

1 * (2 d - 1)

and

(2 d - 1) * 1

,

d

is the dilation rate.

Z^{C} = C o n v (C o n v (Z, c, 1 * [\frac{k}{d}], 1, 1), c, [\frac{k}{d}] * 1, 1, 1)

(10)

Z^{C}

represents the output of depth-wise convolution with

kernel_size

1 * [\frac{k}{d}]

and depth

[\frac{k}{d}] * 1

.

k

represents the maximum receptive field.

Then, perform a convolution operation with

kernel_size

1 * 1

on the output

Z^{C}

to obtain output

A^{C}

, and take the Hadamard product of

A^{C}

and

F^{C}

. Finally, the output

{\bar{F}}^{C}

is obtained. The formula is expressed as follows:

A^{C} = C o n v (Z^{C}, c, 1 * 1, 1, 1)

(11)

{\bar{F}}^{C} = A^{C} \otimes F^{C}

(12)

\otimes

signifies the Hadamard product.

Following these operations, the LSKA module is created and incorporated into the C2f to develop the C2f_LSKA module. Figure 9 structural diagram of this module is provided.

3. Training and Testing Results

3.1. Data Set and Experiment Setup

In this research, the aluminum profiles dataset from Alibaba’s Tian-chi was chosen as the primary subject of investigation. This dataset comprises 1885 images, each depicting various types of damage on aluminum profiles. To address the limited size of the original dataset, enhancements such as image flipping and rotation were applied, resulting in a total of 7871 images. Among these, 6259 images were allocated for training, 806 for testing, and another 806 for validation. This distribution ensures the availability of sufficient data across the training, validation, and testing phases, facilitating the acquisition of reliable results with potential for generalization. Thus, the study provides a thorough examination of computer vision tasks and assesses the model’s performance in practical scenarios. The experiments were conducted using a Windows 10 operating system, an Intel(R) Core(TM) i5-13400F CPU (Intel Corporation, Santa Clara, CA, USA), an Nvidia RTX3060 GPU with 12 G of video memory (Nvidia Corporation, Santa Clara, CA, USA). The deep learning framework utilized was PyTorch version 2.0.1, and CUDA version 11.8.

3.2. Evaluations Metrics

Mean average precision (

m A P

) is a critical metric for assessing the performance in object detection or localization tasks within the computer vision domain. It gauges the accuracy and robustness of models by amalgamating two vital metrics: precision and recall, offering a more comprehensive evaluation of model performance across various thresholds. Specifically, the average precision at 50% intersection over union (IoU) metric quantifies the model’s segmentation accuracy, while the same metric at a 50% IoU threshold measures the precision of the model’s bounding box predictions. Furthermore, giga floating-point operations per second (GFLOPs) denote the computational performance, representing the billions of floating-point operations executed per second. In this study, metrics such as weight,

m A P_{M a s k}^{50}

,

m A P_{B o x}^{50}

, and GFLOPs are employed as primary evaluative indicators, ensuring a multifaceted assessment of the model’s efficacy.

In the realm of object detection, the intersection over union (IoU) metric is commonly employed to ascertain the extent of overlap between the predicted bounding box and the true label box. The process of calculating metric

m A P

involves arranging all detection outcomes in descending order according to their confidence scores. Subsequently, the precision and recall for each prediction box are calculated under various confidence thresholds. By aggregating the areas under the curve (AUC) of the precision-recall curve across all confidence thresholds and dividing by the number of categories, one obtains the value of metric

m A P

, as delineated in Formula (13):

m A P = \frac{1}{N} \sum_{i = 1}^{N} \int_{0}^{1} P (R) d R \times 100 %

(13)

Recall rate (

R

) measures the proportion of correctly predicted positive samples relative to all predicted samples, as defined in Formulas (14) and (15) for metrics

P

and

R

, respectively.

P = \frac{T P}{T P + F P}

(14)

R = \frac{T P}{T P + F N}

(15)

Metric

T P

counts accurately identified positive instances, Metric

F P

tracks erroneously classified positive instances, and Metric

F N

records instances wrongly categorized as negative.

3.3. Experimental Results

The performance curve of the model, as depicted in Figure 10, demonstrates a trend towards stabilization around the 400th round of training. Based on this observation, the decision was made to conclude the training at the 417th round, marking a pivotal point in the experiment’s progression.

In the comparative analysis, the performance of Mask R-CNN, YOLOv5n-seg, YOLOv7-seg, YOLOv8n-seg, YOLOv8m-seg, YOLOv8n-seg, MobileNetV3-YOLOv8n-seg, Ghost-YOLOv8n-seg, ShuffleNetv2-YOLOv8n-seg, and ALdamage-seg was evaluated on the same dataset. The assessment focused on their respective weights and performance metrics, including weight,

m A P_{M a s k}^{50}

,

m A P_{B o x}^{50}

, GFLOPs, and parameters.

The findings, presented in Table 1, reveal that ALdamage-seg has a weight of 2.9M, with performance metrics of 0.604, 0.688, 6.4 GFLOPs, and 1,373,412 parameters. ALdamage-seg excels across all evaluated quantitative metrics, achieving high accuracy while significantly reducing model weight. Notably,

m A P_{M a s k}^{50}

indicates a 99% performance level for YOLOv8n-seg, whereas

m A P_{B o x}^{50}

signifies a 98% performance level for the same model. While Mask R-CNN and YOLOv8n-seg display commendable performance in

m A P_{M a s k}^{50}

and

m A P_{B o x}^{50}

metrics, their considerable weight and GFLOPs render them less suited for edge devices constrained by computational resources.

As depicted in Figure 11, ALdamage-seg and YOLOv8n-seg show comparable effectiveness in image segmentation on aluminum profiles under various damage conditions, according to the

m A P_{M a s k}^{50}

evaluation metric. However, ALdamage-seg stands out for its lightweight design, a critical attribute for enhancing deployment and operational efficiency in real-world applications, particularly where swift response and real-time processing are paramount. The streamlined nature of ALdamage-seg positions it advantageously for use in scenarios demanding high efficiency and rapid processing capabilities.

3.4. Ablation Experiment

The ablation experimental is presented in Table 2. In this ablation study, the integration of the MobileNetV3 backbone, MFEM module, C2f_LSKA module, and DPConv was investigated. The outcomes of the ablation experiments are summarized in Table 2. After conducting 15 rounds of experiments, it was noted that the MFEM significantly improved the model’s feature extraction efficiency. The inclusion of MobileNetV3, DPConv, and C2f_LSKA contributed to the model’s lightweight nature without sacrificing accuracy. Noteworthy is the observation that omitting the MobileNetV3 backbone and replacing the neck network of the YOLOv8n-seg model with either DPConv or C2f_LSKA resulted in diminished performance. The utilization of the MobileNetV3 backbone and the subsequent integration of DPConv with MFEM led to a decrease in performance due to the incompatible feature scale merging. In a similar vein, the combination of DPConv and C2f_LSKA significantly impaired segmentation performance, highlighting the inefficacy in feature scale integration. The exclusive use of the MFEM module echoed the challenges noted with the DPConv and MFEM combination, considerably increasing the model’s complexity. Moreover, relying solely on the C2f_LSKA module replicated the issues found with the DPConv and C2f_LSKA pairing, manifesting in a notable degradation of segmentation capability.

3.5. Results of Detection

The test results are as shown in Figure 12; segmentation was performed utilizing YOLOv5-seg, YOLOv8n-seg, and ALdamage-seg models on various defects such as non-conductive, orange-peel, spots, scratches, and dents. The segmentation outcomes are depicted in the figures. From the detection results, it is evident that YOLOv8n-seg and ALdamage-seg exhibit superior performance compared to YOLOv5-seg across different defect scenarios. Furthermore, ALdamage-seg notably conserves computational resources while achieving a performance nearly equivalent to that of YOLOv8n-seg.

4. Discussion

This study introduces a novel approach for detecting damages on aluminum profiles through instance segmentation, presenting an advanced model, ALdamage-seg. This model, an enhancement of the YOLOv8-seg for aluminum profiles instance segmentation, reduces the weight to 43.9% and decreases the GFLOPs to 53% of the original YOLOv8-seg, while achieving an average precision of 99%.

Traditionally, defect detection has been reliant on identifying bounding boxes around defective areas [26]. However, instance segmentation offers the ability to precisely delineate object boundaries, achieve pixel-level segmentation, and furnish more comprehensive information. This research proposes a method for accurately estimating the actual damaged area on aluminum profiles by employing injury information segmented from instances.

The deployment process begins with the trained ALdamage-seg model being implemented on edge devices designated for aluminum profiles instance segmentation within an image processing system. Following the stabilization of the camera position, data captured by the camera are inputted into the image processing system. The system then analyzes the data to produce detection results, which are illustrated in Figure 13. The method for estimating the detection area is shown in Figure 14.

We set the field of view angle as

α

, the straight-line distance as

b

, and the distance detection as

c

. Thus, we can obtain:

c = 2 \times (\tan \frac{α}{2} \times b)

(16)

By employing the same algorithm for both length and width dimensions, the detection distance’s length and width are ascertained, enabling the calculation of the detection perspective’s area

S

. Subsequently, by multiplying the total number of pixel blocks by the ratio of segmented pixels to the detection perspective area, the damage area

S'

of the aluminum profiles is determined. The formula is outlined as follows:

S' = \frac{Number of segmented pixels}{Total number of pixels} \times S

(17)

While this methodology theoretically offers a robust means of estimating the damaged area on aluminum profiles, it necessitates precise measurement of the pixel-to-area ratio and the camera’s distance from the target. Implementing such accuracy may require complex algorithms, consequently elevating the implementation complexity. Future investigations might explore the use of instruments, like laser radar or infrared ranging, or visual algorithms, such as binocular ranging, to ascertain the camera’s distance. These methods, in conjunction with high-definition cameras for segmentation, hold substantial research value for industrial applications like non-destructive testing and detailed information extraction from damaged sections.

5. Conclusions

This research introduces an advanced model derived from YOLOv8n-seg for defect detection in aluminum profiles, incorporating MobileNetV3, MFEM, DPConv, and C2f-LSKA components. The effectiveness of the method was verified by comparative test and ablation experiment. The experimental outcomes reveal that the model retains 99% of YOLOv8n-seg’s performance in terms of

m A P_{M a s k}^{50}

, while reducing its weight to 43.9% and GFLOPs to 53% of those of YOLOv8-seg. These results signify a notable decrease in computational demand and model size, enhancing its suitability for edge devices constrained by computational resources. Despite advancements in reducing YOLOv8n-seg’s resource requirements, obtaining a dataset to train a model is difficult. Future research avenues might explore incorporating generative adversarial networks (GANs) into the model framework. This would enable adversarial training between the generator and discriminator, potentially augmenting the model’s generalization capabilities for rare or new defect types without the explicit definition of a loss function, thus substantially elevating real-world application performance. In industrial contexts, where labeling resources are scarce and defect types continually evolve, the implications of this research are profound.

Author Contributions

Conceptualization, W.Z., B.S., X.Z. and L.L. Data curation, W.Z. Formal analysis, L.L. and S.F. Methodology, W.Z., B.S., X.Z. and L.L. Resources, B.S. Software, W.Z., X.Z. and S.F. Validation, W.Z. Visualization, S.F. Writing—original draft, W.Z. Writing—review and editing, W.Z., B.S., X.Z. and L.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Scientific Research Startup Fund for Shenzhen High-Caliber Personnel of SZPU, the General Higher Education Project of Guangdong Provincial Education Department, the Guangdong Provincial General University Innovation Team Project and the college start-up fund of ShenZhen Polytechnic University, grant numbers 6023330002K, 2023KCXTD077, 2020KCXTD047, 6022312031K.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Chen, J. Research on Aluminum Alloy Materials and Application Technology for Automotive Lightweighting. J. Mater. Chem. 2023, 4, 1–7. [Google Scholar] [CrossRef]
Yu, Y.; Ti, J.; Lu, Z. Law and Fracture Characteristics of Stress Corrosion Cracking for 7B04 Aluminum Alloy. Mater. Sci. Forum 2021, 6181, 207–212. [Google Scholar] [CrossRef]
Pratim, D.M.; Larry, M.; Shreyansh, D. Online Photometric Calibration of Automatic Gain Thermal Infrared Cameras. IEEE Robot. Autom. Lett. 2021, 6, 2453–2460. [Google Scholar] [CrossRef]
Dionysios, L.; Vaia, K.; Niki, M.; Anastasios, K.; Athanasios, B.; George, F.; Ioannis, V.; Christos, M. On the Response of a Micro Non-Destructive Testing X-ray Detector. Materials 2021, 14, 888. [Google Scholar] [CrossRef] [PubMed]
Li, X.; Chao, D.; Yan, Z.; Yin, P. Wafer Crack Detection Based on Yolov4 Target Detection Method. J. Phys. Conf. Ser. 2021, 1802, 022101. [Google Scholar] [CrossRef]
Chen, L.; Yan, H.; Xiang, Q.; Zhu, S.; Zhu, P.; Liao, C.; Tian, H.; Xiu, L.; Wang, X.; Li, X. A Domain Adaptation YOLOv5 Model for Industrial Defect Inspection. Measurement 2023, 213, 112725. [Google Scholar] [CrossRef]
Liu, S.; Chen, J.; Liang, L.; Bai, H.; Dang, W. Light-Weight Semantic Segmentation Network for UAV Remote Sensing Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 8287–8296. [Google Scholar] [CrossRef]
Wang, Y.; Qi, Q.; Jun, W.; Wang, M.; Ye, Y. The Potential of Image Segmentation Applied to Sampling Design for Improving Farm-level Multi-soil Property Mapping Accuracy. Precis. Agric. 2023, 24, 2350–2373. [Google Scholar] [CrossRef]
Zheng, Q.; Xu, L.; Wang, F.; Xu, Y.; Chao, L.; Zhang, G. HilbertSCNet: Self-attention Networks for Small Target Segmentation of Aerial Drone Images. Appl. Soft Comput. 2024, 150, 111035. [Google Scholar] [CrossRef]
Bijal, C.; Nikolas, O.; Jonne, T.; Nicklas, N.; Jon, E.; Ismo, A. Automated Mapping of Bedrock-fracture Traces from UAV-acquired Images Using U-Net Convolutional Neural Networks. Comput. Geosci. 2024, 182, 105463. [Google Scholar] [CrossRef]
Li, J.; Xiang, L.; Li, M.; Yan, P. A Dual-path Instance Segmentation Network Based on Nuclei Contour in Histology Image. Discov. Artif. Intell. 2023, 3, 35. [Google Scholar] [CrossRef]
Chen, C.; Guo, Y.; Tian, F.; Liu, S.; Yang, W.; Wang, Z.; Jing, W.; Hang, S.; Hanspeter, P.; Liu, S. A Unified Interactive Model Evaluation for Classification, Object Detection, and Instance Segmentation in Computer Vision. IEEE Trans. Vis. Comput. Graph. 2023, 30, 76–86. [Google Scholar] [CrossRef] [PubMed]
Park, J.J.; Doiphode, N.; Zhang, X.; Pan, L.; Blue, R.; Shi, J.; Buch, V.P. Developing the Surgeon-machine Interface: Using a Novel Instance-segmentation Framework for Intraoperative Landmark Labelling. Front. Surg. 2023, 10, 1259756. [Google Scholar] [CrossRef] [PubMed]
García-Aguilar, I.; García-González, J.; Luque-Baena, R.M.; López-Rubio, E.; Domínguez, E. Optimized Instance Segmentation by Super-resolution and Maximal Clique Generation. Integr. Comput.-Aid. Eng. 2023, 30, 243–256. [Google Scholar] [CrossRef]
Kim, K.; Kim, K.; Jeong, S. Application of YOLO v5 and v8 for Recognition of Safety Risk Factors at Construction Sites. Sustainability 2023, 15, 15179. [Google Scholar] [CrossRef]
Li, G.; Zhao, S.; Zhou, M.; Li, M.; Shao, R.; Zhang, Z.; Han, D. YOLO-RFF: An Industrial Defect Detection Method Based on Expanded Field of Feeling and Feature Fusion. Electronics 2022, 11, 4211. [Google Scholar] [CrossRef]
Pedro, A.; Vítor, S. Comparative Analysis of Multiple YOLO-based Target Detectors and Trackers for ADAS in Edge Devices. Robot. Auton. Syst. 2024, 171, 104558. [Google Scholar] [CrossRef]
Sayyad, J.; Ramesh, B.T.; Attarde, K.; Bongale, A. Hexacopter-Based Modern Remote Sensing Using the YOLO Algorithm. Adv. Scitechnol.-Res. 2023, 6680, 75–84. [Google Scholar] [CrossRef]
Chen, J.; Bao, E.; Pan, J. Classification and Positioning of Circuit Board Components Based on Improved YOLOv5. Procedia Comput. Sci. 2022, 208, 613–626. [Google Scholar] [CrossRef]
Lv, L.; Li, X.; Mao, F.; Zhou, L.; Xuan, J.; Zhao, Y.; Yu, J.; Song, M.; Huang, L.; Du, H. A Deep Learning Network for Individual Tree Segmentation in UAV Images with a Coupled CSPNet and Attention Mechanism. Remote Sens. 2023, 15, 4420. [Google Scholar] [CrossRef]
Rui, S.; Osvaldo, G.F.; Pedro, M.-P. Boosting the performance of SOTA convolution-based networks with dimensionality reduction: An application on hyperspectral images of wine grape berries. Intell. Syst. Appl. 2023, 19, 200252. [Google Scholar] [CrossRef]
Hussain, M. YOLO-v1 to YOLO-v8, the Rise of YOLO and Its Complementary Nature toward Digital Manufacturing and Industrial Defect Detection. Machines 2023, 11, 677. [Google Scholar] [CrossRef]
Wu, Y.; Han, Q.; Jin, Q.; Li, J.; Zhang, Y. LCA-YOLOv8-Seg: An Improved Lightweight YOLOv8-Seg for Real-Time Pixel-Level Crack Detection of Dams and Bridges. Appl. Sci. 2023, 13, 10583. [Google Scholar] [CrossRef]
Wai, L.K.; Lai-Man, P.; Ur, R.Y.A. Large Separable Kernel Attention: Rethinking the Large Kernel Attention design in CNN. Expert Syst. Appl. 2024, 236, 121352. [Google Scholar] [CrossRef]
Zhang, H.; Tang, C.; Sun, X.; Fu, L. A Refined Apple Binocular Positioning Method with Segmentation-Based Deep Learning for Robotic Picking. Agronomy 2023, 13, 1469. [Google Scholar] [CrossRef]
Song, X.; Cao, S.; Zhang, J.; Hou, Z. Steel Surface Defect Detection Algorithm Based on YOLOv8. Electronics 2024, 13, 988. [Google Scholar] [CrossRef]

Figure 1. Aluminum profile illustration.

Figure 2. The structure of the proposed method.

Figure 3. Dataset production process.

Figure 4. The structure diagram of YOLOv8n-seg.

Figure 5. ALdamage-seg’s overall architecture.

Figure 6. Multi-layer feature extraction module (MFEM).

Figure 7. Depth-wise separable convolution (DPConv).

Figure 8. (a): large kernel attention (LKA) mechanism and (b): large separable kernel attention (LSKA) mechanism.

Figure 9. C2f_LSKA module.

Figure 10. Illustrates the curves for metrics

m A P_{M a s k}^{50}

and

m A P_{B o x}^{50}

throughout the training process. Part (a) displays the value of

m A P

for bounding boxes at a 50% intersection over union (IoU) condition. Part (b) shows the average value of

m A P

for bounding boxes between 50–90% IoU conditions. Part (c) delineates the value of

m A P

for segmentation regions at a 50% IoU condition, while part (d) presents the average value of

m A P

for segmentation regions across 50–90% IoU conditions.

Figure 10. Illustrates the curves for metrics

m A P_{M a s k}^{50}

and

m A P_{B o x}^{50}

throughout the training process. Part (a) displays the value of

m A P

for bounding boxes at a 50% intersection over union (IoU) condition. Part (b) shows the average value of

m A P

for bounding boxes between 50–90% IoU conditions. Part (c) delineates the value of

m A P

for segmentation regions at a 50% IoU condition, while part (d) presents the average value of

m A P

for segmentation regions across 50–90% IoU conditions.

Figure 11. Average precision data statistical chart.

Figure 12. Test results: Part (a) displays non-conductive, orange-peel, spot, scratch, and dent. Part (b) displays leakage at the bottom, coating cracking, raised powder, and concavity. These respectively represent different types of damage to aluminum profiles.

Figure 13. Simplified diagram of the detection device structure.

Figure 14. Camera perspective analysis.

Table 1. Comparative Experimental Results.

Model	Weight	$m A P_{M a s k}^{50}$	$m A P_{B o x}^{50}$	GFLOPs	Parameters
Mask R-CNN [25]	170M	0.661	0.73	136	43,970,546
YOLOv5n-seg	5.5M	0.586	0.677	10.8	2,672,590
YOLOv7-seg	79.6M	0.655	0.727	151	38,760,121
YOLOv8m-seg	57.4M	0.649	0.72	112	28,462,981
YOLOv8s-seg	24.8M	0.631	0.714	43.7	12,746,134
YOLOv8n-seg	6.6M	0.609	0.7	12	3,260,014
YOLOv8n-seg *	6.3M	0.585	0.681	11	3,480,724
YOLOv8n-seg *	5.6M	0.587	0.68	10.4	2,721,230
YOLOv8n-seg *	5.2M	0.554	0.642	9.1	2,494,420
ALdamage-seg	2.9M	0.604	0.688	6.4	1,373,412

YOLOv8n-seg * ranks as follows: MobileNetV3-YOLOv8n-seg, Ghost-YOLOv8n-seg, Shuf-flenetv2-YOLOv8n-seg.

Table 2. Ablation experimental results.

Methodology	Weight	$m A P_{M a s k}^{50}$	$m A P_{B o x}^{50}$	GFLOPs
+MobileNetV3+DPConv +MFEM *	4.8M	0.582	0.67	7.5
+MobileNetV3+DPConv+C2f_LSKA *	2.5M	0.56	0.621	5.4
+MobileNetV3+MFEM+C2f_LSKA *	4.9M	0.598	0.68	9.4
+DPConv+MFEM+C2f_LSKA *	3.9M	0.572	0.633	7.2
+MobileNetV3+MFEM *	7.2M	0.591	0.672	10.5
+ MobileNetV3+C2f_LSKA *	4.3M	0.558	0.654	8.4
+MobileNetV3+DPConv *	4.5M	0.544	0.647	7
+DPConv+MFEM *	5.9M	0.581	0.66	8.2
+MFEM+C2f_LSKA *	5.2M	0.583	0.663	9.7
+DPConv+C2f_LSKA *	3.6M	0.541	0.631	6.6
+MFEM *	7.8M	0.611	0.695	12.4
+C2f_LSKA *	4.7M	0.56	0.64	8
+DPConv *	5.4M	0.551	0.632	7.5
+MobileNetV3	6.3M	0.585	0.681	11
ALdamage-seg	2.9M	0.604	0.688	6.4

* The symbol “+” in the table indicates modifications made only to certain modules of YOLOv8n-seg.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhu, W.; Su, B.; Zhang, X.; Li, L.; Fang, S. ALdamage-seg: A Lightweight Model for Instance Segmentation of Aluminum Profiles. Buildings 2024, 14, 2036. https://doi.org/10.3390/buildings14072036

AMA Style

Zhu W, Su B, Zhang X, Li L, Fang S. ALdamage-seg: A Lightweight Model for Instance Segmentation of Aluminum Profiles. Buildings. 2024; 14(7):2036. https://doi.org/10.3390/buildings14072036

Chicago/Turabian Style

Zhu, Wenxuan, Bochao Su, Xinhe Zhang, Ly Li, and Siwen Fang. 2024. "ALdamage-seg: A Lightweight Model for Instance Segmentation of Aluminum Profiles" Buildings 14, no. 7: 2036. https://doi.org/10.3390/buildings14072036

APA Style

Zhu, W., Su, B., Zhang, X., Li, L., & Fang, S. (2024). ALdamage-seg: A Lightweight Model for Instance Segmentation of Aluminum Profiles. Buildings, 14(7), 2036. https://doi.org/10.3390/buildings14072036

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

ALdamage-seg: A Lightweight Model for Instance Segmentation of Aluminum Profiles

Abstract

1. Introduction

2. Methods

2.1. Methodological Flow and Data Collection

2.2. YOLO Introduction

2.3. Construction of ALdamage-seg

2.3.1. General Architecture

2.3.2. Backbone

2.3.3. MFEM Module

2.3.4. DPConv

2.3.5. C2f_LSKA: LSKA Attention Mechanism Fusion Part

3. Training and Testing Results

3.1. Data Set and Experiment Setup

3.2. Evaluations Metrics

3.3. Experimental Results

3.4. Ablation Experiment

3.5. Results of Detection

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI