FE-YOLO: A Lightweight Model for Construction Waste Detection Based on Improved YOLOv8 Model

Yang, Yizhong; Li, Yexue; Tao, Maohu

doi:10.3390/buildings14092672

Open AccessArticle

FE-YOLO: A Lightweight Model for Construction Waste Detection Based on Improved YOLOv8 Model

by

Yizhong Yang

¹,

Yexue Li

^2,* and

Maohu Tao

^3,*

¹

School of Automotive and Traffic Engineering, Hubei University of Arts and Science, Xiangyang 441053, China

²

School of Civil Engineering and Architecture, Hubei University of Arts and Science, Xiangyang 441053, China

³

School of Mathematics and Statistics, Hubei University of Arts and Science, Xiangyang 441053, China

^*

Authors to whom correspondence should be addressed.

Buildings 2024, 14(9), 2672; https://doi.org/10.3390/buildings14092672

Submission received: 15 July 2024 / Revised: 19 August 2024 / Accepted: 21 August 2024 / Published: 27 August 2024

(This article belongs to the Section Construction Management, and Computers & Digitization)

Download

Browse Figures

Versions Notes

Abstract

:

Construction waste detection under complex scenarios poses significant challenges due to low detection accuracy, high computational complexity, and large parameter volume in existing models. These challenges are critical as accurate and efficient detection is essential for effective waste management in the construction industry, which is increasingly focused on sustainability and resource optimization. This paper aims to address the low accuracy of detection, high computational complexity, and large parameter volume in the models of construction waste detection under complex scenarios. For this purpose, an improved YOLOv8-based algorithm called FE-YOLO is proposed in this paper. This algorithm replaces the C2f module in the backbone with the Faster_C2f module and integrates the ECA attention mechanism into the bottleneck layer. Also, a custom multi-class construction waste dataset is created for evaluation. FE-YOLO achieves an mAP@50 of 92.7% on this dataset, up by 3% compared to YOLOv8n. Meanwhile, the parameter count and floating-point operations are scaled down by 12% and 13%, respectively. Finally, a test is conducted on a publicly available construction waste dataset. The test results demonstrate the excellent performance of this algorithm in generalization and robustness.

Keywords:

construction waste; object detection; YOLOv8n; attention mechanism; lightweight

1. Introduction

Each year, there are approximately 10 billion tons of construction waste generated around the world, the majority of which end up in landfills. Only a small proportion of such waste is directed to engineering backfill and recycling (2024) [1]. It is estimated that the construction and demolition wastes (CDW) generated annually across the European Union (EU) amount to around 450 million tons (MT), with only 28% of them being recycled and the remaining 72% of them going to landfills (2024) [2]. The landfilling of construction wastes occupies large amounts of land, causing serious environmental pollution and threatening the entire ecosystem (2024) [3].

Due to the increasing amount of CDW in recent years, there has been a growing recognition of the need to improve construction waste management practices. The traditional methods, heavily reliant on manual sorting, are increasingly viewed as outdated and insufficient to handle the escalating volume of waste. Manual sorting is not only labor-intensive but also operates in hazardous environments filled with dust, debris, and harmful substances, which jeopardize the health and safety of workers. As a result, there has been a shift towards the exploration and development of automated sorting systems that leverage advanced technologies such as robotics, machine vision, and, more recently, deep learning.

Among these, deep learning-based object detection has emerged as a particularly promising approach. These methods have demonstrated significant potential in accurately identifying and categorizing various types of construction waste from large datasets of images. The application of convolutional neural networks (CNNs) and other deep learning architectures has enabled the development of models that can process complex visual data with high accuracy, facilitating more efficient and precise sorting operations. Several pilot projects and research initiatives have explored the integration of these technologies into waste processing lines, aiming to replace or augment manual sorting processes.

Despite these advances, the field faces several significant challenges that hinder the widespread adoption of intelligent sorting systems. One of the primary challenges is the need for large, annotated datasets. Deep learning models require vast amounts of labeled data to train effectively, and creating such datasets for construction waste is both time-consuming and costly. Additionally, the diverse and often heterogeneous nature of construction waste—ranging from concrete and bricks to plastics and metals—presents a challenge for models to generalize effectively across different types of materials.

Another challenge lies in the computational complexity of deep learning models. The training and deployment of these models require substantial computational resources, which can be a barrier for many waste management facilities, particularly those operating on limited budgets. Furthermore, the models must be robust enough to operate in the harsh and dynamic environments of waste processing lines, where factors such as dust, lighting variations, and occlusions can adversely affect the performance of vision-based systems.

Adapting deep learning models to real-time processing requirements is also a challenge. The models need to be optimized for speed without sacrificing accuracy, which is critical for their practical application in sorting lines where throughput is a key concern. Additionally, integrating these models with existing infrastructure and ensuring they can work seamlessly with mechanical sorting systems poses another layer of complexity.

Finally, there is the challenge of industry acceptance. Many waste management facilities are hesitant to adopt new technologies due to concerns over reliability, cost, and the potential need for significant changes to their current operations. Overcoming these barriers requires not only technological advancements, but also efforts to demonstrate the long-term benefits and return on investment of intelligent sorting systems.

To overcome these challenges, it is imperative to refine the construction waste disposal process by leveraging emerging technologies. The development of advanced image processing and deep learning techniques offers a pathway to optimizing waste sorting systems. This optimization is crucial not only for improving the accuracy and efficiency of waste identification, but also for mitigating the health risks associated with manual sorting.

The contributions of this paper are outlined as follows.

(1): A custom dataset is created to represent the key types of construction waste generated in Xiangyang City.
(2): The FE-YOLO model, derived from YOLOv8, is proposed to improve the accuracy of detection and downscale the model.
(3): The Faster_C2f module, developed by modifying the 2-Stage FPN (C2f) module based on the FasterNet block, reduces the complexity of the model and accelerates the process of detection.
(4): The Efficient Channel Attention (ECA) mechanism is incorporated into the proposed model, which enhances the sensitivity to features while maintaining the minimal number of additional parameters for a lightweight model.

The remainder of this paper is structured as follows. In Section 2, the research background and concepts relevant to CDW detection will be introduced. Section 3 will provide a detailed introduction to the newly proposed method FE-YOLO, including FE-YOLO’s specific improvements compared with YOLOv8, the proposed Faster_C2f structure, and how ECA attention mechanisms were added to the model’s neck to enhance performance. In Section 4, the performance of this model will be evaluated through a series of tests.

2. Literature Review

Section 2 will begin with an introduction and summary of previous automated methods for construction waste. The second section will explain the principles and historical development of the YOLO algorithm. The third and fourth sections will then introduce the principles and advantages of FasterNet and the Efficient Channel Attention (ECA) mechanism.

2.1. Related Research

To detect and classify construction wastes, there have been various methods proposed in recent years on the basis of machine learning and deep learning. These methods, derived from optimizing traditional models, have significantly improved the accuracy and efficiency of classification and detection.

Research on models primarily focuses on two aspects: improving existing models and proposing new frameworks, as detailed below:

(1): Improving Existing Models:

Several studies have enhanced detection accuracy and efficiency by modifying existing models. These include improvements to YOLOv7 [4], CenterNet [5], VGG16 [6], SSD [7], ResNet18 [8], EfficientNet-B5 [9], Faster R-CNN [10], VGGNet [11], Mask R-CNN [12], YOLOv5s [13], Ethseg [14], and other convolution-based networks [15,16,17]. The researchers modified the ShuffleNet v2 architecture to better handle the complexity and variability of waste images [18]. By incorporating more refined feature extraction methods and optimizing the network structure, they addressed issues related to low classification accuracy and high computational costs typically associated with waste sorting tasks.

(2): Proposing New Frameworks:

Other research has focused on developing new frameworks tailored for specific waste detection tasks. These include an intelligent sorting system [19], the CoTR network based on self-attention [20], the W2R two-stage detection algorithm [21], a garbage classification system using CNNs and R-CNNs [22], a multi-modal deep learning network for RGB-D waste detection [23], and a deep residual network with knowledge transfer for construction and demolition waste classification [24].

In addition to these studies in the field of waste management, significant progress has been made in the automation of sorting systems and the application of engineering technology, as detailed below:

(1): Annotation Techniques and Benchmark Datasets:

Hu Ke et al. [25] introduced three distinct annotation methods—regular rectangle, rotated rectangle, and polygon—to label construction wastes in high-resolution remote sensing images, aiding in the study of annotation impacts on detection outcomes. Complementing this, Demetriou D et al. [26] created a benchmark dataset for training object detection models, enabling the automatic classification of CDW, which is vital for advancing waste management practices. Wu X et al. [27] developed a deep learning-based instance segmentation method using 3D laser triangulation data, improving real-time monitoring and classification of waste in construction and demolition recycling.

(2): Intelligent Sorting Systems and Edge Computing Applications:

Xu Wenjia et al. [19] designed an intelligent sorting system based on machine vision and deep learning, employing background modeling to enhance detection accuracy and positioning speed, thereby improving overall sorting efficiency. Similarly, Kang Zhuang et al. [28] integrated Inception v3 with transfer learning for automatic garbage classification and recycling on the Raspberry Pi 3B+, showcasing the application of edge computing devices in waste management.

(3): Advanced Detection and Screening Techniques:

Lu et al. [29] developed a construction waste segmentation model using DeepLabv3+, which improved the precision of waste management operations. Radica F et al. [30] automated the screening of construction waste aggregates through near-infrared spectroscopy, highlighting the use of advanced detection technologies in the waste management process. Zeli Wang [31] further advanced detection techniques by employing drones to create a high-precision identification and localization method for construction wastes, significantly enhancing drone-based detection outcomes.

(4): Real-World Challenges and Environmental Management:

Mei et al. [32] tackled the challenges of glass detection in real-world scenarios, providing insights into the complexities of automatic detection in challenging environments. Sirimewan et al. [33] contributed to environmental management by proposing deep learning-based models for detecting construction, renovation, and demolition wastes in the wild, thereby promoting automated detection processes for better environmental management. Yong Q et al. [34] introduced a computer vision approach for detecting illegal waste landfills, addressing unregulated disposal and aiding in environmental management.

Currently, plenty of efforts have been made to optimize the utilization of computational resources and accelerate processing through innovative architectural modification and algorithmic enhancement. However, some problems remain despite the significant progress made in waste detection through these studies, where the performance in waste detection and classification is enhanced to varying degrees. Firstly, there is almost no publicly available large-scale dataset involved. Individual datasets cannot cover all the categories of construction wastes. Instead, they focus on only three representative ones. Secondly, light weight is required for the models used for real-time detection of construction wastes, but there is little research on this. The relevant research focuses mostly on improving the accuracy of detection and classification. Lastly, there is a lack of attention paid to the detection of construction wastes in complex scenarios, where the wastes are often piled. Therefore, the existing models are prone to failed or false detection. To address this issue, a model named FE-YOLO is proposed in this paper.

2.2. Overview of YOLOv8

YOLOv8, released in 2023, represents a significant advancement from its predecessors by further optimizing the original model in terms of depth and width. By incorporating more advanced feature fusion techniques and more efficient inference models, it enhances the efficiency and accuracy of detection significantly. Meanwhile, YOLOv8 maintains the light weight and high efficiency of its predecessors, making it well-suited to practical applications.

This model relies on CSPNet for feature extraction. CSPNet reduces computational load through cross-stage partial connections, thereby improving the performance in feature representation and efficiency. Additionally, YOLOv8 incorporates a spatial pyramid pooling module into its backbone network. This module is effective in capturing multi-scale spatial information, enabling the model to perform better in terms of detecting objects of different sizes. In the detection head, YOLOv8 integrates PANet, which enhances the fusion of features extracted from different layers, further improving the accuracy of detection. This model features a refined loss function that takes into account localization error, classification error, and confidence error, which significantly improves the accuracy and stability of detection while maintaining its high time efficiency. Moreover, YOLOv8 incorporates flexible and diverse data augmentation strategies, such as Mosaic augmentation, random cropping, and color jittering. These strategies enhance not only the generalization performance of the model, but also the diversity of the training data. With the MS COCO dataset test-dev 2017 as a benchmark, YOLOv8x achieves an impressively high AP of 53.9% when the image has a size of 640 pixels, outperforming YOLOv5. On an NVIDIA A100 equipped with TensorRT, YOLOv8 achieves a fairly high speed of 280 FPS, underscoring its high performance in time efficiency [35]. The experimental results obtained on multiple public datasets show the potential of YOLOv8 in detecting small objects, multi-scale objects, and objects in complex environments. The structure of YOLOv8 is illustrated in Figure 1.

2.3. Fasternet

In FasterNet, proposed by Jie Ren Chen et al. [36] (2023), partial convolution (PConv) is performed to reduce floating-point operations (FLOPs) by efficiently extracting spatial features and minimizing redundant computations and memory accesses. PConv leverages the high similarity of feature maps across different channels by performing convolution operations on a subset of certain channels, with the others kept unchanged. Thus, the computational complexity is reduced. To maintain the continuity of memory, either the starting or ending sequence of consecutive channels is taken to represent the entire feature map. Subsequently, pointwise convolution (PWConv) is incorporated into PConv for the effective use of information from all channels, which generates an effective receptive field on the input feature map similar to T-shaped convolution. This decomposition into PConv and PWConv relies on the redundancy between convolution operations, thus further reducing FLOPs. It is crucial to determine the relationship between memory access and computational complexity. This is because both memory access and computational load can be significantly reduced by minimizing the number of channels involved in convolution.

The memory access for PConv is expressed as follows:

h \times w \times 2 c_{p} + k^{2} \times c_{p}^{2} \approx h \times w \times 2 c_{p}

(1)

where

h

and

w

represent the height and width of the input feature map, respectively;

c_{p}

denotes the number of channels in the subset used for PConv;

h \times w \times 2 c_{p}

describes the memory access required for both the input and output feature maps; and

k^{2} \times c_{p}^{2}

describes the memory access required for the convolutional kernel. Through approximation, it is indicated that the memory required by the feature maps dominates the memory access of the kernel when

h

and

w

are large enough.

When the input and output channels are assumed to be the same, the FLOPs for T-convolution (TConv) can be calculated as follows:

h \times w \times (k^{2} \times c_{p}^{2} \times c + c \times (c - 2 c_{p}))

(2)

where

c

represents the total number of channels;

k^{2} \times c_{p}^{2} \times c

represents the FLOPs required for performing convolution operations on the input feature maps with

k \times k

kernels;

c_{p}

represents the input channels;

c

represents the output channels; and

c \times (c - c_{p})

represents the additional FLOPs due to the output channel operations, considering the difference in the number of channels.

The FLOPs split into PConv and PWConv are expressed as:

h \times w \times (k^{2} \times c_{p}^{2} + c^{2} - c_{p}^{2})

(3)

where

k^{2} \times c_{p}^{2}

represents the FLOPs for the partial convolution operations on a subset of the input channels and

c^{2} - c_{p}^{2}

represents the FLOPs for the pointwise convolution operations in the remaining channels. The process is illustrated in Figure 2.

FasterNet is implemented in four stages, each of which involves a group of FasterNetBlocks, with a preceding embedding or merging layer used for spatial downsampling and channel expansion. The overall architecture of FasterNet is shown in Figure 3. The FasterNetBlock consists of a series of convolutional layers, including one PConv layer and two PWConv layers. In the PConv layer,

3 \times 3

convolution is performed to retain the partial identity of the input; in the PWConv layers (

1 \times 1

convolutions), the feature maps are further processed to enhance their representativeness. Through the inclusion of batch normalization and ReLU activation functions after each convolutional layer, convergence and non-linearity are improved.

To begin with, the input tensor undergoes an embedding process before moving to the first stage. This first stage features a Fasteret Block, where the input tensor

x

is downsampled to

x_{1}

with the dimensions of

c_{1} \times \frac{h}{4} \times \frac{w}{4}

. Then, the output from Stage 1 is merged and transferred to the second stage, with the FasterNetBlock included. At this stage, the tensor is further downscaled to

x_{2}

with the dimensions of

c_{2} \times \frac{h}{8} \times \frac{w}{8}

. In the third stage, another Fasteret Block is used to processes the tensor, reducing it to

x_{3}

with the dimensions of

c_{3} \times \frac{h}{16} \times \frac{w}{16}

. Similarly to the first stage, the PConv and PWConv layers are used in combination at the third stage to efficiently extract and downsample the features. The identity of certain channels is retained in the PConv layer in this block, while the feature maps are refined in the PWConv layers. Through hierarchical feature extraction, it is ensured that the network progressively captures higher-level representations with a reduction in spatial dimensions. At the fourth stage, a final Fasteret Block is involved to obtain a tensor

x_{4}

with the dimensions of

c_{4} \times \frac{h}{32} \times \frac{w}{32}

. The structure of this block is identical to that at the previous stages, with the PConv and PWConv layers used to process and downsample the input tensor. In the PConv layer,

3 \times 3

convolution is performed with partial identity maintained, and the subsequent PWConv layers (

1 \times 1

convolutions) are used to further enhance the performance in feature representation. Beyond Stage 4, the tensor is subjected to global pooling, and the spatial dimensions are reduced to a fixed size regardless of the input dimensions. Afterwards, a

1 \times 1

convolution layer is used to further process the pooled features. Finally, the final output is produced in a fully connected (FC) layer. Through global pooling, the model can process the inputs of different sizes, while through

1 \times 1

convolution, classification or regression is performed in the FC layer to obtain the final results.

2.4. ECA Attention Mechanism

The ECA mechanism was first proposed by Qihao Wang et al. (2020) [37]. When a feature map is inputted into the neck, the ECA attention mechanism first subjects the original feature map to global average pooling, thus transforming it from the dimensions of W, H, C (width, height, channel) into a tensor sized 1

\times

1

\times

C. The resulting tensor then undergoes one-dimensional convolution before it is weighted through a sigmoid function. Finally, this is multiplied by the input feature map through a residual structure. The flowchart of the ECA attention mechanism is illustrated in Figure 4.

The ECA module calculates the kernel size of the one-dimensional convolution flexibly based on the number of channels. The equation used to calculate the kernel size is expressed as follows:

[K = {|\frac{\log_{2}^{(c)}}{ϒ} + \frac{b}{ϒ}|}_{o d d}]

(4)

ECA shares the information about weight among all channels through the same set of learning parameters, as achieved through 1D convolution with a kernel size of k to enable the interaction between individual channels.

ω_{i} = σ (C 1 D_{K} (y))

(5)

To effectively capture the local information about cross-channel interaction, a frequency band matrix

ω_{k}

is used to describe the relationship between channels.

ω_{k}

involves

k \times c

parameters and

ω_{i}

avoids a complete independence between different groups. For

y_{i}

, the corresponding weights are

ω_{1}, ω_{2}

…

ω_{i}

. What is considered is the exchange of information between

y_{i}

and its

k

neighbors. It is calculated as follows:

ω_{i} = σ (\sum_{j = 1}^{k} ω_{i}^{j} y_{i}^{j}), y_{i}^{j} \in Ω_{i}^{k}

(6)

When

k = 3

, the ECA module can achieve the same effect as the SE (Squeeze-and-Excitation) mechanism, despite the lower complexity of the model. Therefore, both performance and efficiency are ensured for the model by capturing the cross-channel exchange of information.

3. Methodology

In the first section of Section 3, the proposed FE-YOLO model architecture will be introduced. The second section will present the proposed Faster_C2f module, explaining how this model was developed by replacing the bottleneck in the C2f neck with the existing FasterNet block to create a new structure. The third section will discuss how FE-YOLO enhances model performance by incorporating the ECA attention mechanism, which was introduced in Section 2.

3.1. Overview of the FE-YOLO Model

Figure 5 illustrates the structure of the FE-YOLO model. Based on the original architecture of YOLOv8, the C2f modules in the backbone are replaced with Faster_C2f and the ECA attention mechanism is incorporated into the neck section.

Compared to the original model, this improved model is higher in precision but lower in the number of parameters. In the backbone of the original model, there are many C2f modules involved in feature extraction. Each C2f module contains a residual module composed of repeated 2D convolution layers, significantly increasing the computational load and parameter count. In comparison, Faster_C2f’s Pconv applies convolution only to part of the feature map, with the rest unchanged. When the bottleneck in C2f is replaced with Pconv-based FasterNet, there is a significant reduction in the complexity of the module. In the original model, C2f modules are positioned between feature map upsampling and feature map concatenation during feature fusion in the neck. By introducing the ECA attention mechanism into C2f modules in the neck, the model performs better in terms of capturing details from the feature map. ECA is a lightweight attention mechanism that hardly increases the computation load, even when it is added behind all C2f modules in the neck. According to the extensive tests conducted on both the dataset created for this study and public datasets, FE-YOLO effectively incorporates the attention mechanism and meets the requirements.

3.2. Faster-C2f Module

With the incorporation of FasterNet blocks, C2f bottlenecks can be replaced to achieve a comparable or even better performance in feature representation with fewer parameters involved and a reduction in computational load. Moreover, when the C2f structures in the neck of YOLOv8 are replaced with FasterNet blocks, the overall computational efficiency is further enhanced without any compromise on performance. Therefore, this strategic modification enables the model to perform better in the pace of processing and resource utilization, thus improving the efficiency and performance of YOLOv8. As shown in Figure 6, the bottleneck in C2f is replaced with FasterNet blocks, namely, Faster_C2f.

There are a large number of C2f modules used in both the backbone and the neck of YOLOv8 for feature extraction and fusion. The primary computational load on the C2f module is concentrated in its bottleneck, which consists of two convolutional layers, each of which is followed by batch normalization and SiLU activation. These bottlenecks are arranged in series, with the output of each upper-level bottleneck concatenated with the input at the next level to promote feature fusion. The time efficiency of computation is significantly affected by the repeated residual modules composed of 2D convolutions in the C2f bottleneck. Additionally, the process of feature concatenation within C2f is also subject to these bottlenecks, which further increases the computational load.

To accelerate computation and reduce the parameter count, it is necessary to redesign the C2f bottleneck. C2f, or Cross-Stage Partial Networks with Full Connection, is designed to enhance the efficiency of feature extraction in neural networks through partial residual connection, dense aggregation, and depthwise separable convolution, which contribute collectively to improving the learning ability of the model, reducing the vanishing gradient, and alleviating the computational load. However, despite these benefits, the bottlenecks in C2f still lead to high computational complexity and parameter count. In the Faster-C2f module, the C2f bottlenecks are replaced with FasterNet blocks, which can be incorporated into the C2f structure through partial convolution (PConv) and pointwise convolution (PWConv). PConv performs convolution on only a subset of the input feature map channels, with the remaining channels unchanged, which downscales redundant computation. This selective operation significantly reduces both the computational load and the number of parameters. With PWConv integrated into PConv, the FasterNet block ensures the utilization of information from all channels, maintaining the high performance of this model in feature extraction.

3.3. Incorporation of Attention Mechanism into the Bottleneck Layer

To improve the performance of this model, experiments were conducted by incorporating the attention mechanism into the C2f modules within the neck network. The neck network acts as an intermediary between the backbone and the detection head, playing a crucial role in the integration of features extracted from various layers. To achieve this purpose, several attention mechanism models were evaluated and subsequently incorporated into the neck. The key to introducing attention mechanism specifically after each C2f module in the neck lies in the critical role played by these modules in feature fusion. For an improvement in accuracy, it is essential to enhance the performance by capturing the fine details in the feature maps before they are passed to the detection head. Thus, incorporating an attention mechanism into each C2f module within the neck is required for the model to perform better in feature fusion.

The ECA mechanism, adopted for this purpose, possesses some advantages. ECA improves upon the traditional Squeeze-and-Excitation (SE) block by replacing the fully connected layer with a one-dimensional convolution of variable kernel size. This enables ECA to maintain channel-wise interaction without a significant increase in computational load or parameter count. By focusing on the most informative channels, ECA makes the model perform better in identifying important features while maintaining high computational efficiency.

The attention mechanism was incorporated into the neck rather than other parts of the network, which was due to the pivotal role played by the neck in merging the features extracted at different stages of the backbone. By refining the feature maps in the neck, the subsequent detection head can receive more detailed and representative inputs, thus improving the accuracy of detection. The effectiveness of this approach was evaluated through extensive experiments conducted on the construction waste dataset, with various attention mechanisms applied at the same position within the neck. The results show that the ECA attention mechanism is fit for the purpose, significantly enhancing the accuracy of the model without a sharp rise in parameter count or computational overhead. By implementing ECA behind each C2f module in the neck, there is an improvement in the outcome of feature fusion and the overall performance of the model. The attention mechanism is incorporated into the C2f modules in the neck rather than other parts of the network, which is attributed to the functionality of the neck. The neck network is responsible for aggregating and refining feature representations before they are passed to the detection head, which affects the quality of the final output directly. Enhancing feature fusion at this stage is crucial for ensuring that the detection head receives a well-processed input of high quality, which is essential for accurate detection.

The ECA mechanism was chosen for its effective channel-wise feature recalibration with minimal computational overhead. The one-dimensional convolutional layer of ECA, with a variable kernel size, allows for a dynamic adjustment to its receptive field to capture the most salient features without incurring any computational cost associated with the fully connected layers used in traditional SE blocks. This characteristic is particularly beneficial for the neck network, where it is critical to maintain computational efficiency while improving the performance in feature representation.

In practice, the implementation of ECA in the neck involves introducing the mechanism after each C2f module. This enables ECA to recalibrate the feature maps emerging from each C2f module, focusing on enhancing the representation of important features while suppressing the less relevant ones. Through strategic positioning, it is ensured that the features fused in the neck are optimally weighted and refined before being passed to the detection head.

This approach was validated through a series of experiments conducted on the construction waste dataset. The results show that the incorporation of ECA into the neck significantly improves the accuracy of detection. As the ECA mechanism proves effective in enhancing feature representation without any substantial increase in computational load or parameter count, it is considered to be suited to refining the process of feature fusion in the neck network. Thus, the overall model performance is improved.

4. Experiments and Analysis of Results

4.1. Overview of Dataset

To train and evaluate the proposed construction waste detection model, five representative types of construction wastes were collected from the Recycling Center of Xiangyang City. They included small stones, concrete blocks, rebar, foam, and bricks, as seen in a total of 550 images. The initial resolution of the captured images is 1920 × 1080 pixels in RGB. The sample datasets are shown in Figure 7.

A dataset involves diverse and complex scenes with wastes scattered and overlapping with each other. In the dataset, 10% of the images were taken as the test set, and the remaining 90% were divided into training and validation sets at a ratio of 9:1.

4.2. Experimental Setup

The experiments were set to a batch size of 16, the number of workers was set to 8, and the number of epochs was set to 100. The size of the input images was set to 640 × 640, and all other settings were kept at the default. The experimental configurations are detailed in Table 1.

4.3. Evaluation Metrics

In this experiment, metrics like precision (P), recall (R), mean average precision (mAP), and computational complexity were used to evaluate the performance in construction waste detection.

Precision (P): This indicates how many of the predicted positive samples are correctly predicted, as shown in Equation (7).

P = \frac{T P}{T P + F P} \times 100 %

(7)

Recall (R): This indicates how many of the positive samples in the dataset are correctly predicted, as shown in Equation (8).

R = \frac{T P}{T P + F N} \times 100 %

(8)

In these equations, FP represents the number of false positives, TP represents the number of true positives, and FN represents the number of false negatives for each type of construction waste.

mAP stands for the mean average precision across all categories. In this study, mAP is calculated at an IoU threshold of 0.5, which is also known as [email protected]. A higher mAP indicates a better overall performance of the model in terms of detection. The equation used for calculation is expressed as follows:

m A P = \sum_{i = 1}^{C} \frac{A P_{i}}{C}

(9)

There are two criteria used to evaluate model lightweighting: the number of parameters (Params) and the amount of computation (FLOPs).

Params: Number of parameters in the model.

p a r a m s = C_{o} \times (C_{i} \times K_{w} \times K h + 1)

(10)

FLOPs stands for floating-point operations per second.

F L O P s = [C_{i} \times K_{w} \times K_{h} + (C_{i} \times K_{w} \times K_{h} - 1) + 1] \times W \times H \times C

(11)

4.4. Results Analysis

4.4.1. Comparison of Different Lightweight Backbone Networks

Considering the reduction in complexity of the model and further lightweighting, the backbone of the YOLOv8 original model was replaced with various lightweight models in this experiment. Furthermore, four lightweight backbone networks (ShuffleNet, MobileNetv3, GhostNet, and FasterNet) were tested on the construction waste dataset. The test results are shown in Table 2.

Among the lightweight models under test, FasterNet achieved the highest average precision, accuracy, and recall with the lowest parameter count and computational complexity. Therefore, it replaced the C2f modules in YOLOv8 to improve the model.

4.4.2. Performance Comparison of Adding Faster_C2f Module in Different Positions

To reveal the impact of position, three different methods of replacement were tested by replacing the C2f modules in the backbone, the neck, and all parts of the YOLOv8n network. The test results are shown in Table 3.

Replacing all C2f modules led to a significant decline in accuracy, making it less applicable; replacing the C2f modules in the backbone reduced precision; and replacing the C2f modules in the backbone with Faster_C2f produced the best results, with a reduction in both parameter count and computational complexity on the basis of maintaining precision.

4.4.3. Comparison of Different Attention Mechanisms

To further improve precision, various attention mechanisms like Shuffle Attention (SA), ECA, Global Attention Mechanism (GAM), and ResBlock + CBAM (ResCBAM) [38] (2024) were incorporated into the C2f modules in the neck. The results are shown in Table 4.

Both GAM and ResCBAM caused significant increases in the parameter count and computational complexity, leading to slight improvements of the model in terms of precision. The ECA attention mechanism performed best in terms of striking balance, slightly improving the mAP and other metrics without any significant increase in computational overhead. Thus, ECA was chosen for the neck of FE-YOLO.

4.4.4. Ablation Studies

Ablation experiments were conducted to assess the FE-YOLO model for its coupling capability and robustness. Introducing only the ECA attention mechanism failed to simplify the model, while replacing C2f with Faster_C2f alone reduced the parameter count and computational complexity, with mAP50% reduced by 1.8%. The FE-YOLO model improved mAP50% by 3%, with the parameter count and computational complexity reduced by 12% and 13%, respectively. The results of the ablation experiment are shown in Table 5.

4.4.5. Comparison with Mainstream Detection Models

In order to evaluate the performance of the improved model in comparison to other mainstream detection models, experiments were conducted by comparing the improved FE-YOLO model with various mainstream detection models such as Faster-Rcnn, SSD, and RT-DETR-resnet50. The experimental results are shown in Table 6.

This table shows a comprehensive comparison between various network models across five key performance parameters: mAP50, precision (P), recall (R), number of parameters (Params), and GFLOPs. FE-YOLO achieved the highest mAP50 at 92.70%, surpassing YOLOv8n at 89.80% and YOLOv7 at 89.40%. In terms of precision, YOLOv8n led at 90.00%, followed by FE-YOLO at 88.40%. Recall was the highest for FE-YOLO at 98.00%, which was marginally better than 97.00% for YOLOv8n. In terms of efficiency, FE-YOLO had the lowest parameter count (2.64 M) and GFLOPs (7.0), significantly outperforming such models as RT-DETR-resnet50 (42.90 M, 130.0 GFLOPs) and Faster R-CNN (137.07 M, 370.0 GFLOPs). Overall, FE-YOLO was the most efficient model, performing best in striking balance between high accuracy, precision, and computational resource consumption.

4.5. Comparative Analysis of Training Process Models

In order to compare YOLOv8n with FE-YOLO during the training process, the training curves of the two models were drawn on the precision, recall, Map@50, and Los training data. Figure 8 summarizes the performance achieved by the YOLOv8n model and the FE-YOLO model during the training process.

According to this figure, FE-YOLO operated effectively in the context of scattered and piled construction wastes. When the construction wastes were piled, the model could detect them with an accuracy of about 0.93. This indicates a significant improvement of the model relative to the existing detection models. This illustrates the robustness and reliability of the proposed detection model, which are crucial when high accuracy and comprehensiveness of detection are required. Recall is a measure of comprehensiveness for the relevant instances in the data on model detection.

FE-YOLO outperformed YOLOv8 in the detection of scattered and piled construction wastes, despite some inconsistencies in different scenarios. Notably, FE-YOLO significantly outperformed the baseline model, indicating its advantage in capturing the complete set of construction waste instances. mAP_0.5, as a metric, can be used to evaluate the concordance between precision and recall at an IoU threshold of 0.5. The mAP_0.5 of FE-YOLO was significantly higher than that of YOLOv8 for detection of both scattered and piled construction wastes, indicating its enhanced detection performance at this discrete threshold. FE-YOLO demonstrated significant advantages in detecting scattered and piled construction wastes under complex scenarios. Despite the inherent variability characteristic of the training process, FE-YOLO demonstrated its potential in improving the performance progressively in these complex environments. Given the outstanding performance of the model in accuracy and the significant improvement in mAP relative to YOLOv8, FE-YOLO is considered superior in terms of accurately identifying and classifying construction wastes in complex environments, especially in the scenario where high accuracy and comprehensiveness of detection are required. In summary, FE-YOLO showed its greater competitiveness.

4.6. Demonstration of Outcome of Detection and Generalization Performance

In order to illustrate the outcome of detection, the YOLOv8n and FE-YOLO models were used to detect the construction wastes in the four photos from the test sets, respectively. Figure 9 shows the test results.

Obviously, the original model was prone to leakage and repeated detection when the objects are densely piled, while the improved FE-YOLO avoided these problems.

FE-YOLO was tested on a public Construction and Demolition Waste dataset created by Robert P M et al. (2023) [39] to evaluate its performance. The use of this dataset was intended to promote the development and benchmarking of detection models intended to locate and classify three common types of CDW: concrete, brick, and tile. This dataset consisted of samples collected from the manually sorted piles of CDW obtained from a recycling facility in Cyprus. The images had a resolution of 1920 × 1200 pixels file, and the total number of these images was 550. There were two test sets in the dataset. Test set 1 consisted of the samples scattered on the conveyor belt, while test set 2 consisted of the densely piled samples, which enhanced the representativeness of the model evaluation.

As the models could easily detect the samples that overlap slightly, the FE-YOLO model was tested on test set 2 to assess its actual performance. The results are shown in Table 7 and Figure 10.

4.7. Discussion

These experiments were intended to enhance the performance of the YOLOv8 model in terms of its efficiency and accuracy in construction waste detection. By evaluating various lightweight backbone networks, including ShuffleNet, MobileNetv3, GhostNet, and FasterNet, FasterNet was determined as the most effective due to its superior performance in mAP50, precision, and recall, alongside the minimal parameter count and computational complexity. In this study, the effect of integrating the Faster_C2f module was further examined at different positions within the YOLOv8 network: the backbone, the neck, and all other parts. It was discovered that placing Faster_C2f in the backbone is optimal for striking a balance between precision, parameter count, and computational load. Additionally, various attention mechanisms such as SA, ECA, GAM, and ResCBAM were incorporated into the neck for enhanced precision. Ablation studies confirmed the importance of each component, and the analysis conducted in comparison with other models verified the advantages of YOLOv8. Figure 8, Figure 9 and Figure 10 show the training process and actual detection performance, highlighting the applicability and efficiency of this model.

However, there remain some limitations facing this study. A major one is the limited size and variety of the dataset, which may affect the generalizability of the model across diverse types of construction wastes and different environments. Additionally, the comparison with other mainstream models is insufficient, which hampers the understanding of how this model performs relative to other advanced detection systems in a wider context. Another limitation is the model’s performance in detecting construction waste highly similar in color to the background. In this case, the detection bounding boxes may not completely match the actual size and shape of the waste. Although the model can mitigate the issue of repeatedly detecting elongated objects like rebar when they are piled with other wastes, it is challenging to detect tiny objects like pebbles and concrete fragments. In these cases, the confidence level of the detection labels declines. Moreover, there is still room for improvement in cases of severe clutter and overlapping, particularly when the background is similar to the waste in color.

5. Conclusions and Prospect

To tackle the challenges in construction waste detection, the FE-YOLO model is proposed in this study. Under the YOLOv8 framework, the FE-YOLO model incorporates the Faster_C2f module and the ECA mechanism. The Faster_C2f module enhances the outcome of feature extraction while reducing model complexity, and the ECA mechanism reinforces the focus on significant image features. The model improves the mean average precision by 3% at IoU 0.5 (mAP@50), which reaches 92.7% on a custom dataset representing the key types of construction wastes in Xiangyang City. Also, parameter count and floating-point operations are reduced by 12% and 13%, respectively. Furthermore, the FE-YOLO model achieves an mAP of 66.60% on a publicly available dataset with severe occlusions, which illustrates its effectiveness and high generalization performance. These results highlight the advantage of this model in terms of the complexity and the time efficiency of detection, confirming its applicability and robustness in various scenarios.

To resolve the identified limitations, our future study will be conducted from the following perspectives. Firstly, the dataset will be expanded and diversified by increasing the sample size and the categories of construction wastes so as to enhance the model’s generalization performance. Secondly, the model will be improved in terms of accurately identifying construction wastes similar in color to the background. Thirdly, the accuracy of small object classification will be improved in the scenario of severe cluttering. Lastly, the scope of comparative analysis will be expanded by using a wider range of mainstream models and optimizing the model for practical deployment.

Our plan is to include pruning of the model for export in ONNX format to facilitate its deployment on embedded platforms. Therefore, cameras can be used to collect data and control robotic arms on assembly lines, which would enable the efficient recycling of construction wastes that have been sorted and are moderate in size. Thus, the model can be improved in terms of generalization performance, accuracy, and applicability for construction waste detection, which contributes to automated waste management.

Author Contributions

Conceptualization, Y.Y., Y.L. and M.T.; methodology, Y.Y. and M.T.; software, Y.Y.; validation, Y.L. and M.T.; formal analysis, Y.Y. and Y.L.; investigation, Y.Y., Y.L., and M.T.; resources, Y.Y. and Y.L.; data curation, Y.Y. and M.T.; writing original draft preparation, Y.Y., Y.L., and M.T.; writing review and editing, Y.Y., Y.L., and M.T.; visualization, Y.Y.; supervision, Y.Y., Y.L., and M.T.; project administration, Y.L. and M.T.; funding acquisition, Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Key project of joint fund of the Hubei provincial natural science foundation for Innovation and development (2022CFD009), the Excellent Young and Middle-Aged Innovation Research Groups of Hubei Provincial Department of Education (T2021019), and Blockchain operational architecture and benefit distribution mechanism design for scientific data sharing consortium (B2021211). Key project of Hubei Provincial Department of Education (D20182601).

Data Availability Statement

The datasets are available upon reasonable request from the corresponding author.

Acknowledgments

Thanks to XiangYang Construction Waste Resource Disposal Center, who provided the CDW materials for us to forge the dataset.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Pereira, V.M.; Baldusco, R.; Silva, P.B.; Quarcioni, V.A.; Motta, R.S.; Suzuki, S.; Angulo, S.C. Thermoactivated cement from construction and demolition waste for pavement base stabilization: A case study in Brazil. Waste Manag. Res. 2024. [Google Scholar] [CrossRef] [PubMed]
Khan, Z.A.; Balunaini, U.; Costa, S. Environmental feasibility and implications in using recycled construction and demolition waste aggregates in road construction based on leaching and life cycle assessment–A state-of-the-art review. Clean. Mater. 2024, 12, 100239. [Google Scholar] [CrossRef]
Pereira, V.M.; Baldusco, R.; Nobre, T.; Quarcioni, V.A.; Coelho AC, V.; Angulo, S.C. High activity pozzolan obtained from selection of excavation soils in a Construction and Demolition Waste landfill. J. Build. Eng. 2024, 84, 108494. [Google Scholar] [CrossRef]
Liu, J.; Li, L.; Zhang, H.; Rong, H.; Wang, S.; Zhao, J. Construction waste classification detection based on improved YOLOv7 model. J. Environ. Eng. 2024, 18, 270–279. [Google Scholar]
Yue, X.; Li, J.; Hou, Y.; Lin, C. CenterNet-based waste classification detection method. Ind. Control Comput. 2020, 33, 78–79+82. [Google Scholar]
Xia, J.; Xu, Z.; Tan, L. Application of lightweight network LW-GCNet in waste classification. Environ. Eng. 2023, 41, 173–180. [Google Scholar]
Zhao, S.; Liu, Z.; Zheng, A.; Gao, Y. An improved real-time SSD garbage classification and detection method based on MobileNetV2 and IFPN. Comput. Appl. 2022, 42, 106–111. [Google Scholar]
Zhang, Q.; Zhang, X.; Mu, X.; Wang, Z.; Tian, R.; Wang, X.; Liu, X. Recyclable waste image recognition based on deep Learning. Resour. Conserv. Recycl. 2021, 171, 105636. [Google Scholar] [CrossRef]
Ming, G.; Yuhan, C.; Zehui, Z.; Yu, F.; Weiguo, F. A garbage image classification model based on novel spatial attention mechanism and migration learning. Syst. Eng. Theory Pract. 2021, 41, 498–512. [Google Scholar]
Ma, W.; Yu, J.; Wang, X.; Chen, J. A spam detection and classification method based on improved Faster R-CNN. Comput. Eng. 2021, 47, 294–300. [Google Scholar]
Lin, K.; Zhou, T.; Gao, X.; Li, Z.; Duan, H.; Wu, H.; Lu, G.; Zhao, Y. Deep convolutional neural networks for construction and demolition waste classification: VGGNet structures, cyclical learning rate, and knowledge transfer. J. Environ. Manag. 2022, 318, 115501. [Google Scholar] [CrossRef]
Zhang, R.; Ning, Q.; Lei, Y.; Chen, B. Domestic Waste Detection Based on Improved Mask R-CNN. Comput. Eng. Sci. 2022, 44, 2003–2009. [Google Scholar]
Xing, J.; Xie, D.; Yang, R.; Zhang, X.; Sun, W.; Wu, S. A lightweight detection method for farmland waste based on YOLOv5s. J. Agric. Eng. 2022, 38, 153–161. [Google Scholar]
Qiu, L.; Xiong, Z.; Wang, X.; Liu, K.; Li, Y.; Chen, G.; Han, X.; Cui, S. Ethseg: An amodel instance segmentation network and a real-world dataset for x-ray waste inspection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 2283–2292. [Google Scholar]
Chen, Z.; Yang, J.; Feng, Z.; Zhu, H. RailFOD23: A dataset for foreign object detection on railroad transmission lines. Sci. Data 2024, 11, 72. [Google Scholar] [CrossRef] [PubMed]
Wu, Y.; Zhao, S.; Xing, Z.; Wei, Z.; Li, Y.; Li, Y. Detection of foreign objects intrusion into transmission lines using diverse generation model. IEEE Trans. Power Deliv. 2023, 38, 3551–3560. [Google Scholar] [CrossRef]
Nežerka, V.; Zbíral, T.; Trejbal, J. Machine-learning-assisted classification of construction and demolition waste fragments using computer vision: Convolution versus extraction of selected features. Expert Syst. Appl. 2024, 238, 121568. [Google Scholar] [CrossRef]
Chen, Z.; Yang, J.; Chen, L.; Jiao, H. Garbage classification system based on improved ShuffleNet v2. Resour. Conserv. Recycl. 2022, 178, 106090. [Google Scholar] [CrossRef]
Xu, W.; Jiang, Q.; Liu, G. Research on intelligent recognition of construction waste based on machine vision and deep learning. Electron. Devices 2022, 45, 1489–1496. [Google Scholar]
Yang, S. Research on Construction Waste Detection Methods under Small Scale Dataset; Southwest University of Science and Technology: Mianyang, China, 2023. [Google Scholar] [CrossRef]
Zhang, S.; Chen, Y.; Yang, Z.; Gong, H. Computer vision based two-stage waste recognition-retrieval model for waste classification. Resour. Conserv. Recycl. 2021, 169, 105543. [Google Scholar] [CrossRef]
Nowakowski, P.; Pamua, T. Application of deep learning object classifier to improve e-waste collection planning. Waste Manag. 2020, 109, 1–9. [Google Scholar] [CrossRef]
Li, Y.; Zhang, X. Multi-modal deep learning networks for RGB-D pavement waste detection and recognition. Waste Manag. 2024, 177, 125–134. [Google Scholar] [CrossRef]
Lin, K.; Zhao, Y.; Zhou, T.; Gao, X.; Zhang, C.; Huang, B.; Shi, Q. Applying machine learning to fine classify construction and demolition waste based on deep residual network and knowledge transfer. Environ. Dev. Sustain. 2023, 25, 8819–8836. [Google Scholar] [CrossRef]
Hu, K.; Shen, J.; Ling, Z.; Wang, J. Comparative study of multiple labeling forms for automatic identification of urban construction waste in high-resolution remote sensing images. Autom. Appl. 2024, 65, 47–51+54. [Google Scholar] [CrossRef]
Demetriou, D.; Mavromatidis, P.; Petrou, M.F.; Nicolaides, D. CODD: A benchmark dataset for the automated sorting of construction and demolition waste. Waste Manag. 2024, 178, 35–45. [Google Scholar] [CrossRef]
Wu, X.; Kroell, N.; Greiff, K. Deep learning-based instance segmentation on 3D laser triangulation data for inline monitoring of particle size distributions in construction and demolition waste recycling. Resour. Conserv. Recycl. 2024, 205, 107541. [Google Scholar] [CrossRef]
Kang, Z.; Yang, J.; Guo, H. Design of automatic garbage classification system based on machine vision. J. Zhejiang Univ. 2020, 54, 1272–1280+1307. [Google Scholar]
Lu, W.; Chen, J.; Xue, F. Using computer vision to recognize composition of construction waste mixtures: A semantic segmentation approach. Resour. Conserv. Recycl. 2022, 178, 106022. [Google Scholar] [CrossRef]
Radica, F.; Iezzi, G.; Trotta, O.; Bonifazi, G.; Serranti, S.; de Brito, J. Characterization of CDW types by NIR spectroscopy: Towards an automatic selection of recycled aggregates. J. Build. Eng. 2024, 88, 109005. [Google Scholar] [CrossRef]
Kronenwett, F.; Maier, G.; Leiss, N.; Gruna, R.; Thome, V.; Längle, T. Sensor-based characterization of construction and demolition waste at high occupancy densities using synthetic training data and deep learning. Waste Manag. Res. 2024. [Google Scholar] [CrossRef]
Mei, H.; Yang, X.; Wang, Y.; Liu, Y.; He, S.; Zhang, Q.; Wei, X.; Lau, R.W. Don’t hit me! glass detection in real-world scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 3687–3696. [Google Scholar]
Sirimewan, D.; Bazli, M.; Raman, S.; Mohandes, S.R.; Kineber, A.F.; Arashpour, M. Deep learning-based models for environmental management: Recognizing construction, renovation, and demolition waste in-the-wild. J. Environ. Manag. 2024, 351, 119908. [Google Scholar] [CrossRef]
Yong, Q.; Wu, H.; Wang, J.; Chen, R.; Yu, B.; Zuo, J.; Du, L. Automatic identification of illegal construction and demolition waste landfills: A computer vision approach. Waste Manag. 2023, 172, 267–277. [Google Scholar] [CrossRef] [PubMed]
Hussain, M. YOLOv1 to v8: Unveiling Each Variant—A Comprehensive Review of YOLO. IEEE Access 2024, 12, 42816–42833. [Google Scholar] [CrossRef]
Chen, J.; Kao, S.H.; He, H.; Zhuo, W.; Wen, S.; Lee, C.H.; Chan, S.H. Run, Don’t Walk: Chasing Higher FLOPS for Faster Neural Networks. arXiv 2023, arXiv:2303.03667. [Google Scholar]
Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
Chien, C.-T.; Ju, R.-Y.; Chou, K.-Y.; Lin, C.-S.; Chiang, J.-S. YOLOv8-AM: YOLOv8 with Attention Mechanisms for Pediatric Wrist Fracture Detection. arXiv 2024, arXiv:2402.09329. [Google Scholar]
Demetriou, D.; Mavromatidis, P.; Robert, P.M.; Papadopoulos, H.; Petrou, M.F.; Nicolaides, D. Real-time construction demolition waste detection using state-of-the-art deep learning methods; single–stage vs. two-stage detectors. Waste Manag. 2023, 167, 194–203. [Google Scholar] [CrossRef]

Figure 1. Structure of YOLOv8.

Figure 2. Procedure of TConv in comparison with traditional conv.

Figure 3. Structure of Fasternet.

Figure 4. Structure of ECA.

Figure 5. Structure of FE-YOLO.

Figure 6. Structure of Faster_C2f.

Figure 7. Sample dataset.

Figure 8. Training curve comparison between FE-YOLO and YOLOv8n.

Figure 9. Comparison of the actual performance. (a) Original image; (b) result of detection by YOLOv8n; (c) result of detection by FE-YOLO.

Figure 10. Comparison of the actual performance.

Table 1. Experimental environment for research on construction waste detection.

Item	Software Version
Operating System	Windows11
CPU	AMD r9 7945hx
GPU	NVIDIA GeForce RTX 4060 Laptop
Memory	8 GB
Acceleration Environment	CUDA v11.8
PyTorch Version	2.3.0
Python Version	3.9.19

Table 2. Comparison between different lightweight backbone networks.

Model	mAP50/%	Precision P/%	Recall/% R/%	Params/M	GFLOPs
Shufflenetv2	81.90	84.50	92.00	6.40	16.7
Mobile-netv3	81.50	86.70	94.00	2.30	5.8
GhostNet	83.20	85.40	95.00	2.30	6.3
Fasternet	84.70	92.80	95.00	1.70	5.1

Table 3. Effect of Faster_C2f added at different locations.

Model	mAP50/%	Precision P/%	Recall/% R/%	Params/M	GFLOPs
backbone	88.00	92.80	96.00	2.65	7.1
neck	87.50	81.20	96.00	2.66	7.5
all	84.80	88.10	97.00	2.30	6.4

Table 4. Comparison between different attention mechanisms.

Model.	mAP50/%	Precision P/%	Recall/% R/%	Params/M	GFLOPs
YOLOv8n	89.80	78.90	97.00	3.01	8.0
SA-YOLOv8	92.20	84.80	97.00	3.01	8.0
ECA-YOLOv8	92.30	92.70	98.00	3.01	8.0
GAM-YOLOv8	90.40	90.70	98.00	3.68	9.5
ResCBAM-YOLOv8	92.30	83.60	97.00	4.24	10.6

Table 5. Ablation experiments.

Model	ECA	Faster-c2f	mAP50/%	Precision P/%	Recall/% R/%	Params/M	GFLOPs
YOLOv8n	✕	✕	89.80	78.90	97.00	3.01	8.0
E-YOLO	✓	✕	92.20	84.80	97.00	3.01	8.0
F-YOLO	✕	✓	88.00	92.80	96.00	2.65	7.1
FE-YOLO	✓	✓	92.70	88.40	98.00	2.64	7.0

Table 6. Performance comparison between different network models.

Model	mAP50/%	Precision P/%	Recall/% R/%	Params/M	GFLOPs
Faster R-CNN	87.41	74.61	93.87	137.07	370.0
SSD	83.28	96.75	68.68	26.28	63.0
RT-DETR-resnet50	89.50	86.10	96.10	42.90	130.0
Swin-S	88.67	87.23	91.67	50.0	8.7
YOLOX-s	83.27	90.60	72.36	42.90	156.0
YOLOv5s	88.30	87.70	98.00	7.0	16.0
YOLOV7	89.40	89.10	73.50	36.50	103.2
YOLOv8n	89.80	78.90	97.00	3.01	8.0
FE-YOLO	92.70	88.40	98.00	2.64	7.0

Table 7. Test results on public dataset.

Class	mAP50/%	mAP50-95/%	Precision P/%	Recall/% R/%
All	66.60	31.40	65.40	64.20
Brick	69.90	31.30	68.50	69.80
Tile	75.30	39.40	71.30	73.80
Concrete	54.60	23.30	56.40	49.00

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, Y.; Li, Y.; Tao, M. FE-YOLO: A Lightweight Model for Construction Waste Detection Based on Improved YOLOv8 Model. Buildings 2024, 14, 2672. https://doi.org/10.3390/buildings14092672

AMA Style

Yang Y, Li Y, Tao M. FE-YOLO: A Lightweight Model for Construction Waste Detection Based on Improved YOLOv8 Model. Buildings. 2024; 14(9):2672. https://doi.org/10.3390/buildings14092672

Chicago/Turabian Style

Yang, Yizhong, Yexue Li, and Maohu Tao. 2024. "FE-YOLO: A Lightweight Model for Construction Waste Detection Based on Improved YOLOv8 Model" Buildings 14, no. 9: 2672. https://doi.org/10.3390/buildings14092672

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

FE-YOLO: A Lightweight Model for Construction Waste Detection Based on Improved YOLOv8 Model

Abstract

1. Introduction

2. Literature Review

2.1. Related Research

2.2. Overview of YOLOv8

2.3. Fasternet

2.4. ECA Attention Mechanism

3. Methodology

3.1. Overview of the FE-YOLO Model

3.2. Faster-C2f Module

3.3. Incorporation of Attention Mechanism into the Bottleneck Layer

4. Experiments and Analysis of Results

4.1. Overview of Dataset

4.2. Experimental Setup

4.3. Evaluation Metrics

4.4. Results Analysis

4.4.1. Comparison of Different Lightweight Backbone Networks

4.4.2. Performance Comparison of Adding Faster_C2f Module in Different Positions

4.4.3. Comparison of Different Attention Mechanisms

4.4.4. Ablation Studies

4.4.5. Comparison with Mainstream Detection Models

4.5. Comparative Analysis of Training Process Models

4.6. Demonstration of Outcome of Detection and Generalization Performance

4.7. Discussion

5. Conclusions and Prospect

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI