SiM-YOLO: A Wood Surface Defect Detection Method Based on the Improved YOLOv8

Xi, Honglei; Wang, Rijun; Liang, Fulong; Chen, Yesheng; Zhang, Guanghao; Wang, Bo

doi:10.3390/coatings14081001

Open AccessArticle

SiM-YOLO: A Wood Surface Defect Detection Method Based on the Improved YOLOv8

by

Honglei Xi

¹,

Rijun Wang

^2,3,*

,

Fulong Liang

^2,3,

Yesheng Chen

^2,3,

Guanghao Zhang

^2,3 and

Bo Wang

^3,4,*

¹

Department of Information Engineering, Shanxi Institute of Mechanical & Electrical Engineering, Changzhi 046011, China

²

School of Teachers College for Vocational and Technical Education, Guangxi Normal University, Guilin 541004, China

³

Key Laboratory of AI and Information Processing, Hechi University, Hechi 546300, China

⁴

School of Artificial Intelligence and Smart Manufacturing, Hechi University, Yizhou 546300, China

^*

Authors to whom correspondence should be addressed.

Coatings 2024, 14(8), 1001; https://doi.org/10.3390/coatings14081001

Submission received: 15 July 2024 / Revised: 2 August 2024 / Accepted: 5 August 2024 / Published: 7 August 2024

(This article belongs to the Section Surface Characterization, Deposition and Modification)

Download

Browse Figures

Versions Notes

Abstract

:

Wood surface defect detection is a challenging task due to the complexity and variability of defect types. To address these challenges, this paper introduces a novel deep learning approach named SiM-YOLO, which is built upon the YOLOv8 object detection framework. A fine-grained convolutional structure, SPD-Conv, is introduced with the aim of preserving detailed defect information during the feature extraction process, thus enabling the model to capture the subtle variations and complex details of wood surface defects. In the feature fusion stage, a SiAFF-PANet-based wood defect feature fusion module is designed to improve the model’s ability to focus on local contextual information and enhance defect localization. For classification and regression tasks, the multi-attention detection head (MADH) is employed to capture cross-channel information and the accurate spatial localization of defects. In addition, MPDIoU is employed to optimize the loss function of the model to reduce the leakage of detection due to defect overlap. The experimental results show that SiM-YOLO achieves superior performance compared to the state-of-the-art YOLO algorithm, with a 9.3% improvement in mAP over YOLOX and a 4.3% improvement in mAP over YOLOv8. The Grad-CAM visualization further illustrates that SiM-YOLO provides more accurate defect localization and effectively reduces misdetection and omission issues. This study highlights the effectiveness of SiM-YOLO for wood surface defect detection and offers valuable insights for future research and practical applications in quality control.

Keywords:

wood surface defect; YOLOv8; fine-grained convolution; simplified iterative Attentional Feature Fusion-Path Aggregation Network; minimum point distance intersection over union

1. Introduction

Wood, as an environmentally friendly and renewable raw material, possesses characteristics such as high strength, good elasticity, and renewability. According to statistics, the European Union produced approximately 450 million cubic meters of roundwood in 2020, and construction accounts for around 40% of the total timber use [1]. It is extensively utilized across various industries in both industrial production and daily life [2]. With the continuous development of the social economy, the demand for wood is steadily increasing, while the growth cycle of wood remains relatively prolonged. In addition to vigorously promoting afforestation, it is imperative to enhance the comprehensive utilization rate of wood and reduce the waste of wood resources [3]. This represents a prevailing trend in the development of the wood industry. Therefore, achieving sustainable utilization and development of wood resources has become a pressing issue. In the wood processing industry, defect detection on wood surfaces is a primary measure to improve the comprehensive utilization rate of wood and minimize waste [4].

Wood surface defects are prevalent in all types of wood and can be broadly classified into two main categories: natural defects and man-made defects. Natural defects arise during the growth process due to weather, environmental, and biological factors, leading to uneven growth conditions, foreign matter intrusion, or abnormal growth patterns [5]. These defects include knots, joints, tree inclusions, decay, and insect damage. Man-made defects occur during logging, processing, and transportation due to improper handling or external forces, resulting in surface cracks, impact damage, wear, and deformation. These defects can significantly reduce the quality and strength of the wood. Additionally, they affect not only the aesthetic quality but also the structural performance of the wood, thereby diminishing both its utility and economic value [6].

Detecting surface defects in wood has become an essential step in wood processing, providing the necessary conditions for quality classification and efficient utilization [7]. Rapid and effective detection of the morphology, size, and precise location of wood surface defects enables rational processing of the wood. By mitigating the adverse effects of defects, this approach ensures the quality of wood products, including consistency, structural integrity, mechanical properties, and aesthetic appearance. Consequently, it enhances the comprehensive utilization rate of wood, thereby increasing its utility and economic value.

To detect wood surface defects, it is essential to understand the common types of defects and their characteristics. Depending on the formation process of wood defects, they are usually classified into three categories: growth defects, biohazardous defects, and processing defects. Growth defects are natural defects that occur during the growth of the tree. Biohazardous defects are defects that occur when the tree is affected by biological factors (e.g., insects and fungi). Processing defects are defects that occur as a result of human factors, such as improper felling or drying treatments. The literature [8] establishes a large wood surface defect dataset for automated wood processing, which covers the most common surface defects. The wood surface defects listed in the study include the following: Live_Knot (commonly found in Pine, Oak, Cherry, Teak, and Birch trees), Marrow or pitch (commonly found in Pine and Oak trees), Resin (commonly found in Pine trees), Dead_Knot (commonly found in Pine, Oak, Cherry, Teak, Walnut, and Birch trees), Knot_with_crack (commonly found in Pine, Oak, Maple, Cherry, Walnut, and Birch trees), Knot_missing (commonly found in Pine trees), Crack (commonly found in Pine, Oak, Maple, Cherry, Walnut, Mahogany, and Birch trees), Quartzity (commonly found in Oak, Maple, Cherry, Teak, Walnut, and Mahogany trees), Overgrown (commonly found in Pine and Maple trees), and Blue_stain (commonly found in Pine, Oak, Maple, Cherry, and Birch trees). Detecting wood surface defects is a typical object detection problem. Many instances of wood surface defects occupy a very small portion of the surface, making it a classic small object detection problem.

Deep learning-based algorithms have proven effective for many surface defect detection tasks and have shown superiority over manually designed feature detection methods [9,10,11]. However, for the detection of such a wide variety of wood surface defects, utilizing deep learning methods still presents numerous challenges, primarily manifested in the following areas:

(1): High variability within the same defect type: Even within the same type of defect, there can be significant differences in shape and size. For example, cracks can vary in length, width, and depth, and scars can exhibit different shapes and sizes. Furthermore, the location and distribution of the same type of defect can also differ. For instance, insect damage might be concentrated in a specific area or dispersed across the entire wood surface, while cracks can extend in different directions. Additionally, the severity of the same type of defect can vary. Some cracks might be shallow or localized, whereas others could be deep or affect the overall structural integrity of the wood.
(2): Inter-class similarity: Different categories of wood surface defects may exhibit similar visual features, making them difficult to distinguish. For instance, the visual similarity between insect boreholes and wood grain patterns might hinder their differentiation. Moreover, different categories of defects may share similar patterns in terms of location and distribution on the wood surface, leading to some defects having similar spatial layouts, thus appearing alike. Variability in shape and size among different defect categories may also contribute to their similarity. For example, cracks and knots may share certain similarities in terms of size and shape.
(3): Inter-class overlap: Different categories of wood surface defects may exhibit overlap in terms of morphology, size, location, or appearance features in the feature space. This overlap makes it challenging for detection algorithms to capture the diversity and subtle differences of such complex defects.

The diversity within the same category of wood surface defect increases the complexity of defect detection, thereby posing challenges for deep learning models to accurately classify and detect such defects. Different types of wood defects may exhibit similar characteristics in appearance, shape, location, and distribution, necessitating deep learning models to possess the ability to discern subtle differences and features to ensure accurate classification of different defect categories. The inter-class overlap of defects makes it difficult for deep learning models to determine the correct category attribution, especially when the features of defects are spatially similar or overlapping.

In summary, this paper addresses the challenges of wood surface defect detection by proposing a targeted deep learning approach called SiM-YOLO based on the YOLOv8 object detection algorithm. Taking into account the complexity of wood surface defect detection, this method aims to effectively retain finer-grained defect information during feature extraction by employing a granular convolution module and designing a space-to-depth convolution (SPD-Conv) structure. This allows for comprehensive learning of wood surface defect features and enables the model to discern subtle differences and features. In the feature fusion stage, a SiAFF-PANet-based wood defect feature fusion module is designed to integrate features with semantic inconsistency at different scales of wood surface defects. This approach facilitates the model to focus more on local context information, enhancing its perception of the local details of wood surface defects. For classification and regression, a multi-attention detection head (MADH) is constructed to help the model capture cross-channel information while obtaining accurate spatial position information. Additionally, the optimization of the model’s loss function helps mitigate the issue of missed detections caused by overlapping wood surface defects. By making the improvements described above, the paper addresses the issues of high variability within the same defect type, inter-class similarity, and inter-class overlap which affect the accuracy of wood surface defect detection. The main contributions of this paper are as follows:

(1): A novel wood surface defect detection method is proposed to address the challenges posed by the complexity of wood surface defect detection.
(2): An SPD-Conv module is designed to reduce the loss of fine-grained defect information and improve the accuracy of feature extraction during the feature extraction process.
(3): A wood defect feature fusion module based on a SiAFF-PANet is devised to enhance the model’s capability to perceive local details of wood defects.
(4): The MADH is constructed and the loss function is optimized to acquire more precise spatial position information, thereby mitigating missed detections caused by overlapping wood surface defects.

The remaining part of the paper is organized as follows: Section 2 presents the related works on wood surface defect detection. Section 3 provides a detailed design process and structure of SiM-YOLO. The analysis of SiM-YOLO and its benchmarking with other algorithms are presented in Section 4. Section 5 concludes the paper and outlines future directions for this work.

2. Related Works

The techniques for detecting surface defects in wood continue to change and evolve with increasing technological advances. Initially, wood surface defects were primarily detected manually. This method is highly influenced by subjective factors, leading to frequent missed detections and low efficiency [12]. Clearly, this approach is incompatible with the demands of today’s large-scale wood processing industry and limits the quality of wood products. To address the issues of manual wood surface defect detection, besides the partial applications of non-destructive testing technologies such as laser scanning [13,14], ultrasonic testing [15,16], and CT scanning [17,18], research has predominantly focused on two main directions: methods combining machine vision with traditional image processing techniques, and methods based on deep learning algorithms for wood defect detection.

The former utilizes image processing techniques to analyze and process images of wood surfaces, encompassing three main steps: image preprocessing, feature extraction, and defect-recognition [19]. Feature extraction tools are manually used to describe defect information, and the feature vectors are then input into a classifier to determine the defect category. The processes of feature extraction, dimensionality reduction, and classification are complex and challenging, making it difficult to handle the complexity and diversity of wood surface defects. Additionally, ensuring the accuracy of defect detection is challenging.

Given the advancements in deep learning technology within the field of image detection, particularly in object recognition and detection, an increasing number of researchers have started to employ deep learning techniques for wood defect detection, gradually making it a mainstream method in this field [20,21,22,23]. These methods utilize convolutional neural networks (CNNs) to automatically extract defect features, effectively avoiding the complex manual feature extraction process. By constructing and training deep learning models, these techniques automatically learn the characteristics of wood defects, integrate richer defect features, and complete defect classification [24].

Currently, such methods applied to wood defect detection can be categorized into two main types: single-stage and two-stage, similar to most existing object detection algorithms. The former treats the object detection problem as a single regression problem, directly outputting the category and location information of the target. Single-stage detection methods typically employ anchor-based detection techniques. Research utilizing single-stage detection for wood defects has yielded promising results, especially the You Only Look Once (YOLO) series [25,26,27,28,29]. The latter divides the object detection problem into two steps: first generating candidate boxes, and then performing classification and regression on these boxes. Two-stage detection methods generally employ region-based detection techniques. Studies utilizing two-stage detection methods for wood defects have also achieved notable success, such as Faster R-CNN [30,31], and Mask R-CNN [32,33,34].

3. Methodology and Design

3.1. SiM-YOLO Overview

In response to the numerous challenges faced in wood surface defect detection, this paper proposes the SiM-YOLO model based on the YOLOv8 architecture, with targeted improvements in the feature extraction stage, feature fusion stage, and classification and regression stage, as shown in Figure 1.

To address the issue of feature extraction for wood surface defects, which often exhibit significant intra-class variability and inter-class similarity, a wood defect feature extraction module based on SPD-Conv is adopted. This module reduces the loss of fine-grained defect information and achieves the goal of learning detailed features, preventing the loss of features both within the same defect class and between different defect classes.

To tackle the problem of feature fusion caused by scale and semantic inconsistencies of wood surface defects, a wood defect feature fusion module based on SiAFF-PANet is employed. This module integrates features with inconsistent semantics across different scales of wood surface defects, enabling the model to focus more on local contextual information and enhancing the model’s ability to perceive local details of wood surface defects.

For the issue of bounding box distortion, missed detections, or inaccurate detections caused by overlapping wood defects, a detection head based on an MADH is used. This mechanism captures cross-channel information while obtaining precise spatial location information, thereby improving the model’s classification capability. The Minimum Point Distance Intersection over Union (MPDIoU) loss function is used to replace the Complete Intersection over Union (CIoU) loss function in YOLOv8, effectively addressing the problem of bounding box distortion due to overlapping wood defects and reducing the occurrence of missed detections.

3.2. Wood Defect Feature Extraction Based on SPD-Conv

In feature extraction for target defects, YOLOv8 employs a conventional CNN structure. However, this design has inherent flaws, specifically the use of convolutional strides or pooling layers, which can lead to the loss of finely grained information or inefficient feature learning [35]. Consequently, the YOLOv8 model suffers from finely grained information loss when handling images with significant intra-class variability and inter-class similarity, such as wood surface defects, thereby failing to fully capture the characteristics of these defects.

To address this issue, the SPD-Conv can effectively replace the stride convolution and pooling layers in existing CNN architectures. SPD-Conv comprises a space-to-septh (SPD) layer followed by a non-stride convolution (Conv) layer. The SPD layer reduces each spatial dimension of the input feature map to the channel dimension while preserving intra-channel information. This is achieved by mapping each pixel or feature of the input feature map to a channel, reducing the spatial dimensions while increasing the channel dimensions. The Conv layer, which follows the SPD layer, performs standard convolution operations. Unlike stride convolution, non-stride convolution does not move across the feature map but instead performs convolution operations on each pixel or feature. This approach mitigates the potential over-downsampling issue in the SPD layer and retains more finely grained information about the wood surface defects.

By combining the SPD layer and the Conv layer, SPD-Conv reduces the spatial dimensions without losing information, while preserving intra-channel details. This helps reduce the loss of finely grained information and improves the accuracy of feature extraction for wood surface defects. Therefore, SiM-YOLO introduces SPD-Conv to replace the stride-2 convolution downsampling modules in YOLOv8, achieving finely grained feature learning and preventing the loss of small features. The convolution downsampling module is shown in Figure 2, and the SPD-Conv (scale = 2) structure is shown in Figure 3. When the intermediate feature map X has dimensions S × S × C1, the downsampling parameter scale can be specified to slice out a series of sub-feature maps. When scale = 2, the feature map X is sliced as shown below:

f0,0 = X [0: S: scale, 0: S: scale], f0,1 = X [0: S: scale, 1: S: scale]

(1)

f1,0 = X [1: S: scale, 0: S: scale], f1,1 = X [1: S: scale, 1: S: scale]

(2)

3.3. Wood Defect Feature Fusion Based on SiAFF-PANet

The PANet structure is used in the neck part of YOLOv8, which is a bi-directional through-complex network, making it easier to pass the target information from the bottom to the top of the top layer, enhancing target information transmission. To a certain extent, it alleviates the loss of target feature information caused by the network being too deep. However, the traditional Concat method is used in YOLOv8 for simple target feature fusion splicing, which is unable to cope with the feature fusion problem caused by the inconsistency of the scale and semantics of wood surface defects, not to mention the applicability of fusion to the specific object of wood surface defects, which makes the model’s performance of detecting the defects of wood surface limited.

To address this problem, inspired by the literature [36], we utilized the attention mechanism to improve the traditional target feature fusion method. Instead of using simple concatenation, we introduced the multi-scale channel attention module (MS-CAM), which aggregates multi-scale target features within the attention module to mitigate issues caused by scale variations and small- to medium-sized objects. The MS-CAM enhances the local context by adding local context branches to the global context and employs point-wise convolution as the aggregator for these local branches. This approach dilutes the global average pooling operation, ensuring that the aggregation focuses more on local context information. Consequently, the final weight vector is reweighted to balance both local and global context information. Moreover, to reduce computational complexity, we eliminated the channel reduction strategy in the MS-CAM, creating a simplified version called SMS-CAM. Figure 4a illustrates the structure of the MS-CAM, where r denotes the channel reduction ratio. Figure 4b depicts the structure of the simplified SMS-CAM after removing the channel reduction.

Assuming that the output of the SMS-CAM module through the global branch is G(X_in) and the output through the local branch is L(X_in), the output X of the first spatial path is pooled through global averaging to obtain g(X), which is computed as follows:

g (X) = \frac{1}{H \times W} \sum_{i = 1}^{H} \sum_{j = 1}^{W} X_{[;, i, j]}

(3)

where H × W denotes the dimensional size of the feature map, and [:, i, j] denotes the location of all batches, which slices at height i and width j in the input tensor X.

The global context G(X_in) and local context L(X_in) outputs are computed as follows:

G (X_{in}) = δ (B (p w c o n v (g (X))))

(4)

L (X_{in}) = δ (B (p w c o n v (X)))

(5)

where δ denotes the linear rectification function ReLU, B denotes batch normalization (BN), and pwconv denotes point-wise convolution. Given the global context and local context, the following results are obtained after passing through the SMS-CAM module:

X_{out} = X \otimes S M S (X) = X \otimes σ (G (X_{in}) \oplus L (X_{in}))

(6)

where

σ

denotes the Sigmoid function, SMS is the output through the SMS-CAM module,

\otimes

denotes element-by-element multiplication, and

\oplus

denotes broadcasting addition.

Considering the significant semantic difference between the features of spatial and contextual paths, a two-stage SMS-CAM iterative Attentional Feature Fusion, called SiAFF, was utilized. Since a single-stage SMS-CAM cannot adequately mitigate the effects of the initial fusion, the SiAFF method effectively reduces the initial fusion’s influence on the final fusion weights. The input features X and Y are fused by SiAFF. The computation process is as follows:

X Φ Y = S M S (X \oplus Y) \otimes X \oplus (1 - S M S (X \oplus Y)) \otimes Y

(7)

F = S M S (X Φ Y) \otimes X \oplus (1 - S M S (X Φ Y)) \otimes Y

(8)

where XΦY denotes the output after one stage of SMS-CAM and F denotes the output after two stages of SMS-CAM. Figure 5 illustrates the structure of the SiAFF module.

3.4. Detection Head Based on Multi-Attention Mechanism

Traditional detection heads process multiple tasks in parallel, which can easily lead to confusion and mutual interference of features. The YOLOv8 adopts the design of a decoupled head, considering the differences in the content of concern for classification and localization. It decouples separate feature channels for bounding-box coordinate regression and object classification, respectively. The head part of YOLOv8 initially employs a 3 × 3 convolution to transform the three effective feature outputs from the neck part into 1C and 2C channels according to the localization and classification tasks, respectively. Where i denotes the i-th detected head, x denotes the corresponding channel number. Subsequently, feature extraction continues using the convolution module. Ultimately, the number of output channels for the localization task is 4 × reg_max, where each feature point outputs 4 × reg_max regression predictions. These correspond to the centroid of the frame (x, y) and the width and height information of the predicted frame (w, h), respectively. The number of output channels for the classification task is NC, representing the number of defect categories to be detected, with each feature point used for judging the category information of the defects.

The above process relies solely on stacked convolutional layers for feature extraction, neglecting the sensitivity of the localization task to spatial information and the sensitivity of the classification task to channel information. In our study, based on the YOLOv8 decoupling head, we constructed the MADH. The designed MADH structure is shown in Figure 6. The MADH has two segments, localization prediction and classification prediction, and is essentially a decoupled head as well.

In the localization prediction segment, we replaced some of the convolutional layers in the head part of YOLOv8 with the Coordinate Attention (CA) module [37]. In the classification prediction segment, we constructed the Channel Compression Attention (CCA) on top of the Squeeze and Excitation (SE) module [38] to adapt to the channel changes.

The CA module, as shown in Figure 7a, encodes the input features into one-dimensional features along the two spatial directions X and Y, capturing the long-range dependencies of the input feature maps. It generates direction-aware and spatially sensitive attention feature maps, which help the model capture cross-channel information while acquiring accurate spatial location information.

The CCA module, as shown in Figure 7b, takes the number of channels of the feature map X as C₂ and generates a C₂ × 1 × 1 vector after global average pooling. To match the output channel number C_N at the classification prediction end, a C_N × 1 × 1 weight vector corresponding to C_N defect types is generated after two fully connected layers and Sigmoid activation. Simultaneously, to obtain the feature map with the corresponding number of channels, X is convolved in two layers to reduce the number of channels to C_N, and then multiplied with the weight vector to obtain the attention-weighted feature map. Finally, a residual join is introduced to improve the training efficiency. The CCA module compresses the feature channels and focuses on the C_N channels closely related to the category features, thereby improving the model’s classification ability.

Through the role of the MADH, the sensitivity of the model to spatial information in locating wood surface defects and to channel information in classifying wood surface defects is enhanced, which can effectively improve the accuracy of the model’s localization and classification.

3.5. Optimization of Loss Function Based on MPDIOU

The bounding box regression loss function is crucial for target detection, as it enables the model to predict the bounding box position to closely match the actual bounding box. This provides essential information about the precise location and area of the detected target. The YOLOv8 adopts CIoU as the bounding box regression loss function, which calculates the loss based on the overlap area, centroid distance, and aspect ratio of the predicted and actual boxes. In the process of bounding box regression for small target detection of wood defects, when the centroid overlap occurs between the predicted and actual frames, the CIoU loss function can be optimized by the penalty term for shape, position, and size deviations. However, if the predicted bounding box and the actual frame have the same aspect ratio but different predicted aspect values, the CIoU loss function loses its effectiveness and cannot accurately reflect the differences between the predicted and actual frames. This limitation hinders its ability to adapt to the bounding box regression of overlapping and non-overlapping wood surface defects, leading to missed or inaccurate detections.

To address this issue, we adopted the MPDIoU loss function to replace the CIoU loss function in the original network. MPDIoU considers the minimum point distance and uses this metric to redefine the loss function, thereby reducing the total degrees of freedom of the loss function. The specific computational process is as follows:

d_{1}^{2} = (x_{1}^{B} + x_{1}^{A})^{2} + (y_{1}^{B} - y_{1}^{A})^{2}

(9)

d_{2}^{2} = (x_{2}^{B} + x_{2}^{A})^{2} + (y_{2}^{B} - y_{2}^{A})^{2}

(10)

MPDIOU = \frac{A \cap B}{A \cup B} - \frac{d_{1}^{2}}{W + H} - \frac{d_{2}^{2}}{W + H}

(11)

where A and B represent two arbitrary convex shapes and W and H denote the width and height of the input image, respectively.

(x_{1}^{A}, y_{1}^{A})

and

(x_{2}^{A}, y_{2}^{A})

represent the top-left and bottom-right coordinates of A, while

(x_{1}^{B}, y_{1}^{B})

and

(x_{2}^{B}, y_{2}^{B})

represent the top-left and bottom-right coordinates of B.

d_{1}^{2}

and

d_{2}^{2}

denote the squared Euclidean distances between the top-left corners and bottom-right corners of A and B, respectively.

The MPDIoU loss function addresses the limitations of the CIoU loss function and is more suitable for detecting wood surface defects. According to the MPDIoU loss function formula, in scenarios such as non-overlapping center points, the function promotes the prediction frame to be closer to the actual frame. In cases where the center points of the predicted and actual frames overlap, and they have the same aspect ratio but different aspect values, the penalty term of the MPDIoU function is not zero and will not degenerate into the IoU loss. Therefore, compared to the CIoU loss function, using the MPDIoU loss function not only simplifies the calculation process but also stabilizes the model convergence and improves the model’s accuracy in detecting small targets with wood surface defects. Additionally, it effectively resolves the distortion of detection frames caused by overlapping wood defects and reduces the occurrence of missed detections.

4. Experiment and Results

4.1. The Distribution of Wood Surface Defect Types in the Dataset

The dataset of wood surface defects provided in the literature [8] includes 20,275 images representing ten types of defects on wood surfaces. After extensive analysis and selection work, the dataset applied to our experiments was created, which consisted of 3600 images and retained the seven most common wood surface defects in the original dataset (shown in Figure 8).

The dataset was divided into a training set and a test set in a 9:1 ratio. Additionally, 10% of the training set was set aside for validation purposes. The distribution of wood defect types in our dataset is shown in Table 1.

4.2. Experimental Setup: Environment and Parameter Configurations

The configuration of the experimental environment is shown in Table 2.

The experimental hyperparameters and their configurations are listed in Table 3.

4.3. Evaluation Indicators

To validate the performance of SiM-YOLO in detecting surface defects in wood, the model was evaluated using the following indicators: average precision (AP) and mean average precision (mAP).

For AP, the average precision of a given type of wood defect is given as follows:

AP = \int_{0}^{1} P (R) d R

(12)

where P(R) is the precision (P)–recall (R) curve. The precision (P) represents the percentage of true defective samples out of the defective samples identified by the model. The recall (R) indicates the percentage of identified defective samples out of the total number of true samples. They can be calculated by the following equations, respectively:

Precision = \frac{T P}{T P + F P}

(13)

Recall = \frac{T P}{T P + F N}

(14)

where TP is a true-positive defect, FP is a false-positive defect, and FN is a false-negative defect.

For mAP, the mean value of the average accuracy of all wood defect types is calculated by the following equation:

mAP = \frac{\sum_{i = 1}^{c} A P_{i}}{c}

(15)

where i is a defect category and the c is the number of wood defect categories with a value of seven in this experiment.

4.4. Ablation Experiment

To validate the overall effectiveness of SiM-YOLO, ablation experiments were carried out based on YOLOv8. SPD-Conv was added to the feature extraction network of YOLOv8 to replace the original convolutional downsampling module. The SiAFF-PANet was designed and used in the feature fusion network, and the MADH was used in the head of YOLOv8 to replace the decoupled head. These modules were combined to create the SiM-YOLO model, which was then compared with YOLOv8. The results of the ablation experiments are shown in Table 4.

Relative to YOLOv8, the addition of each module improved the mAP values by 2.6%, 3.4%, and 2.2%, respectively. The combination of all three modules resulted in a 4.3% improvement in mAP value. This demonstrates the model’s enhanced ability to detect surface defects in wood. For each type of defect, the addition of each module also led to an increase in its AP value. For example, after adding the SPD-Conv module, the AP values of all defects, except for Knot_missing, improved. When all three modules were added together, the AP value of each defect either improved or remained equivalent.

4.5. Comparative Experiment

To evaluate the performance of the proposed method for wood surface defect detection, SiM-YOLO was compared with the current mainstream YOLO family of target detection algorithms, including YOLOv5, YOLOv7, YOLOX, YOLOv8, and YOLOv9. The experimental results are shown in Table 5. These results demonstrate that SiM-YOLO achieved the highest mAP value among all the YOLO algorithms tested. Notably, the mAP value was improved by 9.3% relative to YOLOX. Compared to the original YOLOv8 model, the AP values obtained by SiM-YOLO for detecting each type of defect were significantly improved, except for Marrow. Specifically, the AP value was improved by 16% for detecting Crack and by 13.9% for detecting Knot_with_crack.

The precision–recall (P-R) curves for YOLOv5, YOLOv7, and SiM-YOLO in detecting the seven defects in our dataset are shown in Figure 9. In a P-R curve, a larger area under the curve indicates better model performance. As shown in Figure 8, the area under the P-R curve for SiM-YOLO is consistently larger than that of the other models, and its mAP value also outperforms the others.

Figure 10 shows a portion of the results visualized from the comparative experiment. The visualization results demonstrate that SiM-YOLO not only detected and located all defects more accurately but also provided more precise prediction frames. Additionally, SiM-YOLO effectively avoided the issues of misdetection and omission encountered by the other two models in detecting wood surface defects. In the case of Dead_Knot detection, the confidence level of SiM-YOLO was the highest with 0.98, 0.99, and 0.99, respectively, and that of YOLOv8 was the smallest with 0.67, 0.88, and 0.87, respectively. Although the detection result of YOLOv7 was not much different from that of SiM-YOLO, the omission of the detection of Crack occured. In the case of Knot_missing detection, YOLOv8 and YOLOv7 incorrectly detected Knot_missing as Dead_Knot, and misdetection occurred. SiM-YOLO accurately detected Dead_Knot while also detecting Knot_missing, effectively avoiding the occurrence of misdetection.

The Gradient-weighted Class Activation Mapping (Grad-CAM) technique was used to generate Grad-CAM images for the experimental results. These images provide enhanced visualization of the model’s focus on the defect categories. A portion of the Grad-CAM images from the experimental results is shown in Figure 10. By examining the Grad-CAM images, we can observe the areas of interest and the model’s focus on the defect categories more clearly. When the model correctly recognizes a wood surface defect, the Grad-CAM image highlights the defect area with a distinct response. As shown in Figure 11, compared to the other two models, SiM-YOLO exhibits a darker and more concentrated color in the regions where wood defects are located, demonstrating a more accurate aggregation of the defect areas. In addition, the original image used had defects in three locations, YOLOv7 and YOLOv8 only accurately detected one location, while SM detected defects in all three locations. The results further illustrate that SiM-YOLO is effective in avoiding the occurrence of omission.

5. Conclusions

To address the challenges posed by the complexity and variability of wood surface defects, a novel deep learning-based approach named SiM-YOLO is proposed. By building on the YOLOv8 framework, SiM-YOLO incorporates several innovative components: the SPD-Conv module for preserving fine-grained defect information, the SiAFF-PANet-based fusion module for integrating multi-scale features with semantic consistency, and the MADH for capturing cross-channel information and achieving precise spatial localization. Additionally, optimizing the loss function effectively mitigates the issue of missed detections due to overlapping defects. Experimental results demonstrate the superior performance of SiM-YOLO compared to current state-of-the-art YOLO algorithms. Specifically, SiM-YOLO achieved a 9.3% improvement in mAP over YOLOX and a significant increase in AP for specific defect types, such as a 16% improvement for Crack detection compared to YOLOv8. The visualization results further confirm that SiM-YOLO offers more accurate defect localization and significantly reduces misdetection and omission issues. The successful application of SiM-YOLO in wood surface defect detection highlights its potential for practical implementation in quality control processes. This study not only demonstrates the effectiveness of SiM-YOLO but also provides valuable insights for future research and development in the field of automated defect detection in wood. Future research efforts will focus on further refining the model, particularly in terms of wood species and properties, to enhance its generalization ability and applicability to detecting surface defects across a wide range of wood species.

Author Contributions

H.X.: Writing—original draft, Resources, Methodology. R.W.: Supervision, Writing—review and editing, Funding acquisition, Methodology. F.L.: Validation, Software, Data curation. Y.C.: Software, Data curation. G.Z.: Data curation, Software. B.W.: Supervision, Investigation, Conceptualization, Funding Acquisition. All authors have read and agreed to the published version of the manuscript.

Funding

This study was co-supported by the Science and Technology Planning Project of Guangxi Province, China (No. 2022AC21012); the industry-university-research innovation fund projects of China University in 2021 (No. 2021ITA10018); the fund project of the Key Laboratory of AI and Information Processing (No. 2022GXZDSY101); and the Scientific Research Project of Hechi University (No. 2022YLXK002).

Data Availability Statement

Data will be made available on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Food and Agriculture Organization. World Food and Agriculture—Statistical Yearbook 2022; Food and Agriculture Organization (FAO): Rome, Italy, 2022. [Google Scholar]
Sutton, W.R.J.W. Wood—the world’s most sustainable raw material. N. Z. J. For. 2010, 55, 22–26. [Google Scholar]
Yi, Z.; Luo, L.; Lu, Q.; Chen, M.; Zhu, W.; Zhang, Y. An efficient and accurate surface defect detection method for quality supervision of wood panels. Meas. Sci. Technol. 2024, 35, 055209. [Google Scholar] [CrossRef]
Yuan, Y.; Zhang, D.; Sayed, U.; Zhu, H.; Wang, J.; Yang, X.J.; Wang, Z. Research and Application of Log Defect Detection and Visualization System Based on Dry Coupling Ultrasonic Method. J. Renew. Mater. 2023, 11, 3917–3932. [Google Scholar] [CrossRef]
Xuebing, B.; Kejun, W.; Lihui, Z. Approach to Texture Segmentation of Wood Surface Defects Based on Gray Level Co-occurrence Matrix. J. Northeast For. Univ. 2008, 36, 23–25+27. [Google Scholar] [CrossRef]
Ge, Y.; Jiang, D.; Sun, L. Wood Veneer Defect Detection Based on Multiscale DETR with Position Encoder Net. Sensors 2023, 23, 4837. [Google Scholar] [CrossRef]
Conners, R.W.; Mcmillin, C.W.; Lin, K.; Vasquez-Espinosa, R.E. Identifying and Locating Surface Defects in Wood: Part of an Automated Lumber Processing System. IEEE Trans. Pattern Anal. Mach. Intell. 1983, 5, 573–583. [Google Scholar] [CrossRef] [PubMed]
Kodytek, P.; Bodzas, A.; Bilik, P. A large-scale image dataset of wood surface defects for automated vision-based quality control processes [version 1; peer review: 2 approved with reservations]. F1000Research 2021, 10, 581. [Google Scholar] [CrossRef] [PubMed]
Kursat, D.; Mustafa, A.; Mehmet, C.; Fatih, D. Automated steel surface defect detection and classification using a new deep learning-based approach. Neural Comput. Appl. 2023, 35, 8389–8406. [Google Scholar] [CrossRef]
Jain, S.; Seth, G.; Paruthi, A.; Soni, U.; Kumar, G. Synthetic data augmentation for surface defect detection and classification using deep learning. J. Intell. Manuf. 2022, 33, 1007–1020. [Google Scholar] [CrossRef]
Bhatt, P.M.; Malhan, R.K.; Rajendran, P.; Shah, B.C.; Gupta, S.K. Image-Based Surface Defect Detection Using Deep Learning: A Review. J. Comput. Inf. Sci. Eng. 2021, 21, 040801. [Google Scholar] [CrossRef]
Liu, G. Surface Defect Detection Methods Based on Deep Learning: A Brief Review. In Proceedings of the 2020 2nd International Conference on Information Technology and Computer Application (ITCA), Guangzhou, China, 18–20 December 2020; pp. 200–203. [Google Scholar] [CrossRef]
Thomas, L.; Mili, L.; Thomas, E.; Shaffer, C.A. Defect detection on hardwood logs using laser scanning. Wood Fiber Sci. 2007, 38, 682–695. [Google Scholar]
Zhao, P.; Li, Y.; Ning, X. Simultaneous Wood Defect and Species Detection with 3D Laser Scanning Scheme. Int. J. Opt. 2016, 2016, 7049523. [Google Scholar] [CrossRef]
Karsulovic, J. Ultrasonic detection of knots and annual ring orientation in Pinus radiata lumber. Wood Fiber Sci. 2000, 32, 278–286. [Google Scholar] [CrossRef]
Fang, Y.; Lin, L.; Feng, H.; Lu, Z.; Emms, G.W. Review of the use of air-coupled ultrasonic technologies for nondestructive testing of wood and wood products. Comput. Electron. Agric. 2017, 137, 79–87. [Google Scholar] [CrossRef]
Qi, D.; Yu, L.; Feng, X. A detection method on wood defects of CT image using multifractal spectrum based on fractal brownian motion. In Proceedings of the IEEE International Conference on Automation & Logistics, Qingdao, China, 1–3 September 2008; IEEE: Qingdao, China, 2008; pp. 1539–1544. [Google Scholar] [CrossRef]
Tong, Q.J.; Ding, J.W.; Wang, H.L. Nondestructive Detection of Internal Defects in Wood-Trend of Study on CT Scanning of Wood. China For. Prod. Ind. 2005, 32, 5–7. [Google Scholar] [CrossRef]
Levent, T. The determination of the displacement of the wooden construction materials under load via digital image processing. Sci. Res. Essays 2010, 5, 1903–1910. [Google Scholar] [CrossRef]
Xie, Y.; Ling, J. Wood defect classification based on lightweight convolutional neural networks. BioResources 2023, 18, 7663–7680. [Google Scholar] [CrossRef]
He, T.; Liu, Y.; Yu, Y.; Zhao, Q.; Hu, Z. Application of deep convolutional neural network on feature extraction and detection of wood defects. Measurement 2020, 152, 107357. [Google Scholar] [CrossRef]
Wang, X. Detection of natural wood defects with large color differences based on branched network. Multimed. Tools Appl. 2023, 82, 44719–44739. [Google Scholar] [CrossRef]
Gao, M.; Qi, D.; Mu, H.; Chen, J. A Transfer Residual Neural Network Based on ResNet-34 for Detection of Wood Knot Defects. Forests 2021, 12, 212. [Google Scholar] [CrossRef]
Chen, Y.; Sun, C.; Ren, Z.; Na, B. Review of the current state of application of wood defect recognition technology. BioResources 2023, 18, 2288–2302. [Google Scholar] [CrossRef]
Meng, W.; Yuan, Y. SGN-YOLO: Detecting Wood Defects with Improved YOLOv5 Based on Semi-Global Network. Sensors 2023, 23, 8705. [Google Scholar] [CrossRef] [PubMed]
Xu, J.; Yang, H.; Wan, Z.; Mu, H.; Qi, D.; Han, S. Wood Surface Defects Detection Based on the Improved YOLOv5-C3Ghost With SimAm Module. IEEE Access 2023, 11, 105281–105287. [Google Scholar] [CrossRef]
Cui, W.; Li, Z.; Duanmu, A.; Xue, S.; Guo, Y.; Ni, C.; Zhu, T.; Zhang, Y. CCG-YOLOv7: A Wood Defect Detection Model for Small Targets Using Improved YOLOv7. IEEE Access 2024, 12, 10575–10585. [Google Scholar] [CrossRef]
Zheng, Y.; Wang, M.; Zhang, B.; Shi, X.; Chang, Q. GBCD-YOLO: A High-Precision and Real-Time Lightweight Model for Wood Defect Detection. IEEE Access 2024, 12, 12853–12868. [Google Scholar] [CrossRef]
Wang, B.; Yang, C.; Ding, Y.; Qin, G. Detection of wood surface defects based on improved YOLOv3 algorithm. BioResources 2021, 16, 6766–6780. [Google Scholar] [CrossRef]
Chen, W.; Liu, J.; Fang, Y.; Zhao, J. Timber knot detector with low false-positive results by integrating an overlapping bounding box filter with faster R-CNN algorithm. BioResources 2023, 18, 4964–4976. [Google Scholar] [CrossRef]
Urbonas, A.; Raudonis, V.; Maskeliūnas, R.; Damaševičius, R. Automated Identification of Wood Veneer Surface Defects Using Faster Region-Based Convolutional Neural Network with Data Augmentation and Transfer Learning. Appl. Sci. 2019, 9, 4898. [Google Scholar] [CrossRef]
Shi, J.; Li, Z.; Zhu, T.; Wang, D.; Ni, C. Defect Detection of Industry Wood Veneer Based on NAS and Multi-Channel Mask R-CNN. Sensors 2020, 20, 4398. [Google Scholar] [CrossRef]
Hu, K.; Wang, B.; Shen, Y.; Guan, J.; Cai, Y. Defect identification method for poplar veneer based on progressive growing generated adversarial network and MASK R-CNN model. BioResources 2020, 15, 3041–3052. [Google Scholar] [CrossRef]
Li, D.; Xie, W.; Wang, B.; Zhong, W.; Wang, H. Data Augmentation and Layered Deformable Mask R-CNN-Based Detection of Wood Defects. IEEE Access 2021, 9, 108162–108174. [Google Scholar] [CrossRef]
Sunkara, R.; Luo, T. No more strided convolutions or pooling: A new CNN building block for low-resolution images and small objects. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer: Cham, Switzerland, 2022; pp. 19–23. [Google Scholar]
Dai, Y.M.; Gieseke, F.; Oehmcke, S.; Wu, Y.Q.; Barnard, K. Attentional feature fusion. In Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2020; pp. 3559–3568. [Google Scholar] [CrossRef]
Hou, Q.B.; Zhou, D.Q.; Feng, J.S. Coordinate attention for efficient mobile network design. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13713–13722. [Google Scholar] [CrossRef]
Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Wu, E. Squeeze-and-Excitation Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 2011–2023. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Structure of SiM-YOLO.

Figure 2. Convolution downsampling module.

Figure 3. Structure of SPD-Conv (scale = 2).

Figure 4. Structure of multi-scale channel attention module. (a) MS-CAM; (b) SMS-CAM.

Figure 5. Structure of SiAFF.

Figure 6. Structure of MADH.

Figure 7. Structure of CA and CCA. (a) CA, (b) CCA.

Figure 8. Pine surface defect types in the dataset. (a) Live_Knot; (b) Marrow (Pith); (c) Resin (Resin pocket); (d) Dead_Knot; (e) Knot_with_crack; (f) Knot_missing; and (g) Crack.

Figure 9. Precision–recall (P-R) curves. (a) YOLOv7; (b) YOLOv8; and (c) SiM-YOLO.

Figure 10. Portion of visualization results.

Figure 11. Grad-CAM images of portion of experimental results. (a) Original image of Pine surface defect; (b) YOLOv7; (c) YOLOv8; and (d) SiM-YOLO.

Table 1. Distribution of defect types in the dataset.

Defect Type of Pine	Number of Occurrences	Number of Images with the Defect	Images with the Defect in the Dataset (%)
Live_Knot	4070	2256	62.7
Marrow (Pith)	206	191	5.3
Resin (Resin pocket)	650	523	14.5
Dead_Knot	2934	1875	52.1
Knot_with_crack	542	398	11.1
Knot_missing	121	110	3.1
Crack	517	371	10.3
Without any defects	—	7	0.2

Table 2. Details of experimental environment configurations.

Configuration Environment	Configuration Name (Version)
Operating system	Window 11
CPU	9th Gen Intel(R) Core (TM) i5-9300
GPU	NVIDIA GeForce GTX 1650
Compilers	Python 3.8.19
Deep Learning Framework	Pytorch 1.13.1
Acceleration Module	CUDA 11.7.0

Table 3. Details of experimental hyperparameter configurations.

Hyperparameters	Values
image size	640
batch size	4
iteration times	200
learning rate	0.01
Momentum	0.937
Weight_decay	0.0005

Table 4. Ablation experiments results.

	mAP%	AP%
	mAP%	Live_Knot	Marrow	Resin	Dead_Knot	Knot_with_Crack	Knot_Missing	Crack
YOLOv8	0.741	0.797	0.962	0.745	0.844	0.476	0.824	0.533
YOLOv8-SPD-Conv	0.767	0.819	0.966	0.782	0.864	0.556	0.751	0.630
YOLOv8–SiAFF-PANet	0.775	0.815	0.954	0.830	0.869	0.560	0.820	0.574
YOLOv8-MADH	0.763	0.814	0.831	0.771	0.833	0.551	0.850	0.694
SiM-YOLO	0.784	0.854	0.871	0.779	0.831	0.595	0.866	0.693

Table 5. Comparison results of SiM-YOLO with other YOLO family algorithms for wood surface defect detection.

	mAP%	AP%
	mAP%	Live_Knot	Marrow	Resin	Dead_Knot	Knot_with_Crack	Knot_Missing	Crack
YOLOv5	0.702	0.808	0.868	0.728	0.841	0.316	0.600	0.757
YOLOv7	0.777	0.839	0.839	0.733	0.840	0.650	0.773	0.762
YOLOX	0.691	0.823	0.857	0.711	0.822	0.476	0.653	0.492
YOLOv8	0.741	0.797	0.962	0.745	0.844	0.476	0.824	0.533
YOLOv9	0.774	0.817	0.816	0.787	0.842	0.578	0.899	0.678
SiM-YOLO	0.784	0.854	0.871	0.779	0.831	0.595	0.866	0.693

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xi, H.; Wang, R.; Liang, F.; Chen, Y.; Zhang, G.; Wang, B. SiM-YOLO: A Wood Surface Defect Detection Method Based on the Improved YOLOv8. Coatings 2024, 14, 1001. https://doi.org/10.3390/coatings14081001

AMA Style

Xi H, Wang R, Liang F, Chen Y, Zhang G, Wang B. SiM-YOLO: A Wood Surface Defect Detection Method Based on the Improved YOLOv8. Coatings. 2024; 14(8):1001. https://doi.org/10.3390/coatings14081001

Chicago/Turabian Style

Xi, Honglei, Rijun Wang, Fulong Liang, Yesheng Chen, Guanghao Zhang, and Bo Wang. 2024. "SiM-YOLO: A Wood Surface Defect Detection Method Based on the Improved YOLOv8" Coatings 14, no. 8: 1001. https://doi.org/10.3390/coatings14081001

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

SiM-YOLO: A Wood Surface Defect Detection Method Based on the Improved YOLOv8

Abstract

1. Introduction

2. Related Works

3. Methodology and Design

3.1. SiM-YOLO Overview

3.2. Wood Defect Feature Extraction Based on SPD-Conv

3.3. Wood Defect Feature Fusion Based on SiAFF-PANet

3.4. Detection Head Based on Multi-Attention Mechanism

3.5. Optimization of Loss Function Based on MPDIOU

4. Experiment and Results

4.1. The Distribution of Wood Surface Defect Types in the Dataset

4.2. Experimental Setup: Environment and Parameter Configurations

4.3. Evaluation Indicators

4.4. Ablation Experiment

4.5. Comparative Experiment

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI