Self-Attention-Mechanism-Improved YoloX-S for Briquette Biofuels Object Detection

Wang, Yaxin; Liu, Xinyuan; Wang, Fanzhen; Ren, Dongyue; Li, Yang; Mu, Zhimin; Li, Shide; Jiang, Yongcheng

doi:10.3390/su151914437

Open AccessArticle

Self-Attention-Mechanism-Improved YoloX-S for Briquette Biofuels Object Detection

by

Yaxin Wang

¹,

Xinyuan Liu

¹,

Fanzhen Wang

¹,

Dongyue Ren

¹,

Yang Li

¹,

Zhimin Mu

²,

Shide Li

² and

Yongcheng Jiang

^1,*

¹

College of Engineering and Technology, Tianjin Agricultural University, Tianjin 300392, China

²

College of Basic Science, Tianjin Agricultural University, Tianjin 300392, China

^*

Author to whom correspondence should be addressed.

Sustainability 2023, 15(19), 14437; https://doi.org/10.3390/su151914437

Submission received: 1 August 2023 / Revised: 15 September 2023 / Accepted: 18 September 2023 / Published: 3 October 2023

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Fuel types are essential for the control systems of briquette biofuel boilers, as the optimal combustion condition varies with fuel type. Moreover, the use of coal in biomass boilers is illegal in China, and the detection of coals will, in time, provide effective information for environmental supervision. This study established a briquette biofuel identification method based on the object detection of fuel images, including straw pellets, straw blocks, wood pellets, wood blocks, and coal. The YoloX-S model was used as the baseline network, and the proposed model in this study improved the detection performance by adding the self-attention mechanism module. The improved YoloX-S model showed better accuracy than the Yolo-L, YoloX-S, Yolov5, Yolov7, and Yolov8 models. The experimental results regarding fuel identification show that the improved model can effectively distinguish biomass fuel from coal and overcome false and missed detections found in the recognition of straw pellets and wood pellets by the original YoloX model. However, the interference of the complex background can greatly reduce the confidence of the object detection method using the improved YoloX-S model.

Keywords:

deep learning; fuel identification; self-attention mechanism; YoloX

1. Introduction

Over 80% of global energy is generated from fossil fuels, resulting in severe climate problems and environmental pollution and forcing human beings to produce energy in a sustainable way [1,2,3]. Biomass is a carbon-neutral fuel that accounts for 10.3% of the global primary energy supply; the conversion of biomass to energy is crucial for sustainable eco-development and renewable energy production [4,5]. Therefore, bioenergy is considered as an ideal alternative energy source to fossil fuels. However, many kinds of biomass are treated as wastes in various countries [5], representing an insufficient utilization of biomass resources. China is rich in agricultural straw, but most kinds of straw are burned in the field, and the use of straws as fuels for domestic cooking or heating in rural areas is limited [6]. Recently, the application of biomass in bioenergy production has been extensively studied, and the most widely used technology for converting biomass to energy is still direct combustion, partly because the other biomass chemical technologies have difficulties in overcoming the high investment costs and high losses [4,7].

In China, over 900 million tons of agricultural crops and forestry biomass wastes are produced annually [8]. A significant portion of these biofuels can be effectively utilized for heat energy through direct combustion in biomass boilers, making it a practical and favorable option for biomass recycling and direct heat supply [7]. Before combustion, a variety of agricultural and forestry biomass wastes can be pulverized, stirred, and compacted into biomass briquettes to increase the mass and energy densities of biomass fuels [9,10,11]. The briquetting treatment of biomass is remarkably beneficial for biomass storage, transportation, and feeding into boilers, and it also improves the homogeneity and durability of materials and further ensures viable energy recovery [9,10]. Solid biofuels and biomass boilers have developed rapidly in China over the past two decades. The production of solid biofuels reached 3.82 million tons in 2012, with tens of thousands of industrial biomass boilers now in operation [6,12]. In China, the policy and incentive measures are playing crucial roles in promoting biomass energy development and standardizing the biomass industry. Based on the regulations of the National Energy Administration (NEA) of China, biomass boilers fed with biomass briquettes are encouraged for uses such as providing heat energy to residential zones, schools, and factories that are located in densely populated areas and therefore have convenient access to biomass [13]. Additionally, it was explicitly stipulated by the NEA that biomass boilers are prohibited from burning coal to ensure up-to-standard emissions [13], and relevant national and local laws have also been enacted. However, there is still a lack of effective supervisory measures to detect the illegal burning of coal in biomass boilers, and it is necessary to establish a monitoring method to distinguish the coal from the typical biomass briquettes in a timely manner.

As the essential parameters of biomass briquettes, their shapes and types are related to the calorific values and pollutant emissions of the biofuels, which further dramatically affect the optimal operational conditions [9,11,14,15]. Additionally, the ash production features of various biomass briquettes are also very distinct from coal during combustion; biomass fuels that contain more ash have greater risks of slagging and fouling [12,16]. During the use of a biomass boiler, multiple fuel types may be burnt; thus, it is desirable to determine the type of fuels that are about to be burnt in a timely manner so that further appropriate operation adjustment strategies can be applied to optimize the combustion process [17]. To achieve the efficient and safe operation of biomass boilers, it is vital to rapidly identify fuel types as a basis for operation optimization and accurate control.

With the rapid development of artificial intelligence, image processing technology has been increasingly applied to online fuel type identification. Many studies have focused on flame monitoring, feature extraction, and the identification of fuel types using image processing technology and spectral analysis. Machine learning algorithms, including fuzzy logic, neural networks (NNs), and data mining have been shown to be efficient approaches in this field [17,18]. In various research studies, flame radiative signals in different spectral ranges, such as radical, ultra-violet, visible, and infrared, have been acquired. Time-domain and frequency-domain features of flame signals have been extracted to establish soft computing-based models for identifying fuel types [19]. However, fuel identification based on flame features may result in mis-identification when the fuels used have similar components and combustion behaviors [18]. Moreover, the flame radiation dominantly reflects the chemical characteristics of the fuel, while the shape features of biomass briquettes are hard to recognize. In contrast to the complicated and time-consuming flame-based fuel identification methods, this article proposes a direct fuel type detection method based on images of fuels, which can also provide the shape information of the same fuel type. This study involved selecting images of straws and woods, as well as coals, with different shapes to construct the fuel dataset and subsequently establishing the improved YoloX-S network with added self-attention modules for fuel type target detection. Finally, the detection accuracy and loss of the improved YoloX-S were verified by comparing the model with the common networks in the YOLO series.

2. Related Works

2.1. Image Datasets

2.1.1. The Making and Processing of the Fuel Image Datasets

There are currently no publicly available image datasets for biomass briquettes and coal. Therefore, one was established for the purposes of this study. The dataset consists of high-resolution JPG images of five types of biomass briquettes and coals with dimensions of 4624 × 3468 pixels (Figure 1). The number of images for straw pellets, straw blocks, wood pellets, wood blocks, and coal are 757, 266, 332, 325, and 357, respectively.

2.1.2. Setup of Experimental Platform

The experiment was performed utilizing the PyTorch deep learning framework on a hardware platform comprising an NVIDIA GeForce 2080 Ti RTX which produced by NVIDIA, (Santa Clara, CA, USA). The operating system we used was Windows 10 Professional. The algorithms were trained on a personal computer utilizing Anaconda Python 3.7 with an Intel Core i7-9700 CPU @ 3.00GHz processor and an NVIDIA GeForce RTX 2080Ti GPU.

2.1.3. Model Training

The training parameters were set as follows:

The size of the input images was 640 × 640, the training process involved using 12 samples for each iteration, and the model was trained for 150 epochs using the entire training dataset. During the testing phase, the input image size was set to 640, and the intersection over union (IOU) threshold was set to 0.5.

2.1.4. Evaluation Metrics

To ensure a comprehensive and unbiased analysis of the results, we utilized five evaluation metrics: Precision (P), Recall (R), F1-score (F1), Average Precision (AP), and Mean Average Precision (mAP). The calculation formulas for these metrics are as follows: [20]:

m A P = \frac{\sum_{i = 1}^{N} A P_{i}}{N}

(1)

A P = \sum_{i = 1}^{n - 1} (r_{i - 1} - r_{i}) P_{i n t e r} (r_{i} + 1)

(2)

R = \frac{T P}{T P + F N} \times 100 %

(3)

P = \frac{T P}{T P + F P} \times 100 %

(4)

In the context of object detection, the following definitions apply:

FN (False-Negative): The number of actual positive samples that are missed and incorrectly detected as negative by the model.
TP (True-Positive): The number of actual positive samples that are correctly detected as positive by the model.
FP (False-Positive): The number of actual negative samples that are mistakenly detected as positive by the model [21].

Recall is a metric that calculates the ratio of TP to the sum of TP and FN. In Equation (1), N represents the number of categories that objects may belong to. The r₁, r₂, ..., r_i correspond to Recall values (sorted in ascending order of precision interpolation).

Various types of average precision are represented by mAPs.

2.2. Methodologies

The YOLO series algorithm is a “one-stage” object detection algorithm that strikes a balance between detection accuracy and speed. One notable algorithm in this series is YoloX, developed by MEGVII, which is an open-source high-performance detection algorithm [22]. YoloX has been shown to demonstrate superior Average Precision compared to Yolov3, Yolov4, and Yolov5 while maintaining a highly competitive processing speed.

2.2.1. YoloX-S Network

The YoloX-S model is a variant of the YoloX model, which consists of three main parts: the backbone feature extraction network (CSPDarknet), the enhanced feature extraction network (FPN), and the classifier and regression network (YOLO head) [22]. The input images were 640 × 640 (similar to Yolov5) [23].

The focus network structure is used to compress the height and width of the input image to expand the number of channels [24]. Conv2D_BN_SiLU (CBS) and Resblock_body were employed for channel adjustment and feature extraction, respectively. CBS is a structure comprising a convolution layer, a batch normalization layer, and an activation function. In Resblock_body, the input image was a 3 × 3 convolution. Subsequently, feature extraction was carried out using CSPlayer. The Resblock_body was repeated four times for feature extraction. The Spatial Pyramid Pooling (SPP) structure was added to the fourth Resblock_body. In Yolov4, SPP enhanced the feature extraction network, while in YoloX-S, SPP was used for the backbone feature extraction network [25]. The SPP structure employs three pooling cores with sizes of 5, 9, and 13, respectively, to extract features from the input image. Finally, three effective feature layers were obtained through the backbone feature extraction network, resulting in shapes with dimensions of (80 × 80 × 256), (40 × 40 × 512), and (20 × 20 × 1024), respectively [26].

The three effective feature layers strengthened the feature extraction network. The enhanced feature extraction network performs feature fusion, which involves both up-sampling and down-sampling feature fusion processes [27]. For up-sampling feature fusion, the 20 × 20 × 1024 feature layer is convolved, up-sampled, and stacked with the 40 × 40 × 512 feature layer [23]. The CSPLayer structure is for feature extraction. The resulting features are convolved, up-sampled again, and stacked with the 80 × 80 × 256 feature layer. In another round of feature extraction using the CSPLayer structure, a down-sampling is employed for feature fusion to compress the height and width of the images. The final output is obtained by inputting the three effective feature layers into the YOLO Head (Figure 2).

2.2.2. Contextual Transformer Network

The Contextual Transformer Network (CoTNet) is a backbone architecture inspired by the Transformer model that is designed specifically for visual detection tasks [28]. While traditional designs apply self-attention directly on 2D feature maps to obtain an attention matrix, CoTNet takes advantage of the rich contextual information between adjacent keys to guide the learning of a dynamic attention matrix, thereby enhancing the visual representation capability [29].

In the CoTNet architecture, the CoT block plays a crucial role. It first encodes the input key through a context-aware convolution, generating a static context representation. This representation is then concatenated with the input query. The learned attention matrix is applied to the input value through two successive convolutions, resulting in a dynamic context representation of the input. The CoT block can easily replace each convolution in the ResNet architecture, making it an attractive choice for building the backbone of the CoTNet [30]. ResNet, short for residual neural network, addresses the issues of gradient dispersion and precision decline in deep networks. By utilizing residual connections, ResNet enables the network to be deeper while maintaining accuracy and controlling speed (Figure 3).

2.2.3. Convolutional Neural Network

A convolutional Neural Network (CNN) is a kind of feedforward neural network, whose artificial neurons can capture a part of the surrounding units within the surrounding area; they perform well in large-scale image processing [31]. They are very similar to ordinary neural networks, which are composed of neurons with learnable weights and biased constants [32]. Moreover, they have the advantage of being able to capture local features effectively. CNNs adopt convolutions of different sizes to perform cross-iteration operations, which differs from the matrix multiplication used in traditional neural networks. This allows for the allocation of weights to local information [33]. By distributing the weights during the convolution process, the CNN model can be effectively reduced in complexity, making it suitable for large-scale machine learning tasks like image processing. The CNN architecture comprises an input layer, convolutional layer, pooling layer, activation function layer, fully connected layer, and output layer (Figure 4).

2.2.4. CoT Block in the Improved YoloX-s Network

The Contextual Transformer Network (CoTNet) offers advantages in long-distance information modeling and global perception compared to other representative channels and spatial attention mechanisms in the visual field, such as SENet and CBAM [34]. In the context of target detection problems related to biomass fuel placement, the CoT Block demonstrates its strength in accurately describing foreground positive sample features and significantly improving target recognition rates.

In the CoT Block, given an input feature X, three variables (Q = X, K = X, and V = X) are defined [35]. The KeyMap undergoes a grouping convolution with a kernel size of k × k to obtain a representation with context information, which can be seen as the static modeling of local information. Then, a concatenation operation is performed, where Q and the obtained representation are concatenated. The resulting feature map is then subjected to two consecutive convolutions to generate an Attention Map that contains rich context information. Finally, this Attention Map is multiplied by V to obtain a dynamic context modeling representation.

The improved YoloX network replaces the 3 × 3 convolution in ResNet with the CoT module, thereby constructing a new backbone feature extraction network (Figure 5).

3. Experimental Analysis

3.1. Comparative Experiment

CNN-based object detection has been shown to significantly improve object detection accuracy. However, its excessive processing time hampers its usability for real-time applications [36]. In our study, we focused on detecting complex areas and small targets and made changes to address these challenges. We initially used the YoloX-S model as our baseline and then improved it by incorporating self-attention modules to enhance object detection performance.

A comparison with six different networks demonstrated the effectiveness of our model. The results of our comparison show that the YoloX-S achieved a high mAP of 95.4% in Table 1. However, it had the lowest frames per second (FPS) (29 f/s), indicating a trade-off between accuracy and speed. By employing our improved YoloX-S model, we significantly increased the FPS to 73 f/s, and this demonstrates its good performance in improving processing speed. The improved YoloX-S, Yolov7 [37], and Yolov8 models exhibited much higher FPS compared to the other models, indicating that they have higher detection speeds.

In terms of training performance, the loss value of the improved YoloX-S network and Yolov8 model was lower than that of the other five models, indicating better convergence effects and improved performance. The variation of the loss value with epoch value in the training process of the improved YoloX-S model is shown in Figure 6. The loss value gradually decreased with the number of training rounds, and the loss curve converged at around 2.25 after approximately 140 rounds.

Regarding the object detection accuracy, the mAP value of the improved YoloX-S was higher than that of the other five network models. Yolov7 was intended to be a faster and better Yolo algorithm. The authors of Yolov7 claimed that it achieved both speed and accuracy, exceeding the currently known target detection within the range of 5 FPS to 160 FPS [38]. In our study, Yolov7 was chosen as a comparison model for training and testing. The results show that the AP for straw block detection was only 74%, while the APs for the detection of the other four fuels were above 98%, resulting in a mAP value of only 94.32% (Figure 7).

Yolov8 represents the latest advancement in the Yolo series of detection algorithms, and it utilizes a single prediction framework to locate and classify objects in image detection. This method exhibits superior capabilities in detecting multiple objects with enhanced speed and accuracy [39]. Comparing the improved YoloX-S to the Yolov7 and Yolov8 networks, it is clear that the identification accuracy is increased, with mAP increasing by 2.28 and 2.64 percent, respectively, while the speed decreased, with FPS reducing by 2 f/s and 3 f/s, respectively.

Accuracy and speed are vital performance parameters for object detection algorithms as they help to find the balance between accuracy and speed (according to application demand). For biomass boilers, enhanced accuracy in detecting fuel types is beneficial for obtaining correct information, which is better than improving detection speed for combustion control and illegal fuel supervision. Therefore, the improved YoloX-S algorithm is suitable for fuel identification in biomass boilers.

3.2. Ablation Experiments

The YoloX network was used as the test network, and various improvement modules were trained and evaluated on the biomass fuel dataset. The impact of these modules on the benchmark model is presented in Table 2, which showcases the experimental results for object detection using the improved modules. The benchmark model used in Experiment 1 achieved a mAP of 94.38%, a recall of 89.10%, and an FPS of 83 f/s.

The introduction of the Focal loss function (Experiment 2) resulted in a decrease in performance. However, the modules of CSPDarknet53 + CoT, CSPDarknet53 + SE + Focal loss, CSPDarknet53 + CBAM + Focal loss, and CSPDarknet53 + COT + Focal loss progressively increased the mAP and recall and led to an improvement in performance. It is important to note that all these modules reduced the FPS compared to the benchmark model.

Considering the significance of accuracy in fuel identification, the algorithm incorporating the CoT self-attention model in CSPDarknet53 achieved the highest mAP and recall. Therefore, it was selected as the optimal algorithm for this study.

4. Results and Discussion

The backbone networks before and after the improvement showcases the feature map output using the grad-CAM algorithm (Figure 8). The red region represents the region of highest attention, while the yellow and blue regions indicate areas with diminishing attention. Figure 8 demonstrates the class activation mapping area of the enhanced backbone network with the CoT block, which focuses more on the fuel targets and pays less attention to the background area. These findings validate that the improved network possesses stronger feature extraction and discrimination capabilities for irrelevant features in the background, and this aligns with previous studies [40].

Figure 9 and Table 3 compare the missed detection rates of the five fuel targets with and without the CoT block. The results indicate that both methods exhibit high precision in fuel identification, with the average accuracy of the improved YoloX-S algorithm (96.60%) surpassing that of the YoloX-S algorithm without the CoT module (95.40%) after 150 cycles. The data demonstrate that the improved model reduces the missed detection rates of straw blocks, straw pellets, wood pellets, coal, and wood blocks by 6%, 9%, 7%, 2%, and 3%, respectively (Figure 9).

In our self-made biomass molding fuel dataset, straw pellets were utilized as small targets for saliency. The algorithm proposed in this study demonstrates an improvement in the detection of straw pellets, as indicated by the increased Recall value, AP value, and F1 value. Specifically, compared to the original model with a score threshold of 0.5, the proposed algorithm achieves a 3.68% increase in Recall, a 3.13% increase in AP, and a 0.02% increase in F1 for straw pellet detection (Figure 10).

These results highlight the effectiveness of the self-attention mechanism in focusing on relevant information within hidden features and enhancing the attention towards small targets with low edge resolutions and limited useful information.

To verify the practicality and limitations of the improved model, we conducted tests on different models using the same image. A comparison of the instance detection results before and after adding the CoT block is presented (Figure 11). In each set of detection result diagrams, the left figures (Figure 11a,c,e,g,i) are the original model, while the right figures (Figure 11b,d,f,h,j) are the improved model.

The improved YoloX model achieves a higher confidence (0.89–0.94) in detecting coal than the original YoloX model (0.85–0.92) (Figure 11a,b). Moreover, the improved model effectively distinguishes coal from biomass fuels, meaning that it has the potential for application in the monitoring of illegal coal burning in biomass boilers.

Figure 11c–f show the detection results of straw and wood blocks. The improved YoloX model achieves a higher confidence (0.90–0.92) in detecting wood blocks than the original YoloX model (0.89–0.91). However, the original and improved models exhibit low confidence (0.52–0.69) in detecting straw blocks and actually missed some objects. This was attributed to the similarity in features between the straw pellets and wood pellets and their dense distribution.

Figure 11g–j illustrate the detection results regarding the straw and wood pellets. In the actual operation scenario of biomass boilers, the detection of fuel type is crucial to prevent issues such as coking and ash accumulation, which directly impact safety and economic benefits [41]. The original YoloX model shows false and missed detections in recognizing straw pellets and wood pellets, and the improved YoloX model overcame these issues. This implies that the improved model enhances detection accuracy in cases of dense distribution. Additionally, the confidence for straw pellets and wood pellets as higher for the improved YoloX model compared to the original YoloX model. The proposed model effectively recognizes the shape features of straw and wood pellets, as evidenced by the comparisons in Figure 11g–j.

We also tested the improved YoloX model on images containing different object categories, and the results with high confidence (0.90–0.95) indicate high accuracy, as shown in Figure 11k. Notably, Figure 11k achieves a high confidence (0.93) in recognizing the straw block, while Figure 11d, which contains multiple straw blocks, obtains a lower confidence (0.52–0.68) despite being derived from the same improved model. By comparing the backgrounds of Figure 11d and Figure 11k, it can be observed that there are scattered straw clippings in the background of Figure 11d, which may contribute to the lower confidence. Therefore, it can be inferred that the improved YoloX model is effective and precise in identifying briquette biofuels when there are sufficient differences between the foreground and background.

5. Conclusions

Considering the limitations of deep learning technology in detecting biomass fuel types and the potential issues of false and missed detection in complex scenarios, we have made improvements to the YoloX algorithm. We added self-attention and replaced the 3 × 3 convolution in ResNet with the CoT module to create a new backbone feature extraction network. Our experimental results show that the improved YoloX-S model significantly increased the detection speed compared to the original YoloX-S model and that the improved YoloX-S model also achieved the highest detection accuracy compared to YOLO-L, YoloX-S, Yolov5, Yolov7, and Yolov8.

Our ablation experiments demonstrated that the CoT self-attention model in CSPDarknet53 effectively enhances the algorithm's accuracy. When applied to fuel identification, the proposed model effectively distinguishes coal from biomass fuels, and wood blocks can be detected with a high confidence level of 0.90–0.92 by the improved YoloX model. The improved YoloX-S model, which was based on the proposed object detection methods, accurately distinguishes coal from briquette biofuels, meaning that it has potential applications in the monitoring of illegal coal burning in biomass boilers.

However, both the original and improved YoloX models exhibit low confidence levels of 0.52–0.69 for straw blocks with complex backgrounds and occasionally fail to detect some objects. On the other hand, the detection of straw blocks with clean backgrounds achieves a high confidence level of 0.93. Future studies should focus on enhancing the detection confidence under complex backgrounds, which will facilitate the practical application of this detection method.

Author Contributions

Conceptualization, Y.W., X.L. and Y.J.; methodology, Y.W., X.L., Y.L. and Z.M.; software, D.R.; validation, S.L.; formal analysis, Y.W.; investigation, F.W.; resources, Y.J.; data curation, F.W., D.R., Y.L. and Z.M.; writing—original draft preparation, Y.W. and X.L.; writing—review and editing, X.L. and S.L.; visualization, S.L.; supervision, Y.J.; project administration, Y.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research study was funded by the Heilongjiang Province key research and development plan (funding number: GA21C026) and the Key Research and Development Plan of Hebei Province in 2022 (funding number: 22347402D).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Li, F.; Li, Y.; Novoselov, K.S.; Liang, F.; Meng, J.; Ho, S.H.; Zhao, T.; Zhou, H.; Ahmad, A.; Zhu, Y.; et al. Bioresource upgrade for sustainable energy, environment, and biomedicine. Nanomicro Lett. 2023, 15, 35. [Google Scholar] [CrossRef] [PubMed]
Shokri Kalan, A.; Heidarabadi, S.; Khaleghi, M.; Ghiasirad, H.; Skorek-Osikowska, A. Biomass-to-energy integrated trigeneration system using supercritical CO2 and modified Kalina cycles: Energy and exergy analysis. Energy 2023, 270, 126845. [Google Scholar] [CrossRef]
Hu, B.-B.; Lin, Z.-L.; Chen, Y.; Zhao, G.-K.; Su, J.-E.; Ou, Y.-J.; Liu, R.; Wang, T.; Yu, Y.-B.; Zou, C.-M. Evaluation of biomass briquettes from agricultural waste on industrial application of flue-curing of tobacco. Energy Source Part A 2020, 1–12. [Google Scholar] [CrossRef]
Codina Gironès, V.; Moret, S.; Peduzzi, E.; Nasato, M.; Maréchal, F. Optimal use of biomass in large-scale energy systems: Insights for energy policy. Energy 2017, 137, 789–797. [Google Scholar] [CrossRef]
Odzijewicz, J.I.; Wołejko, E.; Wydro, U.; Wasil, M.; Jabłońska-Trypuć, A. Utilization of ashes from biomass combustion. Energies 2022, 15, 9653. [Google Scholar] [CrossRef]
Zhang, C.; Wang, H.; Bai, L.; Wu, C.; Shen, L.; Sippula, O.; Yang, J.; Zhou, L.; He, C.; Liu, J.; et al. Should industrial bagasse-fired boilers be phased out in China? J. Clean. Prod. 2020, 265, 121716. [Google Scholar] [CrossRef]
Güler, B. Investigation of efficiency of pellet burning methods in a full scale rotary dryer. Biomass Convers. Bior. 2022, 1–13. [Google Scholar] [CrossRef]
Zhao, F.; Bai, F.; Liu, X.; Liu, Z. A review on renewable energy transition under China’s carbon neutrality target. Sustainability 2022, 14, 15006. [Google Scholar] [CrossRef]
Ito, H.; Tokunaga, R.; Nogami, S.; Miura, M. Influence of biomass raw materials on combustion behavior of highly densified single cylindrical biomass briquette. Combust. Sci. Technol. 2020, 194, 2072–2086. [Google Scholar] [CrossRef]
Olugbade, T.; Ojo, O.; Mohammed, T. Influence of Binders on Combustion Properties of Biomass Briquettes: A Recent Review. Bioenergy Res. 2019, 12, 241–259. [Google Scholar] [CrossRef]
Kpalo, S.Y.; Zainuddin, M.F.; Manaf, L.A.; Roslan, A.M. A review of technical and economic aspects of biomass briquetting. Sustainability 2020, 12, 4609. [Google Scholar] [CrossRef]
Zhou, Y.; Zhang, Z.; Zhang, Y.; Wang, Y.; Yu, Y.; Ji, F.; Ahmad, R.; Dong, R. A comprehensive review on densified solid biofuel industry in China. Renew. Sustain. Energy Rev. 2016, 54, 1412–1428. [Google Scholar] [CrossRef]
IEA. A Circular of the National Energy Administration on Heating with Renewable Energy Based on Local Conditions (No. 000019705/2021-00020); IEA: Paris, France, 2021. [Google Scholar]
Velusamy, S.; Subbaiyan, A.; Murugesan, S.R.; Shanmugamoorthy, M.; Sivakumar, V.; Velusamy, P.; Veerasamy, S.; Mani, K.; Sundararaj, P.; Periyasamy, S.; et al. Comparative analysis of agro waste material solid biomass briquette for environmental sustainability. Adv. Mater. Sci. Eng. 2022, 2022, 3906256. [Google Scholar] [CrossRef]
Dinesha, P.; Kumar, S.; Rosen, M.A. Biomass briquettes as an alternative fuel: A comprehensive review. Energy Technol. 2019, 7, 1801011. [Google Scholar] [CrossRef]
Li, G.; Hu, R.; Hao, Y.; Yang, T.; Li, L.; Luo, Z.; Xie, L.; Zhao, N.; Liu, C.; Sun, C.; et al. CO₂ and air pollutant emissions from bio-coal briquettes. Environ. Technol. Innov. 2023, 29, 102975. [Google Scholar] [CrossRef]
Li, X.; Wu, M.; Lu, G.; Yan, Y.; Liu, S. On-line identification of biomass fuels based on flame radical imaging and application of radical basis function neural network techniques. IET Renew. Power Gener. 2015, 9, 323–330. [Google Scholar] [CrossRef]
Zhou, H.; Li, Y.; Tang, Q.; Lu, G.; Yan, Y. Combining flame monitoring techniques and support vector machine for the online identification of coal blends. J. Zhejiang Univ.-Sci. A 2017, 18, 677–689. [Google Scholar] [CrossRef]
Ge, H.; Li, X.; Li, Y.; Lu, G.; Yan, Y. Biomass fuel identification using flame spectroscopy and tree model algorithms. Combust. Sci. Technol. 2019, 193, 1055–1072. [Google Scholar] [CrossRef]
Tian, C.; Hao, D.; Ma, M.; Zhuang, J.; Mu, Y.; Zhang, Z.; Zhao, X.; Lu, Y.; Zuo, X.; Li, W. Graded diagnosis of Helicobacter pylori infection using hyperspectral images of gastric juice. J. Biophotonics 2023, e202300254. [Google Scholar] [CrossRef]
Lin, J.; Zhang, K.; Yang, X.; Cheng, X.; Li, C. Infrared dim and small target detection based on U-Transformer. J. Vis. Commun. Image Represent. 2022, 89, 103684. [Google Scholar] [CrossRef]
Zeng, Y.; Zhou, Z.; Yu, Y. Study of YOLOX target detection method based on stand-alone self-attention. Acad. J. Comput. Inf. Sci. 2022, 5, 29–37. [Google Scholar] [CrossRef]
Wang, Y.; Wu, H.; Hua, X.; Ren, D.; Li, Y.; Mu, Z.; Xu, W.; Wei, Y.; Zhang, T.; Jiang, Y. Biological characters identification for hard clam larva based on the improved YOLOX-s. Comput. Electron. Agric. 2023, 212, 108103. [Google Scholar] [CrossRef]
Mamalis, M.; Kalampokis, E.; Kalfas, I.; Tarabanis, K. deep learning for detecting verticillium fungus in olive trees: Using YOLO in UAV imagery. Algorithms 2023, 16, 343. [Google Scholar] [CrossRef]
Luo, M.; Xu, L.; Yang, Y.; Cao, M.; Yang, J. Laboratory flame smoke detection based on an improved YOLOX algorithm. Appl. Sci. 2022, 12, 12876. [Google Scholar] [CrossRef]
Cui, G.; He, H.; Zhou, Q.; Jiang, J.; Li, S. Research on camera-based target detection enhancement method in complex environment. In Proceedings of the 2022 5th International Conference on Robotics, Control and Automation Engineering (RCAE), Changchun, China, 28–30 October 2022. [Google Scholar]
Liu, Y.; Duan, M.; Ding, G.; Ding, H.; Hu, P.; Zhao, H. HE-YOLOv5s: Efficient Road Defect Detection Network. Entropy 2023, 25, 1280. [Google Scholar] [CrossRef]
Zhao, S.; Wu, Y.; Tong, M.; Yao, Y.; Qian, W.; Qi, S. CoT-XNet: Contextual transformer with Xception network for diabetic retinopathy grading. Phys. Med. Biol. 2022, 67, 245003. [Google Scholar] [CrossRef]
Li, Y.; Yao, T.; Pan, Y.; Mei, T. Contextual transformer networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 1489–1500. [Google Scholar] [CrossRef]
Liu, Z.; Dai, C.; Li, X. Pedestrian detection method in infrared image based on improved YOLOv7. In Proceedings of the 2023 IEEE 3rd International Conference on Information Technology, Big Data and Artificial Intelligence (ICIBA), Chongqing, China, 26–28 May 2023. [Google Scholar]
Li, T.; Yin, Y.; Yi, Z.; Guo, Z.; Guo, Z.; Chen, S. Evaluation of a convolutional neural network to identify scaphoid fractures on radiographs. J. Hand Surg. (Eur. Vol.) 2023, 48, 445–450. [Google Scholar] [CrossRef]
Ye, M.; Tan, G.; Tang, J.; Feng, J.; Huang, X.; Sun, W.F. Detection & tracking of multi-scenic lane based on segnet-LSTM semantic split network. SAE Int. J. Adv. Curr. Pract. Mobil. 2021, 3, 2494–2500. [Google Scholar]
Bandy, A.D.; Spyridis, Y.; Villarini, B.; Argyriou, V. Intraclass clustering-based CNN approach for detection of malignant melanoma. Sensors 2023, 23, 926. [Google Scholar] [CrossRef]
Zhang, Z.; Wu, S.; Peng, X.; Wang, W.; Li, R. Continuous learning deraining network based on residual FFT convolution and contextual transformer module. IET Image Process. 2023, 17, 747–760. [Google Scholar] [CrossRef]
Ji, Z.; Wu, Y.; Zeng, X.; An, Y.; Zhao, L.; Wang, Z.; Ganchev, I. Lung nodule detection in medical images based on improved YOLOv5s. IEEE Access 2023, 11, 76371–76387. [Google Scholar] [CrossRef]
He, T.; Zhang, Z.; Zhang, H.; Zhang, Z.; Xie, J.; Li, M. Bag of tricks for image classification with convolutional neural networks. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
Jiang, K.; Xie, T.; Yan, R.; Wen, X.; Li, D.; Jiang, H.; Jiang, N.; Feng, L.; Duan, X.; Wang, J. An attention mechanism-improved YOLOv7 object detection algorithm for hemp duck count estimation. Agriculture 2022, 12, 1659. [Google Scholar] [CrossRef]
Yang, Z.; Feng, H.; Ruan, Y.; Weng, X. Tea tree pest detection algorithm based on improved Yolov7-Tiny. Agriculture 2023, 13, 1031. [Google Scholar] [CrossRef]
Santos, J.; Peixinho, N.; Barata, T.; Pereira, C.; Coimbra, A.P.; Crisostomo, M.M.; Mendes, M. Sunspot detection using YOLOv5 in spectroheliograph H-alpha images. Appl. Sci. 2023, 13, 5833. [Google Scholar] [CrossRef]
Jia, F.; Tan, J.; Lu, X.; Qian, J. Radar timing range-doppler spectral target detection based on attention ConvLSTM in traffic scenes. Remote Sens. 2023, 15, 4150. [Google Scholar] [CrossRef]
Chi, S.; Liang, Y.; Chen, W.; Hou, Z.; Luan, T. Numerical simulation of tail over-fire air supply of a grate biomass boiler. Energies 2022, 15, 7664. [Google Scholar] [CrossRef]

Figure 1. Dataset for biomass briquettes and coal. (a) Straw blocks; (b) straw pellets; (c) wood blocks; (d) wood pellets; (e) coal.

Figure 2. Structure of original YoloX-S model.

Figure 3. Contextual Transformer (CoT) Block. H: height, W: width, C: number of channels. Ch: the number of heads; * denotes local matrix multiplication that measures the pairing relationship between each query and the corresponding keys within a local k × k grid in the space.

Figure 4. Convolutional neural network.

Figure 5. Improved YoloX-S network structure.

Figure 6. Change curve of loss value of the model.

Figure 7. The mAP curve after training the YOLOv7 model.

Figure 8. Feature map visualization (a) without CoT block and (b) with CoT block.

Figure 9. The target missed detection rate with and without CoT: (a) without CoT block; (b) with CoT block.

Figure 10. Evaluations of Recall, AP, and F1: (a) Recall without CoT block. (b) Recall with CoT block. (c) AP without CoT block. (d) AP with CoT block. (e) F1 without CoT block. (f) F1 with CoT block.

Figure 11. Example detection diagram: (a) coal of the original model; (b) coal of the improved model; (c) straw blocks of the original model; (d) straw blocks of the improved model; (e) wood blocks of the original model; (f) wood blocks of the improved model; (g) straw pellets of the original model; (h) straw pellets of the improved model; (i) wood pellets of the original model; (j) wood pellets of the improved model; (k) image with different object categories.

Table 1. The backbone, mAP, loss, and FPS of the improved YoloX-S model and the other models.

Models	Backbone	mAP/%	Loss	FPS/(f/s)
YoloX-L	CSPDarknet53	94.28	2.73	36
Yolov7	ELAN	94.32	3.11	75
Yolov8	CSPDarknet	93.96	2.27	76
Yolov5	Darknet53	83.79	5.20	22
YoloX-S	CSPDarknet53	95.40	2.67	29
Improved YoloX-S	CSPDarknet53 + COT	96.60	2.25	73

Table 2. The results of the ablation experiments with different modules. The ‘—’ sign indicates that no improvement modules were added to the benchmark model in experiment 1.

Experiment	Module	mAP/%	Recall/% (Straw Pellets)	FPS/(f/s)
1	—	94.28	89.10	83
2	Focal loss	83.79	87.35	82
3	CSPDarknet53 + CoT	94.32	89.37	80
4	CSPDarknet53 + SE + Focal loss	95.40	90.53	75
5	CSPDarknet53 + CBAM + Focal loss	95.54	90.64	75
6	CSPDarknet53 + COT + Focal loss	96.60	93.52	73

Table 3. AP of each class with and without CoT block.

	without CoT Block	with Cot Block
Class	without CoT Block	with Cot Block
wood block	0.97	0.98
wood pellets	0.95	0.97
coal	0.96	0.97
straw pellet	0.95	0.96
straw block	0.94	0.95

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, Y.; Liu, X.; Wang, F.; Ren, D.; Li, Y.; Mu, Z.; Li, S.; Jiang, Y. Self-Attention-Mechanism-Improved YoloX-S for Briquette Biofuels Object Detection. Sustainability 2023, 15, 14437. https://doi.org/10.3390/su151914437

AMA Style

Wang Y, Liu X, Wang F, Ren D, Li Y, Mu Z, Li S, Jiang Y. Self-Attention-Mechanism-Improved YoloX-S for Briquette Biofuels Object Detection. Sustainability. 2023; 15(19):14437. https://doi.org/10.3390/su151914437

Chicago/Turabian Style

Wang, Yaxin, Xinyuan Liu, Fanzhen Wang, Dongyue Ren, Yang Li, Zhimin Mu, Shide Li, and Yongcheng Jiang. 2023. "Self-Attention-Mechanism-Improved YoloX-S for Briquette Biofuels Object Detection" Sustainability 15, no. 19: 14437. https://doi.org/10.3390/su151914437

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Self-Attention-Mechanism-Improved YoloX-S for Briquette Biofuels Object Detection

Abstract

1. Introduction

2. Related Works

2.1. Image Datasets

2.1.1. The Making and Processing of the Fuel Image Datasets

2.1.2. Setup of Experimental Platform

2.1.3. Model Training

2.1.4. Evaluation Metrics

2.2. Methodologies

2.2.1. YoloX-S Network

2.2.2. Contextual Transformer Network

2.2.3. Convolutional Neural Network

2.2.4. CoT Block in the Improved YoloX-s Network

3. Experimental Analysis

3.1. Comparative Experiment

3.2. Ablation Experiments

4. Results and Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI