Infrared Imaging Detection for Hazardous Gas Leakage Using Background Information and Improved YOLO Networks

Wang, Minghe; Sheng, Dian; Yuan, Pan; Jin, Weiqi; Li, Li

doi:10.3390/rs17061030

Open AccessArticle

Infrared Imaging Detection for Hazardous Gas Leakage Using Background Information and Improved YOLO Networks

by

Minghe Wang

,

Dian Sheng

,

Pan Yuan

,

Weiqi Jin

^* and

Li Li

MOE Key Laboratory of Optoelectronic Imaging Technology and System, Beijing Institute of Technology, Beijing 100081, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(6), 1030; https://doi.org/10.3390/rs17061030

Submission received: 9 January 2025 / Revised: 7 March 2025 / Accepted: 13 March 2025 / Published: 15 March 2025

(This article belongs to the Special Issue Recent Advances in Infrared Target Detection)

Download

Browse Figures

Versions Notes

Abstract

:

Hazardous gas leakage in the petrochemical industry frequently results in major incidents. A significant challenge arises due to the limitations of the current gas plume target feature extraction and identification techniques, which reduce the automated detection capabilities of remote monitoring systems. To address this, we propose BBGFA-YOLO, a real-time detection method leveraging background information and an improved YOLO network. This approach is designed specifically for the infrared imaging of gas plume targets, fulfilling the requirements of visual remote monitoring for hazardous gas leaks. We introduce a synthetic image colorization method based on background estimation, which leverages background estimation techniques to integrate motion features from gas plumes within the synthesized images. The resulting dataset can be directly employed by existing target detection networks. Furthermore, we introduce the MSDC-AEM, an attention enhancement module based on multi-scale deformable convolution, designed to enhance the network’s perception of gas plume features. Additionally, we incorporate an improved C2f-WTConv module, utilizing wavelet convolution, within the neck stage of the YOLO network. This modification strengthens the network’s capacity to learn deep gas plume features. Finally, to further optimize the network performance, we pre-train the network using a large-scale smoke detection dataset that includes reference background information. The experimental results, based on our self-acquired gas plume dataset, demonstrate a significant improvement in detection accuracy with the BBGFA-YOLO method, specifically achieving an increase in the average precision (AP50) from 74.2% to 96.2%. This research makes a substantial contribution to industrial hazardous gas leak detection technology, automated alarm systems, and the development of advanced monitoring equipment.

Keywords:

infrared target detection; optical gas imaging; deep learning

Graphical Abstract

1. Introduction

Industrial hazardous gases are typically flammable, explosive, toxic, and hazardous, and their release can lead to large-scale deflagration and significant casualties. These gases can be categorized as follows: alkanes (e.g., methane, ethane), unsaturated hydrocarbons (including alkenes such as ethylene and alkynes such as acetylene), nitrogenous gases (e.g., ammonia, nitrogen oxides), sulfurous gases (e.g., hydrogen sulfide, sulfur dioxide), and toxic inorganic gases, notably carbon monoxide (CO). While the human eyes and conventional visible light imaging systems cannot directly detect typical industrial hazardous gas leaks in a non-contact manner, most of these gases exhibit characteristic absorption windows within the infrared spectrum. Consequently, remote optical gas imaging (OGI) utilizing multi-infrared spectral imaging systems can effectively capture the visual traces of floating hazardous gas leaks [1,2]. Compared to traditional point-source gas detectors, OGI offers a visual, non-contact measurement technique. OGI systems capture real-time changes in the gas absorption of infrared radiation, enabling rapid responses and real-time monitoring, making them well suited for industrial applications. This method provides superior visibility and detection efficiency for point-source, laser-based gas leak detection. Infrared multispectral imaging technology has been successfully applied in recent years to address the pressing need for rapid gas plume leak detection at industrial sites. However, the inherent trade-off between the imaging detection signal-to-noise ratio and detection efficiency, stemming from the use of narrow-band infrared spectroscopy, often renders the current algorithms susceptible to false alarms or missed detections due to noise interference. This poses a challenge for reliable gas plume target detection and automated alarm systems. Therefore, further research into robust gas plume target identification and automated detection methods is urgently required.

The physical process of gas plume leakage involves the diffusion of gas into the surrounding environment. Existing theoretical models describing this process primarily include the simplified Gaussian diffusion model and complex hydrodynamic-based diffusion models [3]. In practical leak detection scenarios, the gas concentrations are often low. Furthermore, uncooled infrared imaging systems, while suitable for long-duration operation, typically exhibit lower sensitivity than cooled systems. Consequently, gas plumes often present a low signal-to-noise ratio, with weakly translucent features in the acquired images. The factors influencing the intensity of the gas plume include, but are not limited to, the gas concentration, the gas temperature, the temperature differential between the gas and its background, and the complexity of the background itself [4]. Specifically, infrared images of gas plumes are characterized by the following: (1) the imaging system does not always capture the entirety of the gas plume structure; (2) the gas plume features are frequently superimposed on complex background features, making differentiation difficult; (3) the images contain both gas concentration distribution features and plume morphology features, with the latter being more visually dominant. Figure 1 illustrates gas plume imaging with three representative backgrounds. With a uniform blackbody background, the main features of the gas plume are clearly discernible. When imaged against an industrial site with a textured background, the gas plume features are superimposed onto structural textures (e.g., pipelines) but remain identifiable. However, in the case of an industrial site with a textured background causing high interference, many areas exhibit texture patterns similar to the gas plume, potentially causing false detections and reducing the performance of detection algorithms. As highlighted by the boxed region of a genuine gas leakage, relying solely on the concentration distribution or plume morphology features in a single-frame image may not be adequate for accurate identification. This necessitates further analysis based on the gas plume’s dynamic characteristics and the application of more advanced image processing and deep learning techniques to enhance both the detection accuracy and robustness.

This study introduces a novel, real-time hazardous gas leakage detection method using infrared imaging, termed Background-Based Gas Feature Attention YOLO (BBG-FA-YOLO). First, given that outdoor gas plume target discrimination often depends on plume motion information, a traditional Gaussian mixture model (GMM) is employed in this approach for real-time background estimation, generating reference background frames. These background frames are then combined with the current infrared image frames to capture plume target motion information. Synthesized color infrared images are generated by overlaying the motion information with the original infrared images, which is beneficial for dataset labeling and network training, and the generated synthetic image dataset is directly applicable to existing target detection networks, leading to a substantial enhancement in the gas plume detection performance. The improved network utilizes YOLOv8 as a base architecture. The multi-scale deformable convolution attention enhancement module (MSDC-AEM) is designed to address the weak and variable morphologies of gas plume targets in the synthetic colorized plume images input into the network. Specifically, the MSDC-AEM employs multi-scale deformable convolution to enhance gas edge features and improve the correlation between plume features. Through targeting deep feature maps of the gas plumes, the network enhances plume feature learning via wavelet convolution mechanisms. The proposed method significantly improves the gas plume target detection performance of YOLOv8 while preserving its speed advantages. Validation on self-collected datasets demonstrates superior detection results. Furthermore, pre-training the network using a large-scale synthetic smoke dataset further enhances its gas detection capabilities and deployment speed. Background estimation-based colorization for image synthesis demonstrates practical utility and potential applicability to dim target detection.

The contributions of this study are summarized as follows.

We employ background estimation and image synthesis methods for the first time to incorporate background information into gas plume images, seeking to address the challenge of the feature texture information of leaking gas plume targets being easily disrupted by complex backgrounds. This approach significantly enhances the learning capacity of existing neural networks regarding the motion characteristics of gas plume targets and reduces the difficulty in manual dataset labeling.
To effectively manage the characteristics of weak and unfixed gas plume targets, we introduce a multi-scale, deformable, large-kernel convolution gas plume attention enhancement module (MSDC-AEM). This module is designed to flexibly capture the diverse features of gas plume targets, improving the overall network’s perception of gas plume features.
We integrate an enhanced C2f-WTConv module, based on wavelet convolution, into the neck section of the YOLO network. This module strengthens the learning of gas plume features from deep features, ensuring that gas plume targets can still be accurately identified even under complex background conditions.

2. Related Works

Traditional hazardous gas leakage detection methods based on infrared imaging often rely on the motion characteristics of gas plumes [5]. However, these methods exhibit limited discriminatory capabilities due to complex scene interferences, frequently resulting in missed detections and false alarms, which undermines their reliability in industrial applications. In recent years, the rapid advancement of deep learning technologies has significantly propelled progress in computer vision, particularly in the realm of target detection algorithms. Deep learning exhibits substantial potential for the intelligent identification of gas targets within the context of gas detection via passive infrared imaging.

Existing deep-learning-based infrared gas target detection algorithms predominantly employ convolutional neural networks (CNNs) for the automatic learning and recognition of gas characteristics. For example, ref. [6] extended the U-Net architecture from the spatial to the temporal domain, proposing a spatiotemporal U-Net architecture for the detection of time-varying gas features. This model is capable of extracting pixel-level gas masks for each frame in an input sequence. In another approach, ref. [7] developed GasNet, a deep learning model for methane leakage detection. GasNet operates in two primary steps: (1) extracting foreground moving targets via background modeling and (2) using an improved CNN to determine whether a moving target is a gas. However, this model exhibits limitations due to its training dataset, which consists of a single sky-background scene, resulting in inadequate generalization capabilities. Conversely, ref. [8] introduced a two-stage network that sequentially extracts temporal and spatial gas features and uses texture features for gas target determination, although this tandem structure is susceptible to error accumulation. Furthermore, ref. [9] designed an asymmetric 3D convolutional neural network (A3DNet) that extracts temporal features through 1D convolutional pathways and spatial texture features via 2D convolutional pathways, thereby forming an aggregated spatiotemporal feature representation. In [10], hydrocarbon leakage videos were analyzed by integrating the established Faster R-CNN two-stage detection model with optical gas imaging techniques to create a video dataset encompassing 3205 frames. To address the problem of limited training samples, a transfer learning approach was employed. Building on the U-Net architecture, ref. [11] proposed a 2.5D-Unet semantic segmentation model that uses stacked 2D spatial convolution, 1D temporal convolution, and 3D spatiotemporal convolution to enhance the network’s ability to model the appearance and motion patterns of leaking gases. Similarly, ref. [12] utilized the GasVid dataset and VideoGasNet to address methane leakage classification, retaining a “foreground–background separation followed by suspicious target judgment” strategy. More recently, ref. [13] introduced a network model based on YOLOv8-seg, coupled with a cooled mid-wave infrared (MWIR) system, to achieve the efficient detection of low-contrast gas leaks. This model employs a global attention mechanism (GAM) to enhance small-scale target detection and leverages transfer learning to incorporate prior features. Finally, ref. [14] presented a spatiotemporal feature fusion network (TSFF-Net) designed specifically for real-time SF6 gas leakage video detection. A key innovation of TSFF-Net is the inclusion of foreground pixel images depicting SF6 gas motion characteristics, which, when used in conjunction with infrared gas plume images, significantly improve the model’s capacity to detect SF6 gas leakages. While methods employing traditional techniques to extract gas plume target motion features and subsequently inputting them into neural networks have been developed, the approach of generating gas plume target foreground pixels based on foreground detection is inherently limited by the robustness of the foreground detection method itself. Under complex background conditions, these methods are susceptible to variations in illumination, background interference, and other factors, which hinders the accurate extraction of gas plume foreground pixels. Consequently, this limitation leads to a reduction in the detection performance of such algorithms.

Therefore, existing infrared gas target detection algorithms based on deep learning still exhibit significant shortcomings. Researchers often rely on pre-existing neural network architectures, with insufficient consideration given to the specific characteristics of gas plumes. The gas plume target detection performance is inherently constrained when using a single-frame image. It is therefore crucial to integrate multi-frame analysis and incorporate additional gas plume characteristics to improve the detection accuracy. Although some methods introduce foreground detection algorithms for motion feature extraction, these are prone to interference from noise, occlusion, illumination changes, and other factors under complex industrial background conditions, leading to degradation in the detection performance. The lack of comprehensive research on gas plume characteristics not only affects the detection efficiency but also contributes to missed detections and false positives in practical industrial applications. Hence, there is a pressing need to develop more efficient and robust target detection methods to address these existing challenges.

3. Methods

The detection workflow of the BBGFA-YOLO gas plume detection method is illustrated in Figure 2 and comprises three primary stages. First, a Gaussian mixture model (GMM)-based background estimation technique is employed. This estimated background is then utilized to synthesize colorized gas plume images, which subsequently serve as input to the target detection network. Second, the YOLOv8 target detection network forms the foundational structure, which is augmented by a custom-designed target detection module—the multi-scale deformable convolution with attention enhancement module (MSDC-AEM). This module enhances the network’s capacity to detect gas plume features by leveraging multi-scale deformable convolutions and attention mechanisms. Finally, the enhanced C2f-WTConv module is integrated into the neck of the YOLOv8 network to emphasize the extraction of gas plume features from the deep feature maps, thereby increasing the network’s focus on these features.

3.1. Synthesis of Gas Plume Images Based on Reference Backgrounds

Gas plume targets captured by imaging systems often appear faint due to factors such as the gas concentration and temperature. Despite their low visibility, their potential hazards necessitate effective detection. These weak plumes, exemplified in Figure 3, present a challenge for direct recognition and detection using a single feature. Figure 3a depicts a representative frame containing a gas plume target, with the plume’s location annotated. Due to the gas plume’s appearance as a faint, translucent plume, the concentration distribution is barely perceptible, and the plume is superimposed onto the background texture, making direct detection difficult. In such cases, the gas plume target can be effectively discriminated by using a clean background image (Figure 3b) as a reference. This approach is predicated on the fact that the subtle differences between the two images are discernible to the human eye. By combining the differential information from Figure 3a,b, colorization is applied, resulting in a colorized gas plume image, as shown in Figure 3c. In human observers, this colorized information enhances the prominence of the gas plume target. In a neural network, the differential information introduced by the reference background provides crucial additional features of the gas plume, which is expected to improve the network’s ability to detect and recognize these features effectively.

Given the crucial role of the reference background in gas plume detection, this paper introduces a background-informed gas plume target detection scheme. This scheme is complemented by a set of tailored detection and training mechanisms designed to enhance both the detection accuracy and reliability. We propose an image synthesis colorization approach to effectively integrate reference background information with gas-containing image data, as depicted in Figure 4. This approach not only facilitates the channel-wise superposition of information but also enhances the visual prominence of the gas plume. This is particularly advantageous for both the annotation of gas plume datasets and the subsequent training of neural networks. Specifically, a Gaussian mixture model (GMM) is employed for background modeling [15], enabling the real-time generation of a reference background frame. This reference frame is then superimposed onto the current frame of the gas plume image, resulting in a multi-channel synthetic image.

A Gaussian mixture model is a classical background modeling approach that characterizes individual pixel points in an image. The probability distribution of a pixel in a frame sequence image is obtained by counting this pixel, a process that can be described as follows:

P (x_{t}) = \sum_{k = 1}^{K} ω_{k, t} \cdot N (x_{t}; μ_{k, t}, σ_{k, t})

(1)

where

x_{t}

represents the observed pixel value at time

t

;

K

denotes the number of Gaussian distributions in the mixture; and

ω_{k, t}

,

μ_{k, t}

, and

σ_{k, t}

represent the weight, mean, and variance of the Gaussian component at time

t

, respectively.

The background value is typically estimated by the mean vector of the Gaussian component with the highest weight that satisfies a predefined threshold

T

. Using the estimated Gaussian mixture model and a predefined threshold

T

, background image

B_{GMM}

is generated.

B_{GMM} (x, y) = μ_{B}, w h e r e B = a r g m i n_{b} (\sum_{k = 1}^{b} ω_{k} > T)

(2)

To adapt to dynamic video scenes, the Gaussian mixture model (GMM) requires real-time parameter updates. For each new pixel value, the model assesses its compatibility with the existing Gaussian components. A match is determined if the Mahalanobis distance between the observation and the mean of a Gaussian component falls below a predefined threshold. Upon a successful match, the mean, variance, and weight of the Gaussian component are updated based on the new pixel value. This update process can be represented as follows:

\{\begin{array}{l} μ_{k, t} = (1 - α) μ_{k, t - 1} + α x_{t} \\ σ_{k, t}^{2} = (1 - α) σ_{k, t - 1}^{2} + α {(x_{t} - μ_{k, t - 1})}^{T} (x_{t} - μ_{k, t - 1}) \\ ω_{k, t} = ω_{k, t - 1} + ρ (1 - ω_{k, t - 1}) \end{array}

(3)

where

α

represents the learning rate for updating the mean and variance of the Gaussian model, and

ρ

represents the learning rate for updating the weight of the Gaussian model.

Multi-channel images are generated by combining the background frame,

B_{GMM}

, and the current frame,

I_{current}

. These composite images are readily usable as input for network training. We employ a visible-light RGB image composition approach to streamline both the storage and display of these synthesized multi-channel images. Specifically, the reference background image,

B_{GMM}

, is embedded into the blue (B) channel, while the current gas plume frame,

I_{current}

, is embedded into both the red (R) and green (G) channels, resulting in the synthesized image

I_{fusion}

, as follows:

I_{fusion} = [\begin{matrix} I_{R} \\ I_{G} \\ I_{B} \end{matrix}] = [\begin{matrix} I_{current} (x, y) \\ I_{current} (x, y) \\ B_{GMM} (x, y) \end{matrix}]

(4)

This design ensures that the background pixels across all three channels maintain a grayscale appearance after synthesis. The presence of gas plume absorption within the blue (B) channel accentuates the gas plume’s visibility in this channel, thereby imparting a distinct color to the gas plume within the synthesized multi-channel image. The synthesized infrared gas plume color images not only capture the differential features between the gas plume and the reference background but also provide more comprehensive feature information for the neural network. This approach allows for direct integration with existing target detection networks. Furthermore, the incorporation of color features into the gas plume representation significantly simplifies the manual labeling process and enhances the detection capabilities of gas plume datasets. While the Gaussian mixture model (GMM) method is primarily effective in static scenarios and may be affected by background changes, limiting its capacity for ideal background estimation, the neural network can independently recognize the gas plume based on the gas channel. This mitigates the potential interference introduced by the GMM method in changing background scenarios and enhances the algorithm’s stability compared to directly extracting gas plume targets using the GMM.

3.2. BBGFA-YOLO Network Architecture

The architecture of the proposed BBGFA-YOLO network is depicted in Figure 5. It utilizes the synthesized infrared multi-channel color image as input and employs YOLOv8 as its base network. To enhance the performance, the network incorporates a gas feature attention enhancement module, the MSDC-AEM, and an improved C2f-WTConv-based module to produce gas plume target detection results. YOLOv8 represents the current state of the art in real-time object detection networks. While maintaining a high speed, YOLOv8 achieves significant improvements in accuracy compared to its predecessors, rendering it suitable for real-time detection tasks such as gas plume target detection. Building upon the YOLOv8 network, the multi-scale deformable convolution-based attention enhancement module (MSDC-AEM) is strategically placed at the initial feature extraction stage to capture the whole features of the gas plume and apply attentional enhancement. Furthermore, the C2f-WTConv module is integrated into the backend of the network’s neck to perform complex attentional computations based on the depth feature map of the gas plume. This extracts essential gas information from the channel while mitigating the influence of background features.

3.3. Gas Feature Attention Enhancement Module, MSDC-AEM

In this paper, a multi-scale deformable convolutional attention module, the MSDC-AEM (shown in Figure 6), is proposed to enhance the feature learning capabilities of YOLOv8 in infrared gas plume image detection. The module is designed to provide more accurate detection performance by taking into account the characteristics of gas plumes, including their irregular shapes, variable scales, and diffusion properties.

Considering the unique characteristics of gas plumes, the proposed MSDC-AEM incorporates deformable large-kernel convolution (DLK-Conv) [16]. This approach leverages the efficient implementation of deformable large-kernel convolution to balance the receptive field with the computational demands. Specifically, the module employs three parallel deformable convolution branches, each with a distinct kernel size (3 × 3, 5 × 5, and 7 × 7), to effectively capture multi-scale features. These branches incorporate large-kernel deformable convolution operations and conventional deformable convolution (DConv) operations to accommodate the dynamic and irregular morphologies inherent in gas plumes. Mathematically, given an input feature map,

X \in ℝ^{C \times H \times W}

, where

C

,

H

, and

W

represent the number of channels, height, and width, respectively, the output of each branch can be described as follows:

F_{attn}^{k} = CConv (SConv ({Conv}_{DLK}^{k} ({Conv}_{D}^{k} (F)))

(5)

where

k \in \{3, 5, 7\}

denotes the convolutional kernel size;

{Conv}_{D}^{k}

denotes regular deformable convolution;

{Conv}_{DLK}^{k}

denotes deformable large-kernel convolution;

SConv

denotes spatial feature convolution; and

CConv

denotes channel feature convolution.

Regarding the features of the three branches, the adaptive weights learned by the network are used for feature fusion, and the process of obtaining the fusion weights

w_{fusion} \in ℝ^{3 \times H \times W}

can be tabulated as follows:

w_{fusion} = σ_{S} (Conv (Concat [F_{attn}^{3}, F_{attn}^{5}, F_{attn}^{7}]))

(6)

where

σ_{S}

denotes the activation function

Sigmoid

, and

Conv

denotes the convolution operation.

Based on the fusion weights

w_{fusion} \in ℝ^{3 \times H \times W}

of the three channels

w_{fusion}^{(i)}

, the fusion process can be tabulated as follows:

F' = w_{fusion}^{(0)} \otimes F_{attn}^{3} + w_{fusion}^{(1)} \otimes F_{attn}^{5} + w_{fusion}^{(2)} \otimes F_{attn}^{7}

(7)

where

\otimes

denotes the feature map multiplication.

The final output of the module is obtained:

O u t p u t = F \otimes Conv (F')

(8)

The MSDC-AEM demonstrates a significantly enhanced receptive field compared to conventional convolution methods. Through the adaptive sampling capabilities of deformable convolution, the module effectively focuses on gas features within the receptive field. Building upon conventional deformable convolution, dilated convolution is incorporated into the large deformable convolution layer. This integration effectively expands the receptive field, and the resulting effective kernel size can be characterized as follows:

k^{'} = k + (k - 1) (d - 1)

(9)

where

k

is the kernel size and

d

is the dilation rate.

Given the receptive field

R_{eff}^{1}

of the deformable convolution layer and a predefined step size

s

, the resulting receptive field of the deformable large-kernel convolution operation is defined as follows:

R_{eff}^{2} = R_{eff}^{1} \times k^{'} - (k^{'} - 1) \times (s - 1)

(10)

This module offers notable advantages in gas plume detection. The deformable convolutional operations allow the network to adapt to a variety of gas plume morphologies, thus enhancing the detection accuracy across diverse environmental conditions. The multi-scale architecture facilitates effective feature extraction for gas plumes of varying sizes, enabling the recognition of both small-scale leaks and large-scale diffusion. The module’s fusion mechanism intelligently combines features from different scales, providing the comprehensive characterization of gas plume attributes. Despite its sophisticated design, the module maintains manageable computational complexity through the use of group convolution and an efficient feature fusion approach. Experimental and visualization results demonstrate that the module achieves significant performance improvements in gas plume detection tasks.

3.4. C2f-WTConv Module Based on Deformable Convolution Improvement

The irregular morphologies and dynamic characteristics of gas plumes in infrared imagery present significant challenges for target detection models. To address these challenges, we propose a novel gas plume feature extraction module, termed C2f-WTConv, which builds upon the foundational C2f module of YOLOv8. As illustrated in Figure 6, this module integrates a wavelet convolution [17] mechanism. The inclusion of these elements allows the model to better adapt to the unique feature learning requirements inherent in infrared images of gas plumes.

As illustrated in Figure 7, the fundamental architecture of the C2f-WTConv module is consistent with that of the C2f module. Within this structure, the CBS (Conv-BN-SiLU) module serves as a fundamental building block in the YOLO network. The CBS module comprises a convolutional layer (Convolution), batch normalization, and the SiLU activation function arranged sequentially. The redesigned Bottleneck module integrates a wavelet convolution mechanism to facilitate the learning of gas plume characteristics with an expanded receptive field, thereby enhancing the overall network performance.

Let the input feature map of the Bottleneck-WTConv module be

F \in ℝ^{C \times H \times W}

; then, the output of the module can be described as follows:

M (F) = σ_{SL} (BN ({Conv}_{wt} (F)))

(11)

where

σ_{SL}

denotes the activation function

SiLU

;

BN

denotes batch normalization; and

{Conv}_{wt}

denotes wavelet convolution.

The redesigned C2f-WTConv module’s feature processing can be represented as follows:

X_{attn} = CBS (Concat (F, M (F))

(12)

Given that gas diffusion exhibits varying feature patterns across spatial scales—from a high concentration in the near field to gradual dispersion in the far field—the enhanced module facilitates the learning of features with a larger receptive field. This capability allows for the more comprehensive capture of these intricate spatial characteristics within the neck of the network.

3.5. Pre-Training Method for Transfer Learning Based on Reference Background

Currently, datasets of gas plumes derived from infrared imagery are relatively scarce. Furthermore, substantial variations exist across diverse industrial environments and gas leakage modalities, necessitating data collection tailored to specific real-world scenes and gas types. However, field-collected datasets for particular industrial sites are often limited in size, posing a challenge in the adequate training of deep learning models. Consequently, leveraging pre-trained models, obtained from large-scale datasets, and subsequently applying transfer learning to smaller, field-collected datasets offers a significant advantage. This approach facilitates rapid network training and can substantially improve the model performance.

Smoke and gas plumes share feature similarities, leading to overlapping detection methodologies and techniques. Smoke often exhibits a semi-transparent appearance and overlays complex background features in images, making it a suitable proxy for the training and validation of the plume feature attention network proposed in this study. Furthermore, the maturity of visible smoke detection technologies allows for the convenient acquisition of image data. Prior research has demonstrated that transfer learning using smoke data can enhance networks’ performance. However, the existing fire smoke datasets, primarily based on visible imagery, are generally large but designed for single-frame image detection, rather than for pre-training datasets that incorporate reference backgrounds, due to technical differences.

To address the aforementioned challenges, this paper introduces a transfer learning approach incorporating reference background images to enhance the gas plume detection capabilities (illustrated in Figure 8). Specifically, a synthetic smoke dataset comprising clean background images and corresponding smoke masks is utilized as a pre-training dataset to facilitate transfer learning. The Smoke100k dataset, detailed in reference [18], serves as a large-scale benchmark dataset for smoke detection, containing 100,000 synthetically generated smoke images. Photoshop is employed to create smoke masks, which are then superimposed onto smoke-free images at various angles and shapes to produce synthetic smoke images for the dataset. By merging the synthetic smoke images with the smoke-free background images, a transfer learning dataset with reference background image channels is constructed, thereby providing a foundation for subsequent network training.

4. Experiments

4.1. Comparative Experiments on Background Estimation Models

The Gaussian mixture model (GMM), a classical background modeling technique, exhibits widespread use due to its computational efficiency and broad applicability. While numerous advanced background modeling methods have emerged in recent years, these are primarily designed for foreground detection tasks, with a focus on the precise segmentation of foreground objects. In contrast, the core objective of the background estimation module within the BBGFA-YOLO method is to generate background images devoid of any gas motion information. This allows the subsequent neural network to effectively extract inter-channel features, thereby enhancing the accuracy of gas leak detection. Consequently, the quality of the generated background image (e.g., sharpness and structural similarity) and the processing speed of the algorithm are paramount considerations.

To validate the applicability and advantages of the GMM in this study, we conducted a comparative experiment against several mainstream background modeling methods, including KNN [19], GMG [20], SubSENSE [21], LOBSTER [22], and FuzzyChoquetIntegral [23]. The experimental platform consisted of an Intel E5-2630 v4 processor and OpenCV version 4.1.0. Additionally, the single-frame processing time of the GMM method was evaluated on an NVIDIA GTX 1080Ti GPU.

We selected three video sequences containing multiple moving objects and gas release events as test data, ensuring that the first frame of each sequence was devoid of moving objects. Initially, each algorithm estimated the background image sequence from each video sequence. Subsequently, the background image generated by each algorithm was compared with the first frame of the original video sequence. The mean squared error (MSE), peak signal-to-noise ratio (PSNR), and structural similarity index (SSIM) were calculated to objectively evaluate the image quality of the backgrounds estimated by each algorithm. Concurrently, the single-frame processing time for each algorithm was recorded to assess its computational efficiency. The experimental results are presented in Table 1. As shown, the GMM exhibits advantages in the MSE, PSNR, and SSIM metrics while maintaining a high processing speed, indicating superior quality in the generated background images.

Figure 9 provides a subjective visual comparison of each algorithm’s performance. The results demonstrate that the GMM’s gas target removal capabilities are comparable to those of more advanced mainstream methods. Meanwhile, the background images generated by KNN, SubSENSE, and LOBSTER exhibit a blurring artifact, which diminishes the image quality. Considering both the objective metrics and subjective visual effects, the GMM effectively balances a high processing speed with robust image quality and gas target removal. Therefore, the GMM is better suited to the specific requirements of the BBGFA-YOLO method and performs more effectively in real-time monitoring scenarios.

4.2. Dataset Preparation

Given the limited availability of infrared imaging data for gas plume detection, an LT640P uncooled infrared focal plane camera (640 × 512 pixels, 12 μm pixel size, NETD = 50 mK, 50 mm infrared objective lens focal length), manufactured by the iRaytek Company (Yantai, China), was utilized in this study to capture difluoroethane gas release under varying outdoor conditions (characteristic absorption band: 8.5–9 μm). The gas release flow rate was controlled within the range of 0.5 to 5 L/min, and a synthetic colorized infrared gas plume target dataset, incorporating reference backgrounds, was compiled and constructed. Following the synthetic colorization method detailed in this paper, a total of 3222 synthetic images from 155 unique scenes were obtained. The dataset was subsequently annotated using the LabelImg software (version 1.8.6), resulting in a complete and annotated dataset.

We paired the annotation files with the corresponding raw infrared gas plume gray-scale images to evaluate the performance enhancement afforded by the synthetic color dataset compared to standard infrared grayscale images. This process yielded an infrared grayscale image gas plume dataset for comparative analysis, as depicted in Figure 10.

4.3. Training Configuration and Evaluation Indicators

The network training was conducted on a single NVIDIA 3090 GPU utilizing a batch size of 128, an initial learning rate of 1 × 10⁻⁴, and the Adam optimizer for network parameter updates. All experiments were performed on the Ubuntu 20.04 operating system. The neural network implementation was based on an adaptation of the YOLOv8 architecture. The training procedure consisted of 500 epochs, during which input images were resized to 640 × 640 pixels. We employed the stochastic gradient descent (SGD) algorithm for optimization, with an initial learning rate of 0.01, which was gradually reduced to 0.0001. Critically, during both the training and comparison phases, each model was trained from scratch, without the use of pre-trained weights.

To evaluate the performance of our model in target detection tasks, the average precision (AP) was selected as the primary evaluation metric. The AP provides a comprehensive assessment, considering both precision and recall, and effectively characterizes models’ performance across varying recall levels. AP values are calculated based on the Intersection over Union (IoU) between the predicted bounding boxes and ground truth bounding boxes, quantifying the degree of overlap between a predicted bounding box and its corresponding ground truth. It is expressed as follows:

I o U = \frac{B o u n d i n g B o x \cap G r o u n d T r u t h}{B o u n d i n g B o x \cup G r o u n d T r u t h}

(13)

Specifically, the AP50 and AP50-95 metrics were employed to evaluate the model’s performance. The AP50 denotes the average AP value for an IoU threshold of 0.5, while the AP50-95 represents the average AP value across a range of IoU thresholds from 0.5 to 0.95, with a step size of 0.05. These metrics provide a comprehensive evaluation of the model’s performance across varying levels of localization accuracy, encompassing both coarse localization and precise object capture. Consequently, the AP50 and AP50-95 are widely adopted as standard evaluation metrics in the target detection domain, facilitating comparisons with other studies.

In the context of production safety, the performance evaluation of target detection systems must consider not only the detection accuracy but also the prevalence of false positives and false negatives. A false positive (FP) occurs when a non-gas-leak event is incorrectly identified as a gas leak, potentially leading to unnecessary emergency responses and wasted resources. Conversely, a false negative (FN) represents a failure to detect an actual gas leak, which can result in unaddressed safety hazards and, ultimately, serious safety accidents. Therefore, the false positive ratio (FPR) and false negative ratio (FNR) serve as critical indicators in evaluating the reliability and practicality of gas leak detection systems.

To comprehensively evaluate the performance of the BBGFA-YOLO method, we incorporated the FPR and FNR as evaluation metrics. Calculating the FPR and FNR requires the definition of the following four key metrics, which are commonly used to evaluate networks’ performance on a given dataset: true positives (TP), i.e., the number of gas leak events correctly detected; false positives (FP), i.e., the number of non-gas-leak events incorrectly identified as gas leaks; true negatives (TN), i.e., the number of non-gas-leak events correctly identified; and false negatives (FN), i.e., the number of gas leak events that were not detected.

Based on these four metrics, the FPR and FNR are calculated as follows:

\begin{matrix} F P R = \frac{F P}{F P + T N} \\ F N R = \frac{F N}{F N + T P} \end{matrix}

(14)

In addition to the AP50, AP50-95, FPR, and FNR, the computational complexity of the network was assessed using the GFLOPs metric.

4.4. Comparison with Classical Target Detection Models

To demonstrate the efficacy of our proposed BBGFA-YOLO detection model, we benchmarked its performance against several established object detection architectures. This comparative analysis included classical models like RetinaNet [24] and EfficientDet [25], as well as the YOLO series networks YOLOv5 [26], YOLOv8 [27], and YOLOv10 [28]. Furthermore, we included OGI Faster-RCNN [10], TSFF-Net [14], and FFBGD [29], which are dedicated networks for gas detection based on infrared imagery. Under consistent experimental conditions, we evaluated the performance of several target detection algorithms using both a gas plume grayscale image dataset and a gas plume synthetic color dataset. The results of these evaluations are presented in Table 2. TSFF-Net and FFBGD both employ a foreground detection method to extract potential gas targets before feeding the data into the neural network. These extracted foregrounds are then fused with the current image frame and passed to the neural network for final detection. Consequently, the dataset used to train these models is labeled in the “Fusion“ format.

As designed, the inclusion of a reference background significantly enhanced both the characterization of gas plumes and the network’s detection capabilities. When comparing the training results using the grayscale image dataset to those using the synthetic RGB image dataset, we observed that, for all tested target detection networks, the synthetic dataset substantially improved the network’s gas plume detection performance. This improvement was particularly pronounced for the YOLO-series networks, with the benchmark network YOLOv8 demonstrating the best performance within this series. Specifically, our results for the AP50 and AP50-95 indicate that the BBGFA-YOLO method achieved the highest gas plume target detection performance, reaching an AP50 of 96.1%. This represents a notable improvement compared to the benchmark YOLOv8, demonstrating the efficacy of our proposed module. A similar gas plume detection method, TSFF-Net, which is based on YOLOv5, generates a gas plume target foreground image using image sequences and inputs it, along with the current frame image, into the network for detection. While this method achieved an AP50 of 90.5%, it still exhibited a considerable performance gap compared to our BBGFA-YOLO method. This difference is likely because the differencing method used to calculate the foreground image from the reference background and the current frame is susceptible to the introduction of interfering features, such as image noise and various moving targets, which degrade the detection performance of the neural network.

BBGFA-YOLO benefits from the speed advantages of the YOLOv8 architecture. Despite the addition of a background estimation method, which contributes approximately 0.5 ms to the processing time, the BBGFA-YOLO network runs in real time on our 3090 GPU platform, achieving an FPS of 22.3. The overall performance improvement is significant when compared with traditional YOLO-series networks. The improvements introduced through our MSDC-AEM and the refined C2f-WTConv further enhance the performance, which confirms their efficacy for gas plume feature extraction. The network has been thoroughly tested at various petroleum and chemical sites, and the results are satisfactory for field deployment.

Figure 11 illustrates representative dataset images alongside their corresponding detection results. These results are derived from three different approaches: YOLOv8 trained on the grayscale image dataset, YOLOv8 trained on the synthetic color image dataset, and the BBGFA-YOLO network trained on the synthetic color image dataset. Each detection result is visualized with a bounding box, and the associated confidence score is displayed in the lower-left corner of each image. As evidenced by these results, the synthetic color image dataset yields significantly higher detection confidence than the grayscale image dataset. This observation suggests that color information provides richer feature representation, thereby enhancing the network recognition accuracy. Moreover, the BBGFA-YOLO network demonstrates consistently higher confidence scores than the baseline YOLOv8 network, further confirming its superior performance for infrared imaging gas leakage detection tasks.

In practical applications, non-target object interference can be effectively mitigated by judiciously selecting a confidence threshold for the network. This approach further reduces the likelihood of missed detections and false alarms, enhancing the reliability and adaptability of the detection system in complex environments. The improvements afforded by the proposed method are of significant practical value for the field of infrared imaging gas leakage detection.

4.5. Ablation Experiments

We conducted a series of ablation experiments to rigorously validate the effectiveness of the specific enhancements incorporated into BBGFA-YOLO. These experiments were based on the original YOLOv8 network as a baseline, and the proposed improvements were added incrementally. We maintained identical experimental conditions across all trials, including the use of the same hyperparameters, to ensure a fair comparison. Furthermore, we included a comparison of the results with and without transfer learning. The incremental improvements evaluated in these ablation studies were as follows:

I.: Training with the gas plume grayscale image dataset;
II.: Training with the gas plume synthetic color image dataset;
III.: Adding the multi-scale gas attention enhancement module, MSDC-AEM;
IV.: Adding the refined feature extraction module, C2f-WTConv;
V.: Adding the synthetic color smoke dataset for transfer learning.

The experimental results demonstrate that the incorporation of our proposed improvements consistently enhanced the accuracy of the gas plume target detection network, as detailed in Table 3. Notably, Improvement II yielded the most significant performance gain, increasing the AP50 index by 26.8% compared to the baseline YOLOv8. This outcome underscores the efficacy of augmenting the network’s awareness of background information through the addition of reference background images. Improvements III and IV also contributed substantial performance enhancements, demonstrating the effectiveness of each of these proposed modules.

4.6. Visualization Experiments

To evaluate the feature extraction capabilities of the BBGFA-YOLO method in the context of gas plume detection, we employed the GradCAM [30] visualization technique and compared the results against the benchmark YOLOv8 model. Gradient-weighted class activation mapping (GradCAM) is a network interpretability method designed for convolutional neural networks. It leverages the gradient information of a specific convolutional layer to assign importance values to each neuron, visualizing the network’s attention weights as a heatmap superimposed on the input image. We generated feature activation maps from the layers containing the MSDC-AEM using GradCAM and compared these maps with those generated by the baseline model, which lacked the MSDC-AEM. To facilitate a detailed analysis of the performance differences between the models, we selected three representative images from our self-constructed synthetic color gas plume dataset, representing small, medium, and large gas plume scales. The visualization results effectively highlight the feature extraction process of each model, revealing their ability to focus on relevant gas plume features.

In the attention heatmaps generated, brighter regions indicate areas where the model places high attention, which corresponds to its attentional focus during gas plume detection. For comparison purposes, we overlaid the detection bounding box onto the image and displayed the attention heatmap within the bounding box. The visualization results clearly delineate the heatmap regions extracted by each model and the distribution of the attentional weights. As shown in Figure 12, with the three different target scales, the baseline YOLOv8 model provides reasonably accurate detection results. However, the feature attention is insufficient for medium- and large-scale gas plume targets, and the network primarily focuses on the edge features of the gas plume. Such edge-focused feature extraction can make the network vulnerable to interference from various minor environmental textures in real-world scenarios. In contrast, BBGFA-YOLO, incorporating the multi-scale deformable attention module (MSDC-AEM), effectively attends to both the edges and the center of the gas plume. Through its multi-scale attention design, BBGFA-YOLO achieves complete gas plume coverage for all three scales, demonstrating the effective enhancement of the attention given to gas plume features.

5. Discussion

This study introduces BBGFA-YOLO, a real-time infrared imaging gas plume detection method that leverages background information and an enhanced YOLO network architecture. This approach has demonstrated significant success in gas leak detection scenarios. The experimental results confirm that BBGFA-YOLO effectively improves both the accuracy and robustness of gas leakage detection.

For background estimation, we employed the classical Gaussian mixture model (GMM). While superior background modeling techniques exist, the GMM offers a compelling balance between computational efficiency and image quality, aligning well with the requirements of this study. Although the GMM facilitates the rapid generation of high-quality background images, thereby providing a solid foundation for subsequent neural network feature extraction, its adaptability to illumination changes and dynamic backgrounds is limited in inspection applications. Given the requirement for the detection of moving gas plumes, future work should explore the integration of alternative adaptive background modeling techniques, such as Kalman filtering, optical flow estimation, or deep learning approaches, to enhance the system’s background modeling capabilities in dynamic environments. This integration would facilitate the deployment of the gas detection system in inspection and other application scenarios, further bolstering its robustness in complex operational environments.

6. Conclusions

This paper introduced BBGFA-YOLO, a real-time detection method that leverages background information in conjunction with an enhanced YOLO network to address the challenges of detecting faint gas plumes within complex infrared imagery. We constructed an enhanced detection framework with the aim of improving the accuracy and reliability of gas plume detection. Our approach introduces a real-time method for the generation of reference background frames, allowing the network to perceive background information and synthesize multi-channel images with the current frame. This significantly enhances the network’s ability to perceive subtle gas plumes. Furthermore, we designed a multi-scale deformable attention enhancement module (MSDC-AEM) and a refined feature extraction module (C2f-WTConv) to address the weak and irregular plume features of gas plumes. The proposed synthetic colorization method exhibits considerable practical value and holds promise for broad application within the domain of dim small target detection.

Collectively, this research demonstrated a substantial improvement in the detection ac-curacy through the proposed novel architecture. It provides new insights and directions for further exploration in the field of gas plume infrared imaging detection. However, the current research predominantly focuses on general gas plume detection tasks, whereas real-world applications often involve a multitude of interfering objects. Future investigations should explore the integration of BBGFA-YOLO with multi-spectral infrared imaging systems to further enhance the classification performance and interference rejection capabilities of gas plume detection systems.

Author Contributions

Conceptualization, M.W. and W.J.; methodology, M.W.; validation, M.W., D.S. and P.Y.; formal analysis, M.W. and D.S.; writing—original draft preparation, M.W.; writing—review and editing, M.W., D.S. and P.Y.; visualization, M.W.; supervision, L.L.; project administration, W.J.; funding acquisition, W.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Capital of Science and Technology Platform of China, grant number Z171100002817011.

Data Availability Statement

The data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Acknowledgments

The authors are grateful to the Beijing Wisdom Sharing Technology Service Co., Ltd. for the device support.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Li, J.; Wang, L.; Wang, M.; Gao, Y.; Jin, W. Gas Imaging Detectivity Model Combining Leakage Spot Size and Range. In Proceedings of the Thermosense: Thermal Infrared Applications XXXIV, Bellingham, WA, USA, 18 May 2012; SPIE: Bellingham, WA, USA; Volume 8354, pp. 360–371. [Google Scholar]
Hinnrichs, M. Imaging Spectrometer for Fugitive Gas Leak Detection. In Proceedings of the Environmental Monitoring and Remediation Technologies II, Boston, MA, USA, 21 December 1999; SPIE: Bellingham, WA, USA; Volume 3853, pp. 152–161. [Google Scholar]
Tan, Y.; Li, J.; Jin, W.; Wang, X. Model Analysis of the Sensitivity of Single-Point Sensor and IRFPA Detectors Used in Gas Leakage Detection. Infrared Laser Eng. 2014, 43, 2489–2495. [Google Scholar]
Li, J.; Jin, W.; Wang, X.; Zhang, X. MRGC Performance Evaluation Model of Gas Leak Infrared Imaging Detection System. Opt. Express 2014, 22, A1701–A1712. [Google Scholar] [CrossRef] [PubMed]
Lu, Q.; Li, Q.; Hu, L.; Huang, L. An Effective Low-Contrast SF₆ Gas Leakage Detection Method for Infrared Imaging. IEEE Trans. Instrum. Meas. 2021, 70, 5009009. [Google Scholar] [CrossRef]
Bhatt, R.; Gokhan Uzunbas, M.; Hoang, T.; Whiting, O.C. Segmentation of Low-Level Temporal Plume Patterns From IR Video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, 16–17 June 2019. [Google Scholar]
Wang, J.; Tchapmi, L.P.; Ravikumar, A.P.; McGuire, M.; Bell, C.S.; Zimmerle, D.; Savarese, S.; Brandt, A.R. Machine Vision for Natural Gas Methane Emissions Detection Using an Infrared Camera. Appl. Energy 2020, 257, 113998. [Google Scholar] [CrossRef]
Badawi, D.; Pan, H.; Cetin, S.C.; Enis Cetin, A. Computationally Efficient Spatio-Temporal Dynamic Texture Recognition for Volatile Organic Compound (VOC) Leakage Detection in Industrial Plants. IEEE J. Sel. Top. Signal Process. 2020, 14, 676–687. [Google Scholar] [CrossRef]
Tan, J.; Cao, Y.; Wang, F.; Xia, X.; Xu, Z. VOCs Leakage Detection Based on Weak Temporal Attention Asymmetric 3D Convolution. In Proceedings of the 2022 International Conference on Advanced Robotics and Mechatronics (ICARM), Guilin, China, 9 July 2022; IEEE: Piscataway, NJ, USA; pp. 200–205. [Google Scholar]
Shi, J.; Chang, Y.; Xu, C.; Khan, F.; Chen, G.; Li, C. Real-Time Leak Detection Using an Infrared Camera and Faster R-CNN Technique. Comput. Chem. Eng. 2020, 135, 106780. [Google Scholar] [CrossRef]
Lin, H.; Gu, X.; Hu, J.; Gu, X. Gas Leakage Segmentation in Industrial Plants. In Proceedings of the 2020 Chinese Automation Congress (CAC), Shanghai, China, 6 November 2020; IEEE: Piscataway, NJ, USA; pp. 1639–1644. [Google Scholar]
Wang, J.; Ji, J.; Ravikumar, A.P.; Savarese, S.; Brandt, A.R. VideoGasNet: Deep Learning for Natural Gas Methane Leak Classification Using an Infrared Camera. Energy 2022, 238, 121516. [Google Scholar] [CrossRef]
Xu, S.; Wang, X.; Sun, Q.; Dong, K. MWIRGas-YOLO: Gas Leakage Detection Based on Mid-Wave Infrared Imaging. Sensors 2024, 24, 4345. [Google Scholar] [CrossRef]
Yao, J.; Xiong, Z.; Li, S.; Yu, Z.; Liu, Y. TSFF-Net: A Novel Lightweight Network for Video Real-Time Detection of SF6 Gas Leaks. Expert Syst. Appl. 2024, 247, 123219. [Google Scholar] [CrossRef]
Stauffer, C.; Grimson, W.E.L. Adaptive Background Mixture Models for Real-Time Tracking. In Proceedings of the 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149), Fort Collins, CO, USA, 23–25 June 1999; Volume 2, pp. 246–252. [Google Scholar]
Azad, R.; Niggemeier, L.; Hüttemann, M.; Kazerouni, A.; Aghdam, E.K.; Velichko, Y.; Bagci, U.; Merhof, D. Beyond Self-Attention: Deformable Large Kernel Attention for Medical Image Segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2024; pp. 1287–1297. [Google Scholar]
Finder, S.E.; Amoyal, R.; Treister, E.; Freifeld, O. Wavelet Convolutions for Large Receptive Fields. In Proceedings of the Computer Vision—ECCV 2024, Milan, Italy, 29 September–4 October 2024; Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G., Eds.; Springer Nature Switzerland: Cham, Switzerland, 2025; pp. 363–380. [Google Scholar]
Cheng, H.-Y.; Yin, J.-L.; Chen, B.-H.; Yu, Z.-M. Smoke 100k: A Database for Smoke Detection. In Proceedings of the 2019 IEEE 8th Global Conference on Consumer Electronics (GCCE), Osaka, Japan, 15–18 October 2019; pp. 596–597. [Google Scholar]
Zivkovic, Z.; Van Der Heijden, F. Efficient Adaptive Density Estimation per Image Pixel for the Task of Background Subtraction. Pattern Recognit. Lett. 2006, 27, 773–780. [Google Scholar] [CrossRef]
Godbehere, A.B.; Matsukawa, A.; Goldberg, K. Visual Tracking of Human Visitors under Variable-Lighting Conditions for a Responsive Audio Art Installation. In Proceedings of the 2012 American Control Conference (ACC), Montreal, QC, Canada, 27–29 June 2012; IEEE: Piscataway, NJ, USA; pp. 4305–4312. [Google Scholar]
St-Charles, P.-L.; Bilodeau, G.-A.; Bergevin, R. Flexible Background Subtraction with Self-Balanced Local Sensitivity. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops, Columbus, OH, USA, 23–28 June 2014; IEEE: Piscataway, NJ, USA; pp. 414–419. [Google Scholar]
St-Charles, P.-L.; Bilodeau, G.-A. Improving Background Subtraction Using Local Binary Similarity Patterns. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision, Steamboat Springs, CO, USA, 24–26 March 2014; IEEE: Piscataway, NJ, USA; pp. 509–515. [Google Scholar]
El Baf, F.; Bouwmans, T.; Vachon, B. Fuzzy Integral for Moving Object Detection. In Proceedings of the 2008 IEEE International Conference on Fuzzy Systems (IEEE World Congress on Computational Intelligence), Hong Kong, China, 1–6 June 2008; pp. 1729–1736. [Google Scholar]
Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 318–327. [Google Scholar] [CrossRef] [PubMed]
Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and Efficient Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10781–10790. [Google Scholar]
Zhang, Y.; Guo, Z.; Wu, J.; Tian, Y.; Tang, H.; Guo, X. Real-Time Vehicle Detection Based on Improved YOLO V5. Sustainability 2022, 14, 12274. [Google Scholar] [CrossRef]
YOLOv8: State-of-the-Art Computer Vision Model. Available online: https://yolov8.com/ (accessed on 7 March 2025).
Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. YOLOv10: Real-Time End-to-End Object Detection 2024. Adv. Neural Inf. Process. Syst. 2024, 37, 107984–108011. [Google Scholar]
Bin, J.; Bahrami, Z.; Rahman, C.A.; Du, S.; Rogers, S.; Liu, Z. Foreground Fusion-Based Liquefied Natural Gas Leak Detection Framework From Surveillance Thermal Imaging. IEEE Trans. Emerg. Top. Comput. Intell. 2023, 7, 1151–1162. [Google Scholar] [CrossRef]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations From Deep Networks via Gradient-Based Localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]

Figure 1. Gas plumes against different backgrounds: (a) uniform blackbody background; (b) textured background; (c) background with interference textures.

Figure 2. The workflow of BBGFA-YOLO.

Figure 3. Gas plume images: (a) background images; (b) gas plume images; (c) gas plume images with background.

Figure 4. Synthesis of infrared gas plume images based on video frame sequences.

Figure 5. Network structure of BBGFA-YOLO.

Figure 6. MSDC-AEM.

Figure 7. C2f-WTConv model.

Figure 8. Synthesizing dataset for transfer learning based on Smoke100k.

Figure 9. Comparison of image quality metrics for different background models.

Figure 10. Self-collected datasets and annotation files.

Figure 11. Some of the test image detection results. Figures (a–e) present the five experimental scenarios.

Figure 12. Comparison of feature attention based on GradCAM thermograms: (a) grayscale gas images, (b) color synthetic gas images, (c) thermogram of the original YOLOv8 network, (d) thermogram of the BBGFA-YOLO method.

Table 1. Comparative results for background estimation models.

Method	MSE ↓	PSNR ↑	SSIM ↑	Time (ms) ↓
GMM	51.85	31.48	0.941	12.2/ 0.53(GPU)
KNN	206.05	28.79	0.870	13.8
GMG	154.00	28.80	0.876	12.1
SubSENSE	241.46	24.30	0.873	116.1
LOBSTER	230.76	24.50	0.856	81.6
FuzzyChoquetIntegral	241.45	24.31	0.873	61.4

Table 2. Comparison with classical object detection networks.

Network	Dataset	AP50 (%) ↑	AP50-95 (%) ↑	FPR (%) ↓	FNR (%) ↓	Parameter (M)	GFLOPs (G)
RetinaNet	Grayscale	74.6	40.1	9.03	21.7	37.97	61.26
EfficientDet	Grayscale	80.2	46.8	6.74	19.8	3.83	4.27
YOLOv5(s)	Grayscale	69.7	41.4	8.39	29.2	9.12	24.0
YOLOv8(s)	Grayscale	67.5	42.7	7.74	30.5	11.13	28.4
YOLOv8(m)	Grayscale	66.4	42.7	5.16	34.6	25.86	79.1
YOLOv10(s)	Grayscale	71.9	44.7	5.16	28.6	8.04	24.4
OGI Faster R-CNN	Grayscale	77.8	39.3	6.71	19.8	46.95	147.61
BBGFA-YOLO	Grayscale	74.2	45.4	7.15	28.5	10.47	52.3
RetinaNet	Synthetic	79.1	42.6	8.39	9.9	37.97	61.26
EfficientDet	Synthetic	92.0	58.1	3.45	12.3	3.83	4.27
YOLOv5(s)	Synthetic	94.2	62.8	2.58	11.1	9.12	24.0
YOLOv8(s)	Synthetic	94.3	62.5	1.94	10.5	11.13	28.4
YOLOv8(m)	Synthetic	94.2	62.7	1.29	12.3	25.96	79.1
YOLOv10(s)	Synthetic	92.0	60.1	3.22	13.7	8.04	24.4
OGI Faster R-CNN	Synthetic	88.6	53.2	4.10	12.7	46.95	147.61
TSFF-Net	Fusion	90.5	53.6	3.22	13.4	2.67	8.7
FFBGD	Fusion	90.2	54.3	3.89	14.8	55.19	109.16
BBGFA-YOLO	Synthetic	96.1	64.7	0.65	8.6	10.47	52.3

The bold font indicates the best values.

Table 3. The results of the ablation experiment.

Network	I	II	III	IV	V	AP50	AP50-95	GFLOPs
BBGFA-YOLO	√					67.5	42.4	28.4
		√				94.3	62.5	28.4
		√	√			95.6	62.4	25.1
		√	√	√		96.1	64.7	52.3
		√	√	√	√	96.2	64.6	52.3

The bold font indicates the best values.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, M.; Sheng, D.; Yuan, P.; Jin, W.; Li, L. Infrared Imaging Detection for Hazardous Gas Leakage Using Background Information and Improved YOLO Networks. Remote Sens. 2025, 17, 1030. https://doi.org/10.3390/rs17061030

AMA Style

Wang M, Sheng D, Yuan P, Jin W, Li L. Infrared Imaging Detection for Hazardous Gas Leakage Using Background Information and Improved YOLO Networks. Remote Sensing. 2025; 17(6):1030. https://doi.org/10.3390/rs17061030

Chicago/Turabian Style

Wang, Minghe, Dian Sheng, Pan Yuan, Weiqi Jin, and Li Li. 2025. "Infrared Imaging Detection for Hazardous Gas Leakage Using Background Information and Improved YOLO Networks" Remote Sensing 17, no. 6: 1030. https://doi.org/10.3390/rs17061030

APA Style

Wang, M., Sheng, D., Yuan, P., Jin, W., & Li, L. (2025). Infrared Imaging Detection for Hazardous Gas Leakage Using Background Information and Improved YOLO Networks. Remote Sensing, 17(6), 1030. https://doi.org/10.3390/rs17061030

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Infrared Imaging Detection for Hazardous Gas Leakage Using Background Information and Improved YOLO Networks

Abstract

1. Introduction

2. Related Works

3. Methods

3.1. Synthesis of Gas Plume Images Based on Reference Backgrounds

3.2. BBGFA-YOLO Network Architecture

3.3. Gas Feature Attention Enhancement Module, MSDC-AEM

3.4. C2f-WTConv Module Based on Deformable Convolution Improvement

3.5. Pre-Training Method for Transfer Learning Based on Reference Background

4. Experiments

4.1. Comparative Experiments on Background Estimation Models

4.2. Dataset Preparation

4.3. Training Configuration and Evaluation Indicators

4.4. Comparison with Classical Target Detection Models

4.5. Ablation Experiments

4.6. Visualization Experiments

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI