Chemical Gas Source Localization with Synthetic Time Series Diffusion Data Using Video Vision Transformer

Jang, Hee-Deok; Kwon, Seokjoon; Nam, Hyunwoo; Chang, Dong Eui

doi:10.3390/app14114451

Open AccessCommunication

Chemical Gas Source Localization with Synthetic Time Series Diffusion Data Using Video Vision Transformer

¹

School of Electrical Engineering, Korea Advanced Institute of Science and Technology, Daejeon 34141, Republic of Korea

²

Chem-Bio Technology Center, Advanced Defense Science and Technology Research Institute, Agency for Defense Development, Daejeon 34186, Republic of Korea

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(11), 4451; https://doi.org/10.3390/app14114451

Submission received: 22 April 2024 / Revised: 14 May 2024 / Accepted: 22 May 2024 / Published: 23 May 2024

Download

Browse Figures

Versions Notes

Abstract

:

Gas source localization is vital in emergency scenarios to enable swift and effective responses. In this study, we introduce a gas source localization model leveraging the video vision transformer (ViViT). Utilizing synthetic time series diffusion data, the source grid is predicted by classifying the grid with the highest probability of gas occurrence within the diffusion data coverage. Through extensive experimentation using the NBC-RAMS simulator, we generate large datasets of gas diffusion under varied experimental conditions and meteorological environments, enabling comprehensive model training and evaluation. Our findings demonstrate that the ViViT outperforms other deep learning models in processing time series gas data, showcasing a superior estimation performance. Leveraging a transformer architecture, the ViViT exhibits a robust classification performance even in scenarios influenced by weather conditions or incomplete observations. Furthermore, we conduct an analysis of accuracy and parameter count across various input sequence lengths, revealing the ability of the ViViT to maintain high computational efficiency while achieving accurate source localization. These results underscore the effectiveness of the ViViT as a model for gas source localization, particularly in situations demanding a rapid response in real-world environments, such as gas leaks or attacks.

Keywords:

chemical gas sensing; gas source localization; gas diffusion; deep neural network; transformer

1. Introduction

Chemical gases, when released into the atmosphere, spread over time and cause widespread harm to humans across large areas [1,2]. Identification of the gas source is important for rapid response in the event of a gas diffusion accident. For instance, in industrial accidents like toxic gas leaks or the dispersal of combustion gases resulting from fires, pinpointing the gas source aids in analyzing the causes of incidents and mitigating further damage. Similarly, in military contexts where gas bomb attacks occur, accurate identification of the gas source enables the implementation of effective defense measures. Gas source localization involves estimating the initial location of gas emission using diffusion data, playing a critical role in responding to the gas diffusion incidents [3]. Strategies for gas source localization typically rely on static sensor networks [4,5,6,7] or mobile sensors [8,9], depending on the deployment configuration. Our focus lies in leveraging static sensor networks, where sensors are deployed to monitor gas concentrations over time, facilitating the detection of changes in diffusion patterns within predefined areas.

To estimate the origin position of a gas source, researchers utilize a gas dispersion process model that incorporates the origin position as a modeling variable. Subsequently, the origin position is inferred by back estimating the modeling variable from the modeled gas diffusion distribution and the measured diffusion distribution information. However, pinpointing the exact location of chemical gases poses challenges due to their irregular diffusion patterns and the complex atmospheric conditions that influence their dispersion [3,6,10]. In response to these challenges, studies on gas source localization within static sensor networks have taken diverse approaches. Foremost among these are optimization methodologies and probabilistic approaches based on Bayesian inference. Optimization methods [11,12] encompass gradient-based techniques that leverage various adaptations of least squares methodologies, representing a field ripe with substantial research advancements. Additionally, direct search methods [13] have made notable progress in this domain. Furthermore, endeavors employing genetic algorithms [14,15] have led to significant breakthroughs. Probabilistic methodologies employ techniques such as Markov chain Monte Carlo [16], sequential Monte Carlo [17], differential evolution Monte Carlo [18], and polynomial chaos quadrature [19], demonstrating notable advancements.

Recent advancements have seen the emergence of approaches employing machine learning models such as support vector machine and kernel ridge regression for gas source localization [4]. Moreover, active research endeavors are underway exploring methods utilizing deep neural networks, renowned for their efficacy in processing complex data, including time series data [20]. Yeon et al. [21] employed a deep neural network (DNN) to estimate gas source locations in 2D coordinates from sensor array data. Cho et al. [22] utilized DNN and random forest classifiers to track gas sources using data generated from computational fluid dynamics (CFD) simulations. Bilger et al. [5] implemented convolutional long short-term memory neural network (CNN-LSTM) architectures to extract temporal and spatial features from gas concentration data collected from the real environment, enabling the estimation of gas source locations in static sensor networks amidst varying air conditions. Similarly, Yamamoto et al. [23] leveraged a long short-term memory deep neural network (LSTM-DNN) to estimate gas source locations using gas sensor arrays and anemometers. Kim et al. [24] proposed a model integrating feedforward neural networks and recurrent neural networks with long short-term memory (LSTM-RNN) to predict gas source locations using data derived from CFD simulations. While RNN or LSTM traditionally processes time series data, recent attention has shifted towards transformers [25], originally introduced for natural language processing, due to their remarkable performance in handling time series data [26]. Operating on the attention mechanism, the transformer model adeptly learns correlations within input tokens. Son et al. [27] deployed a transformer model to recursively track chemical gas dispersion and estimate source localization. The utilization of a 2D convolutional layer to convert each gas diffusion frame into a token facilitated this process. Transformer models have recently found applications across diverse machine learning tasks, including image and video processing, showcasing a performance surpassing that of existing models [28,29,30]. In image and video processing, transformers excel by extracting multiple patches or tubes with spatial or spatio-temporal characteristics, surpassing the simple tokenization of individual frames.

In this study, we introduce a deep neural network utilizing a video vision transformer (ViViT) for gas source localization. This model processes time series gas diffusion data and predicts the grid of gas sources within the area where diffusion data are available. We rigorously train and evaluate the model using diffusion data under diverse experimental conditions and meteorological environments generated via the nuclear biological chemical reporting and modeling system (NBC-RAMS). To validate the effectiveness of the ViViT in gas source localization, we conduct a comparative analysis with existing deep learning methods. Our results not only demonstrate a superior performance compared to other models but also highlight the structural efficiency of the ViViT, which achieves a high performance even with fewer input frames. Section 2 details the experimental data sources and the model structure employed in our study. Section 3 outlines the experimental setup, method validation, and performance analysis against other models. Finally, Section 4 provides the conclusion.

2. Materials and Methods

2.1. Data Source

We acquire time series gas dispersion data using the NBC-RAMS simulator, which simulates gas diffusion distribution following a gas bomb explosion under specific meteorological and experimental conditions in predefined spatial regions. NBC-RAMS established by the Agency for Defense Development of the Republic of Korea is a simulation program with purposes and functions similar to the Hazard Prediction and Assessment Capability software (version 6) [31] developed by the Defense Threat Reduction Agency in the United States. Previous studies have utilized NBC-RAMS for similar purposes [27,32]. NBC-RAMS comprises a meteorological model and a pollution diffusion model. The meteorological model generates a three-dimensional gridded meteorological field, incorporating factors such as wind orientation, wind velocity, and temperature, accounting for topography, land cover, and building heights within the calculation area based on input weather information. The pollution diffusion model calculates the transport and diffusion process of pollutants, including chemical, biological, radiological, and smoke agents within the calculation area, utilizing the generated meteorological field.

The data generation area encompasses a 5 km × 5 km region within the metropolitan area of South Korea, divided into grids of 200 m × 200 m. Gas concentrations are computed at the center of each grid. Simulations extend for 30 min from the time of the gas incident, with data recorded every minute starting from 1 min post-explosion. Each incident scenario yields gas diffusion data comprising 30 frames of size

(H, W, C)

, where H = 25, W = 25, and C = 1. To diversify the gas diffusion data, we vary parameters such as the gas generation grid, warhead payload mass, and meteorological environments. SF6 is used as the gas in the diffusion simulation, with the payload mass ranging from 0.0005 kg to 0.0025 kg in 0.0005 kg increments. Meteorological conditions are set by adjusting the wind speed from 1 m/s to 3 m/s in 0.5 m/s intervals and the wind direction from 0° to 315° in 45° intervals. The temperature remains constant at 10 °C. We generate data by performing the various experimental conditions described above for each

N_{C} = H \times W

grid, representing a potential gas source. The generated dataset comprises 200 experimental conditions per 625 grids, totaling 125,000 instances comprising 3.75 million frames. Of these, 70% are allocated for training, 20% for validation, and 10% for testing. Figure 1a presents a sample of data generated using NBC-RAMS. Frames depict gas dispersion at 2 min intervals from 1 to 27 min post-explosion. The gradual spread of gas over time and its varying distribution across each time step are discernible. The presence of a gas source point, represented by a white flask, and its subsequent diffusion are apparent, illustrating the influence of both inherent chemical properties and the weather environment on dispersion. As time progresses, gas disperses, and concentrations at the source diminish due to atmospheric dilution.

We conduct data preprocessing, encompassing sequence extraction and normalization, on the acquired data. During sequence extraction, we extract T consecutive frames from each simulation data. Following gas generation, the immediate environment experiences dynamic instability due to the sudden introduction of chemical substances, potentially causing irregularities in gas diffusion distribution. These irregularities may induce data instability, thereby impeding model generalization. As time elapses post-explosion, the surrounding environment stabilizes, and the gas diffusion pattern becomes consistent. Hence, we commence the extraction process from the 6th frame among the 30 frames obtained for each scenario. The extracted consecutive frames undergo validation. If the number of frames devoid of any concentration exceeds

\frac{T}{3}

, the frames are disregarded, and the extraction process recommences from the subsequent frame. This ensures that the dataset encompasses ample gas diffusion information. Post-extraction, min–max normalization is applied to ensure that all frames fall within the range of 0 to 1. The preprocessed time series frames

V \in R^{T \times H \times W \times C}

serve as input to the model, with the label

y \in R^{N_{C}}

representing the grid where gas occurred in the input

V

. Figure 1b depicts the result of preprocessing the samples portrayed in Figure 1a.

2.2. Proposed Method

We present a deep neural network for gas source localization using the ViViT. This model takes time series gas diffusion data

V

as the input and accomplishes gas source localization by classifying the grid where gas occurred among

N_{C}

grids. Transformers, which employ attention mechanisms, were initially introduced in [25] to process time series data. They have since been applied not only to natural language processing [26] but also to various fields such as image and video classification [28,29], demonstrating a remarkable performance. Our model is designed based on the video vision transformer [30], which employs the transformer structure for video classification. The model structure comprises embedding layers, transformer encoders, and a classifier, as illustrated in Figure 2.

Within the embedding layers, the input

V

undergoes conversion into a sequence of tokens

z^{0} \in R^{(N + 1) \times d}

expressed as

\begin{matrix} z^{0} = (z_{c l s}, E x_{1}, \dots, E x_{N}) + p, \end{matrix}

where

z_{c l s} \in R^{d}

represents the learned classification token [26]. Each

E x_{i} \in R^{d}

token corresponds to a tublet extracted from the video

x_{i} \in R^{t \times h \times w \times C}

and linearly projected, while

p \in R^{(N + 1) \times d}

denotes a learned positional embedding. Nonoverlapping spatio-temporal tublets

x_{i}

are extracted from

V

. Given the tublet dimensions of t, h, and w for the temporal, height, and width, respectively, the total number of extracted tublets is

N = n_{t} \times n_{h} \times n_{w}

, where

n_{t} = ⌊ \frac{T}{t} ⌋

,

n_{h} = ⌊ \frac{H}{h} ⌋

, and

n_{w} = ⌊ \frac{W}{w} ⌋

. Figure 3 illustrates the extraction of tublets spanning the spatio-temporal input

V

. These tublets

x_{i}

, for i = 1, 2, ⋯, N, are linearly projected by

E

into tokens

E x_{i}

. Subsequently,

z_{c l s}

is prepended to these tokens, resulting in the sequence

(z_{c l s}, z_{1}, \dots, z_{N})

, where

z_{i} = E x_{i}

. This sequence, augmented by the learned positional embedding

p

, forms the input

z^{0}

for the transformer encoders.

The sequence of tokens

z^{0}

passes through L transformer layers, resulting in

z^{L} \in R^{(N + 1) \times d}

, where the dimension of each token remains d throughout each layer. Each layer consists of multi-headed self-attention (MSA) [25], layer normalization (LN) [33], and multi-layer perceptron (MLP) blocks:

\begin{matrix} y^{l} & = MSA (LN (z^{l})) + z^{l}, \\ z^{l + 1} & = MLP (LN (y^{l})) + y^{l}, \end{matrix}

where l = 0, ⋯, L− 1. The MSA consists of k self-attentions operating in parallel, while the MLP block comprises two dense layers and the GELU activation function [34] between the layers, as developed in [30]. The number of nodes in the two dense layers of the MLP is set to

d_{f f}

and d, respectively.

Among the output tokens of the transformer encoders,

z_{c l s}^{L} \in R^{d}

is forwarded to the classifier, which predicts the label

\hat{y} \in R^{N_{C}}

of the grid where gas detonation occurred.

z_{c l s}^{L}

is the token created by passing the classification token

z_{c l s}

through the encoder. The classifier comprises LN and a single dense layer (CLS), expressed as

\begin{matrix} \hat{y} = CLS (LN (z_{c l s}^{L})) . \end{matrix}

The model is trained using the Adam optimizer [35], aiming to minimize the loss function:

\begin{matrix} L (y, \hat{y}) = - \frac{1}{N_{B}} \sum_{i = 1}^{N_{B}} \sum_{c = 1}^{N_{C}} \log (\frac{\exp ({\hat{y}}_{i, c})}{\sum_{j = 1}^{N_{C}} \exp ({\hat{y}}_{i, j})}) y_{i, c} + \frac{1}{2} λ {∥ W ∥}_{2}^{2}, \end{matrix}

where

N_{B}

is the number of data included in the batch. The term

{∥ W ∥}_{2}^{2}

represents the

L 2

regularization term, with the coefficient

λ

set to

10^{- 4}

. To prevent overfitting, dropout is applied to the embedding layer and the transformer layers with a dropout rate of

0.05

. The initialization of

z_{c l s}

and

p

is set to 0 and is trained during the training process. The initial learning rate of the Adam optimizer is

10^{- 3}

.

3. Results

3.1. Experiment Setting

The model undergoes training and validation using the respective datasets and is subsequently evaluated on the test dataset. The weights of the model are updated using the training data, with the validation loss computed using the validation data at each epoch. The model achieving the lowest validation loss across all epochs is selected as the optimal one for evaluation. The following hyperparameters are utilized in the experiments: L = 10, k = 8,

d_{f f}

= 512, d = 128, t = 2, h = 13, and w = 13. During training, the total epochs and batch sizes are set to 300 and 512, respectively. Evaluation metrics include the f1-score, precision, recall, and accuracy, commonly employed for a classification performance analysis. Additionally, the average error distance, computed as the distance between the predicted grid center and the actual grid center of all test data, is used as an evaluation metric, considering each grid unit corresponds to a length of 200 m.

To facilitate the comparison and analysis of performance, CNN-LSTM and 3D-CNN models are considered. The CNN-LSTM model comprises a CNN block, LSTM block, and DNN block, akin to those introduced in [5], with a hidden dimension size of 625 to accommodate the larger width and height of frames than those of [5]. The 3D-CNN model utilizes the VideoGasNet architecture proposed in [36], which comprises six Conv-Pool structures [37] and two fully connected layers. The convolution layers in the Conv-Pool structure have a kernel size of (3, 3, 2), considering that the input tokens of the ViViT span two channels of data, and the dropout rate is set to 0.5, consistent with CNN-LSTM. The configuration of the fully connected layers mirrors that of the DNN blocks of CNN-LSTM. The training settings for the control groups are the same as those of the ViViT, with

L 2

regularization also applied.

3.2. Performance Analysis

Table 1 presents the performance metrics and the number of parameters for each model when the input sequence length is T = 10. The ViViT achieves metrics of 0.93 or higher across all classification performance metrics, demonstrating the highest accuracy in source grid prediction. Additionally, it exhibits the lowest error distance, with an average distance of 12.689 m between the predicted source location and the actual one. CNN-LSTM shows a classification performance of 0.91 or higher and an average error distance of 17.7394 m, ranking second in performance. 3D-CNN exhibits metrics of 0.81 or higher, showing the lowest performance among the comparison groups, with an average error distance of 37.1616 m, approximately three times larger than that of the ViViT. The ViViT comprises 2.10 million parameters, which is approximately 4.6 times fewer than CNN-LSTM and 3.3 times fewer than 3D-CNN. Despite the potential for learning more complex nonlinear relationships with a larger number of parameters, the ViViT demonstrates a superior performance with the fewest parameters.

Figure 4 illustrates three samples from the test data along with the predicted gas source locations by each model. For each sample, the results display 10 data sequences used as the input on the left and the prediction results of each model alongside the ground truth on the right. In Figure 4a, the gas distribution widens over time, indicating a spreading pattern observed across the entire area. Since the distribution of grids showing high concentrations in each frame does not change significantly, it can be seen that the influence of weather does not have a significant effect on gas diffusion, especially when the wind is weak. This sample not only allows for the observation of the entire distribution that changes over time but also confirms that all models accurately predict the source grid because the change in the concentration distribution is monotonic. In Figure 4b, the distribution of gas is observed to move to the bottom right over time, indicating a stronger influence of the wind compared to Figure 4a. Although the results of CNN-LSTM and 3D-CNN are predicted to be close to the actual source location, they do not accurately predict the source location when the gas distribution moves or changes due to the influence of weather. On the other hand, the ViViT accurately localizes the source grid, overcoming these difficulties. In the case of Figure 4c, the distribution of gas moves significantly to the left over time due to the influence of weather, and even in the last two frames, it extends beyond the detection area with no observable values. CNN-LSTM and 3D-CNN fail to accurately estimate the location of the source, with larger errors compared to Figure 4b. However, the ViViT accurately predicts the source grid even under such conditions. Through these examples, it is evident that the ViViT outperforms other models in gas source localization. The ViViT can accurately predict the grid of the source even if the distribution changes significantly over time due to the influence of weather, and even if the distribution is outside the area of interest or some observations are missing. Considering the limitation that it is difficult to respond when gases escape the designated area through diffusion, the ViViT offers an efficient solution to this challenge. The rationale behind the superior performance of the ViViT lies in its utilization of a transformer structure, enabling it to effectively acquire spatio-temporal features compared to other models for processing time series data. While training a model with a transformer structure demands a substantial dataset, such data can be readily acquired through simulations. Moreover, an alternative approach involves loading the parameters of a pretrained model and fine-tuning only some of the parameters with a small dataset.

3.3. Sequence Length Variants

We vary the sequence length T∈ {8, 10, 15, 20, 25} and analyze its impact on the performance of the models for gas source localization. For T = 20 and T = 25, CNN-LSTM and 3D-CNN are trained for 500 epochs. Given that the number of parameters of these models increases with the number of input frames, we extend the training epochs to ensure convergence during the learning process.

Figure 5 illustrates the accuracy and number of parameters of the models with changes in sequence length. It is evident from Figure 5a that all models tend to improve their performance as the number of input frames increases. The ViViT consistently demonstrates the highest performance across all sequence lengths. Even in the T = 8 condition, utilizing the fewest frames, the ViViT exhibits superior performance compared to all the other length conditions of the other models. Using a larger number of input frames allows for more information utilization and potentially a higher performance. However, in practical scenarios, there is a trade off in data acquisition time. To respond swiftly to gas leaks or incidents, a model capable of accurate prediction with a limited number of frames is crucial. Thus, the ability of the ViViT to achieve accurate source localization with a small number of frames offers a significant advantage in real-time applications. While the performance of CNN-LSTM improves with increasing frames, its accuracy gradually converges, showing a similar performance between the T = 20 and T = 25 conditions. 3D-CNN consistently exhibits a lower performance across all sequence lengths.

We evaluate the computational efficiency of the transformer-based model structure in Figure 5b. The transformer architecture used in the ViViT ensures that the number of parameters to be learned in the encoder or classifier remains constant even as the number of input frames increases. Although the parameters for

z_{c l s}

and positional embedding

p

increase with the sequence length, they contribute only a fraction of the total number of parameters. In contrast, CNN-LSTM and 3D-CNN exhibit linear increases in parameters proportional to the number of input frames. The parameters of CNN-LSTM grow with the size of the hidden state of the LSTM and the number of nodes in the first layer of the DNN block. Furthermore, the sequential computation of LSTM outputs for each frame adds computational inefficiency. Similarly, the parameter increase in 3D-CNN is proportional to the product of the number of filters in the last convolutional layer, the dimensions of the output tensor of the last convolutional layer, and the number of nodes in the following dense layer. These models, with their larger parameter counts, demand extensive training and computational resources, making real-world deployment challenging. Even with increased epochs, CNN-LSTM and 3D-CNN fail to surpass the performance of the ViViT, which is particularly evident in the T = 20 and T = 25 conditions. Notably, the excessive parameters of 3D-CNN hinder training convergence in the T = 20 and T = 25 conditions, resulting in a slight performance decrease. Thus, the consistent performance of the ViViT across varying sequence lengths and its constant parameter count make it highly efficient for real-world applications.

4. Conclusions

In this study, we introduce a gas source localization model utilizing the ViViT. The model processes time series diffusion data to predict the source grid by classifying the grid with the highest probability of gas occurrence within the covered area. Through extensive experimentation using the NBC-RAMS simulator, we generate large datasets of gas diffusion under varied experimental conditions and meteorological environments. Subsequently, we train and evaluate the model using these data. Our findings demonstrate that the ViViT outperforms other deep learning models in processing time series gas data, showcasing a superior estimation performance. Leveraging a transformer architecture, the ViViT exhibits robust classification performance even in scenarios heavily influenced by weather conditions or when observations are incomplete. Furthermore, we conduct an analysis of accuracy and parameter count across various input sequence lengths. The ViViT not only achieves a high estimation performance with a small number of frames but also maintains high computational efficiency by keeping the number of learned parameters constant regardless of sequence length. These results underscore the effectiveness of the ViViT as a model for gas source localization, particularly in situations demanding a rapid response in real-world environments, such as gas leaks or attacks. In future work, since the data generated through simulation is limited to one type of gas even though various experimental conditions are used, we aim to conduct experiments by generating gas dispersion data with various volatile characteristics. Additionally, we intend to explore the application of the ViViT in real-world scenarios, leveraging data collected from gas diffusion experiments conducted in real environments.

Author Contributions

Conceptualization, H.-D.J., S.K. and H.N.; methodology, H.-D.J.; software, H.-D.J.; validation, H.-D.J., S.K. and H.N.; formal analysis, H.-D.J.; investigation, H.-D.J., S.K. and H.N.; resources, H.-D.J., S.K. and H.N.; data curation, H.-D.J., S.K. and H.N.; writing—original draft preparation, H.-D.J. and S.K.; writing—review and editing, D.E.C.; visualization, H.-D.J.; supervision, D.E.C.; project administration, D.E.C.; funding acquisition, D.E.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Agency For Defense Development Grant funded by the Korean Government (UI220075ZD).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available upon request from the corresponding author. The data are not publicly available due to legal restrictions.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Jang, H.D.; Park, J.H.; Nam, H.; Chang, D.E. Deep neural networks for gas concentration estimation and the effect of hyperparameter optimization on the estimation performance. In Proceedings of the 2022 22nd International Conference on Control, Automation and Systems (ICCAS), Busan, Republic of Korea, 27 November–1 December 2022; pp. 15–19. [Google Scholar] [CrossRef]
Park, J.H.; Yu, H.G.; Park, D.J.; Nam, H.; Chang, D.E. Dynamic one-shot target detection and classification using a pseudo-Siamese network and its application to Raman spectroscopy. Analyst 2021, 146, 6997–7004. [Google Scholar] [CrossRef]
Hutchinson, M.; Oh, H.; Chen, W.H. A review of source term estimation methods for atmospheric dispersion events using static or mobile sensors. Inf. Fusion 2017, 36, 130–148. [Google Scholar] [CrossRef]
Mahfouz, S.; Mourad-Chehade, F.; Honeine, P.; Farah, J.; Snoussi, H. Gas source parameter estimation using machine learning in WSNs. IEEE Sensors J. 2016, 16, 5795–5804. [Google Scholar] [CrossRef]
Bilgera, C.; Yamamoto, A.; Sawano, M.; Matsukura, H.; Ishida, H. Application of Convolutional Long Short-Term Memory Neural Networks to Signals Collected from a Sensor Network for Autonomous Gas Source Localization in Outdoor Environments. Sensors 2018, 18, 4484. [Google Scholar] [CrossRef]
Singh, S.K.; Rani, R. A least-squares inversion technique for identification of a point release: Application to Fusion Field Trials 2007. Atmos. Environ. 2014, 92, 104–117. [Google Scholar] [CrossRef]
Singh, S.K.; Rani, R. Assimilation of concentration measurements for retrieving multiple point releases in atmosphere: A least-squares approach to inverse modelling. Atmos. Environ. 2015, 119, 402–414. [Google Scholar] [CrossRef]
Brink, J. Boundary tracking and estimation of pollutant plumes with a mobile sensor in a low-density static sensor network. Urban Clim. 2015, 14, 383–395. [Google Scholar] [CrossRef]
Menon, P.P.; Ghose, D. Simultaneous source localization and boundary mapping for contaminants. In Proceedings of the 2012 American Control Conference (ACC), Montreal, QC, Canada, 27–29 June 2012; pp. 4174–4179. [Google Scholar]
Singh, S.K.; Sharan, M.; Issartel, J.P. Inverse modelling methods for identifying unknown releases in emergency scenarios: An overview. Int. J. Environ. Pollut. 2015, 57, 68–91. [Google Scholar] [CrossRef]
Sharan, M.; Issartel, J.P.; Singh, S.K. A point-source reconstruction from concentration measurements in low-wind stable conditions. Q. J. R. Meteorol. Soc. 2012, 138, 1884–1894. [Google Scholar] [CrossRef]
Singh, S.K.; Sharan, M.; Issartel, J.P. Inverse modelling for identification of multiple-point releases from atmospheric concentration measurements. Bound.-Layer Meteorol. 2013, 146, 277–295. [Google Scholar] [CrossRef]
Zheng, X.; Chen, Z. Back-calculation of the strength and location of hazardous materials releases using the pattern search method. J. Hazard. Mater. 2010, 183, 474–481. [Google Scholar] [CrossRef] [PubMed]
Haupt, S.E. A demonstration of coupled receptor/dispersion modeling with a genetic algorithm. Atmos. Environ. 2005, 39, 7181–7189. [Google Scholar] [CrossRef]
Long, K.J.; Haupt, S.E.; Young, G.S. Assessing sensitivity of source term estimation. Atmos. Environ. 2010, 44, 1558–1567. [Google Scholar] [CrossRef]
Senocak, I.; Hengartner, N.W.; Short, M.B.; Daniel, W.B. Stochastic event reconstruction of atmospheric contaminant dispersion using Bayesian inference. Atmos. Environ. 2008, 42, 7718–7727. [Google Scholar] [CrossRef]
Ristic, B.; Gunatilaka, A.; Gailis, R. Localisation of a source of hazardous substance dispersion using binary measurements. Atmos. Environ. 2016, 142, 114–119. [Google Scholar] [CrossRef]
Robins, P.; Thomas, P. Non-linear Bayesian CBRN source term estimation. In Proceedings of the 2005 7th International Conference on Information Fusion, Philadelphia, PA, USA, 25–28 July 2005; Volume 2, p. 8. [Google Scholar]
Madankan, R.; Singla, P.; Singh, T. Application of conjugate unscented transform in source parameters estimation. In Proceedings of the 2013 American Control Conference, Washington, DC, USA, 17–19 June 2013; pp. 2448–2453. [Google Scholar]
Caterini, A.L.; Chang, D.E. Deep Neural Networks in a Mathematical Framework; Springer: Berlin/Heidelberg, Germany, 2018. [Google Scholar]
Yeon, A.S.A.; Zakaria, A.; Zakaria, S.M.M.S.; Visvanathan, R.; Kamarudin, K.; Kamarudin, L.M. Gas Source Localization via Mobile Robot with Gas Distribution Mapping and Deep Neural Network. In Proceedings of the 2nd International Conference on Electronic and Electrical Engineering and Intelligent System (ICE3IS), Yogyakarta, Indonesia, 4–5 November 2022; pp. 120–124. [Google Scholar] [CrossRef]
Cho, J.; Kim, H.; Gebreselassie, A.L.; Shin, D. Deep neural network and random forest classifier for source tracking of chemical leaks using fence monitoring data. J. Loss Prev. Process Ind. 2018, 56, 548–558. [Google Scholar] [CrossRef]
Yamamoto, A.; Bilgera, C.; Sawano, M.; Matsukura, H.; Sawada, N.; Leow, C.S.; Nishizaki, H.; Ishida, H. Application of Sequence Input and Output Long Short-Term Memory Neural Networks for Autonomous Gas Source Localization in an Outdoor Environment. In Proceedings of the 2019 IEEE International Symposium on Olfaction and Electronic Nose (ISOEN), Fukuoka, Japan, 26–29 May 2019; pp. 1–3. [Google Scholar]
Kim, H.; Park, M.; Kim, C.W.; Shin, D. Source localization for hazardous material release in an outdoor chemical plant via a combination of LSTM-RNN and CFD simulation. Comput. Chem. Eng. 2019, 125, 476–489. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 1–52. [Google Scholar]
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
Son, J.; Kang, M.; Lee, B.; Nam, H. Source Localization of the Chemical Gas Dispersion using Recursive Tracking with Transformer. IEEE Access 2024, 12, 40105–40113. [Google Scholar] [CrossRef]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
Girdhar, R.; Grauman, K. Anticipative video transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 13505–13515. [Google Scholar]
Arnab, A.; Dehghani, M.; Heigold, G.; Sun, C.; Lučić, M.; Schmid, C. ViViT: A Video Vision Transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada, 11–17 October 2021; pp. 6816–6826. [Google Scholar] [CrossRef]
Chang, J.C.; Hanna, S.R.; Boybeyi, Z.; Franzese, P. Use of Salt Lake City URBAN 2000 field data to evaluate the urban hazard prediction assessment capability (HPAC) dispersion model. J. Appl. Meteorol. 2005, 44, 485–501. [Google Scholar] [CrossRef]
Ku, H.; Seo, J.; Nam, H. A Study on Transport and Dispersion of Chemical Agent According to Lagrangian Puff and Particle Models in NBC_RAMS. J. Korea Inst. Mil. Sci. Technol. 2023, 26, 102–112. [Google Scholar] [CrossRef]
Ba, J.L.; Kiros, J.R.; Hinton, G.E. Layer normalization. arXiv 2016, arXiv:1607.06450. [Google Scholar]
Hendrycks, D.; Gimpel, K. Gaussian error linear units (gelus). arXiv 2016, arXiv:1606.08415. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Wang, J.; Ji, J.; Ravikumar, A.P.; Savarese, S.; Brandt, A.R. VideoGasNet: Deep learning for natural gas methane leak classification using an infrared camera. Energy 2022, 238, 121516. [Google Scholar] [CrossRef]
Wang, J.; Tchapmi, L.P.; Ravikumar, A.P.; McGuire, M.; Bell, C.S.; Zimmerle, D.; Savarese, S.; Brandt, A.R. Machine vision for natural gas methane emissions detection using an infrared camera. Appl. Energy 2020, 257, 113998. [Google Scholar] [CrossRef]

Figure 1. Sample (a) illustrates time series gas diffusion data generated by NBC-RAMS, while sample (b) depicts the data after preprocessing. The color bar positioned to the right of the samples indicates the value of each data point.

Figure 2. Network architecture for ViViT, which consists of embedding layers, transformer encoders, and classifier.

Figure 3. Tubelets extracted from sequence of input frames.

Figure 4. Visualization of three input data with different conditions and the results of source localization compared with ground truth; (a) input data and prediction results under minimal weather influence; (b) input data and prediction results under significant weather impact; (c) input data and prediction results under adverse weather conditions with partial measurement loss.

Figure 5. (a) Accuracy and (b) the number of parameters of models according to sequence length changes.

Table 1. Performance evaluation scores of models on test data.

Model	f1-Score	Precision	Recall	Accuracy	Average Error Distance [m]	Params [M]
CNN-LSTM	0.9110	0.9189	0.9134	0.9217	17.7394	9.86
3D-CNN	0.8102	0.8298	0.8163	0.8456	37.1616	6.74
ViViT	0.9363	0.9413	0.9383	0.9478	12.6894	2.10

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jang, H.-D.; Kwon, S.; Nam, H.; Chang, D.E. Chemical Gas Source Localization with Synthetic Time Series Diffusion Data Using Video Vision Transformer. Appl. Sci. 2024, 14, 4451. https://doi.org/10.3390/app14114451

AMA Style

Jang H-D, Kwon S, Nam H, Chang DE. Chemical Gas Source Localization with Synthetic Time Series Diffusion Data Using Video Vision Transformer. Applied Sciences. 2024; 14(11):4451. https://doi.org/10.3390/app14114451

Chicago/Turabian Style

Jang, Hee-Deok, Seokjoon Kwon, Hyunwoo Nam, and Dong Eui Chang. 2024. "Chemical Gas Source Localization with Synthetic Time Series Diffusion Data Using Video Vision Transformer" Applied Sciences 14, no. 11: 4451. https://doi.org/10.3390/app14114451

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Chemical Gas Source Localization with Synthetic Time Series Diffusion Data Using Video Vision Transformer

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Source

2.2. Proposed Method

3. Results

3.1. Experiment Setting

3.2. Performance Analysis

3.3. Sequence Length Variants

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI