1. Introduction
Air contaminant emissions and contaminant gas leakage incidents in a chemical cluster pose a potential threat to public health and surrounding environment. Therefore, modeling atmospheric dispersion is a popular issue these years since it plays an important role in evaluating the impact of hazardous gas leak accidents [
1,
2]. Traditional methods (e.g., Gaussian-based dispersion models and Lagrangian dispersion models) usually use static model, wherein model parameters are pre-determined and invariant. However, due to the dynamic and stochastic nature of atmospheric dispersion, it is impractical to measure these model parameters precisely, especially the meteorological data (e.g., the wind field) [
3]. Further, the computational error of static model may be accumulated with time during the calculation process. To address this problem, a data-driven modeling method is proposed in this paper for updating model parameters to generate the simulation results being as close as possible to the real data. However, some errors cannot be avoided, such as measurement errors and floating-point errors. Since these errors are inevitable, people have to know whether the predicted result is computed in an acceptable scope. Error propagation detection is a quite useful approach for error analysis during the computation. Consequently, it is necessary to develop an atmospheric dispersion modeling method that supports both data-driven modeling and error propagation detection.
Data-driven modeling methods based on data assimilation (DA) provide an approach of dynamically estimating model parameters and effectively improving the accuracy of model predictions. This approach assimilates the observations into the model to produce a time sequence of estimates of system states [
4]. With the model parameters adjusted, the accuracy of the model prediction is consequently improved. Therefore, data-driven modeling methods based on DA have been widely used in various fields, especially in numerical weather forecasting and meteorological pre-processing. Some studies implemented data-driven modeling into short-range atmospheric dispersion [
5,
6]. They used Gaussian diffusion model and optimization method for minimizing the cost function to support parameter real time updating. A previous study used a particle filter and the European Tracer Experiment (ETEX) dataset to assimilate observations into an atmospheric transport model [
7]. Data-driven modeling can also be used in atmospheric dispersion model to assess the impact of nuclear accidents [
8]. Furthermore, Kalman filter and its extended methods are also extensively used in data-driven modeling due to its extensive framework [
9,
10]. A modified ensemble Kalman filter for nuclear accident prediction was proposed [
11,
12]. Reddy et al. utilized particle filter to improve diffusion model based on Gaussian multi-puff equation [
13]. Among these methods, particle filter is one of the most suitable approaches for highly nonlinear and non-Gaussian models [
14]. Using a series of weighted random sampling particles to approximate the posterior probability density function of the system state, particle filter is able to estimate arbitrary probability densities with few assumption constraints. Therefore, particle filter is applied as the data assimilation method in the air contaminant dispersion in this paper.
Error propagation is a troubling and common problem in numerical computation. The problem in atmospheric dispersion simulation is no exception. Many methods were proposed in previous studies to find how the computational error propagates [
15,
16,
17]. Thus, researchers can then modify the model to reduce computational error according to error propagation detection results. In 1980, Ginsberg proposed a method to monitor floating-point error propagation in scientific computation [
18]. Some researchers analyzed static program to find the floating-point accuracy problems [
19,
20,
21,
22,
23]. Bao et al. developed an on-the-fly floating-point error propagation monitoring method to find the potential floating-point unstable errors [
24]. A point system based on significance arithmetic was described in Sofroniou’s research [
17]. Previous studies only gave analysis results marked as Boolean tag and just analyzed several common operations and statements. However, atmospheric dispersion modeling needs some advanced operations such as exponential and square root. Therefore, the error propagation detection applied in this paper is developed to support more operations and give multi-level results instead of Boolean tags.
The rest of this paper is organized as follows:
Section 2 briefly introduces the data-driven modeling based on particle filter, Gaussian-PF model, error propagation detection and relative error exponent. In
Section 3, a field study was utilized to illustrate the feasibility of our proposed method. Results and discussion are concluded in
Section 4.
Section 5 summarizes the conclusions.
2. Methods
2.1. Data-Driven Modeling Based on Particle Filter
Previously, Kalman Filter (KF) based methods work successfully in linear systems. However, when KF and its variations are used in a non-linear system, the system must be linearized. For most practical systems, the work of linearization is hard to complete. Fortunately, Particle Filter (PF), a suitable filtering method for non-linear system, can be extensively applied in data-driven modeling. PF uses a set of particles to update the system parameters via a Monte Carlo method [
25]. Therefore, it can be used in complex systems that are difficult to linearize. Generally, the expression of a non-linear system is as shown as Equation (1):
where
represents the state condition at time step
, and
represents the observation at time step
,
and
are state transition function and observation function respectively.
and
are noises following Gaussian distribution, being added in state condition and observation respectively. When PF is applied in an atmospheric dispersion process, state condition
means the combination of system parameters that can represent the current state of dispersion process. One of the common practices is dividing the area of dispersion into numerous grids and choosing the concentrations by grid as the state condition. This choice directly describes the atmospheric dispersion. However, the vast region of the chemical cluster means a high dimension of the state parameters, which results in a high computation cost. Observation
means the concentration dataset measured from monitoring stations. At each step of particle filtering, an expected state of dispersion process is estimated for forecasting the concentration distribution in next step. Furthermore, the state transition function
should be fast due to the real-time requirement of the dynamic data-driven modeling, so Gaussian dispersion model is a quite appropriate option.
2.2. Gaussian-PF Model
In Gaussian dispersion model, most parameters (e.g., wind speed, wind direction, source term and etc.) can be obtained by monitoring data (i.e., observations or measurements in this paper). However, the diffusion coefficients are very difficult to be directly measured, which are affected by environmental conditions (e.g., diffusion terrain, atmospheric stability, and sunlight). Generally, diffusion coefficients can be expressed by the function of downwind distance and atmospheric stability class. Therefore, the diffusion coefficients are chosen as the system states to realize data-driven modeling.
Gaussian plume model is utilized for atmospheric dispersion modeling with constant release rate and static wind field. To obtain the expression, the following assumptions are proposed: the location of gas release source is at
; wind direction is the forward direction of
x-axis; wind speed is
m/s; release rate of the source is
kg/s; and atmospheric condition is stable. Under these assumptions, the Gaussian plume model is expressed as follows:
where
represents the concentration at location
.
is the decay factor containing radioactive decay and deposition. It is an exponential function of time
. When the wind direction is
, set the release source as the rotation center and then clockwise rotate the concentration distribution
with degree
. If the release source is not at
but
, the concentration distribution
needs translation transformation.
In the Gaussian plume model, the air contaminant concentration in axis
and
is considered to follow the Gaussian distribution. Therefore, the key parameters of the model are
and
, which represent the standard deviations that describe the crosswind and vertical mixing of air contaminants and satisfy following relation [
26,
27]:
where
means the downwind distance, and
,
,
, and
depend on the diffusion environment (i.e., it refers to diffusion terrain and other parameters, like atmospheric stability, cloud cover, sunlight and wind). In Gaussian plume model, source term (i.e., release rate and source location) and meteorological data (wind speed and wind direction) can be easily measured. But diffusion coefficients are quite difficult to estimate because they are influenced by various environmental factors. Therefore, the key of data-driven modeling is to update diffusion coefficients. The selection of the state parameters is also the key to the construction of the state transition model. In this paper, system state is expressed by the formula
.
When applying a particle filter in atmospheric dispersion modeling, it is assumed that , i.e., . Moreover, can be computed according to Equation (2). After using particle filter and observation to update model parameters, the probability distribution of concentration at each step can be therefore obtained.
In terms of the Gaussian puff model, it is utilized for atmospheric dispersion modeling wherein source releases are instantaneous like an explosion. The advantage of the Gaussian puff model is that it is able to support time-variant wind fields though it cannot model the continuous release. Assuming the release source is located at
and the center of puff has moved to
at time
in wind field
, the equation of Gaussian puff model is shown as follows:
where
is the instantaneous release quantity and
represents the concentration at
and at time
.
is the radioactive decay and deposition factor.
,
and
are diffusion coefficients at
x-axis,
y-axis and
z-axis respectively. They can be computed by the following equations:
In the Gaussian puff model, diffusion coefficients are the function of total distance
that the puff center has moved. The expression of
is:
where
is the wind speed at time
. The combination of diffusion coefficients is also utilized as system state in Gaussian puff model
. State function
also satisfies
, while the observation function
can be computed by multi-puff merging.
A single puff cannot calculate the function
because a Gaussian puff model cannot simulate the dispersion process of a continuous release. Fortunately, the gas plume can be regarded as the combination of gas puffs that are released from a source with an infinitely little temporal interval, which is named multi-puff merging. Therefore, a series of puffs can be merged to approximately substitute a plume. When the release rate
is time-variant,
can be discretized by substituting the plume by a series of puffs with different release quantities. Let
represent a single puff containing all essential parameters for atmospheric dispersion modeling, where
can be described as a puff released from
at time
with instantaneous release quantity
. The corresponding function
represents the concentration of location
at time
of puff
. Thus, the whole dispersion process can be expressed as:
where
means the
puff divided from the plume. By substituting the plume by puffs, it is feasible to model the atmospheric dispersion whose release rate is varying or whose wind field is dynamic. Thus, the observation function
can be easily derived via this approach.
The structure and workflow of the particle filter as well as the detailed procedure of data assimilation are included in an independent section of our previous studies. For simplicity, these contents will not be introduced again and interested readers are referred to those references [
3,
28].
2.3. Error Propagation Detection
In numerical computing, error propagation would lead to devastating results. For example, a Patriot missile failed to intercept its target, and this caused many casualties during the Gulf War because of a floating-point computational error. In this paper, errors of atmospheric dispersion modeling and simulation are classified into two categories: measurement error and floating-point error . Measurement errors usually represent the errors generated by monitoring devices. They usually include errors in source term information (release rate and WGS84 coordinate), meteorological data (wind speed and wind direction), concentration data, etc. Nevertheless, floating-point errors are generated during the numerical computation due to the limits of memory and the expression of floating-point numbers. Since most real numbers cannot be expressed precisely in a computer, floating-point errors may appear in any places of the atmospheric dispersion simulation, like wind field generation, dispersion equation calculation and particle filtering iteration. In this paper, an error propagation detection method based on the relative error exponent is developed. The exponent of relative error can be calculated by . Consequently, this method can find the potential floating-point instability problems during the computation, which can assist environmental protection authorities to evaluate whether the results are of high-confidence.
In this paper, the floating-point errors and measurement errors are combined in the source term parameters, wind field and concentration data. To fit the concentration measurements, the integration of particle filtering and error propagation detection is then used to update the system state of the atmospheric dispersion process. The estimated concentration distribution and error analysis are finally computed.
Errors are highly possible to seriously affect the accuracy of results and they are inevitable because of the limitations of sensors and computers. Therefore, any atmospheric dispersion modeling results without a high-confidence error analysis are not convincing. The proposed error detection method is able to verify that to what extent the results are seriously influenced by errors. In order to explain the error propagation detection, several definitions should be introduced firstly [
24]. For a variable,
means “computational value”, which is the value stored in the computer memory. Computational value is limited by measurement error and floating-point error. “Actual value”
means the value of the variable in reality with no error (infinite precision). Absolute error
represents the difference between actual value and computational value with infinite precision, which can also be expressed as
(i.e., sum of measurement error and floating-point error). Relative error
is computed as
. Obviously, when computational value is close to zero, it is possible to generate high relative error. For example, consider the addition operation:
The computational result of this expression is , and the absolute error is . Therefore, the relative error can be computed by equation . When , , and , both and are relatively little (close to ). However, the relative error of the sum reaches 20, which means the relative error is times greater than the original ones.
In atmospheric dispersion modeling and simulation, errors usually contain measurement errors and floating-point errors. Measurement errors exist in all the sampled data from monitoring devices. Therefore, all the input measurements have initial errors. In terms of floating-point errors, most real numbers cannot be expressed precisely in a computer because they are limited by the floating-point format. When a program runs, the floating-point errors will propagate gradually. As concluded in the example, addition operations may cause huge relative error. Furthermore, errors could also be propagated because of multiplication, division, exponential, operations, etc. As a result, a detailed list of error propagation rules for relative error inspection is important in atmospheric dispersion modeling and simulation.
2.4. Relative Error Exponent
For one thing, the computer needs huge resources to record the actual values with no error. For another thing, some irrational numbers are impossible to be recorded in memory. Thus, the precise error of each variable cannot be calculated in a normal program. To address this problem, a method of estimating the exponent of relative error
is proposed to classify the error of each variable. Firstly, according to the property of monitoring device and its floating-point value, the exponent
of variable
(calculated by equation
) and its relative error exponent
(calculated by equation
) are determined. Moreover, exponent of relative measurement error
can be easily calculated, and initial exponent of relative floating-point error
can also be estimated precisely. However, it may take too much time for error estimation. Thus, it is assumed that the initial exponent of relative floating-point error
of a “float” variable is
, and that of a “double” variable is
. The initial exponent of relative error is the maximum of
and
. Furthermore, error propagation rules of operations appearing in data-driven atmospheric dispersion modeling (addition, multiplication, inversion, exponential, square root, and sine) are shown in
Table 1. Addition and multiplication can be found anywhere during computation, while inversion, exponential and square root are used in Gaussian models and radioactive decay factors. Sine functions are essential in wind field generation.
In terms of addition, the relative error of addition has also been analyzed before. Consequently, the relative error
can be obtained by following equation:
As shown in this equation, the value of relative error exponent
depends on the values of
and
. Thus, the exponent can be estimated by
. As for multiplication, the derivation process is similar to addition. Relative error
can be calculated first:
Because
and
, we have:
Thus, the relative error exponent of multiplication is
. Division can be decomposed into multiplication and inversion operations. In terms of the inversion
, the derivation equation of
is:
Therefore, the relative error exponent of inversion operation is
. The calculation of exponential operation needs approximation. The relative error
can be expressed by the following equation:
Then, the expression of relative error exponent
of exponential operation
can be obtained. In terms of square root operation, its relative error
can be calculated by:
so the exponent of relative error is
. In order to estimate the relative error exponent of sine function, the relative error also needs some transformation so that it can be expressed by
and
. The derivation equation is:
Therefore, the relative error exponent of sine function is
. These rules cover all operations used in this study. They are written inside the program to trace the error propagation and give the final analysis. In summary, the error propagation rules of all necessary operations are shown in
Table 1.
4. Results and Discussion
The concentration generated in the same monitoring station declines with time (see
Figure 2a). This seems unreasonable because the meteorological measurements show that the wind direction only showed a little fluctuation during this period and the wind speed only increased slightly. Moreover, the release rate measured by the monitoring sensor installed on the stack also shows that it remained at the same level. Therefore, the reason that causes the decrease of concentration is the environmental conditions, which is affected largely by atmospheric stability, cloud cover, sunlight and other complex factors [
27]. The factors of atmospheric stability, cloud cover, sunlight would generate an effect of plume rise [
30]. It is worth noting that the plume rise is beneficial to reduce the concentration of pollutants on the ground. Moreover, atmospheric stability, sunlight and terrain greatly influence the determination of dispersion coefficients [
31,
32,
33]. Dispersion coefficients are of vital importance to dispersion simulation. The disadvantage of traditional plume models is that diffusion coefficients cannot be directly calculated.
The results of using the traditional Gaussian plume model without data-driven modeling to recover the SF
6 atmospheric dispersion are shown in
Figure 3a and
Table 3. The mean squared error (MSE) of the traditional Gaussian plume model is
. Meanwhile, the correlation coefficient (
r) between observations and model predictions is 0.5409. The mean value of real measurements, mean value of model predictions, standard deviation value of real measurements and standard deviation value of model predictions are
,
,
and
respectively.
Figure 4a illustrates the concentration error (absolute difference between measured concentration and modeled concentration) distribution at 10:00. Obviously, the Gaussian plume model shows quite high error at some sensors.
Figure 3b and
Table 3 display the results of the traditional Gaussian multi-puff model without particle filtering. In this figure, the modeling results of traditional Gaussian plume and puff models are quite similar, which means merging a series of puffs to substitute a plume is a feasible approach. However, both traditional Gaussian plume and multi-puff models cannot change the diffusion coefficients during the simulation process. The MSE of Gaussian multi-puff model is
, even higher than plume model. Meanwhile, the correlation coefficient (
r) between observations and model predictions is 0.6046, which outperforms the value of Gaussian plume model. The mean value of real measurements, mean value of model predictions, standard deviation value of real measurements and standard deviation value of model predictions are
,
,
and
respectively. Seen from
Figure 4b, traditional Gaussian puff model shows high error at some sensors, which is similar to the traditional Gaussian plume model.
In terms of PF-based methods,
Figure 3c and
Table 3 illustrate the results of Gaussian-PF plume. Obviously, data-driven modeling based on particle filtering has played an important role during computing. The accuracy of modeled concentrations improves markedly. The MSE of the final result is
, which is significantly less than the MSEs of traditional Gaussian models. Meanwhile, the correlation coefficient (
r) between observations and model predictions is 0.4577, which experiences an obvious decrease compared to that of traditional Gaussian plume model. The mean value of real measurements, mean value of model predictions, standard deviation value of real measurements and standard deviation value of model predictions are
,
,
and
, respectively.
Figure 4c shows that the concentration error at 10:00, which has been reduced significantly comparing to traditional models.
As shown in
Figure 3d and
Table 3, the results of Gaussian-PF multi-puff model is also satisfactory. The MSE of this model is only
. Therefore, Gaussian-PF multi-puff model has the optimal fit to the actual measurements. Meanwhile, the correlation coefficient (
r) between observations and model predictions is 0.5211, which also experiences an obvious decrease compared to that of traditional Gaussian multi-puff model. The mean value of real measurements, mean value of model predictions, standard deviation value of real measurements and standard deviation value of model predictions are
,
,
and
respectively. However, as shown in
Table 4, the Gaussian-PF multi-puff model needs quite a long execution time. Therefore, when meteorological conditions are relatively stable, Gaussian-PF plume model is appropriate since it performs well in both fitness and execution time. However, if wind direction and wind speed are unstable and dynamic, Gaussian-PF multi-puff model is the only choice that is able to fit measurements well [
34].
As seen in
Figure 3,
Table 3 and
Table 4, the Gaussian-PF models perform better in MSE while the traditional Gaussian-based models have better correlation coefficients. The measured concentration values obey the approximate Gaussian distribution and the concentration curve computed by Gaussian-based models also conforms to the Gaussian distribution. Thus, the correlation coefficient between real measurements and predictions of Gaussian-based models is higher. However, the predicted error is larger because Gaussian-based models are simple. For the Gaussian-PF models, in order to approximate the real concentration value, the Gaussian-PF models have to change the original Gaussian distribution curve. So the correlation coefficient between real measurements and predictions of Gaussian-PF models decreases. The purpose of the proposed method is to reduce the prediction error, and therefore the proposed method is acceptable. As for computational time, the computational efficiency is sacrificed for the improvement of prediction accuracy. The addition of the particle filter and error propagation detection makes the proposed models very computational expensive. However, it is noteworthy that the unit of computational time is ms and most of the computational-time results (e.g., 1388.19, 2453.13 and 120,040.87 ms) are acceptable. Moreover, when applied into practical utilization, it is better for users to consider the practical needs and purposes. For example, if users require accurate prediction results in a limited time, Gaussian-PF plume model would be the optimal choice. However, if the wind field is varying and the users do not care about the calculation time, Gaussian-PF multi-puff model would be the better choice.
The results of error analysis are shown in
Figure 5. After experiments, it is found that the final relative error exponents of traditional models and their corresponding data-driven modeling methods are same because randomly generating system states can eliminate the historical error at each step. Therefore, data-driven modeling method based on particle filtering will not accumulate errors. In this figure, it is noteworthy that values of relative error exponent are all integers. This is because the definition of relative error exponent
(calculated by equation
and it is an integer) and its operations (i.e., operations of plus and minus in third line of
Table 1). Furthermore, both plume and multi-puff model do not have large relative errors after computing. After comparing the final relative error exponents of plume model and multi-puff model, it is obvious that Gaussian plume model and its corresponding data-driven modeling method have less probability to cause floating-point unstable problem.
Figure 6a shows the distribution of concentration at 11:00 using Gaussian-PF plume model.
Figure 6b illustrates the corresponding relative error exponents of calculated concentrations. Because the expression of Gaussian plume model is quite simple, the relative error exponent mostly depends on the wind direction. As shown in this figure, the relative error exponents of calculated concentrations at crosswind direction are low, while the relative error exponents at upwind direction are quite high. The concentration at upwind direction is usually very close to zero, so it is reasonable that its corresponding error exponent is high. Therefore, Gaussian plume model is stable because the maximum relative error exponent is only −5 and it will not increase with time.
The error analysis results of Gaussian-PF multi-puff model and its corresponding data-driven modeling method are also shown in
Figure 6.
Figure 6c demonstrates the concentration distribution of Gaussian-PF multi-puff model at 11:00 and its relative error exponent distribution is shown in
Figure 6d. A significant difference between Gaussian plume and Gaussian multi-puff model is that the error propagates with time in Gaussian multi-puff model while it does not increase in the Gaussian plume model. Thus, in the Gaussian multi-puff model, it is obvious that points far from release source generally have high relative errors because puffs need more time to reach these points. However, points that are very close to the release point also have high relative errors because their downwind distances are too short (very close to zero) to avoid errors.
As can be seen in
Table 4, after using error detection, the computational times of the Gaussian multi-puff and Gaussian-PF multi-puff models become very long, while error detection has relatively little influence on Gaussian plume model. The computational time of Gaussian-PF plume model is shorter than the Gaussian multi-puff model. Thus, Gaussian-PF plume model is an appropriate choice when the wind field is stable. The Gaussian-PF multi-puff model has the highest accuracy, quite low error and long computational time, so it might be a better option when the time requirement is not so strict. More importantly, the utilization of Equation (3) for the diffusion coefficient is usually applied in the case of the open terrain, while the Indianapolis experiment was performed in urban terrain [
35]. Actually, the focus of this paper is to improve the prediction performance of atmospheric dispersion simulation. As seen from experimental results, it is concluded that the data-driven method based on particle filter can improve the prediction ability of Gaussian-based models and error propagation detection can inform users whether or not the prediction results are of high confidence. Thus, the results imply that our proposed method is feasible in actual experiments. For further improving the fitting results in
Figure 3 and
Table 3, the diffusion coefficients applied in urban terrain should be used. Interested readers can refer to our future work.
5. Conclusions
In this paper, a data-driven method using a particle filter is developed to improve the accuracy of air contaminant dispersion predictions based on Gaussian-based models. In order to judge whether the predicted results are convincing, an error propagation detection method is also proposed in this paper to analyze the influence of measured or computational errors. In terms of data-driven modeling, diffusion coefficients are regarded as the system states, and particle filtering is then used to update diffusion coefficients during each-step calculation. As for error propagation detection, six rules are listed to inspect the error propagation of essential operations during numerical computing. Then, a series of experiments are designed based on Indianapolis Field Study to test the performances of the proposed data-driven modeling method. Compared to traditional Gaussian-based models (the MSEs of Gaussian plume model and Gaussian multi-puff model are and respectively), the proposed methods based on PF and error propagation detection witness a significant improvement in prediction accuracy and provide convincing results (the MSEs of Gaussian-PF plume model and Gaussian-PF multi-puff model are and respectively) though requiring more computational time (1.39 s for Gaussian-PF plume model; 2.45 s for Gaussian-PF plume model with error detection; 120.04 s for Gaussian-PF multi-puff model; 3672.95 s for Gaussian-PF multi-puff model with error detection). In conclusion, data-driven modeling methods have higher accuracy but longer computational time. However, when error detection is applied in the program, the Gaussian-PF plume is still faster than the traditional puff model, while the computational time of Gaussian-PF multi-puff model becomes extremely long. The results reveal that when applied into practical utilization, it is better for users to consider the practical needs and purposes. If users require accurate prediction results in a limited time, Gaussian-PF plume model would be the optimal choice. However, if the wind field is varying and the users do not care about the calculation time, Gaussian-PF multi-puff model would be the better choice. Therefore, the proposed methods provide strong support for the prediction of air contaminant dispersion and emergency management in chemical clusters.
Future works should include implementing the field experiment in a chemical cluster to verify the data-driven method in real situations, improving the computational efficiency of the proposed methods and dynamic modeling of the wind field for a more accurate prediction of the atmospheric dispersion model.