1. Introduction
Companies around the world are being encouraged and, in some cases, required to measure and report their greenhouse gas emissions (GHG) [
1]. Proper accounting of GHG emissions allows investors and other stakeholders to assess the risks and opportunities related to investment decisions and to evaluate the company’s commitments to net-zero carbon emissions goals [
2,
3,
4].
The Intergovernmental Panel on Climate Change (IPCC) leads an effort to develop guidelines to standardize the calculation and reporting of GHG emissions and inventories [
5]. Although the procedures are based on data and statistical parameters [
6], the IPCC [
7] emphasizes that there may be uncertainties in estimating emission factors.
Companies that are more attuned to market changes are adopting procedures and indicators to measure environmental impacts, which allow them to ensure the quality of their ESG actions (environmental, social, and governance) and report their commitments to sustainable development. Faulty, ambiguous, or imprecise measurements and indicators can generate criticism and associate a company with “greenwashing” [
8].
In this study, we examine the process of estimating GHG emissions from a formaldehyde producer, highlighting the challenges related to addressing uncertainties and deviations involved. The case is particularly interesting since many variables, possibly correlated, are evaluated to arrive at a final estimate of emissions. It is important to note that, when there are correlations among variables, evaluating uncertainties in the resulting emissions is not trivial.
Carotenuto et al. [
9] state, “The uncertainties associated with each step in compiling emission inventories typically accumulate, although it is complicated to estimate the total effect due to the difficulty in determining the uncertainty in each step and how uncertainties interact with each other”. Furthermore, uncertainties may change over time as activities that produce emissions improve and sources become better characterized [
10,
11].
Uncertainty analysis is an important aspect of Life Cycle Assessment (LCA), a widely used method for evaluating the energy efficiency and environmental impact of products, services, and processes, from raw material production to end-of-life management of a service or product [
12,
13].
Park et al. [
14] used the error propagation method to identify the main input parameters affecting the uncertainty of carbon footprint results for dairy farms in Korea. Similarly, Igos et al. [
15] discuss the importance of uncertainties in LCA studies. Marujo et al. [
16] studied uncertainties and deviations in the estimates of a company’s GHG inventory when there are significant correlations among emissions from various sources.
The objective of this study is to investigate how the uncertainties typically present in the operations data of an industrial plant must be considered to determine unbiased and precise estimates of its greenhouse gas emissions. We focused on the case where emissions are a function of the product of many random variables that are subject to variability. In such cases, the expected emission cannot be determined by simply multiplying the expected values due to the phenomenon known as the propagation of variance. Variances and covariances must be included in the calculation. The procedure we propose is based on the Monte Carlo method, and a specific algorithm was developed for the calculations. The procedure was applied to a real-life case study of an industrial plant dedicated to formaldehyde production.
In the Materials and Methods section, we present the necessary steps to estimate the direct emissions from the formaldehyde production plant. We demonstrate that, even for the point estimate of the emission average, the calculation is not straightforward, and we provide a procedure to obtain the mean and variance of the emission.
Finally, in the Results, Discussion, and Conclusion sections, we illustrate the method’s application in a real case and discuss the limitations of the proposed method, as well as potential applications for future studies. We end this article by arguing that this study will contribute to the assessment and quantification of uncertainties in emissions from an industrial plant and present suggestions for ways to improve the accuracy of these estimates.
2. Materials and Methods
2.1. Scope of Analysis
The case that motivated this study refers to the formaldehyde production process of Copenor, a chemical company that has been operating since 1986 in the municipality of Camaçari, Bahia, Brazil. Copenor is recognized as a leader in ESG practices in Brazil and holds certificates that attest to the conformity of its quality and environmental management systems, including health and safety at work. Moreover, the company invests in technology and innovation to improve its processes and reduce the environmental impact of its activities.
Currently, its industrial plant has three active operational units, two of which produce formaldehyde based on metal or silver oxide catalysts, a hexamethylenetetramine unit, and methanol tanks for consumption and resale.
Our analysis refers to the main unit of the plant, called formaldehyde oxide, or simply oxide route, about the catalyst, based on molybdenum oxide and iron molybdate, used in the reaction.
Formaldehyde is a chemical compound used as a stabilizing, bactericidal, and plasticizing agent, as well as a component of synthetic resins, in agricultural production chains and a variety of industries, including textiles, leather, rubber, cosmetics, and pharmaceuticals [
17]. Due to its wide range of applications, formaldehyde is produced on a large scale worldwide, totaling approximately 25 to 27 million metric tons annually. The consumption rate of this product has maintained a notable consistency over the past two decades, indicating its significance and continuous demand in industrial sectors [
18].
The most efficient formaldehyde production process, in terms of conversion and yield, is the one that uses iron oxide and molybdenum as catalysts, with a conversion rate above 92%. Alternative processes that use metallic silver as a catalyst are less efficient and produce more carbon dioxide equivalent (CO2e).
According to the GHG Protocol [
19], our analysis is in a category called “Direct Emissions” and is included in Scope 1. The period of analysis corresponds to the year 2021.
2.2. The Formaldehyde Production Process through the “Oxide Route”
Formaldehyde is obtained by the partial oxidation of methanol in the gaseous phase using a fixed-bed reactor with a catalyst made of molybdenum oxide and iron molybdate (Equation (1)). The concentration of methanol is kept at controlled levels between 4% and 9% of volume to avoid the formation of explosive mixtures and to ensure an atmosphere suitable for oxidation. The heat generated by the reaction is removed from the reactor by boiling a heat transfer fluid (HTF), which is used for steam generation.
The actual yield of formaldehyde ranges from 91% to 94% of the theoretical maximum. The loss of methanol is expressed by measuring the unreacted methanol and the presence of carbon monoxide, dimethyl ether, and small amounts of formic acid present in the formaldehyde stream.
The oxidation of methanol occurs on the surface of the catalyst inside the tubes of the R-3006 reactor. When the gaseous mixture reaches the heated catalyst layer, the reaction starts, and the temperature rapidly increases, reaching its maximum point. As the methanol is completely reacted with, the temperature tends to approach the HTF boiling point.
The reaction gas is redirected from the bottom of the reactor to the vaporizer H-3004. The remaining gases at the top of the column are directed to the absorber tower C-5004. The exhaust gas from the top of C-5004 is divided between the emission control system (ECS) and the fans F-2008 and F-2009 in series. The amount of gas sent to the ECS is controlled by the oxygen content control valve, AV-30023. The gas recirculation and fresh air proportion control valves are adjusted to maintain a constant oxygen concentration after the fans.
Then, the exhaust gas is sent to the chemical incinerator R-5506, where it is oxidized over a platinum-based catalyst bed. During the exothermic reaction, the temperature increases, and the outlet gas is cooled through heat exchange with a water boiler. The single source of emissions from the formaldehyde oxide unit is reactor R-3006.
2.3. The Method to Estimate Direct Emissions in Formaldehyde Production
The direct emissions were estimated using measurements and approximate calculations. The basic stoichiometric relationship to produce formaldehyde (HCHO) from methanol (CH
3OH) is shown in Equation (1):
Note that the reaction does not produce any GHG. However, in an industrial environment, the reaction is not perfect, generating some waste that needs to be burned, and it is also influenced by the so-called “parallel reactions,” which depend on the quality of catalysts and other factors. Therefore, because of the complexity of the many reactions involved, GHG emissions are determined using direct measurements in the purging stream that are monitored by the emission control system. Thus, we determine the emissions resulting from the industrial process using a formula that has as its inputs the concentration of CO2 in the exhaust gases (measured by chromatography), temperature, production hours, and average daily production rate. This method was devised to calculate the CO2 emissions arising from the partial conversion of methanol into formaldehyde. The formula is based on the value of the CO2 concentration measured in the gas exhaustion tower and incorporates adjustment factors to account for variations in temperature and gas flow speed.
The mass of CO
2 emitted during a certain period is estimated by Equation (2).
where:
M = mass of CO2 emitted during the period (in t).
Qmean = average hourly flow rate during the period (in N m3/h).
V = CO2 concentration in exhaust gases (in N m3/N m3).
h = number of production hours during the period (in hours).
r = conversion factor from Nm3 of CO2 to tons of CO2 (in t/N m3).
k = adjustment factor for the Clapeyron formula (in K/K).
The
Qmean is calculated based on the ratio of the flow rate (7216 N m
3/h) for a projected daily production (134.4 t/d) using the following Equation (3):
where:
This is an approximation because it is known that there is a relationship between the load (production) and the flow rate (purged amount), although it is not perfectly linear. In this work, we ignored the potential deviation that this linear approximation may cause.
The factor
r was used to convert m
3 of CO
2 to tons of CO
2 through Equation (4), derived from the Clapeyron equation [
20].
In which 44 is the molecular mass of CO
2 (in g/mol), 0.0022414 is the ideal gas constant, and 10
−6 is the conversion factor from grams to tons. The factor
k, according to Equation (5), was necessary to adjust the ideal gas equation to the case where the temperature is not at 0 °C.
Recalling that 273.15 is the temperature of 0 °C in Kelvin degrees and that the exhaust temperature (in Kelvin degrees) is
T = 393.15 K. Thus, the emitted mass of CO
2 (
) is a function of four variables:
P,
V,
h, and the temperature of exhaust gases (1/
T), as shown in Equation (6):
We investigated how these variables are measured and what their associated uncertainties are. Then, as reported in the following sections, we calculated the estimated mean and variance of each variable based on our analysis of variability in the process and the data collection routines in a practical case study.
2.4. Statistical Model
Observe that, according to Equation (6), the emission of CO
2 mass (
) is estimated by a product of a constant (
k) and four other variables to be obtained by measurements in the field. See Equation (7).
2.4.1. The Expected Value of M
Unless the variables P, V, h, and 1/T are statistically independent, the product of the mean values of them (and the constant k) will not produce the mean of the M. The mean of a variable, also called the “expected value” of a variable, is the weighted average of its possible values.
To compute the expected value of
M, we needed a formula for the expectation of the product of four random variables. Such a formula is not a simple one in the general case where covariances are present. The exact formula for the mean and variance of the product of K-random variables involves the expected values, variances, and correlations among the original random variables, as well as the correlations of their squares. However, Bohrnstedt and Goldberger [
21] proposed approximate formulas and discussed their efficiency under assumptions such as the symmetry of the probability distributions and the absence of certain correlations. We did not use those formulas and preferred a procedure that used Monte Carlo simulation because it can be applied in cases where the probability distributions are neither normal nor symmetrical and subject to any sort of correlation. Monte Carlo simulation is a computational technique that uses randomly generated samples to simulate the occurrence of a series of results. Another advantage of the Monte Carlo method is that it simultaneously solves the problem of estimating the variance of the product, as we comment in
Section 2.4.2 below.
The information that was used as input for the method encompasses:
The mean and standard deviation of P, V, h, and 1/T.
The covariances among these variables.
2.4.2. The Variance of M
The variance of a variable is the usual measure of its dispersion around its mean value. Covariance is a measure of the joint variability of two random variables, and the correlation is defined as their covariance divided by the square root of each of their variances. The variance of the product involved in the expression in Equation (7) could be approximately computed using the so-called “variance propagation formula” [
22]. This formula is derived using Taylor expansion, and its accuracy diminishes as the variation of the components increases [
23].
Other approximate formulas are discussed in Bohrnstedt and Goldberger [
21]. As we pointed out in
Section 2.4.1, we did not use the approximate formulas but proposed an algorithm based on the Monte Carlo method to compute the expected value as well as the variance of the product of variables in Equation (7).
We used an algorithm detailed in
Supplementary File S1 for the case under study; all variables were assumed to be normally distributed, with the mean and variation parameters determined by field measurements. The adequacy and effects of the normality assumption are discussed in
Supplementary File S2.
2.4.3. Data Gathering
Each of the variables in Equation (7) was measured or estimated according to the following models:
P—The daily production was directly measured according to the company’s management system records from 1 January to 31 December 2021. The value of the formaldehyde production in tons is considered quite accurate for long periods. For short periods (such as a day), there may be distortions between actual and recorded values. These variations cancel out the accumulated values produced over long periods. P was considered a normal random variable whose mean and standard deviation remained constant throughout the year. The average hourly flow rate during production periods was estimated without considering the variation of P caused by interruptions and resumptions of activity. Variations in production due to interruptions and resumptions were captured in the computation of the number of h. The value of P was used to calculate the Qmean during a day, and h was used to estimate the time in which this flow was practiced during the day. Thus, the uncertainty about the value of P affected the uncertainty about the value of the hourly flow, which was controlled to correspond to approximately 90% of the maximum design flow.
V—Volumetric concentration measurements were made by laboratory analysis using gas chromatography, which exhibits a high degree of precision. The volumetric concentration of CO2 corresponds to the average value of the readings during operations throughout the year. Therefore, V was considered a normal variable with a constant mean and standard deviation throughout the year.
h—The number of hours worked per day was a variable recorded by the company’s management and is subject to imprecision due to approximations and measurement errors. For example, some hours worked in a given month could be recorded in the subsequent month, or vice versa. Deviation in the recording of production hours was possible when there were interruptions in production, and it was difficult to determine when the counting of production time should begin as the process required a certain interval of time to stabilize.
1/T—The temperature measurement of exhaust gases should remain controlled at 120 °C (393.15 K). However, variations in the process, load, flow, or even external temperature can affect this temperature. We consider that 1/T is a normal variable and that its mean and standard deviation remain constant throughout the year.
2.4.4. Correlations between the Variables
There are six possible correlations among these variables (
Figure 1). In the first analysis, the variables
P,
V,
h, and 1/
T appear to be independent of each other. However, in a detailed examination of the data-gathering procedures and the physical operation of the industrial process, we observed a strong correlation among some of these variables, as reported in the next section.
4. Discussion
4.1. Known and Unknown Uncertainties
The computation of the confidence interval for the emission is subject to limitations. Perhaps the most important one is that we cannot guarantee that we are capturing all uncertainties possibly affecting the resulting emissions. We have only considered those that we know exist and those that we could imagine using our experience and understanding of similar operations. The observations reinforce the points highlighted by Molina-Castro [
24], who also emphasizes the innovative use of expert criteria to determine the expected variabilities in certain input variables when such information is not available. However, it’s important to acknowledge that this approach is not exempt from limitations.
Another fact that might distort the result is the presence of data outliers and possible data-gathering mistakes that went unnoticed. In practical applications, beyond the natural variability in the values associated with operational parameters, such as temperature or the quantity of raw materials processed over a specific period, the presence of outliers and recording errors can confound the analysis. Our approach in this study assumes that all data should be utilized once validated by company management. Furthermore, if this data contains mistakes and erroneous records, they must be considered constituents of the inherent data variability. In the present case, all recommended procedures were followed to ensure maximum reliability of the data and data processing procedures.
4.2. The Effect of Correlations
An important part of this article is dedicated to the analysis of correlations among P, V, h, and 1/T and the effect of these correlations in the estimation of the emission.
Assessing these correlations, though, depends on understanding the chemical and physical relationships in the production process. Groen and Heijungs [
25] discuss the risks involved in ignoring correlations between the input parameters and how ignoring correlations can lead to an underestimation or overestimation of the output variation.
Thus, it is interesting to evaluate what would be missed if correlations were all assumed to be zero. Considering this assumption, while maintaining the rest of the data from the case study presented, the result changes to:
The average value obtained above by the Monte Carlo method with the assumption of zero correlation is the same result that we would obtain by simply multiplying the mean values of P, V, h, and 1/T. This observation serves as a remark for the statistical theorem that the mean of the product of random variables is equal to the product of their means only if there is no correlation among the factors of the product.
We must stress that such simplification might produce a result that appears more precise. This would occur if the disregarded correlations were positive numbers. However, it might be more precise but biased.
In our example, disregarding the correlation produced a bias of 1% in the estimation of the mean emission value. Neglecting the correlation would enlarge the standard deviation from 175 t to 199 t, a 14% increase.
4.3. Length of the Period of Analysis
The estimate of emissions for a period shorter than one year depends on knowing h, the variable accounting for the number of hours in that time interval, and the standard deviation of h. The standard deviation of h for an annual period or a monthly period need not be the same. Neither in percentage terms nor in absolute terms. This deviation depends on the number of disruptions during the period and on other factors.
Also, it is necessary to consider that the catalyst used in the formaldehyde production process wears out and is replaced every 12 to 15 months, approximately. As the catalyst wears out, the process tends to lose efficiency, and management adjusts the controls to maintain maximum efficiency. These adjustments alter the production load, flow rate, etc. The intake and exhaust of gases are also adjusted. Temperature, waste level, and CO2 concentration are affected. Therefore, all variables in Equation (7) might change from month to month and seem to be a gross approximation to consider the annual statistics for the monthly analysis.
4.4. Control of the Process and Precision
The measurement of exhaust gas temperature is another issue that requires attention. It should remain controlled at 120 °C (393.15 °K). The formaldehyde production process is exothermic and uses a special fluid to remove the reaction heat. The unit has an important thermal recovery system, as the output stream from the reactor heats the methanol and air mixture streams. The control of the system is guaranteed by a set of valves operated via SDCD. The control is targeted to maintain the temperature within a range of values where the process is most efficient.
However, it was identified that this is subject to variation. In this case, a temperature fluctuation of ±10 °C results in variations of +2.6% to −2.5% in the normal flow and CO
2 emission estimates (
Table 1). This shows that controlling the temperature of the exhaust tower might significantly improve results in terms of emissions and emission estimate precision.
4.5. Comparison with Other Emission Estimates
It is interesting to compare the estimates obtained by the methodology presented in this case study with the results obtained by the company itself and by third-party technicians hired in 2022 to analyze the emissions related to the 2021 formaldehyde production process.
The annual emissions for 2021 in formaldehyde production by the oxide method are presented in
Table 3. The difference between estimates A and B was only 3%, while the difference between A and C was 48%.
Estimate C adopted a different procedure for measuring the volumetric flow rate in the exhaust gases, and it was not clear whether the choice of time to collect this information was appropriate. Another possible reason for the discrepancy was that the third-party contractors only took measurements during October 2021 and somehow extrapolated them to the entire year.
In a recent study, Barahmand and Eikeland [
26] concluded that the use of different databases and the application of different methodologies are one of the main sources of variability in data and uncertainties. However, the methodologies and data used in A and B were the same. The difference in results was due to the procedures used to calculate the averages (procedure B did not calculate standard deviations) from the collected data.
5. Conclusions
The Copenor case study analyzed how variations in the data affect GHG emissions in an industrial plant dedicated to formaldehyde production. The means and standard deviations were recorded for the estimation of direct emissions, and the effects of variations in the variables P, V, h, and 1/T on the emission estimates were studied. It was observed that variations in the daily production of formaldehyde (P), the volumetric concentration of CO2 in the exhaust gases (V), the number of hours worked (h), and the temperature of the exhaust gases (1/T) led to deviations in the estimates of GHG emissions. Additionally, a correlation between the variables 1/T and V was identified, which required the application of Monte Carlo simulation to adequately estimate the variance considering the four variables involved (P, V, h, and 1/T).
This research highlights the significance of considering these variations in accurately estimating greenhouse gas emissions in industrial processes. It also holds great relevance for informing “carbon peak and carbon neutrality” policies, as it provides valuable insights into the complexities of emissions estimation. Policymakers can leverage these findings to design more effective strategies for emission reduction and carbon removal, contributing to a more sustainable and environmentally responsible industrial landscape. Moreover, this study’s methodology using Monte Carlo simulation offers a robust framework applicable to similar industries, facilitating the development of emission reduction initiatives.