Random Forest Classifier for Cloud Clearing of the Operational TROPOMI XCH4 Product

Borsdorff, Tobias; Martinez-Velarte, Mari C.; Sneep, Maarten; ter Linden, Mark; Landgraf, Jochen

doi:10.3390/rs16071208

Open AccessArticle

Random Forest Classifier for Cloud Clearing of the Operational TROPOMI XCH₄ Product

by

Tobias Borsdorff

^1,*

,

Mari C. Martinez-Velarte

¹,

Maarten Sneep

²

,

Mark ter Linden

² and

Jochen Landgraf

¹

SRON Netherlands Institute for Space Research, 2333 CA Leiden, The Netherlands

²

Royal Netherlands Meteorological Institute, KNMI, 3731 GA De Bilt, The Netherlands

^*

Author to whom correspondence should be addressed.

Remote Sens. 2024, 16(7), 1208; https://doi.org/10.3390/rs16071208

Submission received: 7 February 2024 / Revised: 18 March 2024 / Accepted: 21 March 2024 / Published: 29 March 2024

(This article belongs to the Section Atmospheric Remote Sensing)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

The TROPOMI XCH₄ data product requires rigorous cloud filtering to achieve a product accuracy of <1%. To this end, operational XCH₄ data processing has been based on SUOMI-NPP VIIRS cloud observations. However, SUOMI-NPP is nearing the end of its operational life and has encountered malfunctions in 2022 and 2023. In this study, we introduce a novel machine learning cloud-clearing approach based on a random forest classifier (RFC). The RFC is trained on collocated TROPOMI and SUOMI-NPP VIIRS data to emulate VIIRS-like cloud clearing. After training, cloud masking requires only TROPOMI data, and so becomes operationally independent of SUOMI-NPP. We demonstrate the RFC approach by applying cloud clearing to operational TROPOMI XCH₄ data for August 2022, a period in which VIIRS was not operational. For validation, we analyze the TROPOMI XCH₄ data at 12 TCCON stations. Comparison of cloud clearing using the RFC and the original VIIRS method reveals excellent agreement with a similar station-to-station bias (−7.4 ppb versus −5.6 ppb), a similar standard deviation of the station-to-station bias (11.6 ppb versus 12 ppb), and the same Pearson correlation coefficient of 0.9. Remarkably, the RFC cloud clearing provides a slightly higher volume of data (2182 versus 2035 daily means) and appears to have fewer outliers. Since 21 November 2023, the RFC approach is part of the operational processing chain of the European Space Agency (ESA). For now, the default practice is to utilize SNPP-VIIRS when accessible. Only in cases where VIIRS data are unavailable do we resort to the RFC cloud mask.

Keywords:

TROPOMI; CH₄; SUOMI NPP; VIIRS; CLOUDS; machine learning; random forest classifier

1. Introduction

Methane (CH₄) is the second most important anthropogenic greenhouse gas after carbon dioxide (CO₂), surpassing CO₂ in its heat-trapping effectiveness [1]. Human activities, such as livestock digestion, rice cultivation, and the fossil fuel industry, are the primary sources of anthropogenic CH₄ emissions [2]. Satellite-based CH₄ measurements play an important role in identifying emission sources, quantifying sinks, and devising strategies to mitigate climate change [3,4].

The Tropospheric Monitoring Instrument (TROPOMI) on ESA’s Sentinel 5 Precursor (S5-P) satellite has global coverage within a day and provides the dry air mole fraction-averaged methane columns XCH₄ for clear-sky scenes with a spatial resolution of up to 5.5 × 7 km². TROPOMI is a grating spectrometer with a wide spectral coverage including ultraviolet (UV), visible (VIS), near infrared (NIR) and short-wave infrared (SWIR) [5]. The TROPOMI XCH₄ data product is retrieved from the instrument’s NIR and SWIR measurements by deploying the RemoTeC algorithm [6]. It is worth noting that due to algorithmic limitations on the viewing zenith angle, the swath is narrower for TROPOMI XCH4, resulting in a lack of full daily global coverage.

During recent years, the TROPOMI XCH₄ product was continuously improved; e.g., a new bias correction scheme was introduced to correct the dependence of XCH₄ on the brightness of the scene [7]. The dataset was extended to include observations over oceans under glint geometry [8] and the identified XCH₄ anomalies over carbonated rock formations were solved by improving the spectral fit of the surface reflectivity [9]. The TROPOMI XCH₄ dataset is widely used, for example, for the detection of anthropogenic CH₄ emissions and atmospheric modeling [3,4,10].

The TROPOMI XCH₄ data product meets the stringent mission requirements on precision (<1.5%) and accuracy (<1%). Extensive validation efforts using ground-based Fourier transform infrared (FTIR) measurements from the Total Carbon Column Observing Network (TCCON), have confirmed that the errors of the TROPOMI XCH₄ data are in line with these demands [11]. Achieving such precision necessitates rigorous cloud-clearing of the TROPOMI measurements, as clouds in the satellite’s observation path can introduce significant retrieval errors. Although the S5-P satellite itself lacks a dedicated cloud imager, it flies in loose formation with the SUOMI-National Polar-orbiting Partnership (SUOMI-NPP) satellite. This mission synergy allows us to use the Visible Infrared Imaging Radiometer Suite (VIIRS) cloud data for cloud clearing of TROPOMI measurements, since both instruments observe the same ground scene within <3 min [12].

Here, the VIIRS data serve a dual role in ESA’s data processing workflow. Initially, they are employed for pre-filtering TROPOMI data, reducing the volume of data to be processed—a step representing weak data filtering. The more demanding second step involves cloud clearing, where VIIRS data contribute to a posteriori data filtering of the retrieved XCH₄ data, ensuring the high quality of the data product. This study primarily focuses on the second step of cloud filtering because it is more demanding and interesting from a scientific point of view. However, we also applied our method in the pre-filtering step and it is implemented in ESA’s processing framework.

In the event of missing VIIRS data, a backup filter was established for the TROPOMI XCH₄ retrieval, relying solely on TROPOMI measurements. This alternative filter incorporates information from both weak and strong CH₄ absorption lines in the shortwave infrared, which are included in the output of the TROPOMI CO product [12]. A similar approach, for instance, was applied by [13] for the OCO-2 instrument. However, our investigation revealed that the performance of the backup filter is found to be insufficient. It is crucial to highlight that these filtered thresholds were defined preflight and have not undergone testing on real data.

SUOMI-NPP is approaching the end of its life and will be succeeded by the NOAA-20 satellite [14]. Although NOAA-20 carries a cloud imager equivalent to VIIRS, the different orbit positions cause a time gap of more than 20 minutes with TROPOMI, which presents a challenge for critical cloud clearing of TROPOMI data. We found that a malfunction of the VIIRS instrument of a full month 2022 (August) and 7 days in November 2023 clearly demonstrated the dependence of the TROPOMI XCH₄ data quality on the VIIRS data product. The daily distribution of XCH₄ values in regions such as North America, Siberia, and Australia showed a significant low bias due to the inclusion of cloud-contaminated scenes. Apparently, in August, the global XCH₄ distribution was skewed towards lower values, impacting both the mean by 3.9% and the standard deviation by 303% for the three regions while introducing numerous outliers. Hence, this clearly showed that the current backup filter is not sufficient and an alternative cloud clearing of TROPOMI XCH₄ data is needed that is complementary to the SUOMI-NPP measurements.

This study introduces an innovative machine learning approach for cloud-clearing of the TROPOMI XCH₄ data product using a random forest classifier (RFC). Trained on five years of collocated measurements from both TROPOMI and SUOMI-NPP data (about 20,000 orbits), the RFC can replace the VIIRS cloud-clearing process relying solely on TROPOMI data. Therefore, it represents an alternative cloud clearing in the absence of VIIRS data and so can solve data-processing issues due to the expected end-of-life of SUOMI-NPP. Moreover, the RFC approach is an essential step toward a near real-time TROPOMI XCH₄ data product, as it eliminates the need to await the availability of VIIRS data. A near real-time XCH₄ data product is requested for chemical forecasting of the atmosphere, as conducted by the Copernicus Atmosphere Monitoring Service (CAMS) and the Integrated Forecasting System (IFS) developed by European Centre for Medium-Range Weather Forecasts (ECMWF). It focuses on monitoring and forecasting atmospheric composition, including greenhouse gases, aerosols, and reactive gases and already assimilates TROPOMI CO in near-real time [15].

Our study is structured as follows. In Section 2, we discuss the datasets utilized in our research, and Section 3 explains our machine learning approach. In Section 4, we apply the RFC to address the one-month absence of VIIRS data in August 2022. Furthermore, a validation is presented for measurements at 12 TCCON stations. Finally, Section 5 summarizes our findings and draws conclusions based on our research.

2. Data

2.1. SUOMI-NPP VIIRS

The SUOMI-NPP VIIRS cloud data are our reference standard for cloud clearing. SUOMI-NPP is a collaborative satellite mission led by the National Aeronautics and Space Administration (NASA) and the National Oceanic and Atmospheric Administration (NOAA), and was successfully launched in 2011 [16]. Among its suite of instruments, the VIIRS product provides essential data, including cloud-related parameters. VIIRS cloud information is sampled on the TROPOMI footprints [17], providing valuable information on cloud coverage of TROPOMI observations with a minimal time delay of <3 min [12]. Resampling VIIRS data on the SWIR footprints of TROPOMI introduces a processing delay of approximately 1.5 days. This study used the resampled VIIRS data product version 1.0.3. All threshold values for the VIIRS data filtering are sourced from [18].

2.2. TROPOMI CO

To train an RFC algorithm on the relationship between TROPOMI and resampled VIIRS data, we must select TROPOMI data with sufficient cloud sensitivity sampled on the SWIR pixel mesh. First, we use the CH₄ column retrieved from both weak and strong CH₄ absorption in the SWIR band using a non-scattering radiative transfer model [12]. The differences in columns indicate the presence of atmospheric scattering and were suggested by [13] for cloud filtering. Furthermore, we employ the TROPOMI CO product, which is also retrieved from TROPOMI SWIR observations. Here, the SICOR algorithm [19] infers the total column density together with effective cloud parameters. Here, prior knowledge of atmospheric CH₄ concentration is used to derive cloud information. This improves CO coverage with data quality that is well within mission objectives [20,21,22]. The data product includes the column-averaging kernel [23] that describes the vertical sensitivity of CO recovery and thus indicates the presence of clouds. Here, the ground value of the total column-averaging kernel is particularly sensitive to cloud contamination. Under clear-sky conditions, its value approaches about 0.9, gradually decreasing to 0 in the presence of clouds. Note that the cloud sensitivity could be biased by the retrieval approach and errors in the a priori CH₄ information. However, these biases can be corrected by the training of the RFC, and so do not necessarily hamper the accuracy of the trained model. Moreover, it is worth noting that from an implementation perspective, the TROPOMI CO product is well suited, as this retrieval precedes the TROPOMI processing of XCH₄.

In summary, we use the following key TROPOMI parameters: (1) the CH₄ column retrieved from a strong and weak CH₄ absorption feature in the SWIR spectrum; (2) the CO-averaging kernel at the ground level; (3) the latitude of the observed ground scene; (4) the viewing zenith angle of the observation; (5) the retrieved surface albedo at 2334 nm; and (6) ECMWF surface pressure. All the before-mentioned parameters are provided by the TROPOMI CO L2 product version number 2.4.0 and later.

To demonstrate the validity of our RFC cloud-clearing approach, we use the operational TROPOMI XCH₄ dataset derived from the NIR and SWIR measurements of the instrument. The operational algorithm is RemoTeC [12,24]. First, we consider the TROPOMI data for August 2022. During this period, the VIIRS instrument malfunctioned and, therefore, the data quality of the TROPOMI XCH₄ product was significantly compromised. To illustrate the potential of the RFC approach, we apply the new cloud-clearing method to filter the data using the operational TROPOMI XCH₄ data from the latest data reprocessing and corresponding forward stream (version number 2.4.0).

Second, we validated the TROPOMI XCH₄ product against TCCON measurements for the full mission period by comparing cloud clearing based on the resampled VIIRS data and the new RFC machine learning approach. This enables us to verify that there are no seasonal effects or trends introduced into the TROPOMI CH4 data when applying the RFC cloud-clearing method. To this end, we reprocessed the TROPOMI XCH₄ data using the same configuration employed in the operational retrieval, with the exception of excluding any pre-filtering. This poses a significant processing challenge, but it is crucial to demonstrate the validity of our cloud-clearing approach across all potential cloud contamination cases. Furthermore, this validation ensures the independence of our method from VIIRS data, which are also utilized in the pre-filtering stage. To streamline computational efforts, we limited the processing to a region of 300 km around the designated TCCON stations.

2.3. TCCON XCH₄

We used XCH₄ data from TCCON to show the impact of the RFC cloud-clearing model on the validity of the TROPOMI XCH₄ dataset. TCCON comprises a globally distributed network of ground-based stations designed to provide accurate and precise retrievals of the vertically integrated total column of various trace gases. These measurements have gained recognition as a standard for satellite validation and have been utilized to validate satellite data products such as TROPOMI XCH₄ and CO (see, e.g., [11]). In this study, we used data from 12 TCCON stations, and their details are summarized in Table 1. The subset of TCCON stations was chosen to include stations both along the coast and within inland areas. Although a more recent TCCON dataset is available, namely GGG2020, we opt to use the version GGG2014 to maintain comparability with previous studies assessing the quality of the TROPOMI XCH₄ dataset [7,8,25]. Our validation approach is the same as that used in the previously mentioned publications, encompassing the coordination of TCCON XCH₄ measurements with TROPOMI XCH₄ measurements within a radius of 300 km around each station, while allowing a time discrepancy of

\pm 2

h. The collocation process yields daily means from both the TROPOMI and TCCON data, and from the corresponding time series, we compute the mean bias b, defined as the difference between the daily XCH₄ means (TROPOMI-TCCON), along with its corresponding standard deviation

σ

. Furthermore, we calculate the mean station-to-station bias

\bar{b}

, which is the average bias between all stations, and the standard deviation of the station-to-station bias

\bar{σ}

.

3. Methods

3.1. Random Forest Classifier

The concept of the RFC is well established in the field of machine learning [40]. It comprises a set of decision trees n, where each individual tree is trained on a randomly selected subset of training data. This subset includes m input vectors

[x_{1}, x_{2}, \dots, x_{m}]

paired with their corresponding correct classifications

[y_{1}, y_{2}, \dots, y_{m}]

, where

y_{i}

assumes values of 0 or 1 [41]. For this study,

x

consists of the parameters taken from the CO data product and y is the cloud clearing deploying the SUOMI-NPP data as defined above. When classifying the input data, the random forest aggregates the predictions of all n trees and makes a decision based on the majority vote. are known to be more robust against overfitting compared to other machine learning approaches, can handle outliers in the input data, and provide valuable insight into the feature importance [42].For this study, we deployed the implementation of the RFC as provided by the sklearn python library version 1.2.2 [43].

We opted for an RFC configuration with specific parameters (n_estimators = 150, max_depth = 50, max_features = sqrt, min_samples_leaf = 1, min_samples_split = 2). In this configuration, the choice of 150 estimators determines the number of trees in the forest. A maximum depth of 50 is set for each tree, allowing for intricate pattern capturing. The use of max_features = sqrt introduces randomness by considering the square root of the total features for node splitting, fostering diversity among trees and potentially enhancing model generalization. A minimum leaf size of 1 ensures that each leaf in the tree contains at least one sample and a minimum of 2 samples is required to split an internal node. These parameter choices were made based on the hyperparameter tuning that is implemented in the sklearn python library, aiming to achieve a well-balanced and effective RFC model. The hyperparameter tuning process employs a random search across the parameter space to find the best set of parameters that maximizes the accuracy of predictions.

For this study, we implemented an iterative training approach of the RFC to optimize its performance. To ensure proper balance in the training set, we carefully matched the number of clear-sky and cloud-contaminated observations using the collocated VIIRS data. To this end, we first selected all clear-sky retrievals and then randomly chose an equivalent number of cloudy data from the same orbit. Note that the large number of cloudy data allowed us to maintain this approach (e.g., a typical orbit may yield approximately 20% clear-sky retrievals). Following the initial training, we evaluated the RFC’s performance on a test set comprising 1000 randomly selected orbits from a pool of approximately 20,000 available orbits. We identified the orbit with the highest number of false clear-sky predictions by the RFC, which we used subsequently to update the training set. Here, we ensured that the orbits in the training set were excluded in the test set. False clear-sky classifications are the most undesirable, as they diminish the accuracy of our data product. Here, accuracy is defined as the relative number of true clear-sky and false cloudy classifications. On the contrary, false cloudy classifications primarily reduce data coverage, as true clear-sky measurements are omitted.

Our approach to selecting orbits with the most significant false clear-sky classifications aims to mitigate this specific error class. Figure 1 illustrates the relationship between the performance of the RFC prediction and the size of the learning set. In panel A, we observe the accuracy, fraction of false cloudy, and false clear sky classifications, whereas panel B displays the scatter of these characteristics. Initial training using only one orbit proved insufficient, yielding an accuracy below 60%. Moreover, the false cloudy classification rate exceeds 40%, leading to a substantial reduction in data coverage compared to the original cloud filtering based on the resampled VIIRS data.

When the training set is expanded to include more orbits, there is a rapid increase in accuracy as the number of false cloudy classifications decreases. However, this improvement comes at the expense of a slight increase in false clear-sky measurements. The slow increase in false clear-sky classifications is primarily due to our selection criteria to update the learning set. A similar dependency is evident when examining the scatter of predictions in Figure 1. The scatter of accuracy is predominantly influenced by the high variability in false-cloudy classifications, a variability that diminishes with the inclusion of more orbits in the learning set.

Therefore, the final number of orbits for training is a crucial performance parameter. On the one hand, it should effectively mitigate the initially high number of false-cloudy classifications and their associated scatter. On the other hand, the objective is to minimize the occurrence of false clear-sky classifications. As illustrated in Figure 1, this optimal number of orbits tends to fall within the range of 100 to 150 orbits. Throughout this iterative process, we conducted a validation of the corresponding XCH₄ data product with TCCON observations for various number of orbits (see Figure A1 in the Appendix A). The RFC classifier for 110 orbits emerged as our baseline for this study (accuracy = 82%, false clear-sky = 11%, false cloudy = 7%) with its ability to exhibit the lowest bias with TCCON within the specified orbit range between 100 and 150 of −7 ppb.

3.2. Destriping Approach

The new cloud-clearing approach suggested in this study is based on parameters retrieved from the SWIR spectral range by deploying the TROPOMI CO retrieval as discussed in Section 2.2. The data retrieved from the SWIR measurements of TROPOMI show a significant artificial striping pattern in the flight direction of the satellite. This pattern is, e.g., noticeable in the TROPOMI CO, H₂O/HDO, and XCH₄ products [44]. Although the exact cause of this artifact has yet to be determined, it is likely associated with the calibration of the TROPOMI level 0 data.

Hence, all the parameters sourced from the TROPOMI CO retrieval, essential inputs for our cloud-clearing approach, are susceptible to stripe artifacts. This susceptibility necessitates a preliminary destriping step to ensure that these artifacts do not inadvertently influence the cloud mask for the TROPOMI XCH₄ data. The destriping process is applied to several key parameters, including the XCH₄ column retrieved from both strong and weak CH₄ absorption features in the SWIR spectrum, the CO-averaging kernel, and the retrieved surface albedo at 2334 nm. An illustrative example is depicted in Figure 2, showcasing stripes in the retrieved XCH₄, a side parameter derived from the TROPOMI CO retrieval and indicative of strong methane absorption in the shortwave infrared (SWIR).

We apply the following approach, which is robust against outliers and computationally fast. Assume

v (i, j)

to be the parameter that is retrieved from the TROPOMI level 1 product. Here, the index i represents the number of swaths (in flight direction), and j corresponds to the number of ground pixel within that swath (across the flight direction). In the initial step, we estimate a smoothed background

b (i, j)

(see Figure 2B). This is achieved by applying a moving window median smoothing along the swath direction. The width of the window is

w_{1} = 7

pixels. The residual component

r (i, j) = v (i, j) - b (i, j)

contains valuable information on the presence of stripes in the data (shown in Figure 2C). We proceed with a second moving window median smoothing of the residual

r (i, j)

, this time in the flight direction. The width of this second smoothing operation is defined as

w_{2} = 20

pixels. This second smoothing step limits the stripe sensitivity of our method along the track direction and results in the stripe pattern

s (i, j)

that we are going to remove. Finally, the destriped data product is obtained by calculating

d (i, j) = v (i, j) - s (i, j)

as shown in Figure 2D. The destriping approach is very efficient, as it operates on subsets of an orbit, which distinguishes it from earlier approaches using entire orbit data (see, e.g., [21]). Moreover, we implement a nan value masking approach in the TROPOMI data to ensure they do not influence the median smoothing process. Nan values are disregarded unless an entire window is filled with nan values, in which case propagation occurs. Consequently, there is no requirement for a destriping mask in this scenario.

This novel destriping method holds significance not only for enhancing the destriping of input parameters crucial to our new cloud-clearing method but also for its potential application directly on the TROPOMI XCH₄ data in the future. To demonstrate this potential, we applied the destriping technique to a 5-year dataset of TROPOMI XCH₄ data and validated it against the 12 TCCON stations featured in this study. The results revealed that whereas the bias between TROPOMI and TCCON remained unchanged, there was a slight reduction in the standard deviation of the bias by 0.2 ppb. This outcome is particularly promising, as the removal of stripe noise in the data is expected to primarily reduce scatter rather than bias. Future plans include the implementation of destriping for ESA’s TROPOMI XCH₄ and CO products, as well as for our scientific water vapor isotope product [45].

4. Results

The RFC cloud-clearing for TROPOMI XCH₄ as described in Section 3.1 depends on the TROPOMI CO data and its availability. This dependency does not mean any restriction, as the TROPOMI CO data product has significantly larger coverage than TROPOMI XCH₄. CO retrieval is more resilient with respect to cloud contamination and level 1 data quality, so data that are rejected by the CO data processing are also not usable for XCH₄ processing. To evaluate the performance of our new machine learning-based cloud-clearing approach, we reprocessed the TROPOMI XCH₄ data for August 2022 over North America [37.0° ± 17.0°N, 101.5° ± 34.5°W], Siberia [62.5° ± 6.5°N, 110.0° ± 14.0°E], and Australia [27.2° ± 16.5°S, 133.45° ± 20.12°E], a period when VIIRS data were unavailable. As a backup option when VIIRS data are not available, the operational TROPOMI XCH₄ retrieval applied a simple threshold filter to remove cloud-contaminated measurements [12]. Figure 3 illustrates the performance of the new RFC filtering method compared to threshold filtering across. The comparison reveals that threshold filtering introduces significant errors into the dataset, permitting numerous cloud-contaminated measurements to persist across all examined regions. This discrepancy becomes evident when comparing the data distribution for August with that for July and September, when VIIRS data were available.

For August, the global XCH₄ distribution was skewed towards lower values, affecting both the mean value by 3.9% and the standard deviation by 303% for the three regions while introducing numerous outliers. This is mainly due to the shielding of air masses below the cloud. Thus, for this month, the data no longer align with the mission requirements and are flagged to be of low quality. Figure 3B shows the filtered data using our RFC learning approach. For July and September, this dataset maintained the cloud screening based on the resampled VIIRS data. For August, the distribution of XCH₄ for the three regions is in good agreement with the other months. The mean and standard deviations are no longer skewed towards lower values, and even the number of outliers, indicated by the red data points, is reduced when compared to July and September.

The RFC cloud-clearing approach appears to be more restrictive compared to the VIIRS-based method. This is illustrated in Figure 4 for TROPOMI overpasses over the US. In this example, 68% of all ground pixels flagged using VIIRS data are also flagged by the RFC approach (white), with 16% exclusively masked by RFC (red), and 17% exclusively by the VIIRS cloud mask (blue). Here, the RFC cloud clearing identifies similar clear-sky regions as the VIIRS approach but filters out more lower XCH₄ columns. As previously discussed, lower XCH₄ values are indicative of cloud contamination, suggesting that the RFC method may perform even better than VIIRS in this case. An explanation for this difference might be that the RFC approach is less sensitive to challenges in the VIIRS data because it has learned a general correlation between the VIIRS and the TROPOMI CO data. The persistence of cloud-contaminated scenes in the VIIRS-filtered data might be attributed to the 2–3 min time difference between TROPOMI and VIIRS. Depending on the meteorological situation, the time delay can introduce a wrong clear-sky classification of the TROPOMI data using VIIRS.

The quality of the operational TROPOMI XCH₄ data is subject to validation and continuous monitoring using XCH₄ reference data from TCCON network. For each station, we derived XCH₄ time series of daily means for both TCCON and TROPOMI. This is illustrated in Figure 5 showing data for Sodankyla, Finland, and Wollongong, Australia. The time series of all stations are shown in the Appendix (Figure A2 and Figure A3). When comparing the TROPOMI XCH₄ data filtered with the RFC approach (A, C) to the current VIIRS filtering (B,D), we observe highly comparable results. Both approaches exhibit a similar data density, but when applying the new approach, we notice slightly fewer outliers with lower XCH₄ values. This agrees with the discussion above.

Figure 6 shows the validation statistics for all 12 stations. Overall, we see very good agreement in the bias (−7.4 ppb for RFC vs. −5.6 ppb for VIIRS) and in the standard deviation of the bias (11.6 ppb for RFC vs. 12 ppb for VIIRS). The amount of data is about 7% higher for RFC compared to the VIIRS cloud clearing. In Figure 7, we present the correlation between the daily TCCON and TROPOMI XCH₄ means for all stations. The error bars in the figure represent the error of the daily means. The TROPOMI-TCCON correlation is highly comparable using the two cloud-clearing approaches. The Pearson coefficient remains consistent even with the addition of more data through RFC filtering. This observation suggests that the extra data incorporated through the filtering process are indeed valuable. Overall, from the results presented in Figure 6 and Figure 7, we conclude that the data quality around the TCCON sites is very comparable using both cloud-clearing approaches.

5. Discussions

The quality of the TROPOMI XCH₄ data product depends on accurate cloud clearing of spectrometer data, a process facilitated by using the SUOMI-NPP VIIRS cloud product. During the last six years of mission operation, this approach proved to be very effective because both satellites operate in a formation, ensuring that they capture the same ground scene with a minimal delay of 2–3 min. However, the upcoming decommissioning of SUOMI-NPP necessitates the development of an alternative cloud-clearing method for the TROPOMI XCH₄ data product. This need arises due to the significant time delay of about 20 min between TROPOMI and the NOAA-20 satellite, which is the successor of SUOMI-NPP, and the malfunctioning backup cloud filter of the current TROPOMI XCH₄ processor. To maintain the accuracy and reliability of the TROPOMI XCH₄ data, a new cloud-clearing approach had to be established in response to this new situation.

In this study, we introduced a new machine learning approach based on the random forest classifier technique, which replicates VIIRS cloud clearing using only TROPOMI data. The classifier is trained on a subset of 5 years of collocated TROPOMI and SUOMI-NPP VIIRS data (about 20,000 orbits). To this end, we used parameters derived from the TROPOMI CO retrieval, which is inherently sensitive to the presence of clouds and is processed prior to the retrieval of TROPOMI XCH₄ in the mission operational pipeline. This strategic choice simplifies the integration of our approach into the existing processing framework. In addition, we presented an efficient and robust method for destriping TROPOMI data, relying on median smoothing techniques. This approach was validated with ground-based TCCON measurements to show that it is not changing the bias, but slightly improves the standard deviation of the bias. This is a positive result since removing the stripe noise from the data should only improve the scatter and not introduce a bias change. The destriping will be suggested as an update for the operational TROPOMI XCH₄ and CO retrieval in the future.

We demonstrated the performance of the new cloud-clearing approach by filtering three months of TROPOMI data in summer 2022 over North America, Siberia, and Australia. During one of the months, SUOMI-NPP experienced a temporary outage, which resulted in the lack of availability of VIIRS data. Remarkably, our machine learning approach demonstrated comparable performance during this data gap. Additionally, we conducted a TCCON validation covering the entire 5-year duration of the TROPOMI mission for 12 stations. The results highlighted the effectiveness of our RFC cloud-clearing approach, showcasing its ability to match the performance of VIIRS filtering with a very similar bias and about 7% more data. This agreement also shows that the RFC cloud-clearing can even filter more difficult cloud contamination cases like cirrus clouds that would affect the retrieved XCH4 values significantly, as shown, e.g., by [46].

Currently, our cloud-clearing approach is tailored for land-only scenes and our future plans include expanding its application to glint geometries observed over the oceans. Moreover, we will also apply RFC for improved quality filtering of TROPOMI XCH₄ data in the future. When the dependence on SUOMI-NPP data is eliminated, faster processing becomes possible. This is particularly interesting for the scientific data application monitoring CH₄ point sources or assimilating TROPOMI data in near-real-time as conducted by CAMS-IFS for TROPOMI CO. An intriguing avenue for future research would involve applying the RFC cloud-filtering technique to various spectral measurements and greenhouse gas retrievals, as well as different satellite missions.

Author Contributions

T.B., M.C.M.-V., J.L., M.S. and M.t.L. provided the TROPOMI XCH₄ retrieval and data analysis. The TCCON partners provided the validation datasets. T.B. wrote the original draft with input from the authors. All authors have read and agreed to the published version of the manuscript.

Funding

Funding through the TROPOMI national program from the NSO and Methane+ is acknowledged. Darwin and Wollongong TCCON sites are funded by the Australian Research Council (DP140101552, DP160101598, LE0668470) and NASA (NAG5-12247, NNG05-GD07G). Nicholas M. Deutscher is supported by an ARC Future Fellowship (FT180100327). The Edwards (Armstrong) TCCON measurements are supported by NASA’s Earth Science Division.

Data Availability Statement

The TROPOMI XCH₄ dataset is available for download at https://dataspace.copernicus.eu/ (last access: 12 December 2023). TCCON data are available from the TCCON Data Archive: Total Carbon Column Observing Network (TCCON), available at https://tccondata.org/ (last access: 12 December 2023).

Acknowledgments

The presented material contains modified Copernicus data [2017, 2023]. The TROPOMI data processing was carried out on the Dutch national e-infrastructure with the support of the SURF Cooperative. We express gratitude to the TCCON network for providing the valuable validation data showcased in this study.

Conflicts of Interest

The authors declare that there is no conflict of interest regarding the publication of this article.

Appendix A

Figure A1. Validation with TCCON as shown in Figure 6 but conducted for all orbit subsets shown in Figure 1 (x-axis). Figure (A) shows the bias with TCCON, Figure (B) the standard deviation of the bias, Figure (C) the number of collocations used, and (D) the Pearson correlation coefficient.

Figure A2. As Figure 5 but for additional stations.

Figure A3. As Figure 5 but for additional stations.

References

Myhre, G.; Samset, B.H.; Schulz, M.; Balkanski, Y.; Bauer, S.; Berntsen, T.K.; Bian, H.; Bellouin, N.; Chin, M.; Diehl, T.; et al. Radiative forcing of the direct aerosol effect from AeroCom Phase II simulations. Atmos. Chem. Phys. 2013, 13, 1853–1877. [Google Scholar] [CrossRef]
Kirschke, S.; Bousquet, P.; Ciais, P.; Saunois, M.; Canadell, J.G.; Dlugokencky, E.J.; Bergamaschi, P.; Bergmann, D.; Blake, D.R.; Bruhwiler, L.; et al. Three decades of global methane sources and sinks. Nat. Geosci. 2013, 6, 813–823. [Google Scholar] [CrossRef]
Lunt, M.F.; Palmer, P.I.; Feng, L.; Taylor, C.M.; Boesch, H.; Parker, R.J. An increase in methane emissions from tropical Africa between 2010 and 2016 inferred from satellite data. Atmos. Chem. Phys. 2019, 19, 14721–14740. [Google Scholar] [CrossRef]
Maasakkers, J.D.; Varon, D.J.; Elfarsdóttir, A.; McKeever, J.; Jervis, D.; Mahapatra, G.; Pandey, S.; Lorente, A.; Borsdorff, T.; Foorthuis, L.R.; et al. Using satellites to uncover large methane emissions from landfills. Sci. Adv. 2022, 8, eabn9683. [Google Scholar] [CrossRef]
Veefkind, J.; Aben, I.; McMullan, K.; Förster, H.; de Vries, J.; Otter, G.; Claas, J.; Eskes, H.; de Haan, J.; Kleipool, Q.; et al. TROPOMI on the ESA Sentinel-5 Precursor: A GMES mission for global observations of the atmospheric composition for climate, air quality and ozone layer applications. Remote Sens. Environ. 2012, 120, 70–83. [Google Scholar] [CrossRef]
Haili, H.; Landgraf, J.; Detmers, R.; Borsdorff, T.; de Brugh, J.A.; Aben, I.; Butz, A.; Hasekamp, O. Toward Global Mapping of Methane With TROPOMI: First Results and Intersatellite Comparison to GOSAT. Geophys. Res. Lett. 2018, 45, 3682–3689. [Google Scholar] [CrossRef]
Lorente, A.; Borsdorff, T.; Butz, A.; Hasekamp, O.; aan de Brugh, J.; Schneider, A.; Wu, L.; Hase, F.; Kivi, R.; Wunch, D.; et al. Methane retrieved from TROPOMI: Improvement of the data product and validation of the first 2 years of measurements. Atmos. Meas. Tech. 2021, 14, 665–684. [Google Scholar] [CrossRef]
Lorente, A.; Borsdorff, T.; Martinez-Velarte, M.C.; Butz, A.; Hasekamp, O.P.; Wu, L.; Landgraf, J. Evaluation of the methane full-physics retrieval applied to TROPOMI ocean sun glint measurements. Atmos. Meas. Tech. 2022, 15, 6585–6603. [Google Scholar] [CrossRef]
Lorente, A.; Borsdorff, T.; Martinez-Velarte, M.C.; Landgraf, J. Accounting for surface reflectance spectral features in TROPOMI methane retrievals. Atmos. Meas. Tech. 2023, 16, 1597–1608. [Google Scholar] [CrossRef]
Pandey, S.; Gautam, R.; Houweling, S.; van der Gon, H.D.; Sadavarte, P.; Borsdorff, T.; Hasekamp, O.; Landgraf, J.; Tol, P.; van Kempen, T.; et al. Satellite observations reveal extreme methane leakage from a natural gas well blowout. Proc. Natl. Acad. Sci. USA 2019, 116, 26376–26381. [Google Scholar] [CrossRef]
Sha, M.K.; Langerock, B.; Blavier, J.F.L.; Blumenstock, T.; Borsdorff, T.; Buschmann, M.; Dehn, A.; De Mazière, M.; Deutscher, N.M.; Feist, D.G.; et al. Validation of methane and carbon monoxide from Sentinel-5 Precursor using TCCON and NDACC-IRWG stations. Atmos. Meas. Tech. 2021, 14, 6249–6304. [Google Scholar] [CrossRef]
Hu, H.; Hasekamp, O.; Butz, A.; Galli, A.; Landgraf, J.; Aan de Brugh, J.; Borsdorff, T.; Scheepmaker, R.; Aben, I. The operational methane retrieval algorithm for TROPOMI. Atmos. Meas. Tech. 2016, 9, 5423–5440. [Google Scholar] [CrossRef]
Taylor, T.E.; O’Dell, C.W.; Frankenberg, C.; Partain, P.T.; Cronk, H.Q.; Savtchenko, A.; Nelson, R.R.; Rosenthal, E.J.; Chang, A.Y.; Fisher, B.; et al. Orbiting Carbon Observatory-2 (OCO-2) cloud screening algorithms: Validation against collocated MODIS and CALIOP data. Atmos. Meas. Tech. 2016, 9, 973–989. [Google Scholar] [CrossRef]
Cao, C.; Blonski, S.; Wang, W.; Uprety, S.; Shao, X.; Choi, J.; Lynch, E.; Kalluri, S. NOAA-20 VIIRS on-orbit performance, data quality, and operational Cal/Val support. Earth Obs. Mission. Sens. Dev. Implement. Charact. 2018, 10781, 21. [Google Scholar] [CrossRef]
Inness, A.; Aben, I.; Ades, M.; Borsdorff, T.; Flemming, J.; Landgraf, J.; Langerock, B.; Parrington, M.; Ribas, R. Monitoring and assimilation of S5P/TROPOMI carbon monoxide data with the global CAMS near-real time system. Atmos. Chem. Phys. Discuss. 2022, 2022, 1–39. [Google Scholar] [CrossRef]
Jackson, J.M.; Liu, H.; Laszlo, I.; Kondragunta, S.; Remer, L.A.; Huang, J.; Huang, H.C. Suomi-NPP VIIRS aerosol algorithms and data products. J. Geophys. Res. Atmos. 2013, 118, 12,673–12,689. [Google Scholar] [CrossRef]
Siddans, R. S5P-NPP Cloud Processor ATBD; Atbd, RAL, Harwell Campus: Oxfordshire, UK, 2016. [Google Scholar]
Hasekamp, O.; Lorente, A.; Hu, H.; Butz, A.; de Brugh, J.; Landgraf, J. Algorithm Theoretical Baseline Document for Sentinel-5 Precursor Methane Retrieval; Atbd, SRON: Utrecht, The Netherlands, 2016. [Google Scholar]
Landgraf, J.; de Brugh, J.a.; Scheepmaker, R.A.; Borsdorff, T.; Houweling, S.; Hasekamp, O.P. Algorithm Theoretical Baseline Document for Sentinel-5 Precursor: Carbon Monoxide Total Column Retrieval; Atbd, SRON: Utrecht, The Netherlands, 2016. [Google Scholar]
Borsdorff, T.; de Brugh, J.A.; Hu, H.; Aben, I.; Hasekamp, O.; Landgraf, J. Measuring Carbon Monoxide with TROPOMI: First Results and a Comparison With ECMWF-IFS Analysis Data. Geophys. Res. Lett. 2018, 45, 2826–2832. [Google Scholar] [CrossRef]
Borsdorff, T.; aan de Brugh, J.; Hu, H.; Hasekamp, O.; Sussmann, R.; Rettinger, M.; Hase, F.; Gross, J.; Schneider, M.; Garcia, O.; et al. Mapping carbon monoxide pollution from space down to city scales with daily global coverage. Atmos. Meas. Tech. Discuss. 2018, 2018, 1–19. [Google Scholar] [CrossRef]
Borsdorff, T.; García Reynoso, A.; Maldonado, G.; Mar-Morales, B.; Stremme, W.; Grutter, M.; Landgraf, J. Monitoring CO emissions of the metropolis Mexico City using TROPOMI CO observations. Atmos. Chem. Phys. 2020, 20, 15761–15774. [Google Scholar] [CrossRef]
Borsdorff, T.; Hasekamp, O.P.; Wassmann, A.; Landgraf, J. Insights into Tikhonov regularization: Application to trace gas column retrieval and the efficient calculation of total column averaging kernels. Atmos. Meas. Tech. 2014, 7, 523–535. [Google Scholar] [CrossRef]
Butz, A.; Guerlet, S.; Hasekamp, O.; Schepers, D.; Galli, A.; Aben, I.; Frankenberg, C.; Hartmann, J.M.; Tran, H.; Kuze, A.; et al. Toward accurate CO2 and CH4 observations from GOSAT. Geophys. Res. Lett. 2011, 38. [Google Scholar] [CrossRef]
Lorente, A.; Boersma, K.F.; Eskes, H.J.; Veefkind, J.P.; van Geffen, J.H.G.M.; de Zeeuw, M.B.; Denier van der Gon, H.A.C.; Beirle, S.; Krol, M.C. Quantification of nitrogen oxides emissions from build-up of pollution over Paris with TROPOMI. Sci. Rep. 2019, 9, 20033. [Google Scholar] [CrossRef] [PubMed]
Kivi, R.; Heikkinen, P.; Kyrö, E. TCCON Data from Sodankylä (FI), Release GGG2014.R0; Finnish Meteorological Institute: Helsinki, Finland, 2014. [Google Scholar] [CrossRef]
Kivi, R.; Heikkinen, P. Fourier transform spectrometer measurements of column CO₂ at Sodankylä, Finland. Geosci. Instrum. Methods Data Syst. 2016, 5, 271–279. [Google Scholar] [CrossRef]
Wunch, D.; Mendonca, J.; Colebatch, O.; Allen, N.T.; Blavier, J.F.; Roche, S.; Hedelius, J.; Neufeld, G.; Springett, S.; Worthy, D.; et al. TCCON Data from East Trout Lake, SK (CA), Release GGG2014.R1; Canada Foundation for Innovation: Ottawa, ON, Canada, 2018. [Google Scholar] [CrossRef]
Hase, F.; Blumenstock, T.; Dohe, S.; Groß, J.; Kiel, M. TCCON Data from Karlsruhe (DE), Release GGG2014.R0; Helmholtz Association of German Research Centres: Bonn, Germany, 2014. [Google Scholar] [CrossRef]
Warneke, T.; Messerschmidt, J.; Notholt, J.; Weinzierl, C.; Deutscher, N.M.; Petri, C.; Grupe, P. TCCON Data from Orléans (FR), Release GGG2014.R0; European Union: Brussels, Belgium, 2014. [Google Scholar] [CrossRef]
Wennberg, P.O.; Roehl, C.M.; Wunch, D.; Toon, G.C.; Blavier, J.F.; Washenfelder, R.; Keppel-Aleks, G.; Allen, N.T.; Ayers, J. TCCON Data from Park Falls (US), Release GGG2014.R1; National Aeronautics and Space Administration: Washington, DC, USA, 2017. [Google Scholar] [CrossRef]
Wennberg, P.O.; Wunch, D.; Roehl, C.M.; Blavier, J.F.; Toon, G.C.; Allen, N.T. TCCON Data from Lamont (US), Release GGG2014.R1; National Aeronautics and Space Administration: Washington, DC, USA, 2016. [Google Scholar] [CrossRef]
Wennberg, P.O.; Wunch, D.; Roehl, C.M.; Blavier, J.F.; Toon, G.C.; Allen, N.T. TCCON Data from Caltech (US), Release GGG2014.R0; National Aeronautics and Space Administration: Washington, DC, USA, 2016. [Google Scholar] [CrossRef]
Iraci, L.T.; Podolske, J.R.; Hillyard, P.W.; Roehl, C.; Wennberg, P.O.; Blavier, J.F.; Landeros, J.; Allen, N.; Wunch, D.; Zavaleta, J.; et al. TCCON Data from Edwards (US), Release GGG2014.R1; National Aeronautics and Space Administration: Washington, DC, USA, 2016. [Google Scholar] [CrossRef]
Kawakami, S.; Ohyama, H.; Arai, K.; Okumura, H.; Taura, C.; Fukamachi, T.; Sakashita, M. TCCON Data from Saga (JP), Release GGG2014.R0. 2014. Available online: https://data.caltech.edu/records/n2823-2yt07 (accessed on 20 March 2024).
Griffith, D.W.; Deutscher, N.M.; Velazco, V.A.; Wennberg, P.O.; Yavin, Y.; Keppel-Aleks, G.; Washenfelder, R.A.; Toon, G.C.; Blavier, J.F.; Paton-Walsh, C.; et al. TCCON Data from Darwin (AU), Release GGG2014.R0; National Aeronautics and Space Administration: Washington, DC, USA, 2014. [Google Scholar] [CrossRef]
Griffith, D.W.; Velazco, V.A.; Deutscher, N.M.; Paton-Walsh, C.; Jones, N.B.; Wilson, S.R.; Macatangay, R.C.; Kettlewell, G.C.; Buchholz, R.R.; Riggenbach, M.O. TCCON Data from Wollongong (AU), Release GGG2014.R0; National Aeronautics and Space Administration: Washington, DC, USA, 2014. [Google Scholar] [CrossRef]
Pollard, D.F.; Robinson, J.; Shiona, H. TCCON Data from Lauder (NZ), Release GGG2014.R0; National Institute of Water and Atmospheric Research: Auckland, New Zealand, 2019. [Google Scholar] [CrossRef]
Sherlock, V.; Connor, B.; Robinson, J.; Shiona, H.; Smale, D.; Pollard, D.F. TCCON Data from Lauder (NZ), 125HR, Release GGG2014.R0; National Institute of Water and Atmospheric Research: Auckland, New Zealand, 2014. [Google Scholar] [CrossRef]
Schneising, O.; Buchwitz, M.; Reuter, M.; Bovensmann, H.; Burrows, J.P.; Borsdorff, T.; Deutscher, N.M.; Feist, D.G.; Griffith, D.W.T.; Hase, F.; et al. A scientific algorithm to simultaneously retrieve carbon monoxide and methane from TROPOMI onboard Sentinel-5 Precursor. Atmos. Meas. Tech. 2019, 12, 6771–6802. [Google Scholar] [CrossRef]
Ho, T.K. The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. 1998, 20, 832–844. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 40, 5–32. [Google Scholar] [CrossRef]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Borsdorff, T.; aan de Brugh, J.; Schneider, A.; Lorente, A.; Birk, M.; Wagner, G.; Kivi, R.; Hase, F.; Feist, D.G.; Sussmann, R.; et al. Improving the TROPOMI CO data product: Update of the spectroscopic database and destriping of single orbits. Atmos. Meas. Tech. 2019, 12, 5443–5455. [Google Scholar] [CrossRef]
Schneider, A.; Borsdorff, T.; aan de Brugh, J.; Lorente, A.; Aemisegger, F.; Noone, D.; Henze, D.; Kivi, R.; Landgraf, J. Retrieving H₂O/HDO columns over cloudy and clear-sky scenes from the Tropospheric Monitoring Instrument (TROPOMI). Atmos. Meas. Tech. 2022, 15, 2251–2275. [Google Scholar] [CrossRef]
Butz, A.; Galli, A.; Hasekamp, O.; Landgraf, J.; Tol, P.; Aben, I. TROPOMI aboard Sentinel-5 Precursor: Prospective performance of CH4 retrievals for aerosol and cirrus loaded atmospheres. Remote Sens. Environ. 2012, 120, 267–276. [Google Scholar] [CrossRef]

Figure 1. Evaluation of the iterative training performance of the random forest classifier (RFC). (Top panel) (A) relative number

μ

of true clear-sky and true-cloudy (red), false cloudy (yellow), and false clear-sky (blue) classifications as a function of the number of orbits used for training the RFC. These metrics are derived from the analysis of 1000 randomly selected orbits in each iteration. (Bottom panel) (B), corresponding prediction scatter

σ

percentage is provided.

Figure 1. Evaluation of the iterative training performance of the random forest classifier (RFC). (Top panel) (A) relative number

μ

of true clear-sky and true-cloudy (red), false cloudy (yellow), and false clear-sky (blue) classifications as a function of the number of orbits used for training the RFC. These metrics are derived from the analysis of 1000 randomly selected orbits in each iteration. (Bottom panel) (B), corresponding prediction scatter

σ

percentage is provided.

Figure 2. Destriping method based on moving median smoothing. (A) Original TROPOMI XCH₄ data retrieved from the strong methane absorption by the TROPOMI CO retrieval (lat 37.42°N, lon 61.45°E) with evident striping and a pollution plume in the center, likely due to a pipeline leak. (B) Background derived by smoothing (A) in the across-track direction. (C) Striping pattern obtained by smoothing the difference between A and B in the flight direction. (D) Final destriped TROPOMI XCH₄ data with the pollution plume well preserved.

Figure 3. Time series of TROPOMI XCH₄ for the months July, August, and September in 2022 over Siberia, North America, and Australia. Daily histograms are shown, with the mean as a black strip, standard deviation as a box, and whiskers depicting the 2.5th and 97.5th percentiles. Outliers are highlighted in red. July and September data are filtered using VIIRS data as reference across all subplots. However, August experiences degraded data quality in TROPOMI XCH₄ due to missing VIIRS cloud data for screening (panels A,C,E). Deploying the RFC filtering approach significantly improves data quality for August across all regions (panels B,D,F).

Figure 4. TROPOMI XCH₄ retrieval for orbit 22706 over North America on 1 March 2020. (A) Cloud clearing utilizing the RFC methodology as outlined in this study. (B) Cloud clearing utilizing the resampled VIIRS cloud product. (C) Evaluation of cloud masks derived from RFC and VIIRS (red indicates ground pixels solely flagged by RFC, blue represents those exclusively identified by VIIRS, and white are flagged by both methods).

Figure 5. Validation of TROPOMI XCH₄ against ground-based TCCON measurements at Sodankyla, Finland and Wollongong, Australia. Daily means are displayed (blue for TCCON, red for TROPOMI), with whiskers indicating the daily measurement standard deviation. (A,C) shows TROPOMI XCH₄ using the RFC cloud-clearing approach for two TCCON stations. (B,D) same but using the collocated VIIRS data for cloud clearing.

Figure 6. Validation statistics for TROPOMI XCH₄ daily means at 12 TCCON stations (2017 to 2022). (A) RFC cloud-clearing method and (B) employing VIIRS data. The top panel shows the mean bias (TROPOMI-TCCON) for each TCCON station, calculated from the daily means (e.g., as shown in Figure 5) and the lower panel gives the standard deviation of the mean bias.

\bar{b}

represents the average bias across all stations, and

σ (\bar{b})

is the corresponding standard deviation.

\bar{σ}

denotes the average standard deviation across all station standard deviations, and

σ (\bar{σ})

is the standard deviation of the mean standard deviations.

Figure 6. Validation statistics for TROPOMI XCH₄ daily means at 12 TCCON stations (2017 to 2022). (A) RFC cloud-clearing method and (B) employing VIIRS data. The top panel shows the mean bias (TROPOMI-TCCON) for each TCCON station, calculated from the daily means (e.g., as shown in Figure 5) and the lower panel gives the standard deviation of the mean bias.

\bar{b}

represents the average bias across all stations, and

σ (\bar{b})

is the corresponding standard deviation.

\bar{σ}

denotes the average standard deviation across all station standard deviations, and

σ (\bar{σ})

is the standard deviation of the mean standard deviations.

Figure 7. Correlation between collocated TCCON and TROPOMI XCH₄ daily means (2017–2022). (A) Cloud clearing based on the RFC approach of this study and (B) using VIIRS data. The error bars represent the standard deviation of individual measurements within a day. Data for the differnet TCCON stations are color-coded.

Table 1. Details and references for the TCCON sites used in this study.

Site (Country)	Coordinates (lat., lon.; °)	Altitude (m.a.s.l)	Reference
Sodankyla (Finland)	67.37, 26.63	190	[26,27]
East Trout Lake (Canada)	54.36, $- 104.99$	500	[28]
Karlsruhe (Germany)	49.1, 8.44	110	[29]
Orleans (France)	47.97, 2.11	130	[30]
Park Falls (USA)	45.94, $- 90.27$	440	[31]
Lamont (USA)	36.6, $- 97.49$	320	[32]
Pasadena (USA)	34.14, $- 118.13$	240	[33]
Edwards (USA)	34.95, $- 117.88$	30	[34]
Saga (Japan)	33.24, 130.29	10	[35]
Darwin (Australia)	$- 12.46$ , 130.93	30	[36]
Wollongong (Australia)	$- 34.41$ , 150.88	30	[37]
Lauder (New Zealand)	$- 45.04$ , 169.68	370	[38,39]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Borsdorff, T.; Martinez-Velarte, M.C.; Sneep, M.; ter Linden, M.; Landgraf, J. Random Forest Classifier for Cloud Clearing of the Operational TROPOMI XCH₄ Product. Remote Sens. 2024, 16, 1208. https://doi.org/10.3390/rs16071208

AMA Style

Borsdorff T, Martinez-Velarte MC, Sneep M, ter Linden M, Landgraf J. Random Forest Classifier for Cloud Clearing of the Operational TROPOMI XCH₄ Product. Remote Sensing. 2024; 16(7):1208. https://doi.org/10.3390/rs16071208

Chicago/Turabian Style

Borsdorff, Tobias, Mari C. Martinez-Velarte, Maarten Sneep, Mark ter Linden, and Jochen Landgraf. 2024. "Random Forest Classifier for Cloud Clearing of the Operational TROPOMI XCH₄ Product" Remote Sensing 16, no. 7: 1208. https://doi.org/10.3390/rs16071208

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Random Forest Classifier for Cloud Clearing of the Operational TROPOMI XCH₄ Product

Abstract

1. Introduction