Prediction of Water Carbon Fluxes and Emission Causes in Rice Paddies Using Two Tree-Based Ensemble Algorithms

Gu, Xinqin; Yao, Li; Wu, Lifeng

doi:10.3390/su151612333

Open AccessArticle

Prediction of Water Carbon Fluxes and Emission Causes in Rice Paddies Using Two Tree-Based Ensemble Algorithms

by

Xinqin Gu

^1,2,

Li Yao

^1,2,*

and

Lifeng Wu

^1,2,3

¹

School of Hydraulic and Ecological Engineering, Nanchang Institute of Technology, Nanchang 330099, China

²

Jiangxi Provincial Technology Innovation Center for Ecological Water Engineering in Poyang Lake Basin, Shangrao 334100, China

³

State Key Laboratory of Simulation and Regulation of Water Cycle in River Basin, China Institute of Water Resources and Hydropower Research, Beijing 100038, China

^*

Author to whom correspondence should be addressed.

Sustainability 2023, 15(16), 12333; https://doi.org/10.3390/su151612333

Submission received: 8 July 2023 / Revised: 30 July 2023 / Accepted: 3 August 2023 / Published: 13 August 2023

Download

Browse Figures

Versions Notes

Abstract

:

Quantification of water carbon fluxes in rice paddies and analysis of their causes are essential for agricultural water management and carbon budgets. In this regard, two tree-based machine learning models, which are extreme gradient boosting (XGBoost) and random forest (RF), were constructed to predict evapotranspiration (ET), net ecosystem carbon exchange (NEE), and methane flux (FCH₄) in seven rice paddy sites. During the training process, the k-fold cross-validation algorithm by splitting the available data into multiple subsets or folds to avoid overfitting, and the XGBoost model was used to assess the importance of input factors. When predicting ET, the XGBoost model outperformed the RF model at all sites. Solar radiation was the most important input to ET predictions. Except for the KR-CRK site, the prediction for NEE was that the XGBoost models also performed better in the other six sites, and the root mean square error decreased by 0.90–11.21% compared to the RF models. Among all sites (except for the absence of net radiation (NETRAD) data at the JP-Mse site), NETRAD and normalized difference vegetation index (NDVI) performed well for predicting NEE. Air temperature, soil water content (SWC), and longwave radiation were particularly important at individual sites. Similarly, the XGBoost model was more capable of predicting FCH₄ than the RF model, except for the IT-Cas site. FCH₄ sensitivity to input factors varied from site to site. SWC, ecosystem respiration, NDVI, and soil temperature were important for FCH₄ prediction. It is proposed to use the XGBoost model to model water carbon fluxes in rice paddies.

Keywords:

evapotranspiration; net ecosystem carbon exchange; methane flux; extreme gradient boosting (XGBoost); random forest (RF)

1. Introduction

Rice is a major staple crop that feeds over 50% of the world’s population. Planting rice requires many freshwater resources, coupled with increased demand from other competitive sectors and expanding sectors, which means better agricultural water management for rice. Evapotranspiration (ET) is the main component of water consumption in rice paddies. Therefore, accurate estimation of ET is essential for rice water management, which helps formulate and evaluate water-saving strategies and enhances the understanding of the rice water cycle [1,2,3]. In addition to the rice paddy water cycle, the carbon cycle is significant for maintaining the health and sustainable development of the rice paddy ecosystem [4]. Continuous measurement of net ecosystem CO₂ exchange (NEE) in rice paddy helps determine the source and sink status of the rice paddy ecosystem and analyze the temporal variation of carbon exchange [5]. Methane (CH₄) is the second largest radiatively forced greenhouse gas after CO₂ [6]. Rice paddies will cause an anaerobic soil environment under long-term flooding conditions. Organic matter, such as primary carbon and plant residues in the soil, is gradually decomposed into soluble organic matter utilized by methanogens. Methane is produced by acetic acid fermentation or hydrogen/carbon dioxide reduction. Thus, rice paddies are considered to be one of the main CH₄ sources of the atmosphere [7,8,9,10]. In order to better understand the water and carbon cycle in rice paddies, it is important to conduct field studies to measure these water and carbon fluxes.

The eddy covariance technique [11] has been extensively used for the continuous measurements of evapotranspiration (ET) [12,13,14], net ecosystem carbon exchange (NEE) [15], and methane flux (FCH₄) [16,17]. This technique can provide nearly continuous measurements on the ecosystem scale without interfering with the gas exchange process between the terrestrial ecosystem and the atmosphere, thus becoming a means of measuring trace gas exchange. However, it has limitations that come from the high costs of installation and maintenance. Building more sites to observe these water and carbon fluxes is very complicated, laborious, and not sustainable. On the one hand, these instruments need calibration and servicing and are inconvenient to maintain in the field. On the other hand, the life and performance of the instrument can also be affected by some unstable factors, such as weather conditions, environmental factors, and interference from animals and insects. These aspects affect the sustainability of the measurement instruments. To address these issues and make measuring ET, NEE, and FCH₄ sustainable, other ways are needed to quantify these water-carbon fluxes. Alternative machine learning algorithms for predicting these water and carbon fluxes are support vector machine (SVM) [18,19,20,21] and artificial neural network (ANN) [22,23,24]. Apart from accurately predicting these water and carbon fluxes, it is also necessary to understand the environmental driving factors of water and carbon fluxes in rice paddies. ET, NEE, and FCH₄ nonlinearly rely on multiple driving factors, such as air temperature, soil temperature, soil water content, air pressure, radiation, etc. Understanding their key factors can also provide information for subsequent lifting algorithms.

In this study, we used the XGBoost and RF models to analyze seven rice paddy sites. The objective of this study is (1) to compare the predictive performance of ET, NEE, and FCH₄ for the two models; (2) to analyze the importance of input factors of ET, NEE, and FCH₄; and (3) to provide further information for subsequent water carbon flux predictions in rice paddies.

2. Materials and Methods

2.1. Site Data

Seven rice paddy sites were selected from Version 1 of the FLUXNET-CH₄ database (Table 1, Figure 1) [17]. In this study, two tree-based machine learning models (XGBoost and RF) were used to predict ET, NEE, and FCH₄ in rice paddies. The input factors are shown in Table 2, Table 3 and Table 4 respectively. When soil water content (SWC) and soil temperature (TS) have more than one observation depth, the mean value of SWC is taken into the model, and TS selects the depth with the highest statistical correlation with the predictors. The most used input factors in predicting FCH₄ include air temperature (TA), incoming shortwave radiation (SW_IN), outgoing longwave radiation (LW_OUT), vapor pressure deficit (VPD), atmospheric pressure (PA), wind speed (WS), wind direction (WD), SWC, friction velocity (USTAR), net radiation (NETRAD), ecosystem respiration (RECO), sensible heat turbulent flux (H), gross primary productivity (GPP), soil heat flux (G), TS, normalized difference vegetation index (NDVI), latent heat turbulent flux (LE), NEE, and the temperature difference between the previous day and the current day (DeltaTA).

2.2. Extreme Gradient Boosting (XGBoost)

The XGBoost model is a machine-learning algorithm implemented in a gradient-boosting framework. The integration algorithm summarizes the modeling results of the sum of all the weak learners (classification and regression tree, CART). The XGBoost model adopts the training method of continuous accumulation of multiple weak learners to optimize the objective function; that is, the XGBoost model builds one CART at a time, and the newly established CART score will be accumulated with all previous CART scores. The t-th objective (O^(t)) function of the model can be expressed as follows:

O^{(t)} = \sum_{i = 1}^{n} l (y_{i}, {\hat{y}}_{i}^{t - 1} + f_{t} (x_{i})) + Ω (f_{t}) + C

(1)

where l is the loss term of the t-th CART, C is a constant term, and Ω (f_t) is the regularization term of the model, defined as:

Ω (f_{t}) = γ T_{t} + λ \frac{1}{2} \sum_{j = 1}^{T} w_{j}^{2}

(2)

where γ and λ are customization parameters. In general, the larger these two values are, the simpler the tree structure will be. Then, the problems of over-fitting may be effectively solved. Taking a second-order Taylor expansion with Equation (1), it can be written as follows:

O^{(t)} = \sum_{i = 1}^{n} l (y_{i}, {\hat{y}}_{i}^{t - 1} + g_{i} f_{t} (x_{i}) + \frac{1}{2} h_{i} f_{t}^{2} (x_{i})) + Ω (f_{t}) + C

(3)

where g is the first derivative, and h is the second derivative. They can be described as:

g_{i} = \partial_{{\hat{y}}_{i}^{t - 1}} \cdot l (y_{i}, {\hat{y}}_{i}^{t - 1})

(4)

h_{i} = \partial_{{\hat{y}}_{i}^{t - 1}}^{2} \cdot l (y_{i}, {\hat{y}}_{i}^{t - 1})

(5)

Substituting (2), (4), and (5) into (3) and taking the derivative, the solutions can then be obtained from (6) and (7):

w_{j}^{*} = - \frac{\sum g_{i}}{h_{i} + λ}

(6)

O^{*} = - \frac{1}{2} \sum_{j = 1}^{T} \frac{{(\sum g_{i})}^{2}}{h_{i} + λ} + γ T

(7)

The XGBoost model has the advantages of high computational efficiency and high accuracy. More details about the XGBoost model can be found in Chen et al. [25].

2.3. Random Forest (RF)

RF is an ensemble algorithm proposed by Breiman [26]. RF consists of multiple classification and regression trees (CART), and the average of these tree prediction results is the final prediction result. RF model has various advantages: (1) it can handle very high-dimensional data (that is, data with numerous features) and need no feature selection; (2) it can detect the interaction and importance of input factors; (3) it can be easily parallelized because trees are independent of each other during training; and, (4) it has high accuracy and can obtain good accuracy for the missing data problem.

In this study, a k-fold testing approach was applied to assess the performances of XGBoost and RF models. The complete data set is divided into 5 parts, with 4 parts for training and 1 part for testing. Further, the importance of the input factors of each predictor is calculated by the XGBoost model. The gain represents the fractional contribution of each factor to the model based on the total gain of this factor’s splits. A higher percentage means a more important predictive feature.

2.4. Statistical Evaluation

The statistical indicators used in this study include root mean square error (RMSE), coefficient of determination (R²), mean absolute error (MAE), mean bias error (MBE), and global relative indicator (GRI), which are expressed as:

R M S E = \sqrt{\frac{\sum_{i = 1}^{n} {(Y_{i, o} - Y_{i, p})}^{2}}{n}}

(8)

R^{2} = \frac{{[\sum_{i = 1}^{n} (Y_{i, o} - {\bar{Y}}_{o}) (Y_{i, p} - {\bar{Y}}_{p})]}^{2}}{\sum_{i = 1}^{n} {(Y_{i, o} - {\bar{Y}}_{o})}^{2} \sum_{i = 1}^{n} {(Y_{i, p} - {\bar{Y}}_{p})}^{2}}

(9)

M A E = \frac{1}{n} \sum_{i = 1}^{n} |Y_{i, o} - Y_{i, p}|

(10)

M B E = \frac{1}{n} \sum_{i = 1}^{n} (Y_{i, o} - Y_{i, p})

(11)

G R I = \sum_{j = 1}^{4} α_{j} \frac{(|Y_{i, j}| - |g_{j, m i n}|)}{(|g_{j, m a x}| - |g_{j, m i n}|)}

(12)

where

Y_{i, o}

,

Y_{i, p}

,

{\bar{Y}}_{o}

, and

{\bar{Y}}_{p}

are the observed and predicted factors, and the mean values of the observed and predicted factors, respectively; n is the number of observations;

α_{j}

is a coefficient, which equals to 1 for RMSE, MAE, and MBE, and −1 for R²;

g_{j, m a x}

and

g_{j, m i n}

represents the maximum and minimum of j, respectively;

Y_{i, j}

is the scaled value of j. The models with a lower GRI value indicated higher accuracy.

3. Results

3.1. Models Performance and Driving Factors of ET Prediction

The statistical result of the two machine learning models for predicting half-hour ET at different sites is provided in Table 5. The GRI of the XGBoost models was smaller than that of the RF models. In addition, the XGBoost models showed higher R² and lower RMSE and MAE as compared to the RF models, with RMSE ranging 0.0187~0.0314 mm hr⁻¹, R² ranging 0.8671~0.9377, and MAE ranging 0.0110~0.0169 mm hr⁻¹ for XGBoost, RMSE ranging 0.0207~0.0325 mm hr⁻¹, R² ranging 0.8583~0.9240, and MAE ranging 0.0121~0.0198 mm hr⁻¹ for RF, respectively. At the same time, the MBEs of the two models were almost close to zero.

The scatter plots of ET observed values and the values predicted by the two machine learning models are shown in Figure 2. Figure 2 shows that many points in the two models were located above and below the 1:1 line. The correlation between the observed and predicted values of the XGBoost models was slightly better than those of the RF models. The R² values of the XGBoost and RF model reached the maximum at IT-Cas sites, which were 0.9338 and 0.9240, respectively.

It is of great significance to analyze the sensitivity of rice paddy ET to input factors for further understanding the impact of global climate change on rice paddy ET. The XGBoost model has the ability to evaluate the importance of predictors. Figure 3 shows the importance of different factors at different sites to ET through the XGBoost model (a higher value of gain implies greater importance). It can be seen from Figure 3 that SW is the most important factor for ET prediction at each site; the mean gain value is 0.6872. It is quite normal, after all, that solar radiation provides the energy needed for evapotranspiration. Except for SW, the gain values of other factors were less than 0.3. After averaging the gain of each site factor, the order was SW (0.6872), VPD (0.0989), TA (0.0611), NDVI (0.0470), SWC (0.0402), WS (0.0385), PA (0.0177), and DeltaTA (0.0150).

3.2. Models Performance and Driving Factors of NEE Prediction

The statistical result of the two machine learning models for predicting half-hour NEE at different sites is provided in Table 6. As seen in the table, except for KR-CRK and US-HRC sites, the XGBoost models showed higher R² and lower RMSE and MAE as compared to the RF models. Only at the KR-CRK site was the GRI value of XGBoost larger than that of RF. Figure 4 shows the scatter plots of the observed and predicted NEE values based on the XGBoost and RF models. Both models made good predictions at most points, with R² ranging from 0.7525~0.9668 for XGBoost, and 0.7548~0.9581 for RF. When the predicted value was small, there was an overestimation, and when the predicted value was large, there was an underestimation.

Figure 5 shows the importance of different factors at different sites to NEE through the XGBoost model (a higher value of gain implies greater importance). It can be seen from Figure 4 that except for the JP-Mse site NETRAD data missing, the gain of NETRAD at other sites exceeded 0.2, and the contribution of NETRAD to the prediction of NEE ranked top. After averaging the gain of these sites, the gain of NETRAD was 0.38, ranking first. Followed by NDVI (0.26), LW (0.15), and SWC (0.12), the rest of the factor gain values were less than 0.10. TA was important at only one site (IT-Cas).

3.3. Models Performance and Driving Factors of FCH₄ Prediction

The statistical result of the two machine learning models for predicting half-hour FCH₄ at different sites is provided in Table 7. Comparing the GRI values, the XGBoost models had better performance than RF models except for the IT-Cas and US-HRA sites. In the US-HRA site, the XGBoost model had lower RMSE and higher R², but MAE and MBE were further away from 0 compared to the RF model. However, the RF model showed higher R² and lower RMSE and MAE as compared to the XGBoost model in the IT-Cas site; only MBE performance was not good.

Figure 6 shows the scatter plots of observed and predicted FCH₄ values based on XGBoost and RF models. The accuracy of both models was within the acceptable range, and the minimum R² value was 0.6421. As seen in the figure, there were many sites with relatively serious underestimation, such as when the FCH₄ observed at the US-Twt site exceeded 200 nmol m⁻² s⁻¹; the underestimation was obvious. However, there were still many points distributed on both sides of the 1:1 line.

Figure 7 showed the importance of different factors at different sites to FCH₄ through the XGBoost model. The most important factors of the XGBoost model across all seven sites were SWC, NDVI, RECO, and TS; the average gain value exceeds 0.10. The gain of SWC exceeds 0.76 at two sites (US-HRA and US-HRC). It made a great contribution to the prediction of FCH₄ at these sites. LW, NEE, and PA were useful for some, but not all, sites.

4. Discussion

4.1. Analysis of Influencing Factors of Evapotranspiration in Rice Paddies

Evapotranspiration (ET) plays a crucial role in water resource management in rice paddies. With the growth of data, computing power, and storage capacity, the utilization of machine learning models for predicting ET has gradually increased [27,28]. Agrawal et al. [29] found that the XGBoost model significantly enhances the performance in predicting Penman–Monteith reference evapotranspiration (ET₀) compared with the RF model. Ge et al. [30] used the same models, XGBoost and RF, as this study did, based on three years of experimental data to predict crop evapotranspiration. The results revealed that XGBoost outperformed RF in predicting crop evapotranspiration, consistent with the findings of this study regarding XGBoost’s prediction of rice evapotranspiration. In addition, Ge et al. revealed the order of importance of input factors obtained from the XGBoost model. Similarly, this study used the XGBoost model to rank the importance of the input factors in predicting rice ET.

Previously, Liu et al. [31] indicated that radiation was the dominant factor for rice ET through multiple stepwise regression analyses. Ahmadi et al. [32] found that solar radiation was the most critical factor in California and all its climatic zones. They used eight factors (TA, SW, VPD, PA, WS, NDVI, SWC, and DeltaTA) as factors for rice ET modeling, except for the absence of SWC at the US-Twt site. From the results of the gain value obtained from the XGBoost model, radiation showed the most significant effect on ET. The decisive reason why these studies found that radiation plays such a large role in predicting evapotranspiration is that radiation provides the energy required for rice evapotranspiration [33].

In addition, Zhang et al. [34] also found that the influence of radiation was excellent; the second and third factors were VPD and TA. They found that VPD was significantly better than TA at JP-MSE and US-Twt sites, consistent with the previously mentioned studies, and while at IT-Cas sites, TA had better feature importance than VPD. The former may be due to the influence of rice during the growth process by rough irrigation and water supply. When the leaf surface is humid and the ambient temperature is high, a higher VPD usually occurs. In this case, the rice may need to release water more quickly to adapt to environmental conditions, resulting in larger evapotranspiration [35]. The latter is that the ET of the IT-CAS site was more sensitive to temperature than VPD, which may be due to the fact that the site has a lower temperature, resulting in a relatively small VPD. However, the significant change in temperature makes TA have a greater effect on rice evapotranspiration.

Except for SW, TA, and VPD, other factors were less important in predicting evapotranspiration in rice paddies. In some water-scarce areas, soil moisture was the dominant factor of evapotranspiration [36], while this paper studies the evapotranspiration of rice paddies, which maintains a certain height of water layer for a long time through irrigation during the planting process, so soil moisture performance was poor in this study.

4.2. Analysis of Influencing Factors of Net Ecosystem Carbon Exchange in Rice Paddies

The net ecosystem carbon exchange (NEE) is a critical parameter for quantifying the rice ecosystem and its contributions to climate change. Liu et al. [37] indicated that both the XGBoost and RF models were applicable machine learning algorithms for predicting ecosystem NEE. However, the XGBoost model had higher computational efficiency than the RF model. Among environmental input factors, NETRAD, SWC, and TS were the most important factors, while precipitation and WS were less important in predicting NEE. It was different with this study.

The feature importance of the seven sites is shown in Figure 5, and it can be seen that the input factor importance ranking is different among sites. However, both NETRAD and NDVI were in the top three for all sites (except for the absence of net radiation factor at the JP-MSE site). It showed that NETRAD and NDVI were important factors in predicting NEE in rice paddies. Safa et al. [23] indicated that the most effective inputs on the NEE were NETRAD and LAI for irrigated sites through sensitivity analysis, similar to our findings. The reason for this phenomenon is that NETRAD and NDVI, respectively, reflect the input of solar radiation energy and the degree of vegetation coverage.

Zhou et al. [38] indicated that SWC was one of the most important factors based on the RF model. However, in this study, SWC only performed better at the US-HRC site, ranking first. Five sites (no data available at the other site) showed a relatively moderate effect of SWC on the NEE at the half-hour time scale, which is similar to Xue et al. [39].

Temperature affects both photosynthesis and respiration, thus affecting NEE. Rice sensitivity to temperature varies in different regions. Too high or too low a temperature can affect the growth of rice. Among them, IT-Cas, JP-Mse, US-HRA, and US-HRC belong to humid subtropical climates, while KR-CRK, PH-RiF, and US-Twt have humid continental, tropical monsoon, and Mediterranean climates, respectively. Rice originates in tropical or subtropical regions, so in humid subtropical areas, the climate is warm and humid, which is very conducive to the growth of rice. However, in these humid subtropical regions, only the IT-Cas site was sensitive to temperature when predicting NEE, with a gain value of 0.3916. This may be due to the fact that the temperature at the IT-Cas site varied more than at other sites, ranging from −14.05 °C to 34.41 °C, resulting in NEE’s increased sensitivity to temperature.

Differences in geographical location and climatic conditions lead to different effects of LW on NEE in different regions. In this study, when modeling JP-Mse and KR-CRK, LW had the greatest impact, and the remaining sites had a smaller impact (except for two sites missing LW data). These two sites are located in East Asia. The range of LW at these two sites was 299.71–509.32 W m⁻² and 196.94–517.96 W m⁻², respectively. Compared with other sites, their LW variation span was larger. It was possible that this changing trend affected the growth of rice, and the respiration and photosynthesis of rice were also affected, which made the prediction of NEE more sensitive to LW.

4.3. Analysis of Influencing Factors of Methane Flux in Rice Paddies

Accurate prediction of methane flux (FCH₄) for rice paddies is crucial for understanding the greenhouse gas budget of rice paddy ecosystems and achieving environmental sustainability. Wu et al. [40] identified the XGBoost model as the most suitable model with outstanding efficiency and accuracy for predicting FCH₄. However, Wu et al. have not studied the environmental factors influencing FCH₄ emissions. Understanding the causes of FCH₄ emissions contributes to more sustainable agricultural development. This study investigated the ability of the XGBoost and RF models to predict FCH₄ and its causes.

The importance of methane prediction input characteristics at each site is shown in Figure 7. The dominant factors of methane prediction were also changing with the location of the site. Among them, RECO performed best at IT-Cas and JP-Mse sites, while at the KR-CRK and US-Twt sites, NDVI had the greatest impact on the model. At the remaining three sites, SWC obtained the largest gain value.

Morin et al. [41] found that RECO was a powerful factor that is necessary to explain methane emissions. The importance of RECO in paddy field CH₄ prediction was also well reflected in this study. RECO played the most important role in the IT-Cas and JP-Mse sites. Although the gain value obtained by RECO was no more than 0.1, some sites also had the importance of the top 3. RECO had such a good performance, which is attributed to the fact that it may represent the final result of complex nonlinear processes that also affect CH₄ exchange and are the direct drivers of CH₄ production and methane transport in paddy fields.

Shi et al. [42] indicated that vegetation indexes could be used as input factors for determining CH₄ flux in rice paddies. NDVI was one of the indicators of vegetation change and performed best when predicting CH₄ at KR-CRK and US-Twt sites. In addition, it also had a good performance at IT-Cas and PH-RiF sites, with a gain value better than other variables. These results can prove the possibility of NDVI for FCH₄ prediction.

Ge et al. [43] indicated that TS and SWC were the most important factors controlling CH₄ emissions from rice paddies on seasonal timescales through the partial F tests. Knox et al. [44] showed that the average annual soil temperature was the strongest predictor of annual CH₄ flux across wetland sites globally. This study found that TS and SWC were major factors in CH₄ prediction at the half-hour scale. For instance, TS ranked second in importance among two sites and third in one site; SWC ranked first in three sites, while the rest also performed in the top four. The main reason is that TS and SWC are important regulators of soil reduction conditions and enzymatic processes. Higher soil temperatures can enhance methane production, molecular diffusion, and transport within plants [45,46]. SWC regulates the balance between CH₄ production and oxidation by influencing the depth of anaerobic and aerobic zones in the soil [47].

5. Conclusions

In this study, two tree-based machine learning models (the XGBoost and RF models) were used to model the water carbon fluxes (ET, NEE, and FCH₄) of seven rice paddies, and the importance of these water carbon flux input factors were analyzed.

By analyzing the statistical indicator results, it was found that the XGBoost and RF model was available for predicting rice paddy ET, NEE, and FCH₄, and the XGBoost model outperformed the RF model at most sites. Thus, utilizing the XGBoost model for predicting water carbon fluxes can achieve highly accurate results while reducing resource consumption, such as the installation and calibration of measuring instruments. This is particularly significant for sustainable development in the fields of rice paddy.

Similarly, understanding the causes of water carbon fluxes also contributes to sustainable development by allowing us to focus on the most important factors and save time and resources. When predicting ET, SW was the most important at all sites. NETRAD and NDVI were more sensitive in predicting NEE.TA, LW, and SWC were more sensitive at individual sites. FCH₄ exhibited varying sensitivity to input factors across different sites, with SWC, TS, RECO, and NDVI emerging as the most critical factors influencing its emissions.

For future research, the current study suggests further investigation of XGBoost’s predictive water carbon fluxes performance in different wetland types. Additionally, investigating the combination of XGBoost with other optimization models, such as binary particle swarm optimization, to iteratively search for the optimal parameter combination of the XGBoost model and study the optimal input combination for water carbon fluxes.

Author Contributions

Software, L.W.; Writing—original draft, X.G.; Writing—review & editing, L.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the Science and the Natural Science Foundation of Jiangxi Province of China (20192ACBL20041 and 20212BDH80016), and the Key Project of Water Resources Department of Jiangxi Province of China (202124ZDKT14).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data is available for download at https://fluxnet.org/data/fluxnet-ch4-community-product/.

Conflicts of Interest

The authors declare no conflict of interest.

References

Bormann, H. Sensitivity analysis of 18 different potential evapotranspiration models to observed climatic change at German climate stations. Clim. Chang. 2011, 104, 729–753. [Google Scholar] [CrossRef]
Timm, A.U.; Roberti, D.R.; Streck, N.A.; Gustavo, G.; de Gonçalves, L.; Acevedo, O.C.; Moraes, O.L.; Moreira, V.S.; Degrazia, G.A.; Ferlan, M.; et al. Energy partitioning and evapotranspiration over a rice paddy in Southern Brazil. J. Hydrometeorol. 2014, 15, 1975–1988. [Google Scholar] [CrossRef]
Masseroni, D.; Facchi, A.; Romani, M.; Chiaradia, E.A.; Gharsallah, O.; Gandolfi, C. Surface energy flux measurements in a flooded and an aerobic rice field using a single eddy-covariance system. Paddy Water Environ. 2015, 13, 405–424. [Google Scholar] [CrossRef]
Bhattacharyya, P.; Neogi, S.; Roy, K.S.; Dash, P.K.; Tripathi, R.; Rao, K.S. Net ecosystem CO₂ exchange and carbon cycling in tropical lowland flooded rice ecosystem. Nutr. Cycl. Agroecosys. 2013, 95, 133–144. [Google Scholar] [CrossRef]
Schmitt, M.; Bahn, M.; Wohlfahrt, G.; Tappeiner, U.; Cernusca, A. Land use affects the net ecosystem CO₂ exchange and its components in mountain grasslands. Biogeosciences 2010, 7, 2297–2309. [Google Scholar] [CrossRef] [Green Version]
Forster, P.; Ramaswamy, V.; Artaxo, P.; Berntsen, T.; Betts, R.; Fahey, D.W.; Haywood, J.; Lean, J.; Lowe, D.C.; Myhre, G.; et al. Changes in atmospheric constituents and in radiative forcing. In Proceedings of the Climate Change 2007: The Physical Science Basis. Contribution of Working Group I to the 4th Assessment Report of the Intergovernmental Panel on Climate Change, Oberpfaffenhofen, Germany, 22–26 October 2007; Available online: https://elib.dlr.de/51416/ (accessed on 29 October 2007).
Jacobson, M.Z. Atmospheric Pollution: History, Science, and Regulation; Cambridge University Press: Cambridge, UK, 2002. [Google Scholar] [CrossRef]
Hatala, J.A.; Detto, M.; Sonnentag, O.; Deverel, S.J.; Verfaillie, J.; Baldocchi, D.D. Greenhouse gas (CO₂, CH₄, H₂O) fluxes from drained and flooded agricultural peatlands in the Sacramento-San Joaquin Delta. Agric. Ecosyst. Environ. 2012, 150, 1–18. [Google Scholar] [CrossRef]
Alberto, M.C.R.; Wassmann, R.; Buresh, R.J.; Quilty, J.R.; Correa, T.Q., Jr.; Sandro, J.M.; Centeno, C.A.R. Measuring methane flux from irrigated rice fields by eddy covariance method using open-path gas analyzer. Field Crops Res. 2014, 160, 12–21. [Google Scholar] [CrossRef]
Knox, S.H.; Sturtevant, C.; Matthes, J.H.; Koteen, L.; Verfaillie, J.; Baldocchi, D. Agricultural peatland restoration: Effects of land-use change on greenhouse gas (CO₂ and CH₄) fluxes in the Sacramento-San Joaquin Delta. Glob. Change Biol. 2015, 21, 750–765. [Google Scholar] [CrossRef]
Aubinet, M.; Vesala, T.; Papale, D. (Eds.) Eddy Covariance: A Practical Guide to Measurement and Data Analysis; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
Baldocchi, D.D.; Hincks, B.B.; Meyers, T.P. Measuring biosphere-atmosphere exchanges of biologically related gases with micrometeorological methods. Ecology 1988, 69, 1331–1340. [Google Scholar] [CrossRef]
Rana, G.; Katerji, N. Measurement and estimation of actual evapotranspiration in the field under Mediterranean climate: A review. Eur. J. Agron. 2000, 13, 125–153. [Google Scholar] [CrossRef]
Baldocchi, D.D. Assessing the eddy covariance technique for evaluating carbon dioxide exchange rates of ecosystems: Past, present and future. Glob. Change Biol. 2003, 9, 479–492. [Google Scholar] [CrossRef] [Green Version]
Baldocchi, D.; Falge, E.; Gu, L.; Olson, R.; Hollinger, D.; Running, S.; Anthoni, P.; Bernhofer, C.; Davis, K.; Evans, R.; et al. FLUXNET: A new tool to study the temporal and spatial variability of ecosystem-scale carbon dioxide, water vapor, and energy flux densities. In Bulletin of the American Meteorological Society; American Meteorological Society: Boston, MA, USA, 2001; Volume 82, pp. 2415–2434. [Google Scholar]
Baldocchi, D. Measuring fluxes of trace gases and energy between ecosystems and the atmosphere–the state and future of the eddy covariance method. Glob. Change Biol. 2014, 20, 3600–3609. [Google Scholar] [CrossRef] [PubMed]
Delwiche, K.B.; Knox, S.H.; Malhotra, A.; Fluet-Chouinard, E.; Jackson, R.B. Fluxnet-ch4: A global, multi-ecosystem dataset and analysis of methane seasonality from freshwater wetlands. Earth Syst. Sci. Data 2021, 13, 3607–3689. [Google Scholar] [CrossRef]
Yang, F.; White, M.A.; Michaelis, A.R.; Ichii, K.; Hashimoto, H.; Votava, P.; Zhu, A.-X.; Nemani, R.R. Prediction of continental-scale evapotranspiration by combining MODIS and AmeriFlux data through support vector machine. IEEE Trans. Geosci. Remote Sens. 2006, 44, 3452–3461. [Google Scholar] [CrossRef]
Yao, Y.; Liang, S.; Li, X.; Chen, J.; Liu, S.; Jia, K.; Zhang, X.; Xiao, Z.; Fisher, J.B.; Mu, Q.; et al. Improving global terrestrial evapotranspiration estimation using support vector machine by integrating three process-based algorithms. Agric. Forest Meteorol. 2017, 242, 55–74. [Google Scholar] [CrossRef]
Ichii, K.; Ueyama, M.; Kondo, M.; Saigusa, N.; Kim, J.; Alberto, M.C.; Ardo, J.; Euskirschen, E.S.; Kang, M.; Hirano, T.; et al. New data-driven estimation of terrestrial CO₂ fluxes in Asia using a standardized database of eddy covariance measurements, remote sensing data, and support vector regression. J. Geophys. Res. Biogeosci. 2017, 122, 767–795. [Google Scholar] [CrossRef]
Cui, X.; Goff, T.; Cui, S.; Menefee, D.; Wu, Q.; Rajan, N.; Nair, S.; Phillips, N.; Walker, F. Predicting carbon and water vapor fluxes using machine learning and novel feature ranking algorithms. Sci. Total Environ. 2021, 775, 145130. [Google Scholar] [CrossRef]
Wang, X.; Yao, Y.; Zhao, S.; Jia, K.; Zhang, X.; Zhang, Y.; Zhang, L.; Xu, J.; Chen, X. MODIS-based estimation of terrestrial latent heat flux over North America using three machine learning algorithms. Remote Sens. 2017, 9, 1326. [Google Scholar] [CrossRef] [Green Version]
Safa, B.; Arkebauer, T.J.; Zhu, Q.; Suyker, A.; Irmak, S. Net Ecosystem Exchange (NEE) simulation in maize using artificial neural networks. IFAC J. Syst. Control 2019, 7, 100036. [Google Scholar] [CrossRef]
Abbasi, T.; Luithui, C.; Abbasi, S.A. A model to forecast methane emissions from topical and subtropical reservoirs on the basis of artificial neural networks. Water 2020, 12, 145. [Google Scholar] [CrossRef] [Green Version]
Chen, T.; Guestrin, C. Xgboost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar] [CrossRef] [Green Version]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
Cui, Y.; Jia, L.; Fan, W. Estimation of actual evapotranspiration and its components in an irrigated area by integrating the Shuttleworth-Wallace and surface temperature-vegetation index schemes using the particle swarm optimization algorithm. Agric. Forest Meteorol. 2021, 307, 108488. [Google Scholar] [CrossRef]
Filgueiras, R.; Almeida, T.S.; Mantovani, E.C.; Dias, S.H.B.; Fernandes-Filho, E.I.; da Cunha, F.F.; Venancio, L.P. Soil water content and actual evapotranspiration predictions using regression algorithms and remote sensing data. Agric. Water Manag. 2020, 241, 106346. [Google Scholar] [CrossRef]
Agrawal, Y.; Kumar, M.; Ananthakrishnan, S.; Kumarapuram, G. Evapotranspiration modeling using different tree based ensembled machine learning algorithm. Water Resour. Manag. 2022, 36, 1025–1042. [Google Scholar] [CrossRef]
Ge, J.; Zhao, L.; Yu, Z.; Liu, H.; Zhang, L.; Gong, X.; Sun, H. Prediction of greenhouse tomato crop evapotranspiration using XGBoost machine learning model. Plants 2022, 11, 1923. [Google Scholar] [CrossRef]
Liu, X.; Xu, J.; Yang, S.; Zhang, J. Rice evapotranspiration at the field and canopy scales under water-saving irrigation. Meteorolo. Atmos. Phys. 2018, 130, 227–240. [Google Scholar] [CrossRef]
Ahmadi, A.; Daccache, A.; Snyder, R.L.; Suvočarev, K. Meteorological driving forces of reference evapotranspiration and their trends in California. Sci. Total Environ. 2022, 849, 157823. [Google Scholar] [CrossRef]
Aladenola, O.O.; Madramootoo, C.A. Evaluation of solar radiation estimation methods for reference evapotranspiration estimation in Canada. Theor. Appl. Climatol. 2014, 118, 377–385. [Google Scholar] [CrossRef]
Zhang, B.; Xu, D.; Liu, Y.; Li, F.; Cai, J.; Du, L. Multi-scale evapotranspiration of summer maize and the controlling meteorological factors in north China. Agric. For. Meteorol. 2016, 216, 1–12. [Google Scholar] [CrossRef]
Xu, M.; An, T.; Zheng, Z.; Zhang, T.; Zhang, Y.; Yu, G. Variability in evapotranspiration shifts from meteorological to biological control under wet versus drought conditions in an alpine meadow. J. Plant Ecol. 2022, 15, 921–932. [Google Scholar] [CrossRef]
Qiu, J.; Crow, W.T.; Nearing, G.S. The impact of vertical measurement depth on the information content of soil moisture for latent heat flux estimation. J. Hydrometeorol. 2016, 17, 2419–2430. [Google Scholar] [CrossRef]
Liu, J.; Zuo, Y.; Wang, N.; Yuan, F.; Zhu, X.; Zhang, L.; Zhang, J.; Sun, Y.; Guo, Z.; Guo, Y.; et al. Comparative analysis of two machine learning algorithms in predicting site-level net ecosystem exchange in major biomes. Remote Sens. 2021, 13, 2242. [Google Scholar] [CrossRef]
Zhou, Q.; Fellows, A.; Flerchinger, G.N.; Flores, A.N. Examining interactions between and among predictors of net ecosystem exchange: A machine learning approach in a semi-arid landscape. Sci. Rep. 2019, 9, 2222. [Google Scholar] [CrossRef] [Green Version]
Xue, Y.; Chen, Y.; Hu, Y.; Chen, H. Fuzzy Rough Set algorithm with Binary Shuffled Frog-Leaping (BSFL-FRSA): An innovative approach for identifying main drivers of carbon exchange in temperate deciduous forests. Ecol. Indic. 2017, 83, 41–52. [Google Scholar] [CrossRef]
Wu, Q.; Wang, J.; He, Y.; Liu, Y.; Jiang, Q. Quantitative assessment and mitigation strategies of greenhouse gas emissions from rice fields in China: A data-driven approach based on machine learning and statistical modeling. Comput. Electron. Agric. 2023, 210, 107929. [Google Scholar] [CrossRef]
Morin, T.H.; Bohrer, G.; Frasson, R.D.M.; Naor-Azreli, L.; Mesi, S.; Stefanik, K.C.; Schäfer, K.V.R. Environmental drivers of methane fluxes from an urban temperate wetland park. J. Geophys. Res. Biogeosci. 2014, 119, 2188–2208. [Google Scholar] [CrossRef] [Green Version]
Shi, Y.; Lou, Y.; Zhang, Z.; Ma, L.; Ojara, M.A. Estimation of methane emissions based on crop yield and remote sensing data in a paddy field. Greenh. Gases Sci. Technol. 2020, 10, 196–207. [Google Scholar] [CrossRef]
Ge, H.X.; Zhang, H.S.; Zhang, H.; Cai, X.H.; Song, Y.; Kang, L. The characteristics of methane flux from an irrigated rice farm in East China measured using the eddy covariance method. Agric. Forest Meteorol. 2018, 249, 228–238. [Google Scholar] [CrossRef]
Knox, S.H.; Jackson, R.B.; Poulter, B.; McNicol, G.; Fluet-Chouinard, E.; Zhang, Z.; Hugelius, G.; Bousquet, P.; Canadell, J.G.; Saunois, M.; et al. FLUXNET-CH 4 synthesis activity: Objectives, observations, and future directions. B Am. Meteorol. Soc. 2019, 100, 2607–2632. [Google Scholar] [CrossRef] [Green Version]
Kim, J.; Verma, S.B.; Billesbach, D.P.; Clement, R.J. Diel variation in methane emission from a midlatitude prairie wetland: Significance of convective throughflow in Phragmites australis. J. Geophys. Res. Atmos. 1998, 103, 28029–28039. [Google Scholar] [CrossRef]
Chanton, J.P. The effect of gas transport on the isotope signature of methane in wetlands. Org. Geochem. 2005, 36, 753–768. [Google Scholar] [CrossRef]
Ma, L.; Liu, B.; Cui, Y.; Shi, Y. Variations and drivers of methane fluxes from double-cropping paddy fields in Southern China at diurnal, seasonal and inter-seasonal timescales. Water 2021, 13, 2171. [Google Scholar] [CrossRef]

Figure 1. Location of eddy covariance sites, with sites colored by wetland type. Cyan represents the seven rice paddy sites studied.

Figure 2. Scatter plots of the observed and predicted ET at seven rice paddy sites using the XGBoost and RF models. Red points represent the observed and predicted ET using the XGBoost model. Blue points represent the observed and predicted ET using the RF model.

Figure 3. Characteristic importance plots for the input factors of the XGBoost model for predicting ET. The colors are used to identify input factors. Gain measures the improvement in the predictive power of the model after splitting the node; cover refers to the number of samples covered by a node; frequency represents the number of samples for different categories within a node. Input factors are air temperature (TA), incoming shortwave radiation (SW_IN), vapor pressure deficit (VPD), atmospheric pressure (PA), wind speed (WS), soil water content (SWC), normalized difference vegetation index (NDVI), and the temperature difference between the previous day and the current day (DeltaTA).

Figure 4. Scatter plots of the observed and predicted NEE at seven rice paddy sites using the XGBoost and RF models. Red points represent the observed and predicted NEE using the XGBoost model. Blue points represent the observed and predicted NEE using the RF model.

Figure 5. Characteristic importance plots for the input factors of the XGBoost model for predicting NEE. The colors are used to identify input factors. Gain measures the improvement in the predictive power of the model after splitting the node; cover refers to the number of samples covered by a node; frequency represents the number of samples for different categories within a node. Input factors are air temperature (TA), incoming shortwave radiation (SW_IN), outgoing longwave radiation (LW_OUT), vapor pressure deficit (VPD), atmospheric pressure (PA), wind speed (WS), wind direction (WD), soil water content (SWC), friction velocity (USTAR), net radiation (NETRAD), soil temperature (TS), normalized difference vegetation index (NDVI), and the temperature difference between the previous day and the current day (DeltaTA).

Figure 6. Scatter plots of the observed and predicted FCH₄ at seven rice paddy sites using the XGBoost and RF models. Red points represent the observed and predicted FCH₄ using the XGBoost model. Blue points represent the observed and predicted FCH₄ using the RF model.

Figure 7. Characteristic importance plots for the input factors of the XGBoost model for predicting FCH₄. The colors are used to identify input factors. Gain measures the improvement in the predictive power of the model after splitting the node; cover refers to the number of samples covered by a node; frequency represents the number of samples for different categories within a node. Input factors are air temperature (TA), incoming shortwave radiation (SW_IN), outgoing longwave radiation (LW_OUT), vapor pressure deficit (VPD), atmospheric pressure (PA), wind speed (WS), wind direction (WD), soil water content (SWC), friction velocity (USTAR), net radiation (NETRAD), ecosystem respiration (RECO), sensible heat turbulent flux (H), gross primary productivity (GPP), soil heat flux (G), soil temperature (TS), normalized difference vegetation index (NDVI), latent heat turbulent flux (LE), net ecosystem carbon exchange (NEE), and the temperature difference between the previous day and the current day (DeltaTA).

Table 1. Description of seven rice paddy sites.

Sites	Latitude (°)	Longitude (°)	Start Year (Year)	End Year (Year)	Mean ET (mm hr⁻¹)	Mean NEE (gC m⁻² d⁻¹)	Mean FCH₄ (nmol m⁻² s⁻¹)
IT-Cas	45.07	8.72	2009	2010	0.06	−10.50	94.66
JP-Mse	36.05	140.03	2012	2012	0.11	−15.39	61.17
KR-CRK	38.20	127.25	2015	2018	0.06	−7.72	119.32
PH-RiF	14.14	121.27	2012	2014	0.08	−12.49	40.88
US-HRA	34.59	−91.75	2017	2017	0.07	−32.60	63.80
US-HRC	34.59	−91.75	2017	2017	0.09	−28.57	106.62
US-Twt	38.11	−121.65	2009	2017	0.10	−8.60	43.57

Table 2. The mean values and number of samples of the input factors for ET prediction at seven rice paddy sites.

Factors/Sites	IT-Cas	JP-Mse	KR-CRK	PH-RiF	US-HRA	US-HRC	US-Twt
TA	15.9	21.1	13.3	27.0	24.0	23.9	18.9
SW_IN	265.0	342.2	282.7	219.5	211.2	283.5	358.4
VPD	14.0	8.4	6.2	9.7	5.4	7.6	12.3
PA	100.4	101.0	99.2	100.5	100.9	100.8	101.3
WS	1.1	2.4	2.2	1.9	1.5	2.0	4.3
NDVI	0.5	0.5	0.4	0.6	0.7	0.6	0.5
SWC	58.9	42.5	38.7	63.5	42.3	47.7
DeltaTA	−0.5	−0.6	−0.4	−0.1	0.1	0.0	−0.5
Number	15,497	4952	31,067	21,010	2149	2144	66,455

Note: the blank indicates missing data. Input factors are air temperature (TA), °C, incoming shortwave radiation (SW_IN), W m⁻², vapor pressure deficit (VPD), hPa, atmospheric pressure (PA), kPa, wind speed (WS), m s⁻¹, soil water content (SWC), %, normalized difference vegetation index (NDVI), and the temperature difference between the previous day and the current day (DeltaTA), °C.

Table 3. The mean values and number of samples of the input factors for NEE prediction at seven rice paddy sites.

Factors/Sites	IT-Cas	JP-Mse	KR-CRK	PH-RiF	US-HRA	US-HRC	US-Twt
TA	13.5	21.2	14.1	27.3	26.5	26.6	18.9
SW_IN	221.1	364.1	239.5	267.7	353.7	370.4	348.9
LW_OUT		431.6	391.6	468.8	464.1	463.5
VPD	12.9	8.5	5.7	10.7	7.7	9.7	12.4
PA	100.3	101.0	99.2	100.6	100.8	100.8	101.3
WS	1.1	2.4	2.2	2.1	2.5	2.5	4.4
WD	173.6	167.9	212.4	165.8	179.3	190.4	268.2
SWC	60.4	42.6	40.7	62.1	41.7	47.8	0.4
USTAR	0.1	0.2	0.2	0.2	0.2	0.2
NETRAD	125.8		141.9	188.6	232.0	249.3	187.1
TS	12.9	19.5	14.3	28.5			18.6
NDVI	0.5	0.5	0.4	0.6	0.6	0.6	0.6
DeltaTA	−0.4	−0.6	−0.3	−0.2	−0.3	−0.4	−0.4
Number	10,917	4207	23,594	13,844	1562	1813	55,854

Note: the blank indicates missing data. Input factors are air temperature (TA), °C, incoming shortwave radiation (SW_IN), W m⁻², outgoing longwave radiation (LW_OUT), W m⁻², vapor pressure deficit (VPD), hPa, atmospheric pressure (PA), kPa, wind speed (WS), m s⁻¹, wind direction (WD), soil water content (SWC), %, friction velocity (USTAR), m s⁻¹, net radiation (NETRAD), W m⁻², soil temperature (TS), °C, normalized difference vegetation index (NDVI), and the temperature difference between the previous day and the current day (DeltaTA), °C.

Table 4. The mean values and number of samples of the input factors for FCH₄ prediction at seven rice paddy sites.

Factors/Sites	IT-Cas	JP-Mse	KR-CRK	PH-RiF	US-HRA	US-HRC	US-Twt
TA	18.7	22.0	15.7	27.4	26.8	26.6	19.1
SW_IN	314.9	380.8	313.0	273.1	373.1	375.5	356.1
LW_OUT		436.1	401.8	469.8	465.8	464.0
VPD	15.7	8.9	7.4	11.1	8.1	10.0	12.7
PA	100.5	101.0	99.1	100.6	100.8	100.8	101.3
WS	1.2	2.3	2.4	2.1	2.5	2.5	4.3
WD	184.9	163.3	216.5	164.0	180.7	191.2	268.6
SWC	59.1	43.4	40.0	62.3	42.1	47.7
USTAR	0.2	0.2	0.2	0.2	0.2	0.2	0.4
NETRAD	192.8		195.4	195.1	246.6	253.6	191.4
RECO	4.3	3.1	3.7	4.0	5.4	5.2	4.4
H	25.8	20.2	23.6	23.0			9.2
GPP	9.1	7.9	6.6	7.5	14.5	13.3	7.1
G	−21.7		4.9	10.9	20.6	23.7	6.7
TS	17.7	20.3	14.8	28.6			18.7
NDVI	0.6	0.5	0.4	0.6	0.6	0.6	0.6
LE	101.5	163.4	101.7	120.8	154.9	148.8	148.4
NEE	−4.5	−4.7	−2.6	−3.4	−9.0	−7.8	−2.5
DeltaTA	−0.6	−0.6	−0.4	−0.2	−0.3	−0.4	−0.4
Number	8258	3594	18,366	9720	1326	1634	49,401

Note: the blank indicates missing data. Input factors are air temperature (TA), °C, incoming shortwave radiation (SW_IN), W m⁻², outgoing longwave radiation (LW_OUT), W m⁻², vapor pressure deficit (VPD), hPa, atmospheric pressure (PA), kPa, wind speed (WS), m s⁻¹, wind direction (WD), soil water content (SWC), %, friction velocity (USTAR), m s⁻¹, net radiation (NETRAD), W m⁻², ecosystem respiration (RECO), µmolCO₂ m⁻² s⁻¹, sensible heat turbulent flux (H), W m⁻², gross primary productivity (GPP), µmolCO₂ m⁻² s⁻¹, soil heat flux (G), W m⁻², soil temperature (TS), °C, normalized difference vegetation index (NDVI), latent heat turbulent flux (LE), W m⁻², net ecosystem carbon exchange (NEE), µmolCO₂ m⁻² s⁻¹, and the temperature difference between the previous day and the current day (DeltaTA), °C.

Table 5. ET prediction statistical values of the two machine learning models at seven rice paddy sites.

Models	RMSE	R²	MAE	MBE	GRI
Models	(mm hr⁻¹)		(mm hr⁻¹)	(mm hr⁻¹)
IT-Cas
RF	0.0207	0.9240	0.0121	0.0002	3.0000
XGBoost	0.0187	0.9377	0.0110	0.0000	−1.0000
JP-Mse
RF	0.0262	0.9128	0.0163	0.0002	3.0000
XGBoost	0.0255	0.9175	0.0157	0.0000	−1.0000
KR-CRK
RF	0.0250	0.8816	0.0147	0.0004	3.0000
XGBoost	0.0248	0.8841	0.0145	0.0000	−1.0000
PH-RiF
RF	0.0268	0.8906	0.0171	0.0006	3.0000
XGBoost	0.0243	0.9098	0.0156	0.0000	−1.0000
US-HRA
RF	0.0269	0.9025	0.0151	0.0003	3.0000
XGBoost	0.0255	0.9127	0.0142	0.0000	−1.0000
US-HRC
RF	0.0325	0.8583	0.0174	−0.0001	1.5783
XGBoost	0.0314	0.8671	0.0169	−0.0003	0.0000
US-Twt
RF	0.0297	0.8980	0.0198	0.0005	3.0000
XGBoost	0.0246	0.9301	0.0160	0.0001	−1.0000

Table 6. NEE prediction statistical values of the two machine learning models at seven rice paddy sites.

Models	RMSE	R²	MAE	MBE	GRI
Models	(gC m⁻² d⁻¹)		(gC m⁻² d⁻¹)	(gC m⁻² d⁻¹)
IT-Cas
RF	9.2419	0.9341	4.6274	0.0623	2.0000
XGBoost	9.1589	0.9355	4.6318	0.0429	0.0000
JP-Mse
RF	6.5770	0.9581	4.2104	−0.1047	−0.3422
XGBoost	6.4059	0.9603	4.1446	−0.0420	−1.0000
KR-CRK
RF	17.4811	0.7548	8.9706	0.0292	0.0000
XGBoost	17.5608	0.7525	9.1083	0.0103	2.0000
PH-RiF
RF	10.7097	0.9096	7.1376	−0.1542	0.4636
XGBoost	9.6703	0.9254	6.6271	−0.0326	−1.0000
US-HRA
RF	17.6100	0.9173	8.8268	0.3322	3.0000
XGBoost	16.7583	0.9250	8.7333	0.2344	−1.0000
US-HRC
RF	24.3017	0.8196	8.3080	0.0480	1.0000
XGBoost	23.4708	0.8286	8.8489	−0.1704	1.0000
US-Twt
RF	7.7431	0.9579	4.9390	−0.0023	−1.0596
XGBoost	6.8748	0.9668	4.4939	−0.0012	−3.0596

Table 7. FCH₄ prediction statistical values of the two machine learning models at seven rice paddy sites.

Models	RMSE	R²	MAE	MBE	GRI
Models	(nmol m⁻² s⁻¹)		(nmol m⁻² s⁻¹)	(nmol m⁻² s⁻¹)
IT-Cas
RF	63.3657	0.7641	29.4863	0.7413	0.0000
XGBoost	63.5845	0.7623	30.3872	−0.4892	2.0000
JP-Mse
RF	36.8322	0.8979	14.2688	0.2430	2.0000
XGBoost	30.1109	0.9293	13.4638	−0.2704	0.0000
KR-CRK
RF	84.6257	0.7677	40.3356	0.9057	3.0000
XGBoost	80.4056	0.7893	37.5336	0.1151	−1.0000
PH-RiF
RF	43.9581	0.6421	18.6736	0.6135	2.0000
XGBoost	42.2099	0.6641	18.9505	−0.0848	0.0000
US-HRA
RF	37.0595	0.8292	15.0479	0.1860	1.0000
XGBoost	35.8628	0.8408	15.0570	−0.5114	1.0000
US-HRC
RF	28.7857	0.9444	13.7692	0.3039	2.0000
XGBoost	28.0140	0.9475	15.4187	0.1520	0.0000
US-Twt
RF	29.5718	0.8293	14.4098	0.2981	2.0000
XGBoost	28.9092	0.8355	14.8292	0.0876	0.0000

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gu, X.; Yao, L.; Wu, L. Prediction of Water Carbon Fluxes and Emission Causes in Rice Paddies Using Two Tree-Based Ensemble Algorithms. Sustainability 2023, 15, 12333. https://doi.org/10.3390/su151612333

AMA Style

Gu X, Yao L, Wu L. Prediction of Water Carbon Fluxes and Emission Causes in Rice Paddies Using Two Tree-Based Ensemble Algorithms. Sustainability. 2023; 15(16):12333. https://doi.org/10.3390/su151612333

Chicago/Turabian Style

Gu, Xinqin, Li Yao, and Lifeng Wu. 2023. "Prediction of Water Carbon Fluxes and Emission Causes in Rice Paddies Using Two Tree-Based Ensemble Algorithms" Sustainability 15, no. 16: 12333. https://doi.org/10.3390/su151612333

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Prediction of Water Carbon Fluxes and Emission Causes in Rice Paddies Using Two Tree-Based Ensemble Algorithms

Abstract

1. Introduction