Next Article in Journal
Natural Resources Management as Drivers of Economic Growth: Fresh Insights from a Time Series Analysis of Saudi Arabia
Previous Article in Journal
Evaluating Perceived Cultural Ecosystem Services in Urban Green Spaces Using Big Data and Machine Learning: Insights from Fragrance Hill Park in Beijing, China
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Prediction of Ship CO2 Emissions and Fuel Consumption Using Voting-BRL Model

School of Economics and Management, Shanghai Maritime University, Shanghai 201306, China
*
Author to whom correspondence should be addressed.
Sustainability 2025, 17(4), 1726; https://doi.org/10.3390/su17041726
Submission received: 3 December 2024 / Revised: 23 January 2025 / Accepted: 14 February 2025 / Published: 19 February 2025

Abstract

:
The accurate prediction of ship carbon dioxide (CO2) emissions and fuel consumption is critical for enhancing environmental sustainability in the maritime industry. This study introduces a novel ensemble learning approach, the Voting-BRL model, which integrates Bayesian Ridge Regression and Lasso Regression to improve prediction accuracy and robustness. Utilizing four years of real-world data from the THETIS-MRV platform managed by the European Maritime Safety Agency (EMSA), the proposed model first employs Analysis of Variance (ANOVA) for feature selection, effectively reducing dimensionality and mitigating noise interference. The Voting-BRL model then combines the strengths of Bayesian Ridge Regression in handling uncertainty and feature correlations with Lasso Regression’s capability for automatic feature selection through a voting mechanism. Experimental results demonstrate that Voting-BRL achieves an R 2 of 0.9981 and a Root Mean Square Error (RMSE) of 8.53, outperforming traditional machine learning models such as XGBRegressor, which attains an R 2 of 0.97 and an RMSE of 45.03. Additionally, ablation studies confirm that the ensemble approach significantly enhances predictive performance by leveraging the complementary strengths of individual models. The Voting-BRL model not only provides superior accuracy but also exhibits enhanced generalization capabilities and stability, making it a reliable tool for predicting ship CO2 emissions and fuel consumption. This advancement contributes to more effective emission management and operational efficiency in the shipping sector, supporting global efforts to reduce greenhouse gas emissions.

1. Introduction

In the context of increasing global warming and heightened environmental awareness, the issues of carbon dioxide (CO2) emissions and fuel consumption in the shipping industry have garnered extensive attention. As one of the major sources of global greenhouse gas emissions, the maritime transportation sector faces escalating emission challenges with the growth of international trade volumes [1,2,3]. According to data from the International Maritime Organization (IMO), the shipping industry accounts for approximately 2.5% of global greenhouse gas emissions, and this proportion is expected to rise further as global trade continues to expand. Therefore, enhancing ship operational efficiency and reducing carbon footprints have become core objectives for industry development.
However, the collection of fuel consumption and emission data during actual shipping processes encounters numerous challenges. Firstly, the data acquisition process may be incomplete or inaccurate, introducing substantial noise into the models. Secondly, shipping data typically exhibit highly nonlinear characteristics, making it difficult for traditional linear regression models to effectively capture these complex interactions, thereby limiting the improvement of predictive performance. Additionally, the variability of the marine environment—such as weather conditions, route choices, and ship loading—further increases the complexity of the data and the difficulty of prediction.
To address these issues and achieve more accurate predictions, many researchers have currently adopted advanced machine learning methods such as Random Forest [4,5], Support Vector Machines (SVMs) [6,7], and Neural Networks [8,9]. These methods have, to some extent, improved the accuracy and robustness of predictions but also present significant limitations. Firstly, these models often require large amounts of high-quality data to ensure their performance when handling high-dimensional features, making them highly dependent on data quantity and quality. Secondly, these models have high computational complexity, especially when dealing with large-scale datasets, as the training and prediction processes consume substantial computational resources. Furthermore, the “black-box” [10,11,12] nature of these methods makes it difficult to interpret the contribution of specific features to the prediction results, limiting their interpretability and transparency in practical applications. Ultimately, in scenarios with insufficient data or significant noise, these models are prone to overfitting, resulting in a substantial decline in their generalization capabilities.
In contrast, existing linear regression models [13,14] perform poorly when dealing with multi-dimensional and heterogeneous feature data, especially when features of different scales have unbalanced effects on the model. Although feature scaling (e.g., standardization) can alleviate this issue to some extent, the model remains susceptible to interference from irrelevant features, leading to decreased prediction accuracy. Traditional machine learning methods, when directly applied to features, have advantages over deep learning methods in terms of interpretability and performance; however, single models struggle to achieve high-precision results. Additionally, machine learning methods lack the ability of deep learning to filter effective features and suppress low-relevance features through mechanisms like attention, which can lead to training failures and increased prediction errors when handling a large number of features. These issues introduce significant errors and challenges to CO2 prediction.
To address these challenges, this study proposes a Voting Regressor model (Voting-BRL) that combines Bayesian Ridge Regression and Lasso Regression, aiming to enhance the prediction performance of ship CO2 emissions and fuel consumption. Specifically, this study first employs Analysis of Variance (ANOVA) to select features highly correlated with the dependent variable, thereby reducing the dimensionality of the independent variables and effectively filtering out features with low relevance to the prediction task, thus decreasing model complexity and noise interference. Subsequently, an ensemble learning method that combines Bayesian Ridge Regression and Lasso Regression is utilized, leveraging the advantages of Bayesian Ridge Regression in handling uncertainty and feature correlations, and Lasso Regression’s capability for automatic feature selection. Through a voting mechanism, the predictions of the two models are integrated, further enhancing the model’s generalization capability and prediction stability.
The main contributions of this study include the following:
  • Proposed the Voting-BRL (Voting-Bayesian Ridge and Lasso) method: This method combines Bayesian Ridge Regression and Lasso Regression through a voting mechanism to achieve more precise carbon dioxide emission predictions.
  • Conducted detailed ablation experiments: These experiments analyze the impact of different modules on the performance of the Voting-BRL model across multiple datasets, validating the effectiveness of each component of the model.
  • Validated the method using real-world data: Utilizing four years of actual data from the THETIS-MRV platform managed by the European Maritime Safety Agency (EMSA), the experimental results demonstrate that the Voting-BRL model achieves or exceeds an R 2 of 0.99 in prediction performance, significantly outperforming traditional methods and showcasing its efficiency and reliability in practical applications.

2. Related Work

2.1. Ship Energy Saving and Emission Reduction

The study of energy saving and emission reduction in shipping faces many challenges. The complexity of data acquisition and quality assurance arises from the inconsistency of multi-source heterogeneous data and its inherent noise, which seriously affects the accuracy and reliability of the analysis. Feature selection and data identification and extraction also play a key role in the model’s performance and its predictive results. To address these issues, highly complex models commonly used in current research place significant demands the computing resources during practical application. The complexity of shipping data requires researchers to develop methods that adapt to changes in uncertainty.
In the study of carbon dioxide emissions prediction, many methods have been applied. Song et al. [1], using data from 2010 to 2018, identified causal relationships between factors influencing the logistics environment and specific modes of transportation through the OLS method. Zincir’s research [15] focused on the potential of ammonia as an alternative fuel. The adoption of ammonia fuel could effectively reduce CO2 emissions in shipping, but commercialization still faces challenges related to infrastructure and fuel efficiency. Wang et al. [16] explored the main challenges of decarbonizing the shipping industry, including the long-term sustainability of the industry’s response to regulatory and policy changes on emissions. Mocerino et al. [17] provided a detailed review of the mutual impacts between climate change and the shipping industry, which offers significant insights into CO2 emissions prediction. Nguyen et al. [18] reviewed the application of electric propulsion systems in the shipping industry and highlighted their potential to reduce CO2 emissions. Xing et al. [19] proposed several measures to reduce CO2 emissions from ships, including reducing ship speed and optimizing sailing routes. Mersin et al. [20] analyzed multiple existing emission reduction methods, noting that the inconsistency of multi-source data and the accurate identification and extraction of relevant features are major challenges in this field. It is evident that reducing CO2 emissions during shipping has become a significant issue, and predicting the impact of different shipping methods on CO2 emissions is a crucial step in addressing this problem.

2.2. Bayesian Ridge Regression

Bayesian Ridge Regression is a linear regression method that incorporates a probabilistic perspective, adding regularization to prevent overfitting by applying a prior distribution to the regression coefficients. This method is particularly useful when dealing with multicollinearity or when the number of features exceeds the number of samples. In shipping energy-saving and emission reduction studies, Bayesian Ridge Regression has been applied to estimate and predict various factors influencing CO2 emissions, including fuel consumption, ship speed, and cargo load. The use of Bayesian Ridge Regression allows researchers to introduce uncertainty into the model, which is essential when dealing with noisy or incomplete multi-source data in shipping. For instance, Crosby et al. [21] applied Bayesian logistic regression to examine the relationship between occupants’ perceived thermal comfort and indoor CO2 levels in buildings, providing an effective predictive method. Affholder et al. [22] used Bayesian statistical methods to quantify the likelihood that methanogenesis (biomethane production) could explain the escape rates of molecular hydrogen and methane in the plumes of Enceladus. Michimae et al. [23] employed vine copula to construct a copula-based joint prior distribution, yielding more accurate estimates in cases of multicollinearity. As computing power increases, the application of Bayesian methods is expected to grow, offering more robust and accurate predictions in energy-saving and emission reduction efforts.

2.3. Lasso Regression

Lasso Regression (Least Absolute Shrinkage and Selection Operator) is a popular regression method that performs both feature selection and regularization to enhance the prediction accuracy and interpretability of the model. By imposing a penalty on the absolute values of the coefficients, Lasso forces some of them to be exactly zero, effectively selecting a simpler model by excluding less important features. This makes Lasso particularly well suited for high-dimensional datasets with many predictors, a common scenario in shipping research, where multiple variables such as fuel types, operating speeds, and environmental factors need to be considered. Michalakopoulos et al. [24] conducted a comparative analysis of machine learning algorithms used to predict CO2 emissions in the maritime sector. Zhou et al. [25] proposed an adaptive hyperparameter tuning method combining ANN and Lasso, taking into account the impact of marine environmental factors on fuel consumption. Monisha et al. [26] developed two machine learning models, including Lasso, using actual voyage data collected from noon reports of ships in Bangladesh. Lasso’s application is expected to increase as the volume and complexity of shipping data continue to grow.

3. Voting-BRL

In maritime emission and fuel consumption prediction models, ship CO2 emission data are influenced by a myriad of environmental and operational factors, such as weather conditions, cargo load levels, and route choices. These factors not only complicate the data but also introduce varying degrees of uncertainty in prediction outcomes. The inconsistent collection of emission data further exacerbates the prediction bias inherent in traditional models. Given that the relationship between emissions and fuel consumption is typically complex and nonlinear, traditional linear models, such as ordinary least squares (OLS) regression, are often insufficient to capture the intricate dynamics between variables. This underscores the need for more sophisticated modeling approaches to ensure higher accuracy.
Moreover, the environmental factors—such as wind speed, sea waves, and temperature—directly influence the emission levels. These variables are often neglected in traditional models or are difficult to quantify accurately, further contributing to prediction errors. The interactions between various factors, including ship type, navigation conditions, and fuel type, create a high-dimensional dataset that poses additional challenges for traditional modeling techniques. Therefore, to manage the inherent nonlinearity and complexity, more advanced methods that can effectively handle such high-dimensional data are required.

3.1. Overall Structure

To address data nonlinearity, complexity, and uncertainty in predicting ship emissions, we drew inspiration from Bayes’ theorem. Our approach leverages the probabilistic correlation between known and unknown events to improve the accuracy of predictions. Based on this principle, we developed a Bayesian-based predictive model that utilizes both historical data and the inherent uncertainty in maritime operations to make more informed predictions about ship emissions.
The overall structure of our model is illustrated in Figure 1. In this framework, we first apply an ANOVA (Analysis of Variance) technique for feature selection, then perform regression using a hybrid method that combines Bayesian Ridge Regression and Lasso Regression, ultimately outputting final predictions through an ensemble learning strategy known as Voting Regressor.
In the following sections, we introduce the individual components of the model in detail, explaining the rationale behind each step and its contribution to improving the prediction accuracy of maritime emissions.

3.2. ANOVA Feature Selection

Feature selection plays a pivotal role in the development of predictive models, especially when dealing with high-dimensional data that contain numerous independent variables. In our model, we employ one-way ANOVA analysis to identify the most critical features that have a statistically significant impact on the target variable, which is ship emissions in this context.
Prior to conducting ANOVA, we applied one-hot encoding to all categorical independent variables to ensure the creation of a high-dimensional feature space. While this transformation enables the model to capture essential information from categorical variables, it also increases the dimensionality of the feature space. Too many features can reduce the efficiency of model fitting and introduce overfitting risks, whereas too few features may fail to represent the true complexity of the data. Therefore, feature selection is crucial in balancing model complexity and accuracy.
The ANOVA method allows us to test the hypothesis of whether there are significant differences between the means of different groups with respect to the target variable. Specifically, for each independent variable, we test the following hypotheses:
  • Null Hypothesis H 0 : There is no significant relationship between the feature and the target variable.
  • Alternative Hypothesis H 1 : There is a significant relationship between the feature and the target variable.
The test statistic for one-way ANOVA, the F-statistic, is calculated as follows:
F = Between - group   variance Within - group   variance = 1 k 1 i = 1 k n i ( Y ¯ i Y ¯ ) 2 1 N k j = 1 N ( Y j Y ¯ ) 2
where k is the number of groups, n i is the sample size of the i-th group, Y ¯ i is the mean of the i-th group, Y ¯ is the overall mean, and N is the total sample size.
Based on the resulting F-statistic and its corresponding p-value, we select features that exhibit a significance level below a threshold of α = 0.05 . These selected features are deemed important for the subsequent regression analyses.

3.3. Ensemble Learning Methods

To improve the predictive performance of our model, we employ an ensemble learning approach by combining Bayesian Ridge Regression with Lasso Regression. This approach harnesses the strengths of both regression techniques: Bayesian Ridge Regression excels in handling uncertainty and correlated features, while Lasso Regression is effective at performing automatic feature selection.
Bayesian Ridge Regression incorporates a probabilistic framework that introduces prior distributions for the model parameters, leading to more robust estimates. The regression model is represented as
Y = X β + ϵ
where Y is the target variable, X is the matrix of independent variables, β represents the regression coefficients, and ϵ N ( 0 , σ 2 ) is the error term. The use of priors allows for quantifying the uncertainty in the parameter estimates, which is especially beneficial in high-dimensional settings.
Lasso Regression, on the other hand, is well known for its feature selection capability due to its regularization term, which encourages sparsity in the model coefficients. The objective function of Lasso Regression is
β ^ = arg min β 1 n i = 1 n ( y i x i T β ) 2 + λ j = 1 p | β j |
where λ is the regularization parameter and p is the number of features. Lasso effectively shrinks irrelevant feature coefficients to zero, which not only improves model interpretability but also prevents overfitting.
By combining these two methods in an ensemble learning framework, we enhance the model’s overall predictive power. We first obtain parameter estimates β ^ B R from Bayesian Ridge Regression and then pass these estimates into the Lasso Regression model to refine them further. This two-step approach ensures that we benefit from the strengths of both techniques.

3.4. Voting Regressor

The final step in our model involves the use of a Voting Regressor, which is an ensemble method that generates predictions by taking a weighted average of the outputs from multiple base regression models. In our case, we combine the predictions from Bayesian Ridge Regression and Lasso Regression to form a more robust prediction framework. The final prediction Y ^ can be expressed as
Y ^ = 1 n i = 1 n ( w B R Y ^ B R + w L a s s o Y ^ L a s s o )
where w B R and w L a s s o are the weights assigned to the predictions from the Bayesian Ridge and Lasso models, respectively, and n is the number of models. This method effectively balances the strengths of each model and reduces the potential bias that a single model might introduce.
The overall structure of the Voting Regressor is depicted in Figure 1, illustrating how the predictions from the b a y e s i a n _ m o d e l and the l a s s o _ m o d e l are aggregated to produce the final result. This approach improves the stability, accuracy, and robustness of the model by leveraging the complementary strengths of Bayesian Ridge and Lasso Regression.
Algorithm 1 Voting-BRL algorithm framework.
  1: procedure VotingRegressor( X , y )
  2:    X e n c o d e d OneHotEncoding ( X ) ▹ One-hot encode the independent variables
  3:    f e a t u r e s A N O V A _ F e a t u r e _ S e l e c t i o n ( X e n c o d e d , y ) ▹ Feature selection
  4:    X s e l e c t e d X e n c o d e d [ : , f e a t u r e s ] ▹ Select relevant features
  5:    b a y e s i a n _ m o d e l BayesianRidge ( X s e l e c t e d , y ) ▹ Train Bayesian Ridge model
  6:    l a s s o _ m o d e l Lasso ( X s e l e c t e d , y ) ▹ Train Lasso model
  7:    Y B R Predict ( b a y e s i a n _ m o d e l , X s e l e c t e d ) ▹ Bayesian Ridge prediction
  8:    Y L a s s o Predict ( l a s s o _ m o d e l , X s e l e c t e d ) ▹ Lasso prediction
  9:    w B R , w L a s s o C a l c u l a t e _ W e i g h t s ( Y B R , Y L a s s o ) ▹ Calculate weights
10:    Y ^ 1 2 ( w B R · Y B R + w L a s s o · Y L a s s o ) ▹ Weighted average to obtain final prediction
11:   Return  Y ^ ▹ Return final prediction results
12: end procedure

4. Experiments

4.1. Data Preparation

The data originate from the THETIS-MRV platform managed by the European Maritime Safety Agency (EMSA), focusing on the monitoring, reporting, and verification (MRV) system of ship emissions. This platform provides public ship emission data, helping users view and analyze CO2 emissions from ships within EU waters. The data cover detailed information such as fuel consumption, navigation distance, and CO2 emissions, aiding in promoting transparency and environmental compliance in the shipping industry. Table 1 presents relevant information about these data.
For the original data series, unnecessary rows and irrelevant data are first removed to make the data suitable for model construction. Several feature columns related to the target are selected, and missing values are filled. Numerical data are converted to numeric types and standardized, while categorical data are transformed through one-hot encoding. Finally, the ANOVA method is used for feature selection to prepare the data for model training.

4.2. Parameter Settings

Table 2 shows the parameters for Bayesian Ridge, Lasso, and Voting Regressor models.
For Bayesian Ridge Regression, setting hyperparameters such as alpha and lambda controls the strength of regularization. These hyperparameters constrain the model’s complexity, reducing overfitting and enhancing the model’s generalization ability on new data. Specifically, α 1 and α 2 are used to specify the strength of the prior distribution, while λ 1 and λ 2 help control the distribution of weights. Since Bayesian Ridge Regression involves uncertainty estimation, using regularization can effectively improve the model’s robustness.
In Lasso Regression, the alpha parameter directly affects the strength of regularization. By applying L1 regularization to the regression coefficients, Lasso Regression can achieve feature selection, effectively removing unimportant features. This setup helps enhance the model’s interpretability and improve computational efficiency, especially when the number of features is large.
In Voting Regressor, using multiple base models’ predictions can effectively improve prediction stability and accuracy. Combining different models can compensate for each model’s shortcomings, resulting in more balanced and reliable predictions.
In the feature engineering part, selecting the top 100 features using SelectKBest aims to extract the most influential features on the target variable, further reducing noise interference in the model. Additionally, data standardization ensures all features have a mean of 0 and a variance of 1, eliminating the impact of different units and scales on model training, facilitating faster convergence, and improving prediction accuracy.

4.3. Test Results

4.3.1. Comparative Experiments

To effectively demonstrate the superiority of our method, this experiment compares the regression results from 2020 to 2023 with existing classical machine learning methods using RMSE and R 2 metrics. The comparison results are shown in Table 3.
As shown in the comparison results above, advanced methods applied to this dataset exhibit varying performances. For instance, GaussianProcessRegressor is entirely unsuitable for these data from 2020 to 2023. This may be partly due to the sparsity or insufficiency of CO2 emission data, and additionally, GPR is extremely sensitive to noise in the data, with existing noise and outliers significantly impacting model performance.
MLPRegressor, SVR, RandomForestRegressor, and LinearSVR can handle nonlinear relationships to some extent, but their ability to capture the complex interactions underlying the data still depends on feature quality, resulting in poor model test results. Although LinearSVR has good R 2 results for 2020 and 2021 data, the RMSE results still show significant errors, indicating that the model has learned excessive noise and randomness, leading to overfitting. Such R 2 values are a result of overfitting and lack generalizability.
Among the subsequent methods, ExtraTreeRegressor, DecisionTreeRegressor, and XGBRegressor achieve relatively high R 2 values due to their unique tree structures and nonlinear processing characteristics. However, they still struggle to handle data features with strong uncertainty and complexity.
In both R 2 and RMSE performance metrics, Voting-BRL consistently demonstrates superior predictive performance. This is because Voting-BRL combines multiple models, allowing it to learn diverse features and patterns captured by different models, effectively avoiding overfitting.
To visualize the comparison among different methods more intuitively, we plot bar charts. However, due to the significant differences in magnitude between the values, smaller values are difficult to discern. Therefore, we normalize the values to the range [0,1] as shown in Figure 2.

4.3.2. Visualization Analysis

To more accurately analyze the model’s predictive performance, we plot scatter plots to observe the discrepancies between the predicted and actual values as shown in Figure 3.
In this scatter plot, each point represents an observation from the test dataset. The x-axis represents the actual values, and the y-axis represents the corresponding predictions generated by the model. The dashed line indicates the ideal fit line, illustrating that if the prediction perfectly matches the actual values, the points would lie on this line. This result graphically demonstrates the alignment between actual and predicted values.
To observe the distribution of results, we also plot a box plot as shown in Figure 4.
The median line within the box represents the median of negative MSE values, indicating that most scores are concentrated around this value. The upper and lower edges of the box represent the first quartile (Q1) and third quartile (Q3), covering the middle 50% of the data. This span shows that MSE exhibits some degree of variation. The whiskers extend to the non-outlier minimum and maximum values, indicating that there are few extreme error values, and most of the model’s prediction errors are within a reasonable range.
To understand the relationship between predicted values and residuals and to evaluate potential issues with the model, we plot a residual plot as shown in Figure 5.
Most residuals are concentrated around the zero horizontal line, meaning that most predictions are close to the actual values. However, there are larger residuals at low prediction values, indicating significant errors in these predictions. The residual plot shows no obvious non-random patterns, suggesting that the model may not have significant systematic errors. A few residuals are observed in the lower right and left areas, indicating that the model performs poorly on these data points. This suggests that the model may overfit on some data, necessitating the addition of regularization methods for optimization.

5. Discussion

5.1. Ablation Experiments

To verify that the Voting-BRL method indeed combines the advantages of each model and outperforms other models, we conduct ablation experiments by comparing single models with their ensemble as shown in Table 4.
The performance of Lasso and Bayesian Ridge models is relatively similar, with high R 2 values indicating strong flexibility and better adaptation to the data structure. However, their RMSE values are still relatively large, indicating significant prediction errors. This suggests that while they have advantages in regularization, they may struggle to capture key patterns in the presence of substantial data noise.
Compared to Lasso and Bayesian Ridge, BRL exhibits stronger fitting capabilities, and its RMSE values are significantly lower. However, in 2023, it still experiences considerable errors, indicating that BRL can still overfit in complex data scenarios.
Voting-BRL combines the predictions of multiple models, meaning it likely utilizes different weight assignments or voting mechanisms to leverage the strengths of each model while reducing the errors associated with single models.

5.2. Significance Tests

To demonstrate which mechanisms in the model optimization process have more significant impacts on the results, we conduct t-tests on RMSE and R 2 . The results are shown in Figure 6 and Figure 7.
In Figure 6, each bar represents the p-value of the t-test between the RMSE of two models. The red dashed line indicates the significance level of 0.05. If the p-value is below this line, the difference between models is typically considered statistically significant. All p-values are above 0.05, indicating that the RMSE differences between these models are not statistically significant. However, they are below 0.4, meaning that the changes have some improvement on RMSE.
In Figure 7, different significance results are observed. The highest p-value is between “Lasso vs. BayesianRidge”, indicating no significant difference. The p-values for “Lasso vs. BRL” and “Lasso vs. Voting-BRL” are close to 0.05, suggesting that BRL has significantly improved R 2 compared to Lasso and Bayesian Ridge. Similarly, Voting-BRL also shows a significant improvement over BRL. This proves the effectiveness of the Voting-BRL method.
Thus, combining multiple models significantly enhances results, leveraging each model’s strengths to effectively predict CO2 emissions.

6. Conclusions

In this study, we introduced the Voting-BRL model, an innovative ensemble learning approach that integrates Bayesian Ridge Regression and Lasso Regression, to predict ship carbon dioxide (CO2) emissions and fuel consumption with high accuracy and robustness. By leveraging Analysis of Variance (ANOVA) for feature selection, the model effectively reduced dimensionality and minimized noise interference, enhancing its predictive performance. Experimental results demonstrated that Voting-BRL achieved an outstanding R 2 of 0.9981 and a Root Mean Square Error (RMSE) of 8.53, markedly outperforming traditional machine learning models such as XGBRegressor, which attained an R 2 of 0.97 and an RMSE of 45.03. Ablation studies confirmed that the ensemble strategy harnesses the complementary strengths of Bayesian Ridge and Lasso Regression, resulting in superior generalization capabilities and prediction stability.
The exceptional performance of the Voting-BRL model underscores its potential as a reliable tool for emission management and operational optimization within the maritime industry. Accurate predictions of CO2 emissions and fuel consumption are crucial for developing strategies to enhance environmental sustainability and comply with increasingly stringent regulatory standards. By providing precise forecasts, the Voting-BRL model can assist stakeholders in making informed decisions that contribute to reducing the carbon footprint of shipping operations.
Future work may focus on expanding the model to incorporate additional environmental and operational factors, thereby further enhancing its predictive accuracy and applicability. Additionally, integrating real-time data streams could enable dynamic emission monitoring and adaptive decision-making in response to changing maritime conditions. Exploring the application of the Voting-BRL framework to other sectors within the transportation industry may also yield valuable insights and broaden its impact on global efforts to mitigate greenhouse gas emissions.

Author Contributions

Methodology, Y.L.; Writing—original draft, Y.L.; Writing—review & editing, C.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
BRLBayesian Ridge and Lasso
ANOVAAnalysis of Variance
EMSAEuropean Maritime Safety Agency
SVRSupport Vector Regression
MLMachine Learning
ANNArtificial Neural Network
MSEMean Squared Error
RMSERoot Mean Squared Error

References

  1. Song, M.J.; Seo, Y.J.; Lee, H.Y. The dynamic relationship between industrialization, urbanization, CO2 emissions, and transportation modes in Korea: Empirical evidence from maritime and air transport. Transportation 2023, 50, 2111–2137. [Google Scholar] [CrossRef]
  2. Ha, S.; Jeong, B.; Jang, H.; Park, C.; Ku, B. A framework for determining the life cycle GHG emissions of fossil marine fuels in countries reliant on imported energy through maritime transportation: A case study of South Korea. Sci. Total Environ. 2023, 897, 165366. [Google Scholar] [CrossRef] [PubMed]
  3. Pelić, V.; Bukovac, O.; Radonja, R.; Degiuli, N. The impact of slow steaming on fuel consumption and CO2 emissions of a container ship. J. Mar. Sci. Eng. 2023, 11, 675. [Google Scholar] [CrossRef]
  4. Zhang, H.; Peng, J.; Wang, R.; Zhang, M.; Gao, C.; Yu, Y. Use of random forest based on the effects of urban governance elements to forecast CO2 emissions in Chinese cities. Heliyon 2023, 9, e16693. [Google Scholar] [CrossRef] [PubMed]
  5. de Lima Nogueira, S.C.; Och, S.H.; Moura, L.M.; Domingues, E.; dos Santos Coelho, L.; Mariani, V.C. Prediction of the NOx and CO2 emissions from an experimental dual fuel engine using optimized random forest combined with feature engineering. Energy 2023, 280, 128066. [Google Scholar] [CrossRef]
  6. Zhang, K.; Zhang, K.; Bao, R.; Liu, X. A framework for predicting the carbonation depth of concrete incorporating fly ash based on a least squares support vector machine and metaheuristic algorithms. J. Build. Eng. 2023, 65, 105772. [Google Scholar] [CrossRef]
  7. Gordon, D.; Norouzi, A.; Blomeyer, G.; Bedei, J.; Aliramezani, M.; Andert, J.; Koch, C.R. Support vector machine based emissions modeling using particle swarm optimization for homogeneous charge compression ignition engine. Int. J. Engine Res. 2023, 24, 536–551. [Google Scholar] [CrossRef]
  8. Khoshraftar, Z.; Ghaemi, A. Modeling and prediction of CO2 partial pressure in methanol solution using artificial neural networks. Curr. Res. Green Sustain. Chem. 2023, 6, 100364. [Google Scholar] [CrossRef]
  9. Sedighi, M.; Mohammadi, M.; Ameli, F.; Amiri-Ramsheh, B.; Hemmati-Sarapardeh, A. A comparative study of machine learning frameworks for predicting CO2 conversion into light olefins. Fuel 2025, 379, 133017. [Google Scholar] [CrossRef]
  10. Nagao, M.; Yao, C.; Onishi, T.; Chen, H.; Datta-Gupta, A.; Mishra, S. An efficient deep learning-based workflow for CO2 plume imaging considering model uncertainties with distributed pressure and temperature measurements. Int. J. Greenh. Gas Control. 2024, 132, 104066. [Google Scholar] [CrossRef]
  11. Lv, Q.; Zheng, R.; Guo, X.; Larestani, A.; Hadavimoghaddam, F.; Riazi, M.; Hemmati-Sarapardeh, A.; Wang, K.; Li, J. Modelling minimum miscibility pressure of CO2-crude oil systems using deep learning, tree-based, and thermodynamic models: Application to CO2 sequestration and enhanced oil recovery. Sep. Purif. Technol. 2023, 310, 123086. [Google Scholar] [CrossRef]
  12. Nguyen, V.G.; Duong, X.Q.; Nguyen, L.H.; Nguyen, P.Q.P.; Priya, J.C.; Truong, T.H.; Le, H.C.; Pham, N.D.K.; Nguyen, X.P. An extensive investigation on leveraging machine learning techniques for high-precision predictive modeling of CO2 emission. Energy Sources Part Recover. Util. Environ. Eff. 2023, 45, 9149–9177. [Google Scholar]
  13. Karakurt, I.; Aydin, G. Development of regression models to forecast the CO2 emissions from fossil fuels in the BRICS and MINT countries. Energy 2023, 263, 125650. [Google Scholar] [CrossRef]
  14. Pan, B.; Song, T.; Yue, M.; Chen, S.; Zhang, L.; Edlmann, K.; Neil, C.W.; Zhu, W.; Iglauer, S. Machine learning-based shale wettability prediction: Implications for H2, CH4 and CO2 geo-storage. Int. J. Hydrog. Energy 2024, 56, 1384–1390. [Google Scholar] [CrossRef]
  15. Zincir, B. A short review of ammonia as an alternative marine fuel for decarbonised maritime transportation. In Proceedings of the ICEESEN2020, Kayseri, Turkey, 19–21 November 2020. [Google Scholar]
  16. Wang, S.; Wang, X.; Han, Y.; Wang, X.; Jiang, H. Decarbonizing in maritime transportation: Challenges and opportunities. Open J. Transp. 2023, 13, 301–325. [Google Scholar] [CrossRef]
  17. Mocerino, L.; Quaranta, F.; Rizzuto, E. Climate changes and maritime transportation: A state of the art. In Technology and Science for the Ships of the Future; IOS Press: Amsterdam, The Netherlands, 2018. [Google Scholar]
  18. Nguyen, H.; Hoang, A.; Nizetic, S. The electric propulsion system as a green solution for management strategy of CO2 emission in ocean shipping: A comprehensive review. J. Electr. Energy 2021, 13, e12580. [Google Scholar]
  19. Xing, H.; Spence, S.; Chen, H. A comprehensive review on countermeasures for CO2 emissions from ships. Renew. Sustain. Energy Rev. 2020, 13, 110222. [Google Scholar] [CrossRef]
  20. Mersin, K. Review of CO2 emission and reducing methods in maritime transportation. Therm. Sci. 2019, 23, 2073–2079. [Google Scholar] [CrossRef]
  21. Crosby, S.; Rysanek, A. Predicting thermal satisfaction as a function of indoor CO2 levels: Bayesian modelling of new field data. Build. Environ. 2022, 209, 108569. [Google Scholar] [CrossRef]
  22. Affholder, A.; Guyot, F.; Sauterey, B.; Ferrière, R.; Mazevet, S. Bayesian analysis of Enceladus’s plume data to assess methanogenesis. Nat. Astron. 2021, 5, 805–814. [Google Scholar] [CrossRef]
  23. Michimae, H.; Emura, T. Bayesian ridge estimators based on copula-based joint prior distributions for regression coefficients. Comput. Stat. 2022, 37, 2741–2769. [Google Scholar] [CrossRef]
  24. Michalakopoulos, V.; Ilias, L.; Kapsalis, P.; Mouzakitis, S.; Askounis, D. Comparison of Machine Learning Algorithms For Predicting CO2 Emissions in the maritime domain. In Proceedings of the 2023 14th International Conference on Information, Intelligence, Systems & Applications (IISA), Volos, Greece, 10–12 July 2023; pp. 1–4. [Google Scholar]
  25. Zhou, T.; Hu, Q.; Hu, Z.; Zhen, R. An adaptive hyper parameter tuning model for ship fuel consumption prediction under complex maritime environments. J. Ocean Eng. Sci. 2022, 7, 255–263. [Google Scholar] [CrossRef]
  26. Monisha, I.I.; Mehtaj, N.; Awal, Z.I. A step towards IMO greenhouse gas reduction goal: Effectiveness of machine learning based CO2 emission prediction model. In Proceedings of the 13th International Conference on Marine Technology (MARTEC 2022), Dhaka, Bangladesh, 21–22 December 2022. [Google Scholar]
Figure 1. Overall structure of the Algorithm 1. First, relevant features are extracted through ANOVA analysis, then regression is performed using the Voting-BRL method to output predictions.
Figure 1. Overall structure of the Algorithm 1. First, relevant features are extracted through ANOVA analysis, then regression is performed using the Voting-BRL method to output predictions.
Sustainability 17 01726 g001
Figure 2. Comparison of model predictive performance. The left chart visualizes the unnormalized prediction results, while the right chart shows the normalized R 2 and RMSE. Except for Voting-BRL, other methods cannot achieve maximum R 2 or maintain minimum RMSE.
Figure 2. Comparison of model predictive performance. The left chart visualizes the unnormalized prediction results, while the right chart shows the normalized R 2 and RMSE. Except for Voting-BRL, other methods cannot achieve maximum R 2 or maintain minimum RMSE.
Sustainability 17 01726 g002
Figure 3. Scatter plot of actual vs. predicted values.
Figure 3. Scatter plot of actual vs. predicted values.
Sustainability 17 01726 g003
Figure 4. Box plot of MSE.
Figure 4. Box plot of MSE.
Sustainability 17 01726 g004
Figure 5. Residual plot.
Figure 5. Residual plot.
Sustainability 17 01726 g005
Figure 6. RMSE significance test, the red dashed line indicates the significance level of 0.05.
Figure 6. RMSE significance test, the red dashed line indicates the significance level of 0.05.
Sustainability 17 01726 g006
Figure 7. R 2 significance test, the red dashed line indicates the significance level of 0.05.
Figure 7. R 2 significance test, the red dashed line indicates the significance level of 0.05.
Sustainability 17 01726 g007
Table 1. Data information.
Table 1. Data information.
Data TypeDetails
Fuel ConsumptionTotal fuel consumed by ships during specific voyages
(measured in tons or kilograms)
CO2 EmissionsTotal CO2 emissions by ships during specific voyages
(measured in kilograms or tons)
Energy Efficiency IndicatorsIncludes technical efficiency, assessing ships’ energy
usage efficiency
Navigation Time and DistanceTotal time (in hours) and distance (in nautical miles)
ships spent at sea
Other Relevant ParametersMay include time and distance spent navigating
through ice-covered areas and other operational
parameters under specific environmental conditions
Table 2. Model parameter settings.
Table 2. Model parameter settings.
ModelParameterValue
α 1 1 × 10 06
α 2 1 × 10 06
Bayesian Ridge λ 1 1 × 10 06
λ 2 1 × 10 06
max_iter300
tol0.001
Lasso Regression α 0.1
Voting Regressorestimators[(‘bayesian’, bayesian_model), (‘lasso’, lasso_model)]
Feature Selectionk100
Feature ScalingStandardScalern/a
Table 3. Comparative experiment results.
Table 3. Comparative experiment results.
Model2023202220212020Time Taken
R 2 RMSE R 2 RMSE R 2 RMSE R 2 RMSE
GaussianProcessRegressor−182.963012.16−10,256.4024,831.24−0.0112,3487.96−4.07526.180.27
MLPRegressor−0.2268.98−0.20268.980.9624,412.46−0.22258.081.18
SVR0.18221.780.18221.780.00123,228.640.17212.720.2
RandomForestRegressor0.40189.960.40189.960.00123,152.670.9834.170.98
LinearSVR0.53167.670.53167.670.99844922.380.9361.550.01
ExtraTreeRegressor0.9461.590.9461.590.00123,145.660.9830.860.01
DecisionTreeRegressor0.9459.640.9459.640.00123,126.290.9742.480.02
XGBRegressor0.9745.030.9745.030.00123,126.010.9738.820.92
Voting-BRL0.99818.530.99818.5290.99816.9460.99676.2590.1725
Table 4. Ablation experiment results.
Table 4. Ablation experiment results.
Model2023202220212020Time Taken
R 2 RMSE R 2 RMSE R 2 RMSE R 2 RMSE
Lasso0.9461.590.977336.9450.99617651.6430.993418.9700.01
BayesianRidge0.9459.640.977237.0990.99716583.4790.993219.2620.02
BRL0.9745.030.99798.9340.99817.0240.99636.6880.92
Voting-BRL0.99818.5290.99818.5290.99816.9460.99676.2590.1725
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lin, Y.; Wang, C. Prediction of Ship CO2 Emissions and Fuel Consumption Using Voting-BRL Model. Sustainability 2025, 17, 1726. https://doi.org/10.3390/su17041726

AMA Style

Lin Y, Wang C. Prediction of Ship CO2 Emissions and Fuel Consumption Using Voting-BRL Model. Sustainability. 2025; 17(4):1726. https://doi.org/10.3390/su17041726

Chicago/Turabian Style

Lin, Yinchen, and Chuanxu Wang. 2025. "Prediction of Ship CO2 Emissions and Fuel Consumption Using Voting-BRL Model" Sustainability 17, no. 4: 1726. https://doi.org/10.3390/su17041726

APA Style

Lin, Y., & Wang, C. (2025). Prediction of Ship CO2 Emissions and Fuel Consumption Using Voting-BRL Model. Sustainability, 17(4), 1726. https://doi.org/10.3390/su17041726

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop