1. Introduction
Energy is fundamental to industrial and economic activities, with fossil fuels remaining the primary energy source. However, their excessive use has significantly increased greenhouse gas (GHG) emissions, contributing to severe environmental pollution [
1]. According to the International Energy Agency, environmental pollution causes approximately 6.5 million deaths annually, prompting many countries, including those in Europe, to implement GHG reduction strategies [
2]. Among various environmental concerns, GHG emissions have gained increasing attention, particularly in the maritime sector, where their regulation remains a critical issue [
3,
4]. According to the International Maritime Organization (IMO), carbon dioxide (CO
2) emissions from vessels rose by approximately 9.7%, from 962 Mt in 2012 to 1056 Mt in 2018, accounting for around 3% of global anthropogenic CO
2 emissions [
5]. Projections indicate that emissions from the shipping sector could increase by up to 17% by 2050. In response, the IMO has introduced regulatory measures such as the Energy Efficiency Design Index, Energy Efficiency Existing Ship Index, and Carbon Intensity Indicator as part of its global carbon reduction strategy [
6].
To achieve net-zero GHG emissions, annual statistics are compiled by analyzing CO
2 emissions across various industries, including on-road mobile sources such as automobiles [
7]. In sectors such as automotive, construction, and manufacturing, extensive research has been conducted to measure and analyze emissions, leading to the development of emission factors with relatively low uncertainty [
8,
9,
10]. However, direct measurement of emissions from vessels presents significant challenges due to the high costs and time required for installing monitoring equipment, evaluating emissions under varying environmental conditions, and establishing emission factors. Consequently, many studies have employed modeling techniques that incorporate vessel navigation direction and airflow characteristics to estimate exhaust emissions indirectly.
Vessel emissions are typically estimated using two approaches: the top-down and bottom-up methods [
11]. The top-down method calculates emissions based on fuel sales or global fuel consumption data [
12]. In contrast, the bottom-up method estimates emissions using operational data from individual vessels, offering greater accuracy in reflecting vessel-specific characteristics and operating conditions [
5]. Consequently, the bottom-up approach is more widely adopted for regional and vessel-specific emission assessments [
1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14]. However, the bottom-up method has several limitations. It does not account for emission variations caused by technical differences in engine performance, and calculation errors may arise due to uncertainties in emission factors and reliance on average values. Additionally, discrepancies between reported and actual fuel consumption can lead to significant deviations in emission estimates [
5].
To address these limitations, recent research has explored emission prediction and assessment models leveraging artificial intelligence (AI), including deep learning and machine learning. Shen et al. (2023) developed a prediction model combining a convolutional neural network (CNN) and long short-term memory (LSTM) using engine data such as revolutions per minute (RPM), torque, fuel consumption, and nitrogen oxide (NO
x) emissions from a diesel engine test bench. The model achieved a coefficient of determination (R
2) of 0.977 and a mean absolute percentage error (MAPE) of 18.4% [
15]. Chen et al. (2024) proposed an artificial neural network (ANN) model for predicting NO
x and carbon monoxide (CO) emissions, utilizing emission measurement data, RPM, shaft power, speed, and wind direction from operating vessels. Their approach, incorporating vessel-related and weather variables, significantly outperformed the traditional bottom-up method in prediction accuracy [
16]. Cammin et al. (2023) employed automatic identification system (AIS) data—including vessel speed, position, route, engine power, operating time, mode data, vessel type, and gross tonnage—alongside emissions estimated via the bottom-up method. They developed prediction models using ANN, multiple linear regression, and support vector regression, demonstrating that ANN-based models effectively mitigate the limitations of traditional bottom-up approaches [
17]. Šilas et al. (2023) collected particulate matter emission data by integrating vessel tonnage, size, power, weather conditions, and AIS data with measured exhaust gas plumes. Their ANN-based prediction model, utilizing 17 input variables, achieved higher accuracy than conventional bottom-up methods [
18]. Recently, transformer-based models—originally developed for natural language processing (NLP) and computer vision (CV) [
19,
20,
21,
22,
23,
24,
25,
26]—have been actively extended to emission prediction tasks as well. For example, Z. Li et al. (2022) proposed a time series forecasting (TSF) transformer model to predict exhaust gas emissions from commercial trucks. The proposed model outperformed traditional machine learning models such as gradient-boosted regression tree (GBRT), support vector machine (SVM), extreme gradient boosting (XGBoost), as well as deep learning approaches like LSTM [
27]. Similarly, J. Li et al. (2024) applied a transformer-based model to predict NO
x and CO emissions from gas turbines. Compared to LSTM and CNN, their model demonstrated both superior prediction accuracy and faster execution time [
28]. In addition, physics-informed neural networks (PINN) have gained attention for their ability to incorporate physical constraints into deep learning models. For instance, Zhu et al. (2024) proposed a PINN-based model to predict NO
x emissions from coal-fired boilers by embedding a monotonic relationship between NO
x emissions and three typical operating parameters. Their result demonstrated superior prediction accuracy and generalization capability compared to conventional machine learning models [
29].
However, existing emission prediction models have several limitations. Most studies primarily rely on external environmental variables, such as fuel consumption and wind direction, to estimate CO2 emissions, with relatively little focus on factors directly influencing engine combustion. Notably, few studies have explicitly analyzed the correlations between engine parameters—such as exhaust gas temperature, maximum cylinder pressure, and operating conditions—and emission characteristics for modeling purposes. This gap is largely due to technical and experimental challenges in collecting comprehensive engine operation data.
This study addresses the limitations of existing research by proposing a deep learning model for predicting CO2 emissions from vessel engines using engine operation data. The model incorporates 109 engine parameters, including fuel consumption, operating conditions (RPM, torque, power, etc.), and cylinder-related variables such as exhaust gas temperature, maximum cylinder pressure, and maximum compression pressure, to enhance prediction accuracy. Both single-architecture models (CNN, LSTM, and temporal convolutional network [TCN]) and hybrid architecture (TCN–LSTM) were implemented to estimate CO2 emissions based on engine operation data. The model’s reliability was validated by comparing predictions with measured data collected under actual operating conditions. Through this study, we propose an effective method for estimating CO2 emissions—a critical foundation for carbon reduction policies—and analyze the factors influencing emission levels, thereby providing insight for formulating strategies to reduce overall emissions.
The remainder of this paper is organized as follows:
Section 2 describes the experimental setup and data preprocessing.
Section 3 introduces the deep learning model’s architectures used in this study.
Section 4 presents and discusses the prediction results and performance of the proposed prediction model
Finally,
Section 5 concludes the study and outlines future research directions.
4. Results and Discussion
The validation loss change rates for the CNN, LSTM, TCN, and TCN–LSTM models are shown in
Figure 7. These rates were analyzed to assess the stability of the models. With the exception of the TCN model, all models exhibited a sharp decrease in loss during the initial training phase, indicating rapid optimization in the early learning stages. The TCN model, however, showed the largest initial change rate in loss at the beginning of training, followed by stable learning, with its change rate converging to zero between epochs 13–17. For the LSTM model, the first 0–5 epochs demonstrated gradual optimization due to its recurrent structure. Most models exhibited stabilization, with the change rate approaching zero after 10 epochs. Among all models, the TCN–LSTM architecture displayed the fastest decline in change rate, indicating stable learning. These findings suggest that the hybrid model offers better generalization ability and a lower risk of overfitting compared to individual architectures.
To prevent overfitting, we applied early stopping before 10 epochs for the hybrid model and after 10 epochs for the single models. The validation change rate analysis demonstrated that the TCN–LSTM model proposed in this study exhibited a consistently decreasing change rate, confirming their stability and applicability in predicting CO2 emissions from engine data. However, for models that displayed fluctuations in the change rate during training, we anticipate that the introduction of advanced optimization techniques and further hyperparameter tuning will be necessary to enhance their performance and stability.
The prediction results of the CNN, LSTM, and TCN single architectures and the TCN–LSTM hybrid model are shown in
Figure 8.
Figure 8a shows the overall time-series comparison, while
Figure 8b–e provides zoomed-in views of representative segments with notable prediction performance differences. In
Figure 8b, all models exhibit noticeable prediction inaccuracies in this segment, with significant deviations from the measured values. This discrepancy may stem from the model’s limited capacity to respond effectively to abrupt transitions or irregular input patterns. These results indicate that the architectures including the proposed model may struggle to fully capture nonlinear behaviors or unmodeled dynamics in such regions. To address this limitation, future work should consider incorporating additional representative training data and performing cross–model analysis to enhance robustness and generalization under transient conditions. While all models generally performed well, some exhibited underprediction or overprediction tendencies. Compared to the hybrid model, the LSTM and TCN single architectures (excluding CNN) showed greater deviations, particularly in sections where engine RPM gradually increased (see
Figure 8d) or decreased (see
Figure 8c). This discrepancy likely stems from differences in training methods influenced by each model’s structural characteristics and the nature of the data. LSTM model excels at capturing long-term dependencies by retaining past information. However, in sections with rapid changes, their predictive accuracy decreases, as they rely heavily on historical data, leading to noticeable deviations. TCN model is effective at learning localized patterns, but their predictions may be unstable due to sensitivity to noise or outliers in the input data [
49]. Further parameter optimization could mitigate this issue. CNN models, although traditionally used for image processing, have proven highly effective for time-series prediction [
50]. Their convolutional layers excel at detecting localized features, allowing them to capture pattern changes in short segments of extensive time-series data. Additionally, CNNs naturally filter noise, leading to superior predictive performance compared to LSTM and TCN models, even in rapidly changing data [
51].
A hybrid architecture model can compensate for the weaknesses of a single model by leveraging their complementary strengths. The TCN–LSTM model effectively captures complex time-series patterns by simultaneously considering short-term fluctuations and long-term dependencies, thereby enhancing prediction accuracy. As shown in
Figure 8, the hybrid model demonstrates high predictive accuracy, particularly in variation sections where single models struggle. This confirms their robustness in handling dynamic changes in engine data (see
Figure 8c,e).
To enable a quantitative evaluation of each model,
Table 9 presents the R
2, mean absolute error (MAE), root mean squared error (RMSE), MAPE, and Pearson’s correlation coefficient (R) for the test results. Additionally,
Figure 9 provides a bar graph for a visual comparison of the results. All four models exhibited high accuracy, with R
2 ≥ 0.9, and the TCN–LSTM model outperformed the others across all metrics. Among the single-architecture models, the CNN model achieved the highest accuracy (R
2 = 0.9697) and the smallest errors, as indicated by its MAE (49.6663), RMSE (60.0875), and MAPE (4.2337%)—highlighting its minimal deviation from actual measurements.
Overall, the hybrid architecture model demonstrated strong performance, with the TCN–LSTM model achieving the highest accuracy (R2 = 0.9726), slightly outperforming the CNN-based model. Additionally, the TCN–LSTM model exhibited the lowest prediction deviation and variance, as indicated by its MAE (47.3447) and RMSE (58.5737). These results suggest that the TCN–LSTM hybrid model provides the most accurate CO2 emission predictions, particularly in response to variations in engine operation.
The TCN–LSTM hybrid model outperformed all other models across all evaluation metrics, as it effectively leverages the strengths of both TCN and LSTM to capture complex time-series patterns in the input data. Compared to the TCN single model, the TCN–LSTM model achieved improvements of 3.6% in R2, 24.9% in MAE, 19.8% in RMSE, and 48.8% in MAPE. Additionally, compared to the LSTM single model, it showed enhancements of approximately 3.6% in R2, 21.1% in MAE, 19% in RMSE, and 45.7% in MAPE. These results highlight the effectiveness of the hybrid approach in improving CO2 emission prediction accuracy.
The TCN–LSTM hybrid model also has the best evaluation metrics, with higher prediction accuracy than the single-architecture models. In contrast, the TCN and LSTM model exhibited relatively higher errors and lower R2 values, indicating their individual limitations in accurately predicting CO2 emissions.
Figure 10a visually compares the errors between the actual and predicted values across the four models. The LSTM and TCN models exhibit larger residual variances, with noticeable outliers in sections where rapid changes in engine-operating conditions occur. This suggests that these models struggle to adapt to sudden variations in the data, highlighting the need for additional data supplementation and model correlation analysis in future research. In contrast, the hybrid model shows residuals that are more evenly distributed around zero (green color sector), indicating better stability and accuracy in capturing fluctuations in engine operation.
Figure 10b presents the mean of the residuals, illustrating the extent of underestimation and overestimation for each model. The single-architecture models exhibited larger discrepancies between underestimation and overestimation compared to the hybrid model. Among them, the TCN–LSTM model had the smallest difference (34.7827) and the highest prediction accuracy, further demonstrating its effectiveness in minimizing prediction errors.
Figure 11 illustrates the regression accuracy of each prediction model, with adjusted R
2 values exceeding 0.9 for all models—indicating generally high prediction accuracy. However, distinct deviations appear in certain sections, likely due to correlations between model structure and data characteristics or suboptimal parameter settings, leading to errors. These deviations were particularly evident during transitional engine states, such as sudden changes in rpm or torque, where the input data distributions shift rapidly. Such conditions are often underrepresented in the training dataset, which may lead to localized prediction biases. In addition, structure characteristics of the hybrid model—such as the temporal sensitivity of LSTM and the local responsiveness of TCN—may contribute to deviations in specific intervals. To address these limitations, future work will involve augmenting the training database with more samples of dynamic operating conditions and exploring advanced techniques such as attention mechanisms or adaptive loss functions to enhance the model’s robustness and generalization performance in these regions.
Among the single-architecture models, the CNN model demonstrated more consistent deviations, except in sections where all models exhibited large errors. Unlike LSTM and TCN, it did not show a wide range of errors, suggesting greater robustness. In contrast, LSTM and TCN models exhibited high variance even in high-density data sections (400–800), implying that they lack sufficient explanatory power when used individually for CO2 emission predictions. However, the hybrid model exhibited uniformly distributed deviations across all sections, including in sparsely distributed data ranges (0–200), with the exception of common error-prone regions. This suggests that the hybrid model effectively mitigated the weaknesses of the individual architectures. Notably, the TCN–LSTM hybrid model achieved the highest adjusted R2 value (0.9779), confirming its superior performance in accurately predicting CO2 emissions based solely on engine operation data.