1. Introduction
Maritime is one of the most popular and energy-efficient means of transportation [
1]. Since the total international seaborne trade accounts for more than 80% of the total international cargo trade [
2], and the shipping industry mainly uses heavy fuel oil [HFO] and liquefied natural gas [LNG], etc., as fuel to power ships, ships are considered to be the largest contributor to fuel consumption in the transportation industry. As the volume of international cargo trade has been increasing year-by-year in recent years, the emissions of carbon dioxide (CO
2), nitrogen oxides (NOX), and sulfur oxides (SOX) generated by the fuel consumption of ships’ main engines and auxiliary engines (boilers) have also increased, causing serious impacts on global climate and human health issues [
3]. The International Maritime Organization’s (IMO) fourth greenhouse gas (GHG) study shows that shipping emissions continue to increase overall, despite the IMO’s implementation of various acts and the establishment of emission control areas around the world [
4], between 2012 and 2018, total GHG emissions from shipping increased by 9.6% year-on-year and CO
2 emissions increased by 9.3%. At the same time, the actual share of shipping in total global emissions increased from 2.76% to 2.89% [
5]. On the other hand, as the cost of fuel consumption often occupies 20% to 61% of the ship’s operating costs [
6], reducing ship fuel consumption plays an important role in the cost reduction and development of shipping enterprises and has always received wide attention. Therefore, it is urgent to promote more effective and applicable measures for fuel consumption during ship operation to improve the energy efficiency of enterprise fleets and achieve emission reduction.
The IMO and shipping corporations are currently focusing their efforts on finding operational and technical ways to increase ship energy efficiency. By using innovative technologies to increase main engine efficiency, the technical side can increase ship energy efficiency (e.g., propeller design optimization, hull design optimization, efficient power systems, etc.) [
7]. At the operational level, altering various parameters during navigation at sea can increase the ship’s energy efficiency, such as ship speed optimization, longitudinal inclination optimization, shipping route optimization, etc. The engineering design optimization innovations required for technical solutions are quite expensive to invest in and do not allow for the immediate acquisition of greater benefits [
8]. As a result, shipping companies often tend to reduce fuel consumption through operational techniques.
However, the application of effective and efficient operational techniques remains a challenge, mainly because a variety of factors influence the fuel consumption during the voyage, such as voyage speed, displacement, wind, waves, air temperature, etc., which makes it difficult to quantify the intrinsic link between the influencing factors and the fuel consumption rate through empirical formulas. For ships sailing through a fixed route, shipping companies need to minimize fuel consumption during the voyage while making sure to arrive on time. Accurate prediction of ship fuel consumption during the route is the basis for optimizing energy efficiency and reducing emissions during the voyage. However, there are still some challenges in making accurate predictions.
Three models that have been widely used in ship fuel consumption prediction in recent years are summarized in Leifsson et al.’s study: the white box model (WBM), the black box model (BBM), and the gray box model (GBM) [
9]. The WBM is based on a priori knowledge and physical principles of the ship’s power system, whose structure and parameters are known, to obtain the fuel consumption under certain sailing conditions by calculating the common influence of the resistance (hydrostatic resistance, wind, and wave resistance, etc.) received from several aspects during the ship’s route, which is a method tool often used in the ship design stage and sea trial stage to predict the ship’s fuel consumption, and has good interpretability [
10]. WBMs were developed by Li et al. using a Kwon-model-based WBM for the prediction of ship fuel consumption, which uses multiple sources of data, such as ship operations, machinery test data, and ocean weather, to achieve maximum fuel consumption reduction using predictions on a given route [
11]. WBMs still have a few drawbacks for application, such as the need for a priori knowledge support in the model building process and the neglect of the interactions between ship resistance, resulting in poor applicability and the generalization of the WBM.
With the continuous updating and development of computer technology and mathematical theoretical methods, the BBM has started to receive wide attention from researchers. The BBM is completely data-driven and does not require the a priori knowledge found in the WBM, but requires the support of a large amount of high-quality actual navigation inspection data to build a reliable model. During the actual navigation, the available fuel consumption data mainly comes from the cabin log data or sensor collection data. The cabin log data is filled in manually by crew members at the specified time and according to the fixed format, which inevitably has data errors, and the sampling period is long, which does not accurately describe the actual situation of the ship’s fuel consumption. In recent years, smart sensing devices with high acquisition rates have been increasingly used on modern ships, and a data acquisition system that can collect real-time continuous data has been developed in conjunction with IoT technology. Through this system, it is possible to obtain a large number of the external environment and the ship’s own state characteristics variables that affect fuel consumption during ship navigation, including, but not limited to, longitude, latitude, speed over ground (SOG), course over ground (COG), wind speed, etc. These data lay the foundation for data-driven fuel consumption prediction [
12]. Selected multisource monitoring data are then merged together and preprocessed (data cleaning, feature dimensionality reduction, data transformation) so that they can be used for the training and analysis of predictive models [
13].
BBM models are mainly divided into two categories: BBMs based on statistical modeling and BBMs based on machine learning (ML). The former establishes the relationship between various influencing factors and fuel consumption by establishing regression models, which generally assume that fuel consumption is proportional to the third power of speed, but the method does not take into account the influence of the sailing environment and ship condition [
14]. Lepore et al. used multivariate partial least squares (PLS) regression to predict the hourly fuel consumption of cruise ships, using multiple sources of data collected from sensors during ship operation, such ocean and weather data [
15]. Erto et al. further added ship maintenance data to the multiple sources and modeled them by multiple linear regression (MLR) to obtain more accurate prediction results [
16]. With the development of ML technology in recent years, the ML-based BBM has started to be widely used in the fuel consumption prediction of ships. The core idea is to accurately predict new data based on the statistical model of historical data. Compared with the statistical model of the BBM, it can better identify the linear and nonlinear relationships between the influencing factors and fuel consumption. In addition, ML has a clear advantage in handling high-dimensional data and therefore can include more influential data in the modeling process. According to Petersen et al., the ML-based BBM is able to adapt to more application scenarios and has better generalizability [
17]. For example, Chaal et al. used the ship’s operational data and the surrounding ocean and weather data to model the fuel consumption of tankers with ML models such as decision trees, AdaBoost, KNN, and artificial neural networks (ANNs), respectively, where the best prediction performance can reach
[
18]. Compared to the WBM and statistical-model-based BBM, which lose the interpretability of the prediction results, it focuses more on the accuracy and generalization of the prediction.
WBMs are the theoretical basis for revealing various influences and prediction targets based on a priori knowledge of known structures and parameters. BBMs place more emphasis on bias–covariance tradeoffs and obtain better model generalization through powerful learning capabilities. GBMs effectively combine the advantages of both WBMs and BBMs, generally including at least one WBM and one BBM, and, in theory, GBMs should outperform both BBMs and WBMs [
19]. For example, Caroddu et al. established a WBM, BBM, and GBM to predict the fuel consumption of a Panamax chemical tanker, and the experimental results showed that the GBM and BBM outperformed the WBM, and the GBM used fewer data to achieve the best prediction results [
20].
The purpose of this study is to propose a data-driven modeling method that can be widely used in ship fuel consumption prediction based on ship fuel consumption sensing monitoring data. The proposed predictive performance of the WBM, BBM, and GBM was verified on real data collected on various types of high-precision sensors installed on different parts of a ro-ro passenger ship. Another need of the model built is transparency or interpretability, i.e., the underlying and working process of the prediction model is interpretable and not only aiming at high prediction accuracy. Since BBMs and GBMs based on black box theory lack good interpretability, they cannot provide companies or maritime agencies with physically interpretable analysis of the impact of different influencing factors on ship fuel consumption, making it difficult for the technicians involved to trust the final prediction results. Therefore, this study introduces a framework of additive feature-based interpretation methods, SHAP, to improve the understanding of the prediction results of black box theory models. In addition, researchers can have redundant features in the selection of prediction model input features. The vast majority of information provided by redundant features is already represented by other features, and too many input features can increase the capital investment in data collection for ship companies, as well as significantly increase the memory storage requirements and computational costs for data analysis [
21]. Regression prediction performance depends on the efficiency of the pattern between response and predictor variables, and redundant features that are highly correlated with each other also complicate prediction and affect the stability of predicted fuel consumption [
22]. Therefore, this study further removes the redundant features from the input features with the help of the analysis results of SHAP. The main components of this study are shown in
Figure 1, and the main contributions are as follows:
In this study, 19 multisource sensing data were preprocessed so that they could be used for ship fuel consumption prediction analysis;
In this study, a WBM based on the foundation of physical principles is established to convert the ship resistance calculated from the external environment and the energy transfer relationship between engine–propeller–ship into fuel consumption at a specific speed; secondly, six BBMs covering statistical models and machine learning models are established to map the relationship between multiple input features and fuel consumption. Finally, the GBM effectively combines the WBM and BBM models through a chaining strategy, which has better prediction performance and stability than the WBM and BBM, and obtains a ship fuel consumption prediction model suitable for practical engineering applications;
This study combines the high-performance GBM with the SHAP framework, solves the difficult problem of the poor interpretability of the underlying working principle of the model, quantitatively demonstrates the influence of input features on ship fuel consumption, and further validates the effectiveness of the WBM on GBM prediction performance improvement. On the other hand, based on the results of the importance analysis of input features, it provides an effective reference method for the selection of the best input features for ship fuel consumption prediction modeling, taking into account the prediction performance and input cost.
4. Conclusions
In this study, the MS Smyril passenger roll-on/roll-off vessel was used as the research object. Based on the sensing data to obtain the ship’s characteristics and environmental factors, including the SOG, latitude and longitude, longitudinal inclination, wind, fuel density, heading, and other nineteen influencing factors, three ship fuel consumption prediction models, the WBM, BBM, and GBM, were established to map out the fuel consumption per unit time and the law. The prediction performance of the three types of models is compared horizontally, and the prediction performance of the BBM and GBM models based on different algorithmic principles is compared vertically to determine the best model to be applied to ship fuel consumption prediction. The SHAP method is also used to analyze the importance of the input features affecting ship fuel consumption from a global perspective, which solves the problem of the poor interpretability of the GBM. In addition, the model prediction performance under different subsets of input features is compared based on the importance ranking results of SHAP to determine the best input features for predicting ship fuel consumption. The following conclusions can be drawn:
In this study, the WBM, BBM, and GBM based on six different ML algorithms were established and tested on the sensor data from a passenger roll-on/roll-off vessel. The experimental results show that the prediction error of the WBM is much higher than that of the BBM and GBM and cannot be effectively used for the prediction of ship fuel consumption during actual ship operation. The ensemble learning algorithms based on Boosting, especially Catboost, show the best prediction performance, with RMSE values below 50 L/h on the test dataset. The GBM makes further improvement in prediction accuracy through the prior knowledge of the WBM, which can meet the demand of ship fuel consumption prediction. In practical engineering applications, companies only need to integrate the GBM model based on the Catboost algorithm into the fuel consumption prediction system to achieve accurate fuel consumption prediction.
This paper introduces the SHAP framework combined with the GBM based on the Catboost algorithm to provide a method that can accurately analyze the relative importance of different influencing factors on fuel consumption. The experiment verified the improved effect of the prior knowledge of the WBM on the predictive performance of the GBM model and additionally found that the four most important input factors affecting fuel consumption were port pitch, port pitch, fuel temperature, and the SOG.
Nineteen GBMs based on different input features were established according to the feature importance ranking provided by SHAP. Based on the prediction evaluation index, the best input influences were selected from the 19 GBMs by considering the prediction accuracy, data collection cost, and computation cost, which can predict the fuel consumption of a ship with fewer input features while ensuring a certain prediction accuracy, and provide a reference for sensor-based ship fuel prediction. It provides a reference for the selection of input features in prediction of sensor-based ship fuel.
It is worth noting that the SHAP-based GBM model proposed in this paper still needs further research in the future to make the model work better in practical engineering applications. On the one hand, we should consider trying to incorporate more high-quality input influences (e.g., ship maintenance records, etc.) to continuously improve the stability of the model prediction performance and expand its application scope. On the other hand, the modeling approach proposed in this study is not specific to a particular ship type and aims to establish a unified ship fuel consumption prediction model constructed by combining high-performance sensing data that can be universally applied, so that relevant researchers concerned can choose the arrangement of appropriate sensor equipment and model structure assumptions to realize fuel consumption prediction of ships in conjunction with actual conditions. The prediction performance of the model has been confirmed in a specific case. By randomly dividing the original data into five training and test sets, it is also further demonstrated that the proposed model is not affected by data randomness and time dependence, and more data can be collected from different voyages or ships in the future to further verify the generality of the model and improve the ship fuel consumption prediction model and the interpretable framework to achieve the best results of this study. Therefore, the proposed model can provide a reference for the IMO and shipping companies to address the environmental sustainability of shipping.