1. Introduction
The international shipping community is paying more attention to the issue of greenhouse gas (GHG) emissions with the gradual warming of the global climate. According to the Fourth International Maritime Organization (IMO) GHG Study, the carbon intensity (i.e., CO2 emissions per unit of Gross Domestic Product) of international shipping decreased by 10.7% between 2012 and 2018, while annual GHG emissions rose by 9.6% [
1]. In general, the international shipping industry accounts for approximately 2% of global anthropogenic GHG emissions [
2]. Meanwhile, shipping companies are more concerned about the energy efficiency of their ships due to the increasing proportion of fuel costs relative to operating costs [
3,
4]. Energy efficiency improvement and fuel consumption reduction are essential to decrease operating costs and enhance maritime operations’ sustainability [
5].
As the ships have gradually become a colossal sensor hub, a massive volume of data is generated [
6]. These data sources can lead us to find a method of energy usage optimization using analyzing and monitoring. Mathematical or machine learning methods have been used broadly across industries concerned with data-intensive applications [
7]. The mathematical and machine learning models are applied for shipping companies to analyze the data as an energy optimization and decision support system [
8,
9]. The model reflects the correlation between ship fuel consumption and other parameters, such as speed, main engine power, weather information, etc. [
10]. Therefore, we can employ the fuel consumption models as a robust tool to predict and study the fuel consumption law under different sailing states of ships [
11]. The models need to be accurate on the validation and test sets with the capability of reflecting the results in the actual situations. In this respect, we are committed to building the models and improving the prediction accuracy under specific requirements.
The rest of the paper is organized as follows.
Section 2 reviews existing research on ship energy consumption prediction using mathematical and machine learning models.
Section 3 describes the data and the steps of data preprocessing. In
Section 4, we build a white- and black-box model and propose a data-cleaning method for the black-box model that improves the performance in a specific scenario. We evaluate the models’ accuracy and interpret the prediction results in simulated conditions drawn in
Section 5.
Section 6 discusses the effects of these models and the data-cleaning method, which is followed by the conclusions in
Section 7.
2. Related Work
There are research foundations regarding ship fuel consumption models. The prediction model is the basis of various optimization and analysis, mainly including mechanism-based analysis and machine learning methods.
The propulsion principles and mathematical analysis of fuel consumption form the basis of the white-box model [
12], which includes ship statics and dynamics [
13,
14]. Ref. [
15] modeled the fuel consumption mechanism based on the principle of the ship engine propeller and the law of resistance transfer. Ref. [
16] optimized the navigation process using mathematical modeling of ship energy consumption, such as the resistance in different wind and wave conditions. The white-box models are connected internally; therefore, the internal parameters are easily affected by the environment, which incurs errors in the entire model [
11]. In addition, the internal parameters cannot be adjusted during the voyage, and the limitation of the resistance calculating formula causes the over-time changes in the propulsion system’s operating parameters to be ignored [
8]. Ref. [
17] presents a six-degree-of-freedom (6DOF) ship performance model to evaluate the best method of using a pair of Flettner rotors and analyzes the performance of this propulsion system in consideration of weather and sea conditions, evaluating the related reduction in fuel consumption.
While the white-box model represents the relationship hidden in the formula, the black-box model finds a relationship based on data, which helps to explain information and make decisions [
18]. Therefore, it is necessary to analyze the data generated by ships in different states through classification and clustering methods. Statistical analysis is one of the ordinary and widely accepted methods of the black-box model and can be explained in some way [
15]. Ref. [
19] analyzed different trim values of engine fuel consumption rates and achieved optimal sailing conditions by identifying different draft values. The authors proposed a data processing framework, including preprocessing, post-processing, a data-driven model, sensors, and fault identification [
20].
However, the authors of [
11] state that the machine learning models sacrifice interpretability but enhance predictive accuracy compared to statistical analysis [
11]. Ref. [
21] calibrated fuel consumption–speed curves by polynomial regression based on 418-noon report data, thus obtaining a set of ship fuel consumption–speed curves that can be used under most weather conditions and loading conditions. Ref. [
22] proposed the fine segmentation of the shipping route using the Hadoop and MapReduce frameworks [
23] by applying the ship’s sensor data. They optimized the engine speed of inland ships by finding the optimal segment set using the particle swarm optimization algorithm. Ref. [
24] produced an artificial neural network (ANN) model using the noon report data. Then, they optimized the speed and trim by a two-stage, shore-based, and offshore optimization method during navigation. Because of the nonlinearity of the ANN model, the authors proposed a dynamic programming algorithm to solve the objective function of the optimization problem. Optimizing speed and trim can reduce ships’ fuel consumption by 2–7% in actual navigation. Ref. [
25] proposed a random forest model for the prediction of the fuel consumption of dry bulk carriers based on 242 noon-report data. The mean absolute percentage error (MAPE) reached 7.91% in the model’s evaluation results. Moreover, it can save 6.53% of fuel consumption after speed optimization. However, there is an inherent uncertainty in this noon-report data [
26], which can be solved by the onboard continuous monitoring system data. Ref. [
27] studied the performance of three models based on data: black-box, white-box, and gray-box models. The authors of this reference proposed a new strategy for optimizing the trim of a vessel, and the results showed that the BBM can remarkably improve on the state-of-the-art WBM. At the same time, the GBM can encapsulate the a prioro knowledge of the WBM into the BBM.
The black-box models can capture the impact of weather/sea conditions and other external factors on ship fuel consumption from continuous sensor data [
28,
29,
30]. The accuracy and simplicity of the black-box model can also provide an illustration and potential for ship energy consumption analysis [
31,
32]. The importance of data preprocessing has increased due to the black-box models’ dependence on data, which can reflect the relationship between the various parameters of the ship. Ref. [
33] removed the NaN and zero and measurement errors for speed and fuel consumption values in sensor data. Ref. [
34] identified and rejected the engine transients and recording anomalies and extracted valuable features and standardization. Ref. [
35] detected and synchronized data discontinuities in time. They also removed the ship’s maneuvering (dynamic) conditions in the sea passage, such as voluntary acceleration and deceleration, sharp power increases, and sharp course changes. Ref. [
36] proposed a data-driven solution based on deep learning sequence methods and historical ship trip data to predict ship speeds at different stages of a voyage. The results showed that deep learning models combined with maritime data can leverage the challenge of estimating ship speed and improve shipping operational efficiency, navigation safety and security, and ship emissions estimation and monitoring. Ref. [
37] developed the application of artificial neural networks (ANN) to predict the total fuel consumption of ships in various operational scenarios and applied state-of-the-art deep learning techniques for training and optimizing feedforward neural networks (FNN). The performance of the ship’s propulsion model can be improved, leading to an improved understanding of the ship’s performance regulation and reduction of fuel consumption and emissions. Ref. [
38] introduced an innovative platform that coordinates data collected from various sensors on board through Big Data technology and implements extreme-scale processing techniques to perform operational efficiency and performance optimization. The technology of data collection and processing has been gradually improved.
Through the above literature analysis, we consider an oil tanker’s continuous dataset, including ship parameters and ship model test data, as the research direction. This paper establishes two black-box models and a white-box model. We propose a data cleaning method using Kwon’s formula as the primary calculation method. We discuss the results in the prediction accuracy of different models for a future research line.
3. Data Description and Preprocessing
The raw sensor dataset in this study is the sailing test case from an oil tanker that contains 496 data features (operating parameters). While we participated in the trial, due to the defeat of sensors, there were fault signals in the collected data. The remaining 378,468 (4.38 days) data records were retained, and the data collection time unit is seconds. The raw data usually have noise that can cause over-fitting or mislead the decision of the model [
39], which results in a lower generalization ability of the model. Data preprocessing always has an essential effect on the generalization performance of a supervised machine learning algorithm [
40].
In this study, most features are alarm signal detection points and various temperature, pressure, flow, and other detection signals. Optimizing fuel consumption is paramount to shipping companies as it directly reflects the navigation economy, surpassing rpm and power. To this end, FCR has been chosen as the model output. We need to filter the data and select the modeling features relevant to the ship’s fuel consumption, such as speed, engine power, trim, and draft. Furthermore, the model needs to consider the influence of selected parameters on fuel consumption. The correlation between engine power and fuel consumption will cover the impact of ship speed and other features. The interior features selected for modeling are speed, fuel consumption rate (FCR), trim, and fore and aft draft. Furthermore, external features such as wind and waves also impact ships’ fuel consumption. Due to the lack of wave and current features in the sensor dataset, we parsed the wave data (wave height and wave direction) from ECMWF (European Centre for Medium-Range Weather Forecasts), matched into the sensor data according to the geographical location (latitude and longitude) and collection time. Considering the relative relationship between absolute wind direction and ship heading, we calculated the angle between the absolute wind direction and the heading as the relative wind direction. Considering the symmetry of the ship, the wind from the port side and the starboard side has the same impact, so the relative wind direction ∼ is converted to ∼; means the wind is from the bow and is from the stern.
After extracting the above data features from the sensor dataset, the following data preprocessing was carried out. First, we removed the data of FCR, wind speed, and wind direction values lower than 0. Second, we removed the speed data out of the range of 10 to 16.8 knots (determined by the ship design speed and the maximum speed under full load). After excluding the data below 10 knots, there is no berthing and start-up acceleration). The dataset includes nine features: ship speed (V), fore draft (D
), aft draft (D
), trim (T), wave height (Wave
), wave direction (Wave
), absolute wind speed (Wind
), wind direction (Wind
), and FCR, resulting in 147,845 rows after these steps.
Figure 1 shows the data distribution of ship speed and fuel consumption. The statistics of the data are shown in
Table 1.
Figure 1 is the speed and fuel consumption distribution; the fuel consumption values cover the range of 1 to 4 tons/h in the speed range below 14 knots. It is not easy to find the cubic relationship among them. During the ship’s sea trial process, the acceleration and deceleration processes result in high fuel consumption at low speeds that do not meet standard navigation. Further data processing is necessary if some optimization analysis is performed based on the model.
6. Discussion and Future Work
Based on the assessment outcomes, all the models forecasting the FCR within their respective conditions achieve an acceptable accuracy. Moreover, in the prediction results, the black-box models need to reflect the relationship between speed and fuel consumption regularity. The quality of data collected during the voyage will directly affect the model’s accuracy. The black-box model’s machine learning methods are used for training and predicting data and cannot reflect prior experience in some cases; for example, fuel consumption is cubic related to speed. Then, it is required to have algorithmic accuracy verification and to discuss the cases where we consider prior experience.
During sailing, there is a process of acceleration and deceleration, which are the main reasons that there are higher fuel consumption values under lower speed in the raw data (or lower fuel consumption values under higher speed, as shown in
Figure 1). These values are not outliers, though they affect the reliability of ship energy consumption prediction and related optimization research. However, we have filtered the data with speed values lower than 10 knots in the preprocessing; still, accelerations and decelerations above 10 knots remain. This process lasts only a short time compared to regular sailing (ten days more), especially for ocean-going vessels. There are many of these processes because it is a sailing test. The data in this process that may not be used will affect the decision of the models. As a cleaning method, calculating additional fuel consumption for a ship in wind and waves can significantly solve the problem. Above all, having accuracy with no reflection on the real situation is not reliable for some research cases.
Additionally, the cleaning method has some limitations that should be taken into consideration in real applications. Firstly, it requires detailed and sensitive information and test data on the ship and engine. Secondly, using the Kwon formula to clean the data carries risk. If the cleaning process is too aggressive, there is a risk that the model will be trained on the cleaning method rather than the raw data. Some of the mandatory elimination of data is reasonable because the Kwon formula cannot fully capture different weather and ship states. After all, the Kwon formula still has an error (4%). Therefore, not all of the abnormal data can be characterized as abnormal data. This could hinder the accuracy of the model. It is essential to carefully balance the need for cleaning the data with maintaining the integrity of the raw data to achieve the best results. There is a gap between preprocessed and original data due to the filtering that occurs during preprocessing. This filtering disrupts the time series of the original data, making it impossible to build a time-series model. We will conduct in our following research on the acceleration and deceleration process data eliminated by the Kwon formula in future dynamic optimization work. The following research will include dynamic speed optimization considering the acceleration and deceleration process since these data hold some value. We will also test the model’s performance in the speed optimization process and conduct a comparative analysis with the real ship to ensure the feasibility of the model in actual sailing. It is imperative to save the fuel consumption of the whole voyage in different conditions under the current IMO policy.
7. Conclusions
Ship fuel consumption and emission reduction are significant challenges facing the maritime industry. In order to reduce the fuel consumption and emissions of ships, shipping companies optimize their operation strategies through speed, route optimization, load optimization, ship maintenance, and fuel management. An accurate fuel consumption prediction model is the basis for implementing these optimization strategies. In this research, two black- and one white-box models were built to predict the ship’s fuel consumption using the sensor data and main engine parameters. A data cleaning method was proposed to calculate the additional fuel consumption caused by wind and waves.
Amongst the models, the white-box model predicts with an overall accuracy of 4%. In the black-box model, the
of the XGBoost and the RF model on the test set are 0.9977 and 0.9922, respectively, and these values reached 0.9973 and 0.9921 in the validation set. After applying the Kwon cleaning method, the
of the XGBoost model was still 0.9954. The accuracy of the validation and test sets shows that the model is not over-fitting, confirming that the white-box model built by the main engine and ship parameters can predict fuel consumption. The machine learning models can accurately predict fuel consumption based on input parameters such as speed, trim, draft, and weather conditions. However, in
Figure 9, the change in
R2 is less than 0.01, showing that the benefit by hyperparameter optimization is insignificant. Additionally, there is a similarity between the RF and XGBoost models; both are based on the decision tree. We will continue to build more models to explore different results.
The data-cleaning method demonstrates that empirical formulas can improve data quality. The prediction results under ten simulated wind and wave conditions show that the data-cleaning method effectively eliminates the low (high) speed and high (low) fuel consumption values generated by the acceleration and deceleration process. Our research study provides a reference for shipping companies and ship data analysis.