Assessing Ships’ Environmental Performance Using Machine Learning

Skarlatos, Kyriakos; Fousteris, Andreas; Georgakellos, Dimitrios; Economou, Polychronis; Bersimis, Sotirios

doi:10.3390/en16062544

Open AccessArticle

Assessing Ships’ Environmental Performance Using Machine Learning

by

Kyriakos Skarlatos

¹,

Andreas Fousteris

¹

,

Dimitrios Georgakellos

^1,*,

Polychronis Economou

^2,*

and

Sotirios Bersimis

¹

Department of Business Administration, University of Piraeus, 18534 Piraeus, Greece

²

Department of Civil Engineering, University of Patras, 26504 Patras, Greece

^*

Authors to whom correspondence should be addressed.

Energies 2023, 16(6), 2544; https://doi.org/10.3390/en16062544

Submission received: 13 February 2023 / Revised: 1 March 2023 / Accepted: 6 March 2023 / Published: 8 March 2023

Download

Browse Figures

Versions Notes

Abstract

:

Environmental performance of ships is a critical factor in the shipping industry due to evolving climate change and the respective regulations imposed by authorities all over the world. As shipping moves towards digitization, a large amount of ships’ environmental performance-related data, collected during ships’ voyages, provide opportunities to develop and enhance data-driven performance models by using different machine learning algorithms. This paper introduces new indices of ships’ environmental performance using machine learning techniques. The new indices are produced by combining clustering algorithms as well as principal component analysis. Based on the analysis of the data (14 variables with operational and design characteristics), the ships are divided into four clusters based on the new suggested indices. These clusters categorize the ships according to their physical dimensions, operating region, and operational environmental efficiency, offering insight into the distinctive traits of each cluster.

Keywords:

ship’s environmental performance; machine learning in shipping; data-driven environmental indices; shipping environmental categorization

1. Introduction

Ships have been the dominant means of transporting goods for many years, and more than 80% of the world trade is transported by sea [1]. The increased interest of the global community in reducing environmental pollution has led to the introduction of new regulations by the authorities to improve vessels’ energy and operational efficiency. However, the total greenhouse gas (GHG) emissions produced by the shipping industry have increased from 2012 to 2018 by 9.6% [2]; by 2050, shipping missions are projected to be increased by 90–130% from 2008 levels [3].

The International Maritime Organization (IMO), as the main regulatory body for international shipping, with the adoption of the OILPOL Convention in 1954, introduced environmental regulations in the shipping sector. More recently, the IMO has adopted mandatory operational and technical measures, and committed to controlling GHG emissions via technological improvements, operational performance indicators, and the use of alternative fuels [3,4].

Therefore, the IMO introduced the Energy Efficiency Design Index (EEDI) for new ship design, which sets a minimum

{CO}_{2}

emission per cargo carried, the Energy Efficiency Operational Indicator (EEOI), and the Ship Energy Efficiency Management Plan (SEEMP) for all ships, aiming to improve the operational energy efficiency of ships by using operational strategies and practices [3,5,6]. Most of the measures available are speed based, due to ships’ energy efficiency, high sensitivity, and significant impact in the reduction of greenhouse gas (GHG) emissions [7,8]. The monitoring of such measures does not involve the investment of new funds or incur significant costs.

In addition, the European Commission (EC) presented several initiatives to limit GHG emissions [9], speed up decarbonization by setting the target of a climate-neutral Europe by 2050, and incorporate maritime transportation in emissions trading [10]. Moreover, a wide range of research groups, bodies, and authorities promote new energy indicators, such as the Clean Shipping Index (CSI), the Environmental Ship Index (ESI), and Rightship’s Existing Vessel Design Index (EVDI).

The shipping industry can improve its environmental performance and meet the targets either through ship design or operation-related measures. Although, as [11] argue, devices for data monitoring have a relatively low cost but the data processing method is quite complex, particularly when the activities of a ship vary. Therefore, energy inefficiencies can occur due to the limited information about energy efficiency and the lack of time that this information is produced and provided, as [12] concluded. The performance modeling of a ship can be achieved with multiple levels of sophistication [13], such as theory-based models or data-driven models. The former was mainly developed for ship design purposes and has significant uncertainties about a ship’s operation measures [14].

The shift of the shipping industry towards a more digital era led to large amounts of data related to energy consumption being collected (e.g., Kyma and Lean Marine systems). As a result, the shipping ecosystem aims to use the collected data and improve the operational efficiency of the ships, whether it concerns their design or their maintenance plan. So, all stakeholders are eager to exploit deeper the usage of complex machine learning methods and develop data-driven performance models with prediction accuracy.

This eagerness does not characterize only the shipping industry but every scientific and industrial field. A key tool to fulfill this eagerness and to develop and apply advanced machine learning techniques is the integration of computer science and statistics, as well as the theoretical foundation of artificial intelligence and data science. Thus, it is not surprising that machine learning is one of the technical domains with the fastest growth rates today resulting in the creation of new learning theories and algorithms, their application in new cases and fields, as well as the continual explosion in the accessibility of online data and low-cost processing [15].

Such an example is the artificial neural networks (ANN), which estimate the shaft power of large merchant ships via data-driven performance models [16,17,18,19]. Moreover, a Bayesian belief network (BBN) was applied to a dry-bulk ship interface with the port to quantify energy performance [20]. The capability of ANN and multiple linear regression (MLR) was compared in [21] to establish the relationship between fuel consumption and main engine RPM, ship speed, etc. Both ANNs and Gaussian Process Regression (GPR) were applied by [22] to predict the fuel consumption in relation to shaft power and ship speed.

Furthermore, many products that are using machine learning algorithms and utilizing the available ship performance data have already been developed and launched to the market (e.g., BMT, GreenSteam, HITACHI, and NAPA). However, most of these models are not easily understood and their sensitivity and accuracy are not well defined or explained [23].

The current slow pace of change has increased the pressure on regulatory bodies to intensify their effort and improve their effectiveness, making it difficult to predict future shipping industry trends. In addition, the absence of standardized measurement of environmental performance, due to the complexity of calculations and the dependency on the quantity and quality of data input, makes it a challenging and time-consuming task for humans to assess and implement a holistic approach. Hence, it is of paramount importance for the proper data analysis and the industry experience to be combined.

In this context, the present work intends to provide useful objective indices to aid the assessment of commercial ships’ environmental performance based on machine learning. Thus, this paper is organized as follows: the two next sections are related to the theoretical background and the proposed methodological framework, respectively, while Section 4 presents the application of the proposed methodology as well as the results and the proposed indexes incorporated in a graphical tag. Finally, in Section 5, concluding remarks are given as well as some directions for further research.

2. Theoretical Background

The concern for sustainable transformation in maritime has been at the top of the agenda for many years now. However, it involves complex decisions and multiple factors that must be considered [24]. Hence, most of the decisions that need to be made to improve the environmental performance of vessels and the general shipping industry have conflicting results. As a result, it is difficult to minimize emissions and at the same time maximize service levels [25]. For this reason, most of the existing management decision systems focus on cost or operational performance indexes [26].

The keen interest in environmental sustainability has led to extensive research; however, many of the recommended solutions are theoretical and impracticable. In addition, the multiple and controversial environmental initiatives available to the shipping industry do not offer clarity in making decisions and create additional administrative burden [12]. Further, many of the current studies propose solutions that focus only on the technical side, such as the use of alternative fuels [27], fuel life cycle calculations [28], hull cleaning [29], and vessel design [30,31].

In the existing literature, some initiatives provide indications about vessels’ performance based on environmental factors that are considered to be performance-related and others are developed as incentive schemes where environmental improvements to vessels or practices are rewarded with certifications or class notations, and consequently provide a market advantage [9]. Some other initiatives deal with a single environmental issue or have been developed for a specific use, location, or vessel type, while others assess a broader range of environmental issues and provide an overview of vessels’ environmental performance. However, the effectiveness of these initiatives in improving environmental performance has been questioned. A comparative analysis of the CSI and the ESI suggested that there are several drawbacks in assessing environmental performance [32]. In their study, ref. [33] was cautious about the contribution of “private standards” in mitigating GHG in shipping due to the lack of transparency and the ambition of several schemes analyzed.

In the literature, several studies exist regarding the modeling of vessel fuel consumption and emissions. The traditional “resistance modeling”, with the objective to estimate the vessel’s total resistance in relation to speed and external factors (e.g., wind and waves), is the theoretical foundation of ship fuel consumption [34,35]. However, it cannot handle complex issues, which is why alternative methods have been developed [36,37,38,39,40]. In general, these studies confirm that the speed of a vessel is the principal factor of fuel consumption, although resistance, due to weather, also has a significant influence [29,40].

The approach, the complexity, and the use of raw data are critical to achieving accuracy and well-understanding results related to the ship’s environmental performance. Applying ANN models [41] achieved prediction of propulsive power from the indicators, which mainly affect vessel resistance (speed, wind speed, direction, temperature, etc.). Other empirical studies have applied ship data from noon reports [29,40,42] or vessel positions from the Automatic Identification System (AIS) [43]. Moreover, ref. [44] confirmed that the use of ANN-based fuel prediction is appropriate to analyze the bunker fuel efficiency of a single oil tanker when noon reports are the primary source of information. Furthermore, the application of ANN models transcends traditional models, such as polynomial regression and support vector machine (SVM) learning, in accuracy and efficiency [45].

This paper proposes an alternative method for assessing ship environmental performance based on machine learning by using an objective and quantified approach.

3. Materials and Methods

The framework used in this paper makes extensive use of machine learning techniques to create a new composite energy efficiency index based on real ship operational data (see Figure 1 for the simplified framework process). The actual framework combines Principal Component Analysis (PCA) and clustering techniques to acquire from real data a new combined efficiency index and aims to minimize the number of parameters characterizing the environmental performance of a certain ship.

The best scenario is to conclude with one representative artificial environmental performance index containing the total information (or as much as possible) from the data. Nevertheless, even if only one environmental performance index could summarize the information contained in the data while mixing the various meanings of information, it would still be difficult to draw useful conclusions from it. An alternative and possibly more informative scenario would be the extraction of more than one index incorporating different information (e.g., pollution level and/or pollution reason) from the data providing practical interpretations.

For acquiring appropriate indices from the data, the PCA will be used and then Cluster Analysis (CA) will be applied to create groups of ships with similar environmental performance. PCA is a renowned method that has been applied in a wide range of scientific problems, especially in industry (see, for example, [46]), to reduce the dimensionality of the data at hand, taking into consideration the relations among variables. Moreover, PCA has been used historically to produce environmental performance-related indices in various production fields [47,48].

3.1. Principal Component Analysis

PCA is a mathematical technique [49] that does not make any assumptions about the nature of the data (e.g., the distribution of the available variables). PCA uses an orthogonal transformation to convert several dependent variables into a reduced number of linearly uncorrelated variables called principal components (PCs). PCA is used for revealing the internal structure of the data in a way that best explains the variance in the data [49]. An interesting feature of this method is that the extracted PCs may be appropriately interpreted or labeled by identifying which of the original variables contribute to each of the PCs.

Assuming that there are p original variables (say

X_{i}, i = 1, 2, \dots, p

), each of the p PCs (say

Y_{i}, i = 1, 2, \dots, p

) may be written as a linear combination of the original variables. Specifically, the jth PC can be written in the following form:

Y_{j} = a_{j 1} X_{1} + a_{j 2} X_{2} + \dots + a_{j p} X_{p}

(1)

where

a_{j u}

(

u = 1, 2, \dots, p

) are appropriate weights that quantify the contribution of the uth original variable to the jth PC. The PCA model is extracted by appropriately decomposing the

p \times p

covariance matrix S of

X_{i}, i = 1, 2, \dots, p

that contains the variances and the covariances of the original variables (

s_{i j}

denotes the covariance of the ith and the jth variable). In the case of standardized variables, the PCA model is extracted by appropriately decomposing the

p \times p

correlation matrix P of

X_{i}, i = 1, 2, \dots, p

that contains the correlation coefficients among the variables (or in other words, the variances and covariances of the standardized variables). In case the population parameters (matrices) are unknown, appropriate estimators are used.

Among PCs, the first PC accounts for as much of the information present in the data as possible, and each succeeding PC in turn has the highest variance possible under the constraint that it is orthogonal to (i.e., uncorrelated with) the preceding PCs. Usually, the first two (or three) uncorrelated PCs explain the majority of the information contained in a data set. In cases, such as the case examined here, where just a small number of original variables is available, usually, the two first PCs explain most of the information in a data set. Thus, it is evident that by using PCA the problem under study may be simplified. In Figure 2, the application of PCA in the space defined by two original variables is presented.

Moreover, since each PC is a weighted sum of the random variables

X_{i}, i = 1, 2, \dots, p

where each

X_{i}

represents one of the original parameters, then by using the Central Limit Theorem (see, for example, [50]), it is assumed that at least the approximate distribution of each PC is Normal or, equivalently, that the distribution of the standardized PCs is approximately Standard Normal. Moreover, since the two PCs are uncorrelated and are assumed to be normally distributed, the lack of correlation is equivalent to the independence of the PCs.

As a consequence, it is evident that the PCA method offers an opportunity to reform the multivariate problem to many univariate problems in the sense that the PCs are independent. This can simplify the procedure of evaluating ships’ environmental performance. Additionally, since PCA permits the interpretation of each PC, it offers an additional tool to assign qualitative meaning to quantitative data.

3.2. Cluster Analysis

CA or clustering refers to algorithms that aim to organize a set of items/observations into groups or clusters so that they share similar (in some manner) characteristics and differ from other observations that belong to other groups. Clustering is a key function of exploratory data analysis and a widely used method for statistical data analysis in various fields.

CA may be accomplished by several algorithms, which vary greatly in their comprehension of what defines a cluster and how to effectively discover them. Some common definitions of clusters include groupings with close spacing between cluster members, crowded regions of the data space, intervals, or certain statistical distributions. Therefore, clustering may be described as a multi-objective optimization problem. The proper clustering technique and parameter settings (including factors such as the distance function to employ, a density threshold, or the number of predicted clusters) rely on the specific data set and the intended application of the findings. The task of such analysis can be viewed as a challenge of categorizing items based on how similar they are to one another. To group objects into clusters, this similarity measure is typically—and in most applications—based on distance functions such as Euclidean distance, Manhattan distance, Minkowski distance, Cosine similarity, etc. A homogeneous group is made up of objects that are sufficiently similar to one another (a cluster). CA as such is not an automatic task but an iterative process of knowledge discovery or interactive multi-objective optimization that involves trial and failure. It is often necessary to modify data preprocessing and model parameters until the result achieves the desired properties.

A graphical representation of an application of CA is presented in Figure 3.

3.3. Available Real Data

To develop new ship environmental performance indices, data were combined from two different sources. The first data source was from the EU Monitoring, Reporting, Verification (MRV) mechanism that collects the

{CO}_{2}

emissions reports for ships above 5000 gross tonnages (regardless of the ship’s flag), operating in ports under the jurisdiction of any EU Member State. The second data source was that of the startup “27 Research” in Greece and was used to extract information regarding the physical characteristics of all the ships with at least one voyage in the EU zone during 2018–2021. The final merged data set consisted of two major groups, namely, General Cargo ships with 2650 records and Container ships with 62 records. The variables recorded for each ship are described in Table 1.

4. Analysis and Results

In this section, the data described above are analyzed using mainly the PCA and CA methods described in Section 3 after a necessary preprocessing procedure and an initial exploratory analysis of the original data to identify possible outliers.

4.1. Data Preprocessing

Before applying the PCA method to the data, the observations in each variable

X_{i}

are standardized using the following formula:

Z_{i} = \frac{X_{i} - μ_{i}}{σ_{i}}

(2)

where

μ_{i}

is the mean of the ith variable and

σ_{i}

is the standard deviation of the ith variable. Since, in most cases, the true means and variances are unknown, their unbiased sample estimates can be used.

4.2. Exploratory Analysis of the Original Data

After preprocessing the data, 381 missing values were removed and 2209 remained, and a basic exploratory analysis was carried out to determine initially the correlation between the variables and to detect only possible outliers. Both these analyses are crucial for the PCA since they not only justify its PCA due to the presence of highly correlated variables but also result in more concrete PCs by removing any outlier that incorporates noise in the multivariate data set.

One of the most noticeable points from correlation analysis was the positive correlation between fuel consumption (

X_{1}

) with the variables related to environmentally harmful emissions (

X_{2}, X_{3}, \dots, X_{5}

). There was also a stronger positive correlation between variables related to the construction characteristics of the vessels (

X_{8}, X_{9}, \dots, X_{14}

). To highlight the significant correlations, a correlation heatmap matrix reporting only the significant Pearson’s correlation coefficients among the variables is depicted in Figure 4.

Moreover, as we observe, the annual total fuel consumption for voyages (

X_{1}

) is correlated with the total distance traveled (

X_{7}

), which is related to the annual total time spent at sea on a voyage (

X_{6}

). On the other hand, there are many correlations in the variables corresponding to the technical specifications of a ship, and some of them are strongly positively correlated (>=0.9), such as the weight that a ship can safely carry (

X_{8}

) with (a) the width of the ship (

X_{11}

) and (b) the maximum space available for cargo (

X_{13}

). The maximum space available for cargo (

X_{13}

) is also strongly positively correlated with (a) the maximum length of a vessel (

X_{10}

) and (b) the width of the ship (

X_{11}

).

To examine if there are any outliers in the data set, the variables

X_{8}

–

X_{14}

(ship characteristics) were examined following two different approaches, namely, a univariate graphical examination with the aid of a multivariate approach based on the Mahalanobis Distances (MDs) of the observations (see Figure 5) from the center (mean) of their joint distribution (

\sqrt{(u - μ) V^{- 1} {(u - μ)}^{T}}

, where u is a vector of the observed values,

μ

is the vector with the sampling mean of the variables, and V is the variance–covariance matrix).

As we may observe in Figure 5, most of our data give an MD value smaller than 10, while a small number of ships give MD values larger than 10; moreover, a smaller number of ships give an MD value larger than 20. From the graphical representation, an intuitive threshold of in-bound values was set equal to 10 and all observations with MD values larger than 10 were removed from further analysis, due to probably false reported values. For example, the distance between the waterline and the keel (

X_{9}

) of the ship that corresponds to the upper right dot in Figure 5 is reported to be equal to

X_{9} = 81.15

m when the average value of this characteristic is only

{\bar{x}}_{9} = 8.30

m (

s d_{X_{9}} = 3.32

m), which obviously indicates a false reported value. Another observation that was removed corresponds to a ship with a maximum available space for cargo (

X_{13}

) of 62,400 m³, which is 3.24 standard deviations larger than the corresponding mean value of this variable (

{\bar{x}}_{13}

= 19,452 m³,

s d_{X_{13}}

= 13,265.28 m³).

4.3. Application of PCA

The PCA was applied sequentially to (a) all the available data and (b) to data after removing observations (ships) with an MD value larger than 10 (labeled as reduced data set). The number of principal components that need to be kept in a PCA is usually determined with the help of the so-called scree plot. In the left plot of Figure 6, the scree plot, created based on all data, is presented, which helps determine the number of PCs to keep. Based on the plot, one can conclude that three PCs should be preserved since the curve declines steeply and then bends when the number of PCs equals 3, which serves as an indicator of a cut-off point. It is worth mentioning that the same cut-off point is indicated in both cases.

In Table 2, the PCA loadings for the first three PCs using the initial standardized data set and the reduced data set are reported. It is worth mentioning that the loadings of the first 3 PCs in both cases are similar (observe the small values in the columns labeled as “Pairwise Differences among loadings”), indicating the difference between the loadings of initial data and those of the reduced data set. The first three PCs explain more than 80% of the observed variance in the original variables (80.2% and 85.1% for the two cases, respectively). Moreover, from the loadings, it is clear that almost all variables, with the exception of

Z_{2}

,

Z_{7}

, and

Z_{8}

, have a positive correlation with the first PC. Variables

Z_{8}

,

Z_{10}

,

Z_{11}

, and

Z_{13}

have a negative correlation with the second PCs. A negative correlation with the third PC is also found for the variables

Z_{1}

,

Z_{3}

,

Z_{4}

,

Z_{7}

, and

Z_{14}

. These relationships can be easily depicted, for example, for the first two PCs, with the help of the biplot, as presented in the right plot of Figure 6. The biplot presented in the right plot of Figure 6 is created based on the reduced data set.

Interpretation of the First PCs

Large (absolute) values of PC loadings indicate that the variables have a strong effect on that principal component. From the values reported in Table 2, it is clear that the first PC is more related to the variables

Z_{8}

–

Z_{14}

, which represent the ship’s physical dimensions, while the second PC is related more strongly to the variables

Z_{1}

–

Z_{7}

, which represent the ship’s consumption,

{CO}_{2}

emissions, and operational data. The third PC seems to be strongly related to variables

Z_{2}

–

Z_{5}

, which represent the ship’s

{CO}_{2}

in different geographical regions.

For the first PC, all the large loadings are positive, meaning that ships with large physical dimensions will have large positive values in the first PC while smaller ships will have smaller values. Regarding the second PC, it is worth noticing that all the large loadings (related to variables

Z_{1}

–

Z_{7}

) are negative, meaning that ships with pure environmental performance (large consumption, large

{CO}_{2}

emissions, etc.) will have small values in the second PC and ships with good environmental performance will have large positive values. Finally, in the case of the third PC, a contrast between the loadings representing the

{CO}_{2}

emissions during operation under different conditions is observed. Based on the sign of the loadings, it is clear that the third PC gives large positive scores to ships operating mostly in a Member State’s jurisdiction while it gives large negative scores to ships operating mostly outside a Member State’s jurisdiction.

The above indicates that the first three PCs can be clearly interpreted and labeled as follows:

Ship’s “Physical Dimensions” (first PC);
Ship’s “Operational Env. Efficiency” (second PC);
Ship’s “Operating Region” (third PC).

These three PCs can be considered as three independent indices that characterize ships’ size, operational environmental performance, and operating region (in terms of operating mostly inside or outside a Member State’s jurisdiction).

4.4. Definition of a Graphical Tag Based on the Three PCs (Three Indices)

Based on the three aforementioned PCs (independent indices), an appropriate graphical index (tag) for describing the environmental performance of a ship, conditional to its size and its working region, can be defined. The template icon for the graphical index is given in Figure 7.

The icon of the ship in Figure 7 has two distinct zones at the main body of the ship. The lower zone can be used to depict the score of a ship in the first PC, while the upper zone can be used to represent the score of a ship for the second PC. These scores can be depicted with the aid of colors. Specifically, since the first PC takes positive values (recall that all the large loadings were positive), the lower zone can be filled (for example, with blue color) proportionally to its value. Large ships will be indicated with an almost full lower zone while the small ships will be indicated with an almost empty lower zone. The upper zone that represents the second PC can be filled gradually with green, orange, and red colors, representing the environmental efficiency of a ship. Large values of the second PC indicate ships that can be considered environmentally “friendly” and can be depicted with only green color in the upper zone. On the other hand, the upper zone can be filled with red for ships that contribute significantly to pollution, i.e., ships with small values for the second PC. In order to be able to depict also all the intermediate values/cases, the upper zone can be filled gradually with green, orange, and red to indicate the environmental efficiency of the ship. Finally, the third PC is assigned to the stern of the ship and indicates the operating region index of each ship.

The aforementioned procedure can be summarized in the following algorithm, which also clarifies the procedure for determining the gradual fill of the two zones and the color assigned to the stern of the ship in the image given in Figure 7.

Step 1.: The two zones in the tag are standardized so that their length is equal to 1.
Step 2.: Under the assumption that the first PC is, at least approximately, normally distributed, the lower zone of a ship with values equal to $p c_{1}$ is filled with the color blue up to the value $F (p c_{1})$ , where $F (\cdot)$ denotes the empirical cumulative distribution function of the first PC.
Step 3.: Following a similar reasoning, the upper zone is filled gradually with green, orange, and red with each color assigned to the interval [0, 1/3), [1/3, 2/3), and [2/3, 1], respectively.
Step 4.: The third PC, depicted as the stern of the ship, is colored according to the following rule: If the score of the third PC (denoted as PC3 score) is smaller than the first quartile of its values in the available data, then it is filled in red. If the PC3 score is larger than the third quartile, then the stern of the ship is filled with the color green. In all other cases, the stern is filled with the color orange.

Some characteristic examples of the proposed tag are given in Figure 8. For example, in Figure 8a, a ship with good environmental performance (upper green bar), average physical dimensions (blue bottom bar), and operating mostly inside a Member State’s jurisdiction (green rectangle) is depicted. The second tag of Figure 8b presents a ship with good environmental performance and large physical dimensions, which also operates mostly inside a Member State’s jurisdiction. The third tag (Figure 8c) represents a ship that differs from the first two only in the dimensions, having a size somewhere in the middle of the two previous ships. The fourth example (Figure 8d) demonstrates a graphical index of a ship with average to pure environmental performance and small physical dimensions that operate mostly outside a Member State’s jurisdiction. The fifth tag (Figure 8e) represents a ship with extremely pure environmental performance and large physical dimensions that operate inside a Member State’s jurisdiction. The last tag (Figure 8e) represents a ship similar to the previous ship with the following differences: (1) it has a pure, but not as extreme as the previous ship, environmental performance and (2) operates both inside and outside a Member State’s jurisdiction.

From the above examples, it is clear the proposed tag can serve as a unified index that represents the environmental impact based on carbon dioxide (

{CO}_{2}

) emissions adjusted to the cargo capacity, which is directly related to the physical dimensions of a ship. Thus, this graphical tag is able to distinguish the large vessels with high environmental impact from the vessels with similar dimensions with low

{CO}_{2}

emissions. The same applies to smaller vessels as well.

4.5. Cluster Analysis Based on the Three Indices

Following the production of the three indices related to ships’ environmental impact derived by the operation time and emissions in combination with technical characteristics, such as physical dimensions, resulting in the aforementioned graphical tag, the K-Means algorithm was used to further explore the data. More specifically, the K-Means algorithm was implemented using the three indices produced by the PCA to trace and group vessels in clusters with similar characteristics in terms of size,

{CO}_{2}

emissions, time of operation, energy consumption, etc.

4.5.1. Choosing Optimal Number of Clusters

Determining the number, k, of clusters in a data set is one of the most crucial tasks in CA. There are several methods to achieve this, each one exploring different characteristics of the data and the clusters, which do not always conclude with the same number of clusters. In such cases, an analysis with all the possible scenarios should be carried out. The information gained by this procedure should then be combined with the knowledge of a domain expert to determine not only a statistically realistic option but also a pragmatic choice from the expert’s point of view.

In the present study, two of the most frequently used methods, namely, the silhouette score and the Gap statistic, will be used to determine the optimal number of clusters k in the available data set (PCs for the General Cargo ships). Each of these statistics is calculated for a range of values for the number k of clusters. Large values or, in general, any peaks to the plots of these statistics versus k indicate that the observations in the clusters defined are well-matched with each other and well-separated from neighboring clusters.

The silhouette score [51] for a given separation, i.e., by fixing the number of clusters in the data, is defined as

\frac{(a - b)}{m a x (a, b)}

(3)

where a denotes the mean intra-cluster distance and b denotes the mean nearest-cluster distance (b).

The Gap statistic, on the other hand, for each number k of clusters compares the total within intra-cluster variation

W_{k}

(in the log scale) with its expected value determined by generating a large amount of reference data from a uniform distribution on the hypercube defined by the range of the available variables—in this case, the three PCs). For more details, the reader is referred to [52,53].

The silhouette score and the Gap Statistic for the data set are depicted in Figure 9 using the k-means algorithm. Both procedures present a peak at

k = 4

, indicating that there are four clusters in the data. Additionally, the Silhouette score indicates that analysis with two clusters could also be a reasonably good option. As a result, both analyses were carried out and presented briefly next.

4.5.2. K-Means Algorithm

The k-means algorithm was applied in the data set by setting the number of clusters equal to 2 and 4. In Figure 10, the two clusters defined by the k-means algorithm are plotted with respect to the available variables, i.e., with respect to the PCs. More specifically, in the upper left plot, the two clusters (colored with red and green) are plotted against the first and the second PCs. In the upper right plot, the same clusters are plotted against the first and the third PCs, while in the lower plot, the clusters are plotted against the second and the third PCs.

From the plots, the clustering algorithm segments the data set into two segments, mainly with respect to the size of the ships. For example, clusters on the plane defined by the first and the second PC or on the plane defined by the first and the second PC (plots in the upper row of Figure 10) seem to be separated well in terms of physical dimensions, i.e., with respect to the horizontal axis. However, there seems to be no significant separation regarding the other two PCs (see, for example, the lower plot in Figure 10 or the plots in the upper row with respect to the vertical axis). Therefore, it seems that the k-means algorithm with

k = 2

manages to separate the ships with respect to their physical dimensions and fails to capture any other difference regarding the other two indices.

The plots in Figure 11 represent the clusters, colored in four different colors, identified by the k-means algorithm in the case of four clusters with respect—as in Figure 10—to the PCs. From the plots, it is clear that the four clusters are well-separated with respect to the first two PCs (see upper left plot), namely, the “Physical Dimensions” and “Operational Env. Efficiency”. The third PC, namely, the “Operating Region”, seems to play a relatively smaller role in the separation of the clusters (see the upper right and the lower plots).

From the above analysis, it is clear that the four-cluster is more informative than the two-cluster analysis. The four clusters approach manages not only to separate the ships according to their “Physical Dimensions”, as the two clusters analysis did, but also to advance the information hidden in the second PC (Operational Env. Efficiency).

4.5.3. Interpretation of Clusters

The differences in the four clusters are also highlighted in Figure 12, in which the values of the three indices (PCs) at the centroid of the four clusters identified by the k-means algorithm are presented.

The first cluster consists of 416 ships while the corresponding numbers for clusters 2, 3, and 4 are 625, 309, and 848, respectively. The differences between the clusters were also tested using an ANOVA (assuming that PCs are independent and approximately normally distributed). The results of ANOVA confirmed, indeed, that there is a statistically significant difference between the means of the three PCs in the four clusters (p-value < 0.0001).The four identified clusters can be briefly labeled as follows:

Cluster 1: “large, environmentally friendly ships”;
Cluster 2: “small, environmentally friendly ships”;
Cluster 3: “large, non-environmentally friendly ships”;
Cluster 4: “small, non-environmentally friendly ships”.

These are in accordance with the existing literature on the environmental sustainability in maritime shipping (see, for example, [24,26]).

As a final remark, one can notice that while the third index (Operating Region) seems to play, as already mentioned, a relatively smaller role in the separation of the clusters, there is still some information that can be extracted with respect to this index. More specifically, it seems that the small, environmentally friendly ships (Cluster 2) tend to operate exclusively inside a Member State’s jurisdiction. In addition, it is interesting to mention that while a group of large ships with poor environmental performance due to their size and design is indeed expected to be observed [30,31], there is also a large number of small ships with poor environmental performance (Cluster 4), which operates almost exclusively outside a Member State’s jurisdiction.

4.5.4. Further Investigation of the Characteristics of Clusters

To delve deeper into the nature and the characteristics of the identified clusters, the correlations PCA at each cluster were also calculated. In Table 3, the Pearson correlation coefficients and their corresponding p-values (in parentheses) are presented for all the possible pairs of PCs in each cluster. All the correlation coefficients demonstrate a weak but significant—at a significance level of 0.05—correlation between all the PCs.

More specifically, Physical Dimensions (PC1) and Ship’s Operational Env. Efficiency (PC2) present a weak positive correlation in all clusters, meaning that the ship’s size influences positively its environmental impact. This positive correlation seems to be larger in Clusters 1 and 3—i.e., among large ships—and smaller among small ships (Clusters 2 and 4).

Ship’s Operational Env. Efficiency (PC2) and Operating Region (PC3) seem to have a weak negative correlation in all the Clusters except Cluster 2, i.e., the cluster defined by the small, environmentally friendly ships. One possible explanation for this is that in the group of small, environmentally friendly ships (Cluster 2), the better the environmental performance, the more likely it is to use eco-friendly fuel and operate mostly inside a Member State’s jurisdiction. On the other hand, for the other three clusters, the observed negative correlation is quite surprising since this implies that ships operate mostly in areas where low-quality oil is used (i.e., outside a Member State’s jurisdiction) tend to have a better environmental performance, i.e., large values of the second index (PC2). This may be explained by the better engine specifications usually adopted by ships that make large international voyages to reduce travel costs.

Regarding the correlation between the Physical Dimensions (PC1) and the Operating Region (PC3) in each cluster, it seems that there is a weak but statistically significant negative correlation among the environmentally friendly ships (Clusters 1 and 2) and a weak but statistically significant positive correlation among the non-environmentally friendly ships (Clusters 3 and 4). This may again be explained by the better engines that ships that make large international voyages use to reduce travel costs.

4.6. PCA Validation of the Proposed Indices

To validate the PCs produced by the PCA and used afterward in the CA, the 62 Container ships in the merged data set (see Section 3.3) were used. According to Regulation (EC) No 1367/2006 of the European Parliament and of the Council of 6 September 2006, the main difference between the two categories is the weight and volume of cargo carried. As a result, the 62 Container ships can serve as a validation set to assess the quality, reliability, and consistency of the analytical findings of the PCA and the creation of the three indices (PCs).

In Table 4, the PCs produced by this data set are presented along with the PCs from the reduced data from the General Cargo ships. Additionally, pairwise differences among the loadings are also given for comparison purposes. From the results, it is obvious that the PCs for the Container ships present similar values to those derived by the General Cargo ships and can produce three similar, in nature and behavior, indices. Therefore, it seems that the three proposed indices provide a concrete description of the environmental performance and can be used in other categories of ships. It is of note that no outlier was detected among the Container ships observations.

5. Discussion and Conclusions

The increasing focus on environmental sustainability has spurred considerable research and the production of numerous theoretical solutions. The traditional “resistance modeling” has been widely accepted as the theoretical foundation of ship fuel consumption and emissions, as it serves to estimate the vessel’s total resistance in relation to speed and external influences [34,35]. Although this method is widely used, it is limited in its ability to address complex issues. Consequently, alternative methods have been developed to further improve the accuracy of fuel consumption predictions [36,37,38,39,40]. These studies have generally concluded that speed is the primary factor of fuel consumption, with external conditions such as weather playing a significant secondary role [29,40]. Nonetheless, many current studies prioritize the technical aspects of sustainability and the multitude of environmental initiatives available to the shipping industry can be confusing, as well as add to administrative burdens.

This research presents a large number of environmental initiatives or indices that are currently available in the shipping industry, including instruments developed by the IMO. The framework used makes extensive use of machine learning techniques to create new composite energy efficiency indices that are based on real ship operational data. The actual framework combines PCA and clustering techniques to acquire from real data new combined efficiency indices with an easy interpretation. These indices are combined in a graphical tag to depict the environmental impact of a ship. Considering that there is a plethora of clustering and dimensionality reduction techniques that could be applied in future studies, it seems that PCA fully meets the process of a linear transformation of variables and reducing them as composite variables.

Moreover, based on the three proposed indices, the ships are categorized into four clusters that incorporate the information of 14 operational and design variables. These clusters distinguish the vessels based on their environmental impact, physical dimensions, and operation region, thus shedding light on the specific characteristics of each cluster. For example, it was shown that small, environmentally friendly ships usually operate exclusively inside a Member State’s jurisdiction, which is a characteristic that is not met in any other group of ships. Moreover, a significant number of small ships with poor environmental performance were identified, which operate exclusively outside a Member State’s jurisdiction.

The proposed indices and the corresponding graphical tag manage, indeed, to represent the environmental footprint of a ship. These indices are incorporated in an innovation graphical tag that can serve as an environmental impact label for the ships. Using aggregating data such as those in the present data distribution can only serve as a snapshot of the ship’s performance. It is true that more frequently recorded observations would provide more detailed information, ensure the robustness, and secure the quality of the data. Using statistical process monitoring (SPM) systems to continuously monitor carbon emissions from businesses can have several advantages. At the industrial level, it can assist in detecting excessive emissions at an early stage and ensuring that the necessary measures can be implemented in advance to limit them. This can minimize the estimated overall cost, which includes emission-related and operational costs of the SPM program. In addition, it can help with determining whether the emissions are within the regulatory limit or at a high risk of non-compliance. Monitoring and measuring the impact and associated costs of emissions on the environment in order to establish guidelines for comparing the actual with the targeted emissions. Most importantly, SPM programs can assist decision-makers in determining an acceptable emission charge [54]. A more dynamic tool will require collecting data on a more regular basis—for example, monthly—which will allow not only to incorporate changes in the second (Operational Env. Efficiency) and the third indexes (Operating Region), and monitor the environmental footprint of a specific ship, but also (a) to identify seasonal patterns and/or (b) early detect trends and changes in the shipping market that affect the performance of the ships in general. In more frequently collected data—for example, every hour—this could also allow the real-time monitoring of the second index (Operational Env. Efficiency), which could result in a real-time decision-making tool by updating the permissible limit and alerting all the ships in a region.

Author Contributions

Conceptualization, S.B. and A.F.; methodology, S.B., K.S. and P.E.; software, K.S.; validation, D.G. and P.E.; formal analysis, K.S., S.B. and P.E.; resources, K.S.; data curation, K.S.; writing—original draft preparation, K.S., S.B., A.F. and P.E.; writing—review and editing, K.S., S.B., A.F., D.G. and P.E. All authors have read and agreed to the published version of the manuscript.

Funding

The Article Processing Charges of the paper were funded by University of Piraeus Research Center.

Data Availability Statement

The first data source used in this study (EU Monitoring, Reporting, Verification (MRV) mechanism) can be accessed through https://mrv.emsa.europa.eu/#public/emission-report (accessed on 12 January 2023). The second data source provided by the startup “27 Research” is unavailable due to privacy.

Acknowledgments

The authors thank the 27 Research company and especially its Shipping Director, Vasilis Molaris, for providing the data for this analysis.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

GHG	Greenhouse Gas
IMO	International Maritime Organization
EEDI	Energy Efficiency Design Index
EEOI	Energy Operational Indicator
SEEMP	Ship Energy Efficiency Management Plan
EC	European Commission
CSI	Clean Shipping Index
ESI	Environmental Ship Index
EVDI	Existing Vessel Design Index
ANN	Artificial Neural Network
BBN	Bayesian Belief Network
MLR	Multiple Linear Regression
GPR	Gaussian Process Regression
AIS	Automatic Identification System
SVM	Support Vector Machine
PCA	Principal Component Analysis
CA	Cluster Analysis
MRV	Monitoring Reporting Verification
MD	Mahalanobis Distances
${CO}_{2}$	Carbon Dioxide
SPM	Statistical Process Monitoring

References

Wu, Z.; Xia, X. Tariff-driven demand side management of green ship. Sol. Energy 2018, 170, 991–1000. [Google Scholar] [CrossRef] [Green Version]
Faber, J.; Hanayama, S.; Zhang, S.; Pereda, P.; Comer, B.; Hauerhof, E.; van der Loeff, W.S.; Smith, T.; Zhang, Y.; Kosaka, H.; et al. Fourth IMO Greenhouse Gas Study 2020; International Maritime Organization: London, UK, 2020; pp. 1689–1699. [Google Scholar]
International Maritime Organization. MEPC 75/18/Add.1. Resolution MEPC.324(75). Amendments to MARPOL Annex VI; Technical Report; International Maritime Organization: London, UK, 2020. [Google Scholar]
Jia, H.; Adland, R.; Prakash, V.; Smith, T. Energy efficiency with the application of Virtual Arrival policy. Transp. Res. Part D Transp. Environ. 2017, 54, 50–60. [Google Scholar] [CrossRef]
Bazari, Z.; Longva, T. Assessment of IMO Mandated Energy Efficiency Measures for International Shipping; International Maritime Organization: London, UK, 2011; Volume 10. [Google Scholar]
Rehmatulla, N.; Calleya, J.; Smith, T. The implementation of technical energy efficiency and CO₂ emission reduction measures in shipping. Ocean Eng. 2017, 139, 184–197. [Google Scholar] [CrossRef]
Smith, T.; Parker, S.; Rehmatulla, N. On the speed of ships. In Proceedings of the International Conference on Technologies, Operations, Logistics and Modelling for Low Carbon Shipping, LCS2011, Glasgow, UK, 22–24 June 2011; pp. 22–24. [Google Scholar]
Capezza, C.; Coleman, S.; Lepore, A.; Palumbo, B.; Vitiello, L. Ship fuel consumption monitoring and fault detection via partial least squares and control charts of navigation data. Transp. Res. Part D Transp. Environ. 2019, 67, 375–387. [Google Scholar] [CrossRef]
Gibson, M.; Murphy, A.J.; Pazouki, K. Evaluation of environmental performance indices for ships. Transp. Res. Part D Transp. Environ. 2019, 73, 152–161. [Google Scholar] [CrossRef]
Haas, T.; Sander, H. Decarbonizing transport in the European Union: Emission performance standards and the perspectives for a European Green Deal. Sustainability 2020, 12, 8381. [Google Scholar] [CrossRef]
Trodden, D.; Murphy, A.; Pazouki, K.; Sargeant, J. Fuel usage data analysis for efficient shipping operations. Ocean Eng. 2015, 110, 75–84. [Google Scholar] [CrossRef] [Green Version]
Lister, J.; Poulsen, R.T.; Ponte, S. Orchestrating transnational environmental governance in maritime shipping. Glob. Environ. Chang. 2015, 34, 185–195. [Google Scholar] [CrossRef] [Green Version]
Javdani, S.; Fabian, M.; Carlton, J.S.; Sun, T.; Grattan, K.T. Underwater free-vibration analysis of full-scale marine propeller using a fiber Bragg grating-based sensor system. IEEE Sens. J. 2015, 16, 946–953. [Google Scholar] [CrossRef]
Tillig, F.; Ringsberg, J.W.; Mao, W.; Ramne, B. Analysis of uncertainties in the prediction of ships’ fuel consumption–from early design to operation conditions. Ships Offshore Struct. 2018, 13, 13–24. [Google Scholar] [CrossRef] [Green Version]
Jordan, M.I.; Mitchell, T.M. Machine learning: Trends, perspectives, and prospects. Science 2015, 349, 255–260. [Google Scholar] [CrossRef] [PubMed]
Parkes, A.; Sobey, A.; Hudson, D. Physics-based shaft power prediction for large merchant ships using neural networks. Ocean Eng. 2018, 166, 92–104. [Google Scholar] [CrossRef] [Green Version]
Zhang, C.; Zhang, D.; Zhang, M.; Mao, W. Data-driven ship energy efficiency analysis and optimization model for route planning in ice-covered Arctic waters. Ocean Eng. 2019, 186, 106071. [Google Scholar] [CrossRef]
Moreira, L.; Vettor, R.; Guedes Soares, C. Neural network approach for predicting ship speed and fuel consumption. J. Mar. Sci. Eng. 2021, 9, 119. [Google Scholar] [CrossRef]
Karagiannidis, P.; Themelis, N. Data-driven modelling of ship propulsion and the effect of data pre-processing on the prediction of ship fuel consumption and speed loss. Ocean Eng. 2021, 222, 108616. [Google Scholar] [CrossRef]
Canbulat, O.; Aymelek, M.; Turan, O.; Boulougouris, E. An application of BBNs on the integrated energy efficiency of ship–port interface: A dry bulk shipping case. Marit. Policy Manag. 2019, 46, 845–865. [Google Scholar] [CrossRef]
Kim, Y.R.; Jung, M.; Park, J.B. Development of a fuel consumption prediction model based on machine learning using ship in-service data. J. Mar. Sci. Eng. 2021, 9, 137. [Google Scholar] [CrossRef]
Hu, Z.; Jin, Y.; Hu, Q.; Sen, S.; Zhou, T.; Osman, M.T. Prediction of fuel consumption for enroute ship based on machine learning. IEEE Access 2019, 7, 119497–119505. [Google Scholar] [CrossRef]
Lang, X.; Wu, D.; Mao, W. Comparison of supervised machine learning methods to predict ship propulsion power at sea. Ocean Eng. 2022, 245, 110387. [Google Scholar] [CrossRef]
Smith, T.; Jalkanen, J.; Anderson, B.; Corbett, J.; Faber, J.; Hanayama, S.; O’Keeffe, E.; Parker, S.; Johansson, L.; Aldous, L.; et al. Third IMO Greenhouse Gas Study 2014; International Maritime Organization: London, UK, 2015. [Google Scholar]
Moldan, B.; Janoušková, S.; Hák, T. How to understand and measure environmental sustainability: Indicators and targets. Ecol. Indic. 2012, 17, 4–13. [Google Scholar] [CrossRef]
Mansouri, S.A.; Lee, H.; Aluko, O. Multi-objective decision support to enhance environmental sustainability in maritime shipping: A review and future directions. Transp. Res. Part E Logist. Transp. Rev. 2015, 78, 3–18. [Google Scholar] [CrossRef]
Balcombe, P.; Brierley, J.; Lewis, C.; Skatvedt, L.; Speirs, J.; Hawkes, A.; Staffell, I. How to decarbonise international shipping: Options for fuels, technologies and policies. Energy Convers. Manag. 2019, 182, 72–88. [Google Scholar] [CrossRef]
Greene, S.; Jia, H.; Rubio-Domingo, G. Well-to-tank carbon emissions from crude oil maritime transportation. Transp. Res. Part D Transp. Environ. 2020, 88, 102587. [Google Scholar] [CrossRef]
Adland, R.; Cariou, P.; Jia, H.; Wolff, F.C. The energy efficiency effects of periodic ship hull cleaning. J. Clean. Prod. 2018, 178, 1–13. [Google Scholar] [CrossRef]
Motley, M.R.; Nelson, M.; Young, Y.L. Integrated probabilistic design of marine propulsors to minimize lifetime fuel consumption. Ocean Eng. 2012, 45, 1–8. [Google Scholar] [CrossRef]
Doulgeris, G.; Korakianitis, T.; Pilidis, P.; Tsoudis, E. Techno-economic and environmental risk analysis for advanced marine propulsion systems. Appl. Energy 2012, 99, 1–12. [Google Scholar] [CrossRef]
Murphy, A.; Landamore, M.; Pazouki, K.; Gibson, M. Modelling ship emission factors and emission indices. In Proceedings of the Low Carbon Shipping Conference, London, UK, 9–10 September 2013. [Google Scholar]
Scott, J.; Smith, T.; Rehmatulla, N.; Milligan, B. The promise and limits of private standards in reducing greenhouse gas emissions from shipping. J. Environ. Law 2017, 29, 231–262. [Google Scholar] [CrossRef] [Green Version]
Telfer, E. The Practical Analysis of Merchant Ship Trials and Service Performance; North East Coast Institution of Engineers and Shipbuilders: Newcastle upon Tyne, UK, 1927. [Google Scholar]
Messerly, J.F.; Guthrie Jr, G.B.; Todd, S.S.; Finke, H.L. Low-temperature thermal data for pentane, n-heptadecane, and n-octadecane. Revised thermodynamic functions for the n-alkanes, C5-C18. J. Chem. Eng. Data 1967, 12, 338–346. [Google Scholar] [CrossRef]
Lagouvardou, S.; Psaraftis, H.N.; Zis, T. A literature survey on market-based measures for the decarbonization of shipping. Sustainability 2020, 12, 3953. [Google Scholar] [CrossRef]
Psaraftis, H.N.; Kontovas, C.A. Speed models for energy-efficient maritime transportation: A taxonomy and survey. Transp. Res. Part C Emerg. Technol. 2013, 26, 331–351. [Google Scholar] [CrossRef]
Xia, J.; Li, K.X.; Ma, H.; Xu, Z. Joint planning of fleet deployment, speed optimization, and cargo allocation for liner shipping. Transp. Sci. 2015, 49, 922–938. [Google Scholar] [CrossRef]
He, Q.; Zhang, X.; Nip, K. Speed optimization over a path with heterogeneous arc costs. Transp. Res. Part B Methodol. 2017, 104, 198–214. [Google Scholar] [CrossRef]
Du, Y.; Meng, Q.; Wang, S.; Kuang, H. Two-phase optimal solutions for ship speed and trim optimization over a voyage using voyage report data. Transp. Res. Part B Methodol. 2019, 122, 88–114. [Google Scholar] [CrossRef]
Pedersen, B.P.; Larsen, J. Prediction of full-scale propulsion power using artificial neural networks. In Proceedings of the 8th International Conference on Computer and IT Applications in the Maritime Industries (COMPIT’09), Budapest, Hungary, 10–12 May 2009. [Google Scholar]
Adland, R.; Cariou, P.; Wolff, F.C. Optimal ship speed and the cubic law revisited: Empirical evidence from an oil tanker fleet. Transp. Res. Part E Logist. Transp. Rev. 2020, 140, 101972. [Google Scholar] [CrossRef]
Yang, D.; Wu, L.; Wang, S.; Jia, H.; Li, K.X. How big data enriches maritime research–a critical review of Automatic Identification System (AIS) data applications. Transp. Rev. 2019, 39, 755–773. [Google Scholar] [CrossRef]
Beşikçi, E.B.; Arslan, O.; Turan, O.; Ölçer, A.I. An artificial neural network based decision support system for energy efficient ship operations. Comput. Oper. Res. 2016, 66, 393–401. [Google Scholar] [CrossRef] [Green Version]
Jeon, M.; Noh, Y.; Shin, Y.; Lim, O.K.; Lee, I.; Cho, D. Prediction of ship fuel consumption by using an artificial neural network. J. Mech. Sci. Technol. 2018, 32, 5785–5796. [Google Scholar] [CrossRef]
Bersimis, S.; Psarakis, S.; Panaretos, J. Multivariate statistical process control charts: An overview. Qual. Reliab. Eng. Int. 2007, 23, 517–543. [Google Scholar] [CrossRef] [Green Version]
Dede, D.; Didaskalou, E.; Bersimis, S.; Georgakellos, D. A Statistical Framework for Assessing Environmental Performance of Quality Wine Production. Sustainability 2020, 12, 10246. [Google Scholar] [CrossRef]
Bersimis, S.; Georgakellos, D. A probabilistic framework for the evaluation of products’ environmental performance using life cycle approach and Principal Component Analysis. J. Clean. Prod. 2013, 42, 103–115. [Google Scholar] [CrossRef]
Jackson, J.; Edward, A. User’s Guide to Principal Components; John Willey Sons. Inc.: New York, NY, USA, 1991; Volume 40. [Google Scholar]
Feller, W. An Introduction to Probability Theory and Its Applications, 3rd ed.; Technical Report; Wiley Series in Probability and Mathematical Statistics; Wiley: New York, NY, USA, 1967. [Google Scholar]
Rousseeuw, P.J. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 1987, 20, 53–65. [Google Scholar] [CrossRef] [Green Version]
Tibshirani, R.; Walther, G.; Hastie, T. Estimating the number of clusters in a data set via the gap statistic. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2001, 63, 411–423. [Google Scholar] [CrossRef]
El-Mandouh, A.M.; Abd-Elmegid, L.A.; Mahmoud, H.A.; Haggag, M.H. Optimized K-means clustering model based on gap statistic. Int. J. Adv. Comput. Sci. Appl. 2019. [Google Scholar] [CrossRef] [Green Version]
Shamsuzzaman, M.; Shamsuzzoha, A.; Maged, A.; Haridy, S.; Bashir, H.; Karim, A. Effective monitoring of carbon emissions from industrial sector using statistical process control. Appl. Energy 2021, 300, 117352. [Google Scholar] [CrossRef]

Figure 1. Framework of the analysis process.

Figure 2. PCA in the 2-dimensional space.

Figure 3. Clusters in the 2-dimensional space. The left plot indicates the initial data and the plot on the right side corresponds to the identified clusters, denoted as

C_{1}

(Cluster 1),

C_{2}

(Cluster 2), and

C_{3}

(Cluster 3).

Figure 3. Clusters in the 2-dimensional space. The left plot indicates the initial data and the plot on the right side corresponds to the identified clusters, denoted as

C_{1}

(Cluster 1),

C_{2}

(Cluster 2), and

C_{3}

(Cluster 3).

Figure 4. Correlation matrix of data set.

Figure 5. Mahalanobis Distance and Outlier Detection.

Figure 6. Left plot: scree Plot. Right plot: biplot.

Figure 7. Base Ship (d).

Figure 8. Typical graphical tags of the ship environmental performance. (a) Ship with good environmental performance, average physical dimensions, operating mostly inside Member State’s jurisdiction. (b) Ship with good environmental performance, large physical dimensions, operating mostly inside Member State’s jurisdiction. (c) Ship with very good environmental performance, average physical dimensions, operating mostly inside Member State’s jurisdiction. (d) Ship with average environmental performance, small physical dimensions, operating mostly outside Member State’s jurisdiction. (e) Ship with pure environmental performance, large physical dimensions, operating mostly inside Member State’s jurisdiction. (f) Ship with pure environmental performance, large physical dimensions, operating both inside and outside Member State’s jurisdiction.

Figure 9. Silhouette Score (left plot) and Gap statistic (right plot) obtained for clustering results on the three PCs using k-means.

Figure 10. The two clusters (colored with red and brown) defined by the k-means algorithm plotted with respect to the PCs. Upper left plot: clusters on the plane defined by first and second PCs. Upper right plot: clusters on the plane defined by first and third PCs. Lower plot: clusters on the plane defined by second and third PCs.

Figure 11. The four clusters (colored dark green, light green, red, and orange) are defined by the k-means algorithm plotted with respect to the PCs. Upper left plot: clusters on the plane defined by first and second PCs. Upper right plot: clusters on the plane defined by first and third PCs. Lower plot: clusters on the plane defined by second and third PCs.

Figure 12. Values of the three indices (PCs) at the centroid of the four clusters identified by the k-means algorithm.

Table 1. Data Set Description.

Variable	Description
$X_{1}$	The annual total fuel consumption for voyages (m tonnes)
$X_{2}$	Aggregated ${CO}_{2}$ emissions from all voyages between ports under a Member State’s jurisdiction (m tonnes)
$X_{3}$	Aggregated ${CO}_{2}$ emissions from all voyages which departed from ports under a Member State’s jurisdiction (m tonnes)
$X_{4}$	Aggregated ${CO}_{2}$ emissions from all voyages to ports under a Member State’s jurisdiction (m tonnes)
$X_{5}$	${CO}_{2}$ emissions that occurred within ports under a Member State’s jurisdiction at berth (m tonnes)
$X_{6}$	The annual total time spent at sea in voyages (hours)
$X_{7}$	Total distance traveled in nautical miles
$X_{8}$	The weight that a ship can safely carry (tonnes)
$X_{9}$	The distance between the waterline and the keel of a vessel (meters)
$X_{10}$	The maximum length of a vessel from the two points on the hull measured perpendicular to the waterline (meters)
$X_{11}$	The width of the boat, measured at its widest point (meters)
$X_{12}$	Measured vertically from the lowest point of the hull, ordinarily from the bottom of the keel to the side of any deck that may be chosen as a reference point (meters)
$X_{13}$	The maximum available space for cargo measured in cubic meters
$X_{14}$	Engine power of a ship in kw

Table 2. PCA loadings for the first three PCs using the initial standardized data set and the reduced data set. The pairwise differences for the two analyses are presented in the last three columns.

	Initial Standardized						Pairwise Differences
	Data Set			Reduced Data Set			among Loadings
Variable	PC1	PC2	PC3	PC1	PC2	PC3	PC1	PC2	PC3
$Z_{1}$	0.099	0.478	−0.084	0.101	0.480	−0.082	0.002	0.001	0.002
$Z_{2}$	−0.119	0.334	0.528	−0.118	0.337	0.537	0.000	0.003	0.009
$Z_{3}$	0.170	0.317	−0.455	0.172	0.317	−0.456	0.002	0.000	0.001
$Z_{4}$	0.192	0.291	−0.454	0.194	0.291	−0.455	0.002	0.000	0.002
$Z_{5}$	0.022	0.296	0.430	0.024	0.293	0.408	0.002	0.003	0.022
$Z_{6}$	−0.159	0.397	0.094	−0.161	0.398	0.102	0.002	0.001	0.007
$Z_{7}$	−0.091	0.465	−0.003	−0.089	0.463	0.001	0.001	0.002	0.004
$Z_{8}$	0.378	−0.040	0.196	0.382	−0.040	0.203	0.005	0.000	0.007
$Z_{9}$	0.226	0.031	0.058	0.191	0.015	0.061	0.035	0.016	0.003
$Z_{10}$	0.372	−0.020	0.068	0.376	−0.020	0.073	0.003	0.000	0.005
$Z_{11}$	0.382	−0.046	0.105	0.385	−0.046	0.108	0.003	0.000	0.003
$Z_{12}$	0.366	0.019	0.109	0.362	0.024	0.102	0.004	0.005	0.007
$Z_{13}$	0.379	−0.049	0.186	0.382	−0.049	0.192	0.004	0.000	0.006
$Z_{14}$	0.347	0.074	−0.064	0.352	0.076	−0.064	0.005	0.001	0.001
Variance Explained	0.431	0.281	0.090	0.455	0.301	0.096	0.025	0.020	0.006

Table 3. The Pearson correlation coefficients and their corresponding p-values (in parentheses) for all the possible pairs of PCs in each cluster.

Large, environmentally friendly ships (Cluster 1)
	Physical Dimensions	Operational Env. Efficiency
Operational Env. Efficiency	0.287 (<0.001)
Operating Region	−0.099 (0.045)	−0.250 (<0.001)
Small, environmentally friendly ships (Cluster 2)
	Physical Dimensions	Operational Env. Efficiency
Operational Env. Efficiency	0.097 (0.015)
Operating Region	−0.184 (<0.001)	0.250 (<0.001)
Large, non-environmentally friendly ships (Cluster 3)
	Physical Dimensions	Operational Env. Efficiency
Operational Env. Efficiency	0.291 (<0.001)
Operating Region	0.198 (<0.001)	−0.333 (<0.001)
Small, non-environmentally friendly (Cluster 4)
	Physical Dimensions	Operational Env. Efficiency
Operational Env. Efficiency	0.097 (0.005)
Operating Region	0.119 (<0.001)	−0.248 (<0.001)

Table 4. PCA loadings for the first three PCs using the General Cargo ships (reduced data) and the Container ship data. The pairwise differences for the two analyses are presented in the last three columns.

							Pairwise Differences
	Reduced Data Set			Container Ships			among Loadings
Variable	PC1	PC2	PC3	PC1	PC2	PC3	PC1	PC2	PC3
$Z_{1}$	0.101	0.480	−0.082	0.202	0.422	0.004	0.101	0.057	0.085
$Z_{2}$	−0.118	0.337	0.537	0.044	0.369	0.545	0.162	0.032	0.008
$Z_{3}$	0.172	0.317	−0.456	0.235	0.184	−0.497	0.063	0.132	0.041
$Z_{4}$	0.194	0.291	−0.455	0.207	0.199	−0.576	0.013	0.092	0.120
$Z_{5}$	0.024	0.293	0.408	0.133	0.319	0.246	0.109	0.025	0.162
$Z_{6}$	−0.161	0.398	0.102	0.148	0.396	−0.096	0.309	0.002	0.198
$Z_{7}$	−0.089	0.463	0.001	0.120	0.448	0.080	0.210	0.014	0.079
$Z_{8}$	0.382	−0.040	0.203	0.330	−0.184	0.057	0.052	0.144	0.146
$Z_{9}$	0.191	0.015	0.061	0.342	−0.073	0.125	0.151	0.088	0.065
$Z_{10}$	0.376	−0.020	0.073	0.365	−0.181	0.078	0.011	0.161	0.005
$Z_{11}$	0.385	−0.046	0.108	0.360	−0.175	0.065	0.026	0.129	0.043
$Z_{12}$	0.362	0.024	0.102	0.364	−0.144	0.124	0.002	0.168	0.022
$Z_{13}$	0.382	−0.049	0.192	0.305	−0.113	0.032	0.078	0.064	0.159
$Z_{14}$	0.352	0.076	−0.064	0.303	−0.128	0.050	0.049	0.204	0.115
Variance Explained	0.455	0.301	0.096	0.437	0.288	0.089	0.018	0.013	0.007

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Skarlatos, K.; Fousteris, A.; Georgakellos, D.; Economou, P.; Bersimis, S. Assessing Ships’ Environmental Performance Using Machine Learning. Energies 2023, 16, 2544. https://doi.org/10.3390/en16062544

AMA Style

Skarlatos K, Fousteris A, Georgakellos D, Economou P, Bersimis S. Assessing Ships’ Environmental Performance Using Machine Learning. Energies. 2023; 16(6):2544. https://doi.org/10.3390/en16062544

Chicago/Turabian Style

Skarlatos, Kyriakos, Andreas Fousteris, Dimitrios Georgakellos, Polychronis Economou, and Sotirios Bersimis. 2023. "Assessing Ships’ Environmental Performance Using Machine Learning" Energies 16, no. 6: 2544. https://doi.org/10.3390/en16062544

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Assessing Ships’ Environmental Performance Using Machine Learning

Abstract

1. Introduction

2. Theoretical Background

3. Materials and Methods

3.1. Principal Component Analysis

3.2. Cluster Analysis

3.3. Available Real Data

4. Analysis and Results

4.1. Data Preprocessing

4.2. Exploratory Analysis of the Original Data

4.3. Application of PCA

Interpretation of the First PCs

4.4. Definition of a Graphical Tag Based on the Three PCs (Three Indices)

4.5. Cluster Analysis Based on the Three Indices

4.5.1. Choosing Optimal Number of Clusters

4.5.2. K-Means Algorithm

4.5.3. Interpretation of Clusters

4.5.4. Further Investigation of the Characteristics of Clusters

4.6. PCA Validation of the Proposed Indices

5. Discussion and Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI